Run this in your head for a second. Every transformer has a feed-forward network in every layer. Those FFNs store facts. Not in some abstract distributed way that requires a PhD to decode. Actual facts. France borders Spain. Einstein won the Nobel. Tokyo is in Japan. They're in there as vectors, packed into weight matrices, sitting in specific layers at specific positions.
What if you could just ask for them? Not by prompting the model and hoping it answers correctly. By querying the weights directly. Like a database.
That's what LARQL does. It decompiles a transformer's FFN weights into a queryable graph structure and lets you run SQL on it. Not a metaphor. Actual SQL.
USE "gemma3-4b.vindex";
DESCRIBE "France";
-- borders->country L27(1436.9), language->French L24(35.2), continent->Europe L25(14.4)
That output is not the model generating text. It's a direct lookup into the weight vectors. No inference. No GPU. No prompt engineering. You're reading the model's internal knowledge graph like a table.
The graph was always there
LARQL doesn't add a graph database to the model. It exposes the one that was already there. The framing goes like this:
Entities are nodes. Every "thing" the model knows about (France, Einstein, Python, Toyota) exists as activation patterns across layers. Features are edges. Each feature is a gate vector and a down vector at a specific layer. One slot, one piece of knowledge. Relations are edge labels. "capital-of," "borders," "nationality," "invented-by." These are the types of connections between entities. Attention is the router. It decides which edges fire for a given query during inference.
LARQL takes the FFN weights, extracts these structures into something called a "vindex" (vector index), and from that point you interact with the model's knowledge using queries instead of prompts.
What you can actually do with it
Start simple. Ask what the model knows about an entity:
DESCRIBE "France";
-- borders, language, continent, capital with layer numbers and activation strengths
Filter by relation type:
SELECT * FROM EDGES WHERE entity="France" AND relation="borders" LIMIT 5;
Find what's near something in the model's representation space:
SELECT * FROM EDGES NEAREST "Einstein" AT layer 26 LIMIT 10;
-- Nobel cluster, physics tokens, "brain" variants
Run inference as a graph walk instead of matrix multiplication:
INFER "The capital of France is" TOP 5;
-- Paris (80%)
That inference step is doing something different from normal forward passes. The FFN portion runs as a KNN search over gate vectors instead of matmul over weight matrices. Attention still runs normally. But the knowledge retrieval part is now a graph walk. Full 34-layer walk takes 0.3 milliseconds.
The part that changes things
Here's where it gets interesting. You can write to the graph.
INSERT INTO EDGES (entity, relation, target) VALUES ("Atlantis", "capital", "Poseidon");
INFER "The capital of Atlantis is" TOP 5;
-- Poseidon (99.98%)
One line. No training. No fine-tuning. No GPU. The fact gets written directly as weight vectors using the MEMIT technique under the hood, but surfaced as a database insert. The edit produces a patch file around 10 MB. The base model stays untouched.
Think about what that means for deployment. Instead of distributing an 8 GB fine-tuned model, you distribute a 10 MB patch. Base model is shared infrastructure. Your modifications are a tiny overlay that can be applied, removed, or versioned like any other file.
Three layers of a transformer
One thing LARQL makes visible is the three-stage architecture inside every transformer. Early layers handle syntax. They parse the structure of what you're asking. Middle layers are the knowledge layers. This is where the FFN edges fire and facts get retrieved. Late layers are the commitment layers. They take the retrieved information and commit to an output token.
You can watch this happen in real time by querying different layers for the same entity. The early layers barely respond. The middle layers light up with relevant facts. The late layers narrow down to a specific answer. It's the kind of thing interpretability researchers have been building custom tools to visualize. LARQL lets you see it with a SELECT statement.
Polysemanticity is real and it's messy
Query a single feature at layer 26, say feature 9348, and you'll find Australia, Italy, Germany, and Spain all crammed into the same slot. One feature encoding "Western nations" as a single compressed concept. This is polysemanticity and it shows up everywhere.
It's also why you can't treat the knowledge graph as clean structured data. The entities bleed into each other. One feature encodes multiple concepts because the model has more things to remember than it has slots to put them in. The queries work, the results are meaningful, but there's noise in every lookup. If you're expecting relational database precision, you won't find it. This is a compressed representation, not a schema.
What this doesn't cover
Only the FFN is graph-queryable. Attention remains standard matrix multiplication and you can't inspect or edit it through LARQL. The relation labels that show up in queries are discovered by probes after the fact. They're useful labels, not ground truth. And MEMIT edits can have ripple effects. Change one fact and related facts might shift in unpredictable ways. That's a well-documented limitation of weight-level editing, not specific to LARQL.
The tool is built in Rust so there's slightly more install friction than a pip package. Default config targets Apple Accelerate, so running on Linux means patching a couple of Cargo.toml files to use OpenBLAS instead. Not hard, but worth knowing before you clone and wonder why it doesn't compile.
Why I care about this
I've been building custom models with specific knowledge baked in. The problem is always verification. You train the model, you test it with prompts, and if it gives the right answers you assume the knowledge is in there. But you don't actually know where it's stored or how it's encoded. You're testing behavior, not structure.
LARQL lets you look inside. Query the edges, check that the facts you trained are actually encoded as weight vectors at specific layers, not just surface-level behavioral responses that might disappear under different prompting. That's a different kind of confidence in your model.
Mechanistic interpretability has been locked behind research labs and custom tooling for years. This puts it behind a SQL prompt. If you can read a SELECT statement, you can inspect a transformer's knowledge. That alone makes it worth paying attention to.