The RAG Wall

Your RAG system is failing. You’ve tuned your chunk size, experimented with 15 different embedding models, and even tried the latest long-context LLMs, yet your bot still can't tell you how Project X impacts the Q3 roadmap when that info is scattered across four different Jira tickets and a Slack export. The problem isn’t your context window; it’s your data structure. Vector similarity is a blunt instrument. It finds things that sound the same, not things that are logically connected.

In my last role building an internal developer portal, we hit this wall hard. We had 50,000 documents, and no matter how much we optimized our embeddings, the system couldn't handle multi-hop queries like "Which microservices depend on the library that was deprecated in the last security audit?" To solve this, we had to move beyond flat vector stores and into the world of Knowledge Graphs (KGs). In 2026, the gold standard isn't just RAG; it's GraphRAG—a hybrid approach where an LLM navigates a deterministic graph of entities and relationships to provide structured, verifiable answers.

The Extraction Stack: DSPy, Pydantic, and Llama 4

In 2024, we were all writing 500-line prompts with 'few-shot' examples to extract JSON. It was brittle and expensive. Today, we use DSPy (Declarative Self-improving Language Programs) to treat our LLM calls like a compiler. Instead of tweaking prompts, we define signatures and let the optimizer find the best weights for our specific domain.

For high-fidelity extraction, you need strict typing. Pydantic 3.0 is our best friend here. It ensures that the LLM doesn't just hallucinate a relationship type like WORKS_FOR when your schema strictly requires EMPLOYED_BY.

The Extraction Module

Here is a production-ready extraction module using DSPy and Pydantic. This module handles the extraction of entities and their relationships from raw technical documentation.

import dspy
from pydantic import BaseModel, Field
from typing import List, Optional

class Entity(BaseModel):
    name: str = Field(..., description="Normalized name of the entity, e.g., 'Kubernetes'")
    category: str = Field(..., description="Type: Tool, Team, Service, or Metric")
    properties: Optional[dict] = Field(default_factory=dict)

class Relationship(BaseModel):
    source: str = Field(..., description="Name of the source entity")
    target: str = Field(..., description="Name of the target entity")
    predicate: str = Field(..., description="The relationship type in UPPER_CASE, e.g., DEPENDS_ON")

class GraphExtraction(dspy.Signature):
    """Extract structured entities and relationships from technical text."""
    text = dspy.InputField()
    entities = dspy.OutputField(desc="List of Entity objects")
    relationships = dspy.OutputField(desc="List of Relationship objects")

class KnowledgeGraphExtractor(dspy.Module):
    def __init__(self):
        super().__init__()

    # In 2026, we use Llama-4-70B for extraction due to its superior reasoning
    self.extractor = dspy.TypedPredictor(GraphExtraction)

def forward(self, text):
    prediction = self.extractor(text=text)
    return prediction

Example usage

extractor = KnowledgeGraphExtractor() raw_text = "Service-A depends on the Auth-Library v2.1 which is maintained by the Security-Team." result = extractor(text=raw_text)

The Entity Resolution Nightmare

Extraction is the easy part. The real challenge—the thing that kills production systems—is Entity Resolution (ER). If your text says "AWS" in one paragraph and "Amazon Web Services" in another, a naive extraction creates two separate nodes. Your graph becomes a fragmented mess, and traversals fail.

In our production pipeline, we implemented a two-stage ER process:

Blocking: Use a fast vector search (using something like Qdrant or Milvus) to find the top 5 most similar existing nodes in the graph.
Matching: Use a smaller, cheaper LLM (like Llama-4-8B) to perform a binary check: "Are 'AWS' and 'Amazon Web Services' the same entity in the context of Cloud Infrastructure?"

Don't skip this. If you don't resolve entities at ingestion time, you will spend weeks writing 'cleanup' scripts that never quite work. We learned this the hard way after our first graph grew to 1 million nodes, half of which were duplicates.

Ingesting into Neo4j 6.2

Neo4j remains the king of graph databases. With the 6.2 release, the performance of MERGE operations on large batches has improved significantly. When ingesting, you must use parameter-driven batching. Writing a single MERGE statement for every entity is a recipe for a bottleneck.

from neo4j import GraphDatabase

class GraphDBManager:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def batch_ingest(self, entities, relations):
        with self.driver.session() as session:
            session.execute_write(self._ingest_transaction, entities, relations)

    @staticmethod
    def _ingest_transaction(tx, entities, relations):

    # Batch MERGE for entities
    entity_query = """
    UNWIND $entities AS e
    MERGE (n:Node {id: e.name})
    SET n.category = e.category,
        n.last_updated = timestamp()
    """
    tx.run(entity_query, entities=entities)

    # Batch MERGE for relationships
    rel_query = """
    UNWIND $relations AS r
    MATCH (a:Node {id: r.source})
    MATCH (b:Node {id: r.target})
    CALL apoc.create.relationship(a, r.predicate, {}, b) YIELD rel
    RETURN count(rel)
    """
    tx.run(rel_query, relations=relations)

Pro Tip: Notice the use of apoc.create.relationship. In Neo4j, you cannot use a parameter for the relationship type (e.g., MERGE (a)-[:$type]->(b) is invalid). APOC allows you to dynamically set the relationship type from your LLM's output.

Hard Lessons from the Trenches

1. The Ontology Trap

Do not try to build a "Global Ontology of Everything." We spent three months trying to map our data to Schema.org and it was a disaster. The LLMs struggled to fit specific technical concepts into generic categories. Instead, start with a Domain-Specific Schema. If you are building a graph for DevOps, your nodes should be Service, Repo, Developer, and Deployment. Keep it narrow.

2. The Cost of Extraction

Extracting a graph is 10x more expensive than simple vector embedding. In our last project, processing 10,000 documents cost us ~$400 in API credits because we were using GPT-5-Turbo for every chunk. We eventually moved to a tiered system: use a local Llama-4-70B instance for extraction and only call the heavy models for complex relationship verification.

3. Graph Hairballs

If your LLM extracts every possible relationship, you end up with a "hairball" where every node is connected to every other node (e.g., everything is connected to a node called 'The'). You need to implement a Relevance Filter. Before ingesting a relationship, ask: "Does this relationship provide unique information that isn't captured by the entity attributes?"

Why This Matters for 2026

We are moving away from LLMs as pure generators and toward LLMs as reasoning engines over structured data. A Knowledge Graph provides the 'Ground Truth' that prevents the model from wandering off into hallucination land. When a user asks a question, we now retrieve the relevant sub-graph, convert it to a text-based representation (like GraphML or a simplified Cypher path), and feed that into the LLM context. This results in answers that are not only accurate but also traceable back to the source nodes.

Takeaway

Stop dumping raw text into vector stores and hoping for the best. Today, pick one small subset of your data—say, your API documentation—and define a simple schema with 3 entity types and 2 relationship types. Use the DSPy code above to extract a small graph into a local Neo4j instance. You’ll find that the ability to query "What are the upstream dependencies of X?" is worth a hundred 'semantic similarity' searches.

Building Production-Grade Knowledge Graphs: Beyond the LLM Extraction Hype

The RAG Wall

The Extraction Stack: DSPy, Pydantic, and Llama 4

The Extraction Module

Example usage

The Entity Resolution Nightmare

Ingesting into Neo4j 6.2

Hard Lessons from the Trenches

1. The Ontology Trap

2. The Cost of Extraction

3. Graph Hairballs

Why This Matters for 2026

Takeaway

Enjoyed this article?

Related Articles

Scaling Engineering Velocity: Building Autonomous Code Review Pipelines in 2026

Engineering Reliable AI Agents: A Practical Guide to Tool Use and Function Calling

Uğur Kaval

Building Evaluation Frameworks for LLM Applications: Beyond the Vibe Check