Essay· 6 min read

Memory in LangGraph: Checkpointing, Persistence, and Long-Term Context

Three memory tiers — in-thread, cross-thread, and external store — and exactly when each one becomes the right tool.

Memory in LangGraph: Checkpointing, Persistence, and Long-Term Context

The simplest LangGraph agent has no memory. Every invocation starts fresh. For a one-shot task, that is fine. For anything with continuity — a support agent, a coding assistant, a workflow that spans days — it is a hard limit.

LangGraph gives you three distinct memory tiers. Most tutorials only show you one. Understanding all three is what lets you build agents that feel like they actually know the user.

The three tiers

┌─────────────────────────────────────────────────────────┐
│                    MEMORY TIERS                         │
│                                                         │
│  Tier 1: In-Thread (Checkpoints)                        │
│  ─────────────────────────────────────────────          │
│  Scope: One conversation session                        │
│  Storage: SqliteSaver, PostgresSaver, RedisSaver        │
│  Use: Remember earlier steps in the current run         │
│                                                         │
│  Tier 2: Cross-Thread (Episodic)                        │
│  ─────────────────────────────────────────────          │
│  Scope: Across multiple sessions for one user           │
│  Storage: Your own DB, keyed by user ID                 │
│  Use: "Last time you asked about X..."                  │
│                                                         │
│  Tier 3: External Store (Semantic)                      │
│  ─────────────────────────────────────────────          │
│  Scope: Long-term facts about the world / user          │
│  Storage: Vector database + structured DB               │
│  Use: User preferences, domain facts, learned patterns  │
└─────────────────────────────────────────────────────────┘

Tier 1: In-thread memory with checkpointing

Checkpointing saves the full graph state after every node execution. If the agent crashes or is interrupted, it can resume from the last saved checkpoint rather than starting over.

from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.graph import StateGraph
 
# In-memory for development
from langgraph.checkpoint.memory import MemorySaver
dev_checkpointer = MemorySaver()
 
# SQLite for single-instance production
prod_checkpointer = SqliteSaver.from_conn_string("./agent_memory.db")
 
graph = builder.compile(checkpointer=prod_checkpointer)

Every invocation needs a thread_id. This is the identifier that groups checkpoints into a single conversation.

config = {"configurable": {"thread_id": "user-123-session-456"}}
 
# First message
result1 = graph.invoke(
    {"messages": [HumanMessage(content="My name is Yog")]},
    config=config
)
 
# Second message — the graph remembers the first
result2 = graph.invoke(
    {"messages": [HumanMessage(content="What is my name?")]},
    config=config
)
# -> "Your name is Yog."

Without the checkpointer, the second call would have no idea who Yog is.

Inspecting and replaying checkpoints

The checkpointer exposes the full state history for any thread.

# Get all checkpoints for a thread
history = list(graph.get_state_history(config))
 
for checkpoint in history:
    print(checkpoint.config["configurable"]["checkpoint_id"])
    print(len(checkpoint.values["messages"]), "messages")
    print("---")
 
# Resume from a specific checkpoint (time travel)
past_checkpoint = history[2]
graph.invoke(
    {"messages": [HumanMessage(content="Go a different direction")]},
    config=past_checkpoint.config
)

This is useful for debugging. You can replay any branch of a long agentic run without re-executing the expensive steps that came before it.

Tier 2: Cross-thread episodic memory

Checkpoints only live within a thread. When a new session starts, the slate is clean. Episodic memory bridges sessions.

The pattern: before the agent responds, load relevant memories from a previous session. After the agent finishes, save anything worth remembering.

import json
import sqlite3
from datetime import datetime
 
class EpisodicMemoryStore:
    def __init__(self, db_path: str):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS memories (
                user_id TEXT,
                memory_type TEXT,
                content TEXT,
                created_at TEXT
            )
        """)
 
    def save(self, user_id: str, memory_type: str, content: str):
        self.conn.execute(
            "INSERT INTO memories VALUES (?, ?, ?, ?)",
            (user_id, memory_type, content, datetime.utcnow().isoformat())
        )
        self.conn.commit()
 
    def load(self, user_id: str, limit: int = 10) -> list[dict]:
        rows = self.conn.execute(
            "SELECT memory_type, content, created_at FROM memories WHERE user_id = ? ORDER BY created_at DESC LIMIT ?",
            (user_id, limit)
        ).fetchall()
        return [{"type": r[0], "content": r[1], "at": r[2]} for r in rows]
 
memory_store = EpisodicMemoryStore("./episodic.db")

Wire this into your agent as a node that runs before and after the main logic.

def load_memories_node(state: AgentState) -> AgentState:
    user_id = state["user_id"]
    memories = memory_store.load(user_id)
    if not memories:
        return {}
 
    memory_text = "\n".join(f"- [{m['type']}] {m['content']}" for m in memories)
    system_msg = SystemMessage(content=f"Context from previous sessions:\n{memory_text}")
 
    return {"messages": [system_msg]}
 
def save_memories_node(state: AgentState) -> AgentState:
    user_id = state["user_id"]
    last_exchange = state["messages"][-2:]  # human + assistant turn
 
    # Use an LLM to decide what is worth saving
    extractor_prompt = f"""
    From this conversation exchange, extract any facts worth remembering about the user.
    Return a JSON array of objects with 'type' and 'content' fields.
    Types: preference, fact, goal, frustration
    Exchange: {[m.content for m in last_exchange]}
    """
    result = llm.invoke(extractor_prompt)
 
    try:
        facts = json.loads(result.content)
        for fact in facts:
            memory_store.save(user_id, fact["type"], fact["content"])
    except Exception:
        pass  # saving memories is best-effort, never block the response
 
    return {}

Tier 3: Semantic long-term memory with a vector store

For agents that need to recall facts across a large, growing knowledge base — user preferences, domain expertise, past decisions — add a vector store as the third tier.

from langchain_community.vectorstores import Chroma
from langchain_anthropic import AnthropicEmbeddings
 
embeddings = AnthropicEmbeddings()
vector_store = Chroma(embedding_function=embeddings, persist_directory="./long_term_memory")
 
def remember(content: str, metadata: dict):
    vector_store.add_texts([content], metadatas=[metadata])
 
def recall(query: str, k: int = 5) -> list[str]:
    docs = vector_store.similarity_search(query, k=k)
    return [doc.page_content for doc in docs]

The node that uses this:

def semantic_recall_node(state: AgentState) -> AgentState:
    query = state["messages"][-1].content
    memories = recall(query, k=3)
 
    if not memories:
        return {}
 
    context = "\n".join(f"- {m}" for m in memories)
    recall_msg = SystemMessage(content=f"Relevant long-term context:\n{context}")
    return {"messages": [recall_msg]}

Putting all three tiers together

A production agent for a customer support use case uses all three:

builder = StateGraph(AgentState)
 
# Memory loading (runs before main logic)
builder.add_node("load_episodic", load_memories_node)
builder.add_node("semantic_recall", semantic_recall_node)
 
# Core agent
builder.add_node("agent", agent_node)
builder.add_node("tools", tool_node)
 
# Memory saving (runs after response)
builder.add_node("save_memories", save_memories_node)
 
builder.set_entry_point("load_episodic")
builder.add_edge("load_episodic", "semantic_recall")
builder.add_edge("semantic_recall", "agent")
# ... tool routing ...
builder.add_edge("agent", "save_memories")
builder.add_edge("save_memories", END)
 
# Tier 1: in-thread checkpointing
graph = builder.compile(checkpointer=prod_checkpointer)

Choosing the right checkpointer for production

Backend Best for Limitation
MemorySaver Development, testing Lost on restart
SqliteSaver Single-instance apps No horizontal scaling
PostgresSaver Multi-instance production Requires Postgres
RedisSaver High-throughput, short sessions Needs persistence config

For most production deployments, Postgres is the right choice. The setup cost is worth the reliability.

from langgraph.checkpoint.postgres import PostgresSaver
 
with PostgresSaver.from_conn_string("postgresql://user:pass@host/db") as checkpointer:
    checkpointer.setup()  # creates the checkpoint tables
    graph = builder.compile(checkpointer=checkpointer)

What memory cannot fix

Memory makes agents feel smarter. It does not make them smarter.

If your base agent gives bad answers, adding memory makes it give bad answers with confident references to things it "remembered." The failure mode is worse because it feels coherent.

Get the base agent right first. Add memory when the base behaviour is solid and continuity is genuinely the missing piece.

Also: memory grows. A user who has used your agent for a year has a lot of episodic memories. Add a retention policy — archive or summarise old memories rather than loading everything forever.

def summarise_old_memories(user_id: str, older_than_days: int = 30):
    old = memory_store.load_older_than(user_id, older_than_days)
    if len(old) < 10:
        return  # not worth summarising yet
 
    summary_prompt = f"Summarise these memories into 3-5 key facts:\n{old}"
    summary = llm.invoke(summary_prompt).content
    memory_store.archive(user_id, old)
    memory_store.save(user_id, "summary", summary)

Memory is infrastructure. Treat it like infrastructure — size it, maintain it, and have a plan for when it grows beyond what you expected.

Share: