ArchitectureBusiness Runtime

Persistent Memory

How RARS achieves gigabytes of context and persistent memory through a dual-context architecture.

The Context Window Problem

Every AI system built on large language models faces the same constraint: the context window. The LLM can only reason about what fits in its prompt. Past a certain point, information falls out. Conversations lose coherence. The AI forgets what happened earlier.

The industry has tried several approaches to work around this. RAG (retrieval-augmented generation) fetches relevant documents from a vector store and injects them into the prompt. Long-context models extend the window to hundreds of thousands of tokens. Memory systems bolt conversation summaries or key-value stores onto chatbots. Each of these helps, but none of them solve the fundamental problem: the AI's understanding is limited to what's in the prompt at any given moment.

Two Contexts, Not One

RARS has a fundamentally different architecture. It operates with two contexts simultaneously:

The working memory is what the LLM sees each iteration: the active matrices, relevant schema documentation, and instance data from recent queries and workflow results. This is bounded by the LLM's context window, but it's actively curated by RARS to contain what's most relevant to the current work. Information flows in and out of working memory as the focus shifts.

The context graph is the collaborative, persistent, knowledge graph that RARS is literally embedded in. Every observation, every process trace, every piece of instance data, every relationship between entities. This is not bounded by the LLM's context window. It's a live graph that RARS can navigate to adapt its working memory at any point during execution.

The working memory is the AI's active attention. The context graph is the AI's persistent memory. Together, they give RARS effectively unlimited context.

Query, Don't Remember

The key insight is that RARS doesn't need to hold everything in its prompt. When it needs information that isn't in working memory, it writes a SPARQL query against the context graph and the answer comes back. The graph contains the full operational state, every observation ever made, every process that ever executed. RARS can reach any of it on demand.

This is fundamentally different from RAG. RAG retrieves static document chunks based on semantic similarity. RARS navigates its context through structured traversals against a live, typed, relationally connected knowledge graph. It can ask precise questions: "what was the status of this work order when it was last reviewed?" "which agent made this observation and as part of which process?" "what are all the tasks assigned to this person?" These aren't fuzzy similarity searches. They're exact queries against structured data.

It's also different from long-context models. A long-context model can hold more tokens in the prompt, but it still has a fixed window. RARS's context graph's limit is bound by the hardware resources allocated to the context (gigabytes worth of memory). It grows as the context accumulates observations and operational state.

This first version of RARS' engine is not necessarily optimized for billions of statements, but future versions could theoretically support an entire business worth of global operating state in a single context.

Persistent Across Sessions

The context graph is checkpointed (see Data Sovereignty). When you return to a context, the full graph is restored. RARS doesn't just remember the conversation. It remembers the full operational state: every entity, every observation, every process trace, every relationship.

This means RARS has genuine persistent memory. Not a summary of what happened last time. Not a vector embedding of past conversations. The actual structured state, queryable with the same precision as if the session never ended.

A context used for project management accumulates knowledge over every interaction. The tasks, the decisions, the approvals, the data from external systems, the validation findings. When you come back tomorrow, all of it is there. When you ask "what happened with the procurement request from last week?" RARS queries the graph and gives you the exact answer, traced back through the observation that recorded it and the process that produced it.

Semantic Memory Accumulation

Within a session, RARS also builds up semantic memory: the schema documentation for domain concepts it has encountered. As RARS discovers new domains and uses new concepts, their definitions accumulate. RARS at the end of a conversation understands more about your operating model than it did at the beginning, because it has loaded and retained the schemas relevant to the work.

This semantic memory is curated, not exhaustive. RARS selects schemas based on their relevance to the current work. Domains that aren't relevant to the conversation aren't loaded. This keeps the working memory focused while the full context graph remains available for on-demand queries.

What This Means

The dual-context architecture means RARS operates with:

  • Focused attention: working memory contains what's relevant to the current step, curated and compressed for the LLM
  • Total recall: the context graph contains everything, queryable on demand with full precision
  • Persistent memory: checkpointed state means nothing is lost between sessions
  • Growing understanding: semantic memory accumulates as RARS encounters new domains during a conversation

This effectively gives RARS persistent memory and a gigabyte-scale context window. The LLM's fixed prompt window is a working memory buffer, not a limit on what RARS knows.

Summary

  • Two contexts: working memory (the LLM's prompt) and the context graph (the full persistent knowledge graph)
  • Query, don't remember: RARS queries the graph on demand instead of holding everything in the prompt
  • Not RAG: structured queries against a live, typed graph, not fuzzy similarity search over static documents
  • Not just long-context: the graph has no fixed window and grows as the context accumulates state
  • Persistent across sessions: checkpointed state means full operational history is available when you return
  • Gigabytes worth of context: the LLM's window is a buffer, not a boundary

See Also

  • Contexts and Agents: how contexts are created and how the context graph is structured
  • Provenance: how every observation is tracked with full attribution
  • RDF & SPARQL: the data representation and query language used to navigate the context graph

On this page