Skip to main content
Plyra Memory Retrieval Flow

recall()

Searches across all three memory layers simultaneously and returns a ranked list of results.
result = await mem.recall("LangGraph debugging")
for r in result.results:
    print(f"[{r.layer.value}] {r.content}  score={r.score:.2f}")

How scores are computed

Each result carries a composite score fused from three signals:
SignalWeightDescription
Similaritysimilarity_weightCosine similarity between query and content embeddings
Recencyrecency_weightHow recently this memory was stored
Importanceimportance_weightThe importance score set when storing
Weights are configurable via MemoryConfig or env vars:
export PLYRA_SIMILARITY_WEIGHT=0.6
export PLYRA_RECENCY_WEIGHT=0.2
export PLYRA_IMPORTANCE_WEIGHT=0.2
The weights must sum to 1.0.

Filtering by layer

from plyra_memory.schema import MemoryLayer

# Search only episodic and semantic
result = await mem.recall(
    "user preferences",
    layers=[MemoryLayer.EPISODIC, MemoryLayer.SEMANTIC],
)

RecallResult fields

FieldTypeDescription
querystrThe original query string
resultslist[RankedMemory]Ranked memory results across all layers
total_foundintTotal matches before ranking
layers_searchedlist[MemoryLayer]Layers that were searched
cache_hitboolWhether result came from cache
latency_msfloatQuery execution time in milliseconds
retrieved_atdatetimeWhen the recall was executed

RankedMemory fields

FieldTypeDescription
idstrUnique memory ID
layerMemoryLayerWhich layer this came from
contentstrThe memory content
scorefloatComposite score (0.0–1.0)
similarityfloatSimilarity component
recencyfloatRecency component
importancefloatImportance component
created_atdatetimeWhen memory was created
source_idstrID of source record
metadatadict[str, Any]Custom metadata

context_for()

Builds a token-budgeted string ready to inject into a prompt. Internally calls recall() with top_k=50, then fills the budget with the highest-scoring results.
ctx = await mem.context_for(
    "what does the user prefer?",
    token_budget=500,
)

# Inject into your prompt
prompt = f"Context from memory:\n{ctx.content}\n\nUser question: ..."
print(f"Used {ctx.memories_used} memories, {ctx.token_count} tokens")

ContextResult fields

FieldTypeDescription
querystrThe original query
contentstrFormatted context string ready for injection
token_countintActual tokens used
token_budgetintBudget provided
memories_usedintNumber of memory results included
cache_hitboolWhether result came from cache
latency_msfloatQuery execution time
retrieved_atdatetimeWhen the context was retrieved

Token budget

The default token budget comes from MemoryConfig.default_token_budget (default: 2048). Override per-call:
# Tight budget — most relevant memory only
ctx = await mem.context_for("query", token_budget=200)

# Generous budget — full context
ctx = await mem.context_for("query", token_budget=2000)

Embedding model

By default, plyra-memory uses all-MiniLM-L6-v2 from sentence-transformers (384-dimensional embeddings, runs locally).
export PLYRA_EMBED_MODEL=all-MiniLM-L6-v2
export PLYRA_EMBED_DIM=384