recall()
Searches across all three memory layers simultaneously and returns a ranked list of results.How scores are computed
Each result carries a composite score fused from three signals:| Signal | Weight | Description |
|---|---|---|
| Similarity | similarity_weight | Cosine similarity between query and content embeddings |
| Recency | recency_weight | How recently this memory was stored |
| Importance | importance_weight | The importance score set when storing |
MemoryConfig or env vars:
Filtering by layer
RecallResult fields
| Field | Type | Description |
|---|---|---|
query | str | The original query string |
results | list[RankedMemory] | Ranked memory results across all layers |
total_found | int | Total matches before ranking |
layers_searched | list[MemoryLayer] | Layers that were searched |
cache_hit | bool | Whether result came from cache |
latency_ms | float | Query execution time in milliseconds |
retrieved_at | datetime | When the recall was executed |
RankedMemory fields
| Field | Type | Description |
|---|---|---|
id | str | Unique memory ID |
layer | MemoryLayer | Which layer this came from |
content | str | The memory content |
score | float | Composite score (0.0–1.0) |
similarity | float | Similarity component |
recency | float | Recency component |
importance | float | Importance component |
created_at | datetime | When memory was created |
source_id | str | ID of source record |
metadata | dict[str, Any] | Custom metadata |
context_for()
Builds a token-budgeted string ready to inject into a prompt. Internally callsrecall() with top_k=50, then fills the budget with the highest-scoring results.
ContextResult fields
| Field | Type | Description |
|---|---|---|
query | str | The original query |
content | str | Formatted context string ready for injection |
token_count | int | Actual tokens used |
token_budget | int | Budget provided |
memories_used | int | Number of memory results included |
cache_hit | bool | Whether result came from cache |
latency_ms | float | Query execution time |
retrieved_at | datetime | When the context was retrieved |
Token budget
The default token budget comes fromMemoryConfig.default_token_budget (default: 2048). Override per-call:
Embedding model
By default, plyra-memory usesall-MiniLM-L6-v2 from sentence-transformers (384-dimensional embeddings, runs locally).