TL;DR — Memory in multi-tenant AI products is two problems: making it useful (retrieve the right context for the right person) and making it safe (never leak across users or tenants). The safety part has to be enforced at the data-access layer, not the prompt layer.

Per-user memory isolation in a multi-tenant AI productEvery retrieval query carries mandatory tenant_id + user_id predicates — enforced at the data-access layer, tested on every CI run. — /memory boundary Coaching turn tenant=T1 · user=A queryMemory() WHERE tenant_id = T1 AND user_id = A Vector DB memories + tenant_id idx ⛔ User B in T1 — invisible ⛔ Any user in T2 — invisible ✓ Tested on every CI build
Every retrieval query carries mandatory tenant_id + user_id predicates — enforced at the data-access layer, tested on every CI run.

Plain LLM chat forgets you between sessions. For products like Blink AI that promise coaching continuity, memory is part of the product. But the moment you store embeddings of one user’s messages, you have a leak surface — across users in the same tenant, and across tenants.

What “memory” actually is

  • Recent conversation — last N turns kept in the prompt as-is
  • Session state — structured data (current goal, declared commitments) re-injected on each turn
  • Long-term memory — embeddings of past messages, retrieved by similarity, scoped to the user

This article is about the third. The first two have no cross-pollination risk because they are scoped to one session by construction.

The architecture

  • One memories table or vector store per cluster, not per tenant — operational simplicity
  • Every row carries tenant_id AND user_id as indexed columns
  • Embeddings are generated with a stable model ID stored alongside
  • A single data-access helper, queryMemory(tenantId, userId, queryEmbedding, k), is the only path code can take
  • The helper builds a query that ALWAYS includes both predicates; there is no overload that omits them

What the prompt sees

At each turn, the orchestrator:

  1. Embeds the current user input
  2. Calls queryMemory for the top-K most similar past memories
  3. Re-ranks the results with a cross-encoder (optional but useful past ~50 memories per user)
  4. Injects the top 3–5 into the system prompt as “things you may remember about this user”
  5. Generates the response
  6. After the response, summarizes salient new facts and writes them back as new memories

The boundary, tested

A test creates two synthetic users in the same tenant and two more in a different tenant. Each user writes 50 memories. The test then runs ~200 queries from one user’s perspective and asserts that no result row belongs to any other user. This runs on every CI build. The boundary is verified, not hoped for.

What goes wrong in practice

  • Embedding model upgrades. Mix old and new embeddings in the same index and similarity scores become meaningless. Migrate or version-tag.
  • Soft-delete vs hard-delete. When a user is deleted, their memories must be deleted too — and verified, not assumed.
  • Right-to-be-forgotten flows. KVKK and GDPR subject access requests apply to embeddings. Wire delete paths through the same helper.
  • Prompt-injection through memory. A malicious user could embed instructions in their messages that surface later. Treat retrieved memories as untrusted input, not authoritative content.

What this buys

A continuity-feeling product where session N remembers what session 1 said, without ever surfacing anything that did not come from the same person. The model gets context-rich, the system stays safe, and compliance reviews stop being scary.

Frequently asked questions

Why not just include user history in the prompt?

Token cost and context-window pressure. After a few sessions the prompt explodes; retrieval keeps it bounded while still letting the model use the right history. Prompt-stuffing also makes prompt caching less effective.

How do you prevent cross-tenant leaks?

Filter at the data-access layer, not the prompt layer. A single helper builds every vector query with mandatory tenant_id and user_id predicates. A regression test verifies the boundary on every CI run.

Working on something similar?

T-Square architects, builds and operates production systems for learning, AI and custom software products. Talk to a senior engineer if you want a second opinion on your design or roadmap.