Point11
Agentic

Agent Memory and Personalization

An agent that forgets everything between sessions is just a search bar with personality. Memory turns one-off interactions into relationships that compound.

Every time a customer comes back and your agent says "Hi, how can I help you?" like they have never met, you are training that customer to treat the agent like a disposable tool. Memory is what separates an agent that builds loyalty from one that gets ignored.

Why Memory Matters

Personalization is not a nice-to-have. Amazon attributes 35% of its revenue to personalized recommendations[1]. Netflix estimates its recommendation engine saves $1 billion per year in reduced churn[2]. These numbers come from memory: knowing what a user did before and using that knowledge to make the next interaction better.

For agents, the value compounds. A first conversation might capture that a customer runs a Shopify Plus store with 50,000 SKUs. The second conversation skips 10 minutes of context-gathering. By the fifth conversation, the agent knows the customer's tech stack, pain points, budget constraints, and preferred communication style. Each interaction gets faster and more useful.

Without memory, you pay the full context-gathering cost every single time. That cost is not just tokens and latency. It is customer patience.

Short-Term Memory

The simplest form of memory is the message array: the running list of user and assistant messages in the current conversation. This is what most people think of when they hear "agent memory." It works well for a single session, but it has hard limits.

Every model has a context window, the maximum number of tokens it can process in one request. Claude supports up to 200K tokens. GPT-4o supports 128K. Sounds like a lot, but a 40-message conversation with tool calls and system prompts can eat through context fast.

The practical fix is a sliding window. Instead of sending every message, keep the first few messages (which contain the user's initial intent and the agent's understanding of the task) and the most recent messages (which contain the current state of the conversation). A common pattern: keep the first 4 messages and the last 36. This preserves both the original context and the recent flow.

Summarization is an alternative. After every N messages, generate a summary and replace the older messages with it. The tradeoff: summaries lose detail. Specific product names, exact prices, and nuanced preferences get flattened into generalizations. For high-stakes commerce conversations, that lost detail can mean the difference between a useful recommendation and a generic one.

Long-Term Memory

Short-term memory dies when the session ends. Long-term memory persists facts across conversations, days, and months. The pattern: after each conversation, extract key facts (preferences, constraints, decisions, context) and store them for future retrieval.

Three storage layers serve different needs:

  • Postgres (or any relational DB) for structured facts: user preferences, product history, account details. Fast lookups by user ID. Easy to update and delete specific records.
  • Redis for hot data that needs sub-millisecond access: recent interactions, session state, feature flags. Expiration policies handle cleanup automatically.
  • Vector databases (Pinecone, pgvector, Qdrant) for semantic search: "what did this user say about their budget?" queries that do not map to exact key lookups. Embed the memory, store the vector, retrieve by similarity.

At inference time, the agent's system prompt gets injected with relevant memories. The user never sees this context, but the agent behaves as if it remembers. The result feels like continuity, because it is.

Brain Groups (Isolation)

Not all agents should share the same memories. A customer chatting on your marketing site should not have those conversations leak into an internal employee tool. A demo environment where salespeople impersonate fictional shoppers must not contaminate real customer profiles.

Brain groups solve this with memory namespaces. Agents within the same group read and write to a shared memory pool. Agents in different groups are completely walled off from each other.

Three groups cover most architectures:

  1. Main: your public-facing agents (site chat, site voice). Shared memory means the voice agent knows what the chat agent discussed.
  2. Demo: sandbox agents for demos and testing. Isolated so demo data never touches real customer memories.
  3. Internal: employee-only agents with access to sensitive internal context. Walled off from both main and demo.

The key principle: agents that serve the same user in the same context should share memory. Agents that serve different purposes or different trust levels must not.

Memory Architecture Patterns

Four patterns dominate, each with distinct tradeoffs:

Append-only log: Every extracted fact gets timestamped and appended. Simple to implement, preserves full history, but retrieval gets slower as the log grows. Works well when you need audit trails or when memory volume is low (under 1,000 facts per user).

Entity-based profiles: Facts get organized into structured profiles, one per user, product, or topic. "Preferred color: navy" overwrites the previous value. Compact and fast to query, but you lose the history of how preferences evolved. Best for agents that need current state, not historical context.

Vector memory: Every memory gets embedded and stored as a vector. Retrieval uses semantic similarity, so the agent finds relevant memories even when the wording differs. Powerful for open-ended conversations, but adds embedding latency on write and search latency on read. Requires tuning the similarity threshold to avoid irrelevant recalls.

Hybrid: Combine structured profiles for known entities (user preferences, account data) with vector memory for unstructured context (conversation insights, expressed frustrations, feature requests). This is where most production systems land. The structured layer handles the 80% of queries that map to known fields. The vector layer catches everything else.

Privacy and Data Lifecycle

Memory creates a data obligation. Every fact you store about a user is a fact you must protect, expose on request, and delete when asked.

User visibility: Users should be able to see what an agent remembers about them. This is not just good practice; regulations require it. Build a "Your Memory" view that surfaces stored facts in plain language, not raw database rows.

Deletion rights: GDPR (Article 17) and CCPA (Section 1798.105) both grant users the right to request deletion of their personal data[3][4]. Your memory system needs a hard delete path, not just a soft delete flag. When a user requests deletion, every memory record, vector embedding, and cached reference must go.

Retention policies: Not all memories should live forever. Set time-based expiration for different memory types. Session context might expire after 30 days. Purchase history might persist for 2 years. Behavioral patterns might expire after 90 days of inactivity.

Automate this with scheduled cleanup jobs rather than relying on manual review.

Anonymous vs authenticated: Anonymous users can still get short-term memory within a session, but long-term memory requires authentication. This is a feature, not a limitation. It gives users a clear value exchange: sign in, and the agent remembers you. Stay anonymous, and every session starts fresh.

The goal is not to collect the most data possible. It is to collect exactly enough data to make the next interaction meaningfully better, and nothing more.

How Site Scanner Helps

Site Scanner evaluates whether your site's personalization infrastructure is accessible to agents. It checks for proper server-side rendering of personalized content, validates that dynamic elements do not rely solely on client-side state, and flags patterns where agent-relevant content is locked behind JavaScript-only personalization layers.

See how your site scores.

Run a free scan at point11.ai to check your Agent Memory and Personalization and 40+ other metrics.

Scan Your Site