AI agents and memory: the missing piece for RevOps

We’ve been quietly building something at Bunny that we’re pretty excited about. An AI agent — a RevOps assistant that lives inside your billing and subscription data and can do more than just answer questions.

Not just “what’s the MRR this month?” or “when does Acme Corp’s subscription renew?” — but actually do things. Send a quote. Upgrade a subscription. Apply a promotional price. Run a 5% inflationary increase across your entire customer base. The kinds of tasks that a RevOps manager handles every day but that currently require someone to log in, find the right records, and execute them manually.

We’ve been testing it internally and the results have been genuinely surprising. The capability bar is higher than we expected. What we keep running into, though, isn’t a question of what the agent can do. It’s a question of what it can remember.

The 5% problem

Here’s a scenario that came up almost immediately in our testing.

You ask the agent to apply a 5% inflationary increase to all active subscriptions. It checks the accounts, calculates the new prices, generates the amendments, and sends out the updated invoices. Task complete.

Two weeks later, a colleague — who wasn’t in the original conversation — opens up the agent and asks it to do the same thing. “Can you run a 5% inflationary increase on all accounts?”

Without memory, the agent has no idea this already happened. It will happily start the process again.

This isn’t a hypothetical edge case. In a real RevOps workflow, this kind of double-execution could mean customers getting double-billed, contracts being amended twice, or notices going out that contradict each other. For an agent with real write access to your billing system, missing context isn’t just inconvenient — it’s a trust problem.

What we actually want is for the agent to say: “Joe ran a 5% inflationary increase on the 24th of March and it completed across all 47 accounts. Are you looking to run another one, or is this a follow-up on that?”

That requires memory.

How we’re currently handling it

Right now, our agent loads context from structured markdown files — a set of guides, skill definitions, and action templates that tell it how to perform specific tasks. It works well for capability: the agent knows how to upgrade a subscription or generate a quote because those workflows are documented and retrievable.

But these files are static. They describe how to do things, not what has been done. And as you add more of them — more skills, more action types, more context about your particular setup — the total token count grows quickly. You can’t just dump everything into the context window and hope for the best. You’d hit limits before long, and even if you didn’t, you’d be paying for a lot of irrelevant context on every call.

The history of what happened — who asked for what, when it ran, what the outcome was — lives nowhere that the agent can access.

Enter MemPalace

A project that surfaced this week called MemPalace is taking a thoughtful swing at exactly this problem — and it comes from an unexpected place. The project was conceived by actress Milla Jovovich (The Fifth Element, Resident Evil) while she was working on a gaming project and kept running into frustrations with how AI systems lose context between sessions. She designed the architecture and brought in developer Ben Sigman to build it. The result is open-source, runs locally, and has put up some impressive benchmark numbers — 96.6% on LongMemEval, which measures how accurately a system can retrieve relevant memories from a large pool of stored conversations.

Milla Jovovich

The interesting design choice is that it doesn’t rely on LLM-based summarization to compress memories. Most naive approaches to AI memory work by asking a model to summarize conversations and store the summaries. The problem is that summarization loses things — not just details, but the specifics that matter most when something goes wrong or when a decision needs to be revisited.

MemPalace stores conversations verbatim. Complete. No compression at the storage layer. Then it retrieves them semantically using ChromaDB, so when you ask “did we already run a price increase this quarter?” it’s searching actual conversation records, not a lossy summary of them.

The memory palace metaphor

The project is named after the ancient mnemonic technique — the method of loci — and the architecture maps to it in a way that’s actually useful rather than just decorative.

Memory is organized into:

Wings: people or projects — so your Sales team’s interactions live separately from your Finance team’s
Rooms: topic areas within a wing (renewals, pricing changes, new subscriptions)
Halls: memory types that exist consistently across wings — facts, events, discoveries, decisions
Tunnels: cross-references that connect related memories across different wings

For a RevOps context, this maps surprisingly well. A “renewal” room in the Sales wing would hold the history of all the renewal-related actions and conversations from that team. A “price increase” event in the Halls would record when it happened, who triggered it, and what the outcome was — and be queryable from any context.

The knowledge graph layer underneath it stores temporal relationships: who worked on what, when decisions were made, and crucially, whether a given fact is still valid. That last part matters for billing. “We applied a 5% increase in March” is a fact with a validity window. It happened. It shouldn’t happen again unless someone explicitly decides otherwise.

What this means for our agent

We’re not planning to just drop MemPalace into our stack tomorrow — but the design thinking here is directly relevant to where we’re going.

The pattern we’re moving toward is something like:

Skill files for how the agent does things (already working well)
Action logs for what the agent has done — timestamped, structured, queryable
A retrieval layer that surfaces relevant history before the agent takes any significant action

That third piece is what MemPalace is solving at the general level. Before the agent upgrades a subscription, it should check whether a similar upgrade was recently processed. Before it sends a quote, it should know whether a quote was already sent last week and what happened to it. Before it runs a bulk operation, it should flag if that operation has a recent history.

The wake-up context idea in MemPalace is also interesting for us — the idea that an agent can be initialized with a small, dense summary (~170 tokens) of the most critical recent facts, before it starts reaching for full conversation history. That’s a practical nod to the token cost reality: you can’t load everything, but you can load a smart summary and then fetch specifics on demand.

Memory is what makes agents trustworthy

The thing that makes an AI agent genuinely useful in a RevOps context isn’t raw capability. Generating quotes, creating subscriptions, running bulk operations — those are solvable engineering problems and we’ve largely solved them.

What makes an agent trustworthy — the kind of thing a RevOps manager will actually rely on rather than just demo — is knowing that it understands its own history. That it won’t repeat a task that’s already been done. That it can explain why something happened. That if you ask it “did we send notices about the Q1 price change?” it can tell you yes, when, to which accounts, and what the response rate was.

That’s a memory problem. And it’s the right problem to be thinking hard about right now, before agents with real write access to billing systems become the norm rather than the experiment.

MemPalace is a good sign that the open-source community is thinking about it seriously. We’ll be paying close attention.