How to replace yourself with AI? (part one)
I'm on a crazy quest to replicate myself with context engineering, RAG and vector embeddings. Building the concept and the architecture behind the hypothesis.
Many people now flock to AI for mentorship.
Let’s be honest, when it comes to professional advice - most LLMs suck.
They are decent at handling isolated tasks, but when a decision requires a broader context, they will hallucinate a response by making guesses.
Those guesses might appear deceptively appropriate, but contain hidden logical inconsistencies.
It happens, because, well, LLMs do not know your full context.
They don’t know how you make decisions, don’t know your team, your colleagues, cultural values, business specifics etc. Simply put, there’s so much of the context that is not digitized.
This has been the biggest barrier to agentic adoption, at least in our experience at BirMarket. As much as we’ve tried to AI-fy certain roles (e.g. analytical part of the category management), we’ve ended up with surface-level insights. If you feed surface-level data as your context, that’s what you’ll get.
Then, I’ve asked myself a crazy question - what if I were to digitize as much of my personal context as possible.
Track all my goals, my actions, my calendar, written decisions and even transcribed discussions that I’ve participated in.
What If I were to continuously update and store all the context that I physically can. Then, create an index and query this database as a RAG. Think of it as a “search engine of me” for LLMs.
How would talking to an LLM change? Will it still hallucinate things OR will I be able to rely on it as my mentor? Would it be able to make decisions that I can personally rely on?
In the extreme, would it be able to make decisions that I would have taken myself?
So I’ve decided to figure it out.
🧬 Today’s Free Article
Mapping your personal context. How to digitize the CPO’s footprint.
Types of LLM hallucinations. Factual, temporal, context deviation and other hallucinations that we need to mitigate.
Building your AI self. Full architecture, RAG, data storage approach and process.

Mapping your context
These are the things I do regularly, grouped on a MECE basis.
Hire and develop talent; define annual strategy; empower execution; monitor KPIs and course-correct; present monthly updates to the board.
That’s pretty much the essence of a CPO. Create the context in which a product team can thrive to deliver their best results. Oh, almost forgot, I solve conflicts between teams.
At a high level, that’s about 95% of my role. The next step is to digitize this context end-to-end and store it in a way that reduces the risk of AI hallucinations.
That means that every single task should have a digital footprint. A performance review needs to have a doc, a meeting needs to have a transcription etc.
There’s a caveat though. Much of my decision making draws on global context (my education and experience) and it cannot be fully captured with just the local context (current scope of work).
You can’t do a product review and replicate my approach, if you haven’t went through the same experiences I’ve been in.
But what if I’m wrong? What if a strong foundation LLM can outpace my experience and education and make better decisions.
The other question is how to “evaluate better”, but that’s a topic for a different article.
The types of LLM hallucinations
If we were to collect such a varied context, we need to ensure we’re storing it accurately. Accurately meaning that it has all the relevant meta-data.
In order to figure this out, we need to understand what types of hallucinations are the most prevalent.
There is a decent study on hallucination taxonomy in LLM that I decided to refer to.
So a model can make factual errors, come up with made-up facts, diverge from the context, make time-sensitive errors, add false ethical remarks or even dwell into nonsensical dabbling.
The authors come up with a strong claim. Any computable LLM will hallucinate and eradication of this behavior is impossible. There are only reduction strategies.
Just reading this makes me question the sanity of my initial hypothesis.
On the other hand, there are architectural guardrails that can help substantially reduce this behavior. Among the top ones, the authors list Retrieval grounding or simply RAG.
So Intrinsic and Extrinsic hallucinations can be reduced with the correct provision of context. Temporal errors can be dialed down with timestamp metadata (each of my records needs to have a timestamp).
Other logical and contextual errors can only be solved by a fact-checker agent.
Building your AI-self
Architecture
This is not something groundbreaking or new. This is a classic vector search index approach with a RAG twist and unconventional context.
For the simplicity sake, I’ll store all of my written context in a single google sheet dump.
This dump then would be split into text chunks and converted into vector embeddings. There are many ways this can be executed in n8n (RAG template, Simple Vector Store, or Supabase nodes).
Those vector embeddings would be reindexed every single time a new datapoint is injected. For the index itself, I’ll be using a Supabase db storage.
Then, if a query to the LLM is made, the vector retriever looks into semantically relevant snippets in the search index and outputs 5-8 entries. Those entries are then shared to LLM, with an instruction to “strictly follow them”.
This slicing of snippets helps to limit the token use by pinpointing only the most relevant context entries. Put simply, you don’t want to bombard your LLM with the whole context dump; it will just choke and hallucinate at super high cost.
If I were to wrap all of this process into a metaphor, I’d think of a librarian. My work context is a library. When a guest comes over, asking for books, the librarian goes into the database and searches for snippets containing the names of the authors and books.
Storing the context
Google-sheet dump
I’ve prepared a google sheet with five lists inside, each representing an area of my work:
Hiring and Headcount. Here I’ll store all my interview transcriptions and written assessments.
People Management. A place for all my 1-1 meeting transcriptions, written agenda, meeting notes, performance review assessments of each of the direct reports. I’ll also add my direct reports CVs offloaded from the Linkedin.
Planning and Strategy. I’ll share transcriptions of each OKR review meeting alongside my personal written weekly goals.
Operating Run Tasks and Meetings. Transcriptions of all the run meetings that are being held, all the written notes/follow-ups after the meetings.
Governance and Stakeholders. Transcription of the monthly performance review meetings alongside the written notes.
Each of the lists will contain eight columns:
id, type of the document, meeting type, topic, project name, people involved, raw data, timestamp
Where:
id is an incrementally increasing number,
type of the document refers to “meeting, notes, assessment etc”
meeting type “performance review, 1-1 etc, weekly performance review etc”
topic “the actual topic of discussion”
project name - name of the project obviously (if there’s one)
people involved - a list of names who were involved in the meeting or in the document
raw data - written dumb of the transcript (a single Google sheets row can contain up to 50k symbols, enough for a 1 hour meeting)
timestamp - when the meeting has happened or the data was shared.
Data sensitivity
I’m cautious and will:
(1) avoid sharing any sensitive private data;
(2) will test this on a corporate instance of ChatGPT that we have access to internally.
Recording the context
To record the online meetings I’ll be using Teams Transcribe function.
For the offline meetings I have pre-installed the Whisper app that converts speech to text.
I’ll inform all the meeting participants that the conversation is going to be transcribed.
Appending the context
On a weekly basis, I’ll dump all of the newly recorded meetings, notes inside the context storage.
Next Steps
For the context to be meaningful, I’ll need to dedicate the whole month to data collection.
Meanwhile, I’ll be working out the next steps in the technical architecture.
Figuring out how to best convert the context into embeddings, how to chain the entire workflow in n8n, or maybe even plug in an agent with memory that links directly to my spreadsheet.
Last step in the process would be to create evals: I’ll create multiple outputs (one is mine, another one is LLM based) and ask my team to assess the quality of each (on a set of criterias) and guess which one belongs to whom.