Large language models have become remarkably fluent. They write, summarize, code, plan, and converse with a confidence that would have been science fiction five years ago. And yet, faced with the same prompt, they give every user the same answer — calibrated against the median of a million training conversations.
That's a strange thing to optimize for. People are not medians. The questions we bring to an AI are shaped by who we are: what we know, how we think, what we're trying to do, who we are trying to become. A model that flattens those differences flattens its usefulness with them.
What "personalization" actually means.
In recommender systems, personalization means surfacing the right item from a fixed catalog. In ad-tech, it means predicting click-through against a user segment. Both are useful, both are well-studied, and neither captures what generative models can now do.
For us, personalization means generating the content itself, conditioned on the person in front of it. Not retrieval. Not selection. The actual words, characters, pacing and structure — produced fresh, for one viewer at a time.
Personalization is the difference between a model that can answer and a model that answers you.
That's a much harder problem. It requires representing users in a way models can act on; learning continuously from their actual interactions; and evaluating success against signals more nuanced than accuracy.
What we're working on.
Six problems sit at the centre of the lab.
- Representation. How do you compress a person — their preferences, their style, their intent — into a form a language model can condition on? We study both natural-language paths (search, retrieval, dynamic prompting) and embedding paths (persona vectors, cross-attention conditioning) as complementary mechanisms.
- Learning from interaction. Every conversation is a signal. We develop methods for dense, per-turn reward estimation — predicting user behaviour alongside generation, then comparing prediction to reality.
- Model-centric architecture. In most systems, the model generates text at the end of a pipeline it doesn’t control — memory, personalization, content selection, and interface are all orchestrated by code, with the model as one component among many. We put the model at the center. It controls what it remembers, how it probes and learns about a user, what content to surface, and how the product behaves. When these capabilities live inside a single model rather than separate modules, they compound — and the system becomes end-to-end optimizable in ways that pipeline architectures cannot be.
- Evaluation at scale. Offline benchmarks routinely overestimate online impact by an order of magnitude. We build replay-based benchmarks and causal measurement infrastructure to close that gap.
- Information density over long horizons. A single 200-token reply is effectively solved. The unsolved problem is what happens over time. Today’s models contribute no new information of their own — they recombine what creators and users provide until the well runs dry. Conversations get more repetitive, more predictable, less interesting. This is entropy depletion, not capability failure. We study how to make generative systems that sustain information density across sessions measured in hours — borrowing from how novelists hold readers across thousands of chapters, how showrunners structure season-long arcs, how game designers keep players exploring after the hundredth hour. The answer is never “be more capable.” It’s always about information injection at the right pace. And what counts as “new” depends entirely on who’s reading — which is where personalization and long-horizon generation meet.
- Self-evolving systems. Millions of users interact with our product every day — each conversation a signal, each piece of creator content a source of new information. We build systems that treat this collective activity as a living knowledge base: extracting patterns from how users engage, what creators invent, what succeeds and fails — and feeding it back into the model as a continuous loop, not static training data.
Why this matters.
We believe the next generation of content won’t be selected from a catalog — it will be generated for you, in real time, shaped by who you are. Entertainment, companionship, education, creative tools — all of it becomes fundamentally better when the system knows its user. Not as a segment. As a person.
That’s what we’re building toward: AI-native content that feels crafted for each individual, at the scale of millions, sustained over time. Every research direction in this lab exists to make that real — and we validate it against a live product serving millions of users every day.
If any of this is the thing you can't stop thinking about — come work with us.
