Public AI Agents, Agent Memory and the Return of the Shop Floor

Tobi Lütke’s River argument shows why public AI agent work matters for product teams: visible reasoning, shared memory, faster judgement, and fewer private execution bubbles.

Tobi Lütke’s recent article on Shopify’s AI agent, River, puts language around something strong product environments have always understood: people learn faster when the work is visible.

River lives inside Shopify’s Slack. It can read code, run tests, open pull requests, query data and inspect production traces. The important design choice is not the tool list. River works in public channels.

That means prompts, corrections, reasoning paths and outputs sit where other people can search them, challenge them, reuse them and learn from them.

Lütke connects this to the German idea of a Lehrwerkstatt: a teaching workshop where the shop floor becomes the classroom. That framing matters for any product team trying to work with AI agents without losing the judgement layer that makes teams better over time.

AI can make individuals faster while making organisations worse at learning.

TL;DR

Public AI agent work gives product teams a visible record of how judgement forms before decisions become process.

Private agent threads make one operator faster, but they hide the reasoning, corrections, dead ends and context that help a company learn. Public agent work creates a shared surface for review, reusable memory and sharper decision quality.

For product leaders, the operating question is practical: which investigations, definitions, review rules and repeated workflows should become observable enough for people and agents to learn from them?

A memo gets shaped in ChatGPT. A PRD gets iterated in Claude or Cursor. A campaign analysis happens in a private thread. A personal agent synthesises research, feedback and recurring workflows with privately developed skills. The finished artefact re-enters the company later, but the thinking that produced it has disappeared.

Outputs survive. Judgement goes missing.

Strong teams expose judgement before it becomes process

The best environments I have worked in taught through exposure.

You heard the sales objection that never made it into the CRM notes. You saw an engineer reject a small reporting fix because the actual issue was a broken event definition. You watched product, growth, design and engineering argue over the same drop-off until the team found the constraint none of them could see alone.

That kind of learning does not transfer cleanly through onboarding docs.

It happens because the work is close enough for people to watch each other reason before the answer exists. A product manager sees how a senior engineer narrows an incident. A growth operator sees which product claims cannot survive contact with activation data. A founder hears the customer phrase that makes the current positioning feel overbuilt.

This is one reason healthy start-up environments can develop judgement quickly.

The work is harder to hide. People cross job descriptions because the problem demands it. Customer pain, commercial pressure and technical constraint collide before they get sanitised into process.

That collision develops taste.

It also explains why public channels are not just a Slack preference. They are an operating choice. Public work creates a record of how decisions formed, which evidence mattered, and where a team changed its mind.

Speed comes from low-latency correction

I used to routinely reflect on patrickcollison.com/fast, and it has come back to me because it points at the same underlying force from another angle.

Fast projects move when functions blend.

Amazon Prime and the iPod moved quickly because product judgement, technical constraint, operational reality, commercial pressure and executive taste were held in a tight loop. The lesson is not that every team should copy those exact examples. The useful pattern is proximity.

The person who sees the problem, the person who understands the constraint, the person who can change the system, and the person who can say yes need to correct each other while the decision is still alive.

Slow organisations add translation layers between those people.

A product team writes the brief. Engineering finds the hidden constraint later. Growth discovers the activation issue after launch. Sales repeats an objection for weeks before it becomes product evidence. Leadership reviews the polished version of the problem once the useful mess has already been removed.

AI agents can compress those loops or create a new kind of private sprawl.

The difference depends on where the work happens and what record it leaves behind.

River points to a better AI default

River working in public accelerates organisational learning because it makes agent work observable.

An agent investigating activation should leave behind more than a chart. The valuable residue is the event trail, the disputed definition, the killed hypotheses, and the instruction saved for next time.

An agent reviewing creative should leave behind more than a list of hooks. The valuable residue is the rejected angles, the winning pattern, the evaluation rule, and the examples that now define good for that channel.

An agent helping during an incident should leave behind more than a patch. The valuable residue is how the search narrowed, which traces mattered, which theory failed, and why the final fix was considered safe.

Product managers should pay attention to this because agent adoption is not just a tooling question. It changes how product context, evidence and decision quality move through the company.

If the agent thread is private, only the operator learns.

When the agent thread is public, the team gains a reusable learning surface.

Memory becomes the operating layer

This is where agent memory starts to matter.

A public thread becomes episodic memory: what happened, what was tried, what failed, and what changed after the team saw the evidence.

Definitions become semantic memory: what activation means, which events are trusted, how the company defines a qualified lead, which creative claims are allowed, what production-ready means for a specific system.

Saved instructions become procedural memory: how the team investigates, ships, reviews, escalates and decides.

That is more useful than treating memory as a convenience feature for remembering user preferences. In product teams, memory becomes part of the operating system. It reduces repeated explanation, protects hard-won context, and lets agents inherit the company’s way of working instead of starting from a blank prompt every time.

I wrote more about this in Agents Are Killing the Handoff PM: product work is moving towards context quality, decision rights, evidence loops, reusable skills and the review of agent output. Public agent work makes that shift easier to manage because the context is not trapped in private conversations.

The practical question for product leaders is simple: which parts of the team’s judgement should become reusable memory?

Good candidates include metric definitions, research synthesis rules, product principles, escalation paths, QA standards, creative evaluation rules, launch checklists, customer-segment notes, known non-goals and examples of work the team trusts.

Weak candidates include vague style preferences, overfitted prompts, old decisions without context, and private shortcuts that nobody has reviewed.

Agent memory needs ownership. Otherwise the company turns scattered conversations into stale context and calls it infrastructure.

Product teams need visible agent work

Product managers do not need to copy Shopify’s exact River implementation to apply the lesson.

The useful move is to stop treating AI work as a private drafting environment by default.

A product channel can keep activation investigations in view. A growth channel can review agent-generated creative analysis publicly. An engineering channel can let incident investigation show its reasoning path. A leadership channel can use agents to summarise trade-offs while preserving which assumptions changed.

The team then builds a working record of its judgement.

That record helps new hires learn faster. It gives junior people examples of senior reasoning. It makes repeated workflows easier to codify. It exposes weak definitions before they spread. It gives agents better context for the next task.

The company also gets a sharper review loop.

A private agent answer can look impressive while hiding bad assumptions. A public agent thread invites correction earlier. Someone can challenge the data source, add customer context, spot a missing constraint, or turn a good workflow into a reusable skill.

This is the real value of working in public with agents.

The company does not only get more output. It gets a better trail.

The product manager’s role gets more serious

Public AI agent work raises the bar for product managers.

The job is no longer protected by owning the ticket queue or translating between functions. Agents can already draft, search, cluster, summarise, prototype and inspect. The durable work is deciding what context matters, which evidence deserves belief, which outputs are safe, and which repeated judgement should become part of the system.

That requires product managers to work closer to the operating layer.

They need to define metric meanings before agents query the data. They need to write product principles clearly enough for humans and machines to use. They need to decide when a workflow deserves automation and when it needs senior review. They need to turn strong examples into reusable instructions without flattening judgement into formula.

AI makes weak product judgement more expensive because weak judgement can now travel faster.

A private assistant makes one person quicker. A public agent workspace can make the company smarter, provided the team treats the thread as a record worth maintaining.

That is the lesson I take from River.

Make the work observable. Let people learn from the reasoning. Give the agent enough context to inherit the company’s taste.

Public AI Agents and the Return of the Shop Floor