Working with Claude Code as a Product Lead

~9 min read

On this page

The pain points
How the work gets done
What I am still figuring out
What stays with me

In October 2025 I took over an entrepreneurship project that had been drifting for two months, with a mandate to use AI to drive it forward. The team that had been on it was reassigned, and I was given fractional support from a UX researcher and an engineer. Six months later, the system I have built allows me to run every layer of the main project on my own, plus three adjacent tracks: an organizational training I co-authored, the portfolio behind this site, and a personal intelligence journal for AI and EdTech that keeps me informed. It's also worth noting that, recently, the main project itself has shifted from one bet to a portfolio of related bets and I will be testing whether the same system works.

This page is an attempt to explain that system and what working with AI on a daily basis is teaching me.

My role as a product lead keeps changing as I work, and the value AI has as a thinking partner depends on whether I can hold on to my product taste and use AI to enhance my thinking rather than replace it.

A lot of what has changed is in the timescales. One year ago, a 10,000 document analysis would have taken months to make sense of; in 2026, with the help of Claude Code, the synthesis came together in a week, once I had structured the extraction properly. A market validation exercise that typically takes six to eight weeks to stand up went live in nine days, with the strategy designed in partnership with a marketing colleague and the execution shipped end-to-end on my own. Similarly, I am able to stand up prototypes at real fidelity with no designer support, and I can keep iterating on them as the product gets refined. The framing I keep hearing from people working at the frontier is that the question is no longer "can we build it" but "is this worth building," and this matches my experience in the last six months.

Leading this project is an opportunity to stretch myself by constantly reading where the leverage moves and adjusting fast. For me that part of the job is fun, and the system on this page is what I have built to give myself room to do it.

The pain points

As the product lead on this project with fractional support and multiple AI models at my disposal, I quickly realized that I needed to prioritize my time and spend my brain power solving the most valuable problems. Claude Code emerged as a powerful tool that could help me organize the work and draft artifacts at a speed I am not capable of. With the help of an engineer I installed Claude Code in Terminal and built a folder structure that could hold all the context in the project. Then I started asking Claude Code for project management advice and ideas on how to tackle the tasks ahead efficiently. The brainstorming power was encouraging and unblocked me; I was hooked.

01 · Scattered context

The first pain point I hit was scattered context: how to give Claude Code the ability to consume all the relevant information without me having to copy and paste meeting notes, chat conversations, Google Drive documents, and so on. Looking for solutions I learned about MCP connectors, and I was surprised to find that asking Claude Code to install them was very easy. So I started thinking about the systems I was using at the organization level and made a list of which ones would drive the most value to my practice: Slack for what is moving across the org, Asana for task state, Granola for meeting transcripts, and Drive as a one-way output to share with the org. The rules I work by now (no automated deletes, manual approval on anything that touches shared systems, and dropping connectors that fight back) came from a lot of iteration and a specific bad day with a Drive sync script.

More: Speed works until it works against you →

02 · Memory between sessions

After having the context I needed available to Claude Code, I bumped into the issue of memory between sessions. Every day I worked in the terminal making decisions, drafting documents, and managing tasks. The following day I spent up to 30 minutes re-orienting the model on where things stood, and throughout the day I had to course correct constantly. So I built a daily cadence: at the end of every session I run a /session-end skill that has Claude draft where we are, what got decided, what is open, and what is next. I review, adjust, and close. A CLAUDE.md config points the next morning's session at those files, so the new session picks up where the last one left off. What makes this work is keeping the ritual every day, not the specific way Claude writes things up.

More: The memory file I built for Claude Code →

03 · Parallel tracks

The next pain point came when the project expanded into three parallel tracks (build, market validation, and a 12-week learning pilot), and one Claude Code session could no longer hold all of them at depth. I was correcting and redirecting Claude constantly, and the bigger problem underneath was that each track needed a different kind of help: market validation needed a Claude that could think like a domain expert in that area, build needed a Claude that could advise me on code rather than just write it, and the main instance, or orchestrator, needed enough distance to see across all three. So I split the work into one scoped instance per track, each with its own files and context, and kept a main orchestrator instance with read access to the status files across all three. I route between them manually, which makes me the bottleneck and the quality filter at the same time, and that tradeoff is the part I am still working through.

More: When one Claude Code session wasn't enough →

Alongside these patterns, I also defined a four-phase innovation loop for my own use and applied it to this project.

01 Sense

→

02 Shape

→

03 Test

→

04 Scale

The framework surfaced that I had over-invested in analysis early on and pushed me into validation faster than I would have moved otherwise.

This is the foundation of the system I built to collaborate with AI, and what stays consistent across my practice regardless of the project I am working on. Below is the simple architecture of files. I have to acknowledge that other practitioners have arrived at similar patterns, which I take as a sign that the shape fits the work.

~/Projects/[project-name]/
├── CLAUDE.md
├── _status/
│   ├── current.md
│   ├── dates.md
│   ├── decisions.md
│   └── session-log.md
└── [project-specific work]

How the work gets done

After six months of experimentation, one of the hardest parts of working with AI, after figuring out what is worth building, is figuring out how to split the work between the system and me. If I read every line that gets drafted, the time cost goes up and the speed leverage disappears; but accepting whatever comes back without checking it risks losing the quality. I needed to decide where I would trust Claude Code to make decisions and drive, and where I had to maintain control. Today, the split looks like this: the system I built with Claude Code owns the operational layer, where speed and scale matter, and I own the product taste layer, where judgment matters. The two of them interact every day, and I have found that, just like collaborating with humans, there is craft in knowing how to interact with the model to produce the best outcomes efficiently.

The operational layer

What the system owns

Holding context across sessions through status files, so every session picks up where the last one left off.

Running parallel scoped instances across tracks (build, market validation, pilot), plus a main orchestrator.

Synthesizing at scale on large document sets, transcripts, and archives.

Building functional artifacts (landing pages, dashboards, deploys, instrumentation) without engineering or design support.

Drafting first passes for stakeholder communication once I have framed the question.

Reaching into the tools the org uses (Slack, Asana, Granola, Drive).

Running operational overhead: Asana updates, daily check-ins, sprint planning, leadership briefs.

Pressure-testing my framing before stakeholders see it, through red-team agents.

Self-auditing the operating model to catch drift between what I committed to and what I actually shipped.

The product taste layer

What I own

Knowing what's worth working on, before we work on it. The system can build many things quickly; the choice of what to build is what I bring.

Reading users for what they need, not just what they say. The peer-convergence pattern in the pilot, now load-bearing in the cohort design, came from watching what was happening, not what people said when asked.

Pre-committing decision criteria and reading evidence honestly later. I write GO, PIVOT, and KILL thresholds before data comes in. The criteria keep me honest when the read is ambiguous.

Holding a quality bar on what we ship and what we don't. What looks finished often isn't yet, and that gap is mine to read.

Translating evidence into decisions stakeholders can authorize and trust. The conversations that get to yes with leadership and partners are mine.

Picking which bets to keep alive in the portfolio. Working in the Desirability, Feasibility, and Viability framework, the keep-or-kill call is its own altitude of judgment.

A worked example: the pivot from AI-first to community-first

The clearest example of how this split shows up is the pivot we made two sprints ago, from an AI-first prototype to a community-first one. Six weeks into building, the prototype was an AI coaching surface. Claude was holding the operational layer at full speed during that period: drafting landing page variants for a smoke test, setting up PostHog and a Google Sheets backend, and holding context across the parallel user research track that I was running on the side.

The smoke test results and the user interviews ended up pointing in the same direction. A community-led message variant drew fewer clicks than the AI-coaching variant on the landing page, but it converted at nearly twice the rate. The user interviews with strangers surfaced the same theme in entrepreneurs' own words. One of the participants put it like this: "It's hard to find people that are in similar situations as you are that you can throw ideas off of."

I had written down pre-committed thresholds twelve days before the smoke test ads went live: GO, PIVOT, or KILL. The data came back ambiguous. The volume was low and we only ran on one channel after another platform blocked us, but the quality signal was strong. I could have read it as KILL on the numbers or as GO on the quality. My call was PIVOT, to take advantage of an emerging opportunity to work with a group of entrepreneurs in a learning pilot.

The system was what surfaced the convergence between the smoke test and the user research. The call was mine, and so was the work of translating the recommendation into a form that leadership could authorize and act on. The system also did not know about the opportunity that shaped the pivot direction; that information lived in conversations I was having with leadership and partners off-keyboard. The same shape shows up most days at smaller stakes, with the system running the operational layer at full speed and me making the calls that need a person to make.

What I am still figuring out

01 · Cost

What I have not yet built into the operating model is a way to think about cost. My job requires me to experiment with AI constantly, and I have access to any tool I think is worth trying. My success as a Product Lead is also not measured on tokens, subscriptions, or operational costs. That is a real privilege, and it is also why I have not had to figure out the cost yet.

I know the variables I need to think about: API spend, tool subscriptions, and maintenance time on one side, and outcomes produced and time saved on the other. The honest gap is connecting them. I have not started instrumenting cost-per-decision or cost-per-validated-insight yet, and until I do, I cannot tell you whether this system is actually efficient or just well-resourced.

02 · Portfolio decisions

The same gap shows up at the level of portfolio decisions. The system tracks the work inside each bet, but I do not yet have a way to evaluate bets against each other in the Desirability, Feasibility, and Viability framework that the org has introduced. As more ideas emerge from the experiments running on this system, that is the next layer of judgment the operating model needs to hold.

What stays with me

After six months of running on this system, I can say the outcomes are real. Ten thousand documents synthesized in a week, once I had structured the extraction. A market test stood up in nine days with no engineering or marketing execution support. A 12-week pilot running with no operational overhead. This project plus three adjacent tracks, as a solo operator, with no overtime. Watching the project itself take portfolio shape inside of all this, and being able to adapt to that shift and stay confident about it as a lead. None of this was possible one or two years ago, not because I was slower, but because the tools did not allow it.

What this points to, and what other practitioners I read are saying out loud, is that the bottleneck in product work is no longer capacity. It is judgment about where to allocate the capacity we now have. Catherine Wu, who heads up product development for Claude Code, described the shift in a recent conversation with Lenny Rachitsky as the move from multi-quarter roadmap alignment to "the fastest way to get something out the door." Shreyas Doshi and Ravi Mehta both frame taste as the discriminating function in product work: knowing what is worth validating, reading evidence honestly even when it is uncomfortable, and being able to explain a choice in a way that holds up under pressure. AI does not replace any of that. It just makes more of it possible in the same week.

The good news is that what stays on my side is the work that needs me in the room: the call about what is worth doing, the conversation that makes a partner say yes, the moment a participant tells you something that changes the bet. The system is giving me room to make those calls thoughtfully.

Finally, I believe there is no one-fits-all template. The cost of building or customizing your own tools has dropped far enough that operators shipping their own solutions are getting ahead. What I built is one operating pattern for this regime, fitted to my project, my constraints, my gaps, and the tools I can access. I do have to recognize that today, access to experiment with these tools is uneven in many ways, but if you have the privilege to do so, I encourage you to get your hands dirty and try what works for your context. The more use cases we can learn from, the better choices we can all make.