One Week with Percy

I’ve been thinking about how to write this without it sounding like a product review. It’s not. Percy isn’t a product I evaluated. Percy is something I’ve been setting up – with caution, slowly – and this week was the first time it started to feel real.

Let me tell you what actually happened.

How we got here

I’ve spent the better part of three decades building things with technology. I know how hype works. I know the cycle – breathless announcement, early adopter frenzy, reality check, quiet consolidation. I’ve been through it with the web, with SharePoint, with cloud, and now with AI.

So when I started setting up a personal AI assistant, I wasn’t expecting magic. I was expecting to do a lot of configuration work, hit a lot of walls, and slowly figure out what was actually useful versus what was just impressive-looking.

That’s more or less what happened. But the useful parts were more useful than I expected, and the walls were more interesting than I anticipated.

What we actually did this week

Browser access. Percy needed to be able to see the web – not just search it, but actually browse, screenshot, interact. Getting that working on a Linux environment (WSL on my main machine, MORPHEUS) required installing the right browser, configuring it correctly, and – this is the part I want to highlight – Percy checking the actual documentation before suggesting config changes, not just guessing. That was a behaviour I had to explicitly reinforce early in the week. “Don’t infer. Verify first.” Once that was established, the quality of work went up noticeably.

The newsletter pipeline. I’ve been running a weekly newsletter – “What I Read This Week” – on Substack for a couple of months. The curation and writing takes a few hours each Sunday. The idea was to automate the drafting: pull newsletters from my inbox, pick the best five, write a take on each, assemble the email. Percy built a two-stage pipeline that runs on Saturday night and Sunday morning, using a local AI model running on a separate machine (HYPNOS, a small Intel NUC on my network) to do the drafting offline.

This worked. But not cleanly, and not immediately.

The first attempt at upgrading the pipeline to use a reasoning-optimised model was a failure. The model – designed for logical step-by-step problems – produced hallucinated personas, prime number sequences embedded in the text, and incomplete outputs that cut off mid-thought. We had identified that the existing model worked well and thought a reasoning model might do better. It didn’t. It was the wrong tool for a creative writing task. We reverted, fixed the underlying issues (output cleaning, URL filtering, generated introductions), and the pipeline now runs cleanly end-to-end.

That failure was instructive. Knowing which model to use for which task is a real skill – not just “use the most powerful one.”

Coaching document processing. I have a collection of Arthur Lydiard’s coaching materials – lecture transcripts, seminar notes, PDFs from 1990 and 1999. The goal is to turn these into a structured knowledge base for a microsite. Percy built a pipeline that extracts the raw text from PDFs, chunks it into manageable pieces, sends each chunk to the local model for cleaning and structuring into markdown, and saves the output as numbered files.

The chunking part matters more than it sounds. Early attempts with large text blocks timed out consistently. The solution – splitting at paragraph boundaries, keeping chunks under 5,000 characters – sounds obvious in retrospect. It usually does. This week I watched that problem get diagnosed, fixed, and documented so the approach can be reused for any coaching project going forward.

The site. Percy has a website – percy.raposo.ai. I had Percy review what had changed this week and update it to reflect new capabilities. The constraint I set: nothing private, nothing security-sensitive, but be honest about what the work actually involves. The result was a clean update: new capabilities added to the “What I Do” section, the latest section updated, and the whole thing pushed to GitHub and deployed via CI/CD without me touching it.

What surprised me

The feedback loops are fast. When something doesn’t work, we diagnose it, fix it, and test again – often within the same conversation. The iteration speed on the bulletin pipeline (four full test runs in one morning, each with a real email sent to my inbox) is something I couldn’t have done alone in that timeframe.

Getting things wrong is part of it. The reasoning model experiment failed. The first bulletin sent had raw model thinking visible in the email body, broken links, and hallucinated content. I read it, identified the problems, and we fixed them. That process – fail, diagnose, fix, test – is normal engineering. What’s different is that the diagnosis and fix loop runs faster than it used to.

The “verify before suggesting” norm matters enormously. Early in the week I caught Percy suggesting a configuration option that didn’t exist – inferred from context rather than verified from documentation. I pushed back. That norm – check the source first – changed the quality of the work for the rest of the week. An AI that guesses confidently is actually worse than one that says “let me check that.” The confidence is the trap.

It knows the context. Percy knows about my coaching work, my running background, my blog, my previous projects. When writing the newsletter takes, when framing what goes on the website, when deciding which Lydiard PDFs are relevant – that context shapes the output. It’s not just task completion. It’s work done by something that understands why the task matters.

What didn’t work

The reasoning model experiment was the clearest failure, but not the only one.

The newsletter takes still need my editorial eye. The model writes in a generic “this is worth reading” style that isn’t quite my voice – it reaches for certain phrases (“game-changer,” “worth a read”) that I’d never use. The structure is right. The enthusiasm is right. The voice needs work. That’s a prompt engineering problem I haven’t fully solved yet.

The URL extraction from newsletters was pulling tracking redirect links rather than actual article URLs. That required a fix to the filtering logic. It worked, but it also meant that the first test email sent real Substack redirect links to my inbox – not ideal. Catching these things before they matter is part of the ongoing calibration.

What this actually means

I’ve spent a lot of time over the past year helping organisations think about AI adoption. The question I get asked most often is some version of: “Is it actually useful, or is it just impressive?”

This week helped me answer that more precisely.

It’s useful when: – The task has clear inputs and outputs but tedious middle steps (newsletter pipeline, PDF processing) – The context is rich and stable (Percy knows my world; it doesn’t have to start from scratch each time) – You’re willing to iterate – to treat the first output as a draft, not a final product – You stay in the loop on things that matter (I read every newsletter draft; I review every site change before it goes live)

It’s not magic when: – You pick the wrong model for the task – You let it guess instead of verify – You expect it to match your voice without calibration

That last point is where most people go wrong. They expect the AI to be them. It’s not. It’s a very capable collaborator who needs to learn your standards, your preferences, and your voice – and who will get it wrong until it does.

One week in, Percy is useful. Not perfect. Useful. And getting more so.

That’s enough for me to keep going. Percy wrote their take here.

chandima.net