hngopher.com/1/live/items/47106686

       [HN Gopher] How I use Claude Code: Separation of planning and ex...
       ___________________________________________________________________
        
       How I use Claude Code: Separation of planning and execution
        
       Author : vinhnx
       Score  : 741 points
       Date   : 2026-02-22 00:29 UTC (15 hours ago)
        
  HTML web link (boristane.com)
  TEXT w3m dump (boristane.com)
        
       | zitrusfrucht wrote:
       | I do something very similar, also with Claude and Codex, because
       | the workflow is controlled by me, not by the tool. But instead of
       | plan.md I use a ticket system basically like
       | ticket_<number>_<slug>.md where I let the agent create the ticket
       | from a chat, correct and annotate it afterwards and send it back,
       | sometimes to a new agent instance. This workflow helps me keeping
       | track of what has been done over time in the projects I work on.
       | Also this approach does not need any ,,real" ticket system
       | tooling/mcp/skill/whatever since it works purely on text files.
        
         | gbnwl wrote:
         | +1 to creating tickets by simply asking the agent to. It's
         | worked great and larger tasks can be broken down into smaller
         | subtasks that could reasonably be completed in a single context
         | window, so you rarely every have to deal with compaction.
         | Especially in the last few months since Claude's gotten good at
         | dispatching agents to handle tasks if you ask it to, I can plan
         | large changes that span multilpe tickets and tell claude to
         | dispatch agents as needed to handle them (which it will do in
         | parallel if they mostly touch different files), keeping the
         | main chat relatively clean for orchestration and validation
         | work.
        
         | ramoz wrote:
         | semantic plan name is important
        
       | srid wrote:
       | Regarding inline notes, I use a specific format in the `/plan`
       | command, by using th `ME:` prefix.
       | 
       | https://github.com/srid/AI/blob/master/commands/plan.md#2-pl...
       | 
       | It works very similar to Antigravity's plan document comment-
       | refine cycle.
       | 
       | https://antigravity.google/docs/implementation-plan
        
       | renewiltord wrote:
       | The plan document and todo are an artifact of context size
       | limits. I use them too because it allows using /reset and then
       | continuing.
        
       | ihsw wrote:
       | Kiro's spec-based development looks identical.
       | 
       | https://kiro.dev/docs/specs/
       | 
       | It looks verbose but it defines the requirements based on your
       | input, and when you approve it then it defines a design, and
       | (again) when you approve it then it defines an implementation
       | plan (a series of tasks.)
        
       | jamesmcq wrote:
       | This all looks fine for someone who can't code, but for anyone
       | with even a moderate amount of experience as a developer all this
       | planning and checking and prompting and orchestrating is far more
       | work than just writing the code yourself.
       | 
       | There's no winner for "least amount of code written regardless of
       | productivity outcomes.", except for maybe Anthropic's bank
       | account.
        
         | dmix wrote:
         | Most of these AI coding articles seem to be about greenfield
         | development.
         | 
         | That said, if you're on a serious team writing professional
         | software there is still tons of value in always telling AI to
         | plan first, unless it's a small quick task. This post just
         | takes it a few steps further and formalizes it.
         | 
         | I find Cursor works much more reliably using plan mode,
         | reviewing/revising output in markdown, then pressing build.
         | Which isn't a ton of overhead but often leads to lots of
         | context switching as it definitely adds more time.
        
         | shepherdjerred wrote:
         | I really don't understand why there are so many comments like
         | this.
         | 
         | Yesterday I had Claude write an audit logging feature to track
         | all changes made to entities in my app. Yeah you get this for
         | free with many frameworks, but my company's custom setup
         | doesn't have it.
         | 
         | It took maybe 5-10 minutes of wall-time to come up with a good
         | plan, and then ~20-30 min for Claude implement, test, etc.
         | 
         | That would've taken me at least a day, maybe two. I had 4-5
         | other tasks going on in other tabs while I waited the 20-30 min
         | for Claude to generate the feature.
         | 
         | After Claude generated, I needed to manually test that it
         | worked, and it did. I then needed to review the code before
         | making a PR. In all, maybe 30-45 minutes of my actual time to
         | add a small feature.
         | 
         | All I can really say is... are you sure you're using it right?
         | Have you _really_ invested time into learning how to use AI
         | tools?
        
           | tyleo wrote:
           | Same here. I did bounce off these tools a year ago. They just
           | didn't work for me 60% of the time. I learned a bit in that
           | initial experience though and walked away with some tasks
           | ChatGPT could replace in my workflow. Mainly replacing
           | scripts and reviewing single files or functions.
           | 
           | Fast forward to today and I tried the tools again--
           | specifically Claude Code--about a week ago. I'm blown away.
           | I've reproduced some tools that took me weeks at full-time
           | roles in a single day. This is while reviewing every line of
           | code. The output is more or less what I'd be writing as a
           | principal engineer.
        
             | delusional wrote:
             | > The output is more or less what I'd be writing as a
             | principal engineer.
             | 
             | I certainly hope this is not true, because then you're not
             | competent for that role. Claude Code writes an absolutely
             | incredible amount of unecessary and superfluous comments,
             | it's makes asinine mistakes like forgetting to update logic
             | in multiple places. It'll gladly drop the entire database
             | when changing column formats, just as an example.
        
               | tyleo wrote:
               | I'm not sure what you're doing or if you've tried the
               | tools recently but this isn't even close to my
               | experience.
        
           | streetfighter64 wrote:
           | I mean, all I can really say is... if writing some logging
           | takes you one or two days, are you sure you _really_ know how
           | to code?
        
             | shepherdjerred wrote:
             | You're right, you're better than me!
             | 
             | You could've been curious and ask why it would take 1-2
             | days, and I would've happily told you.
        
               | jamesmcq wrote:
               | I'll bite, because it does seem like something that
               | should be quick in a well-architected codebase. What was
               | the situation? Was there something in this codebase that
               | was especially suited to AI-development? Large amounts of
               | duplication perhaps?
        
               | shepherdjerred wrote:
               | It's not particularly interesting.
               | 
               | I wanted to add audit logging for all endpoints we call,
               | all places we call the DB, etc. across areas I haven't
               | touched before. It would have taken me a while to track
               | down all of the touchpoints.
               | 
               | Granted, I am not 100% certain that Claude didn't miss
               | anything. I feel fairly confident that it is correct
               | given that I had it research upfront, had multiple agents
               | review, and it made the correct changes in the areas that
               | I knew.
               | 
               | Also I'm realizing I didn't mention it included an API +
               | UI for viewing events w/ pretty deltas
        
             | fragmede wrote:
             | We're not as good at coding as _you_ , naturally.
        
             | boxedemp wrote:
             | Ever worked on a distributed system with hundreds of
             | millions of customers and seemingly endless business
             | requirements?
             | 
             | Some things are complex.
        
             | fendy3002 wrote:
             | Well someone who says logging is easy never knows the
             | difficulty of deciding "what" to log. And audit log is
             | different beast altogether than normal logging
        
             | therealdrag0 wrote:
             | Audit logging is different than developer logging...
             | companies will have entire teams dedicated to audit
             | systems.
        
           | jamesmcq wrote:
           | Trust me I'm very impressed at the progress AI has made, and
           | maybe we'll get to the point where everything is 100% correct
           | all the time and better than any human could write. I'm
           | skeptical we can get there with the LLM approach though.
           | 
           | The problem is LLMs are great at simple implementation, even
           | large amounts of simple implementation, but I've never seen
           | it develop something more than trivial correctly. The larger
           | problem is it's very often subtly but hugely wrong. It makes
           | bad architecture decisions, it breaks things in pursuit of
           | fixing or implementing other things. You can tell it has no
           | concept of the "right" way to implement something. It very
           | obviously lacks the "senior developer insight".
           | 
           | Maybe you can resolve some of these with large amounts of
           | planning or specs, but that's the point of my original
           | comment - at what point is it easier/faster/better to just
           | write the code yourself? You don't get a prize for writing
           | the least amount of code when you're just writing specs
           | instead.
        
             | nojito wrote:
             | >I've never seen it develop something more than trivial
             | correctly.
             | 
             | This is 100% incorrect, but the real issue is that the
             | people who are using these llms for non-trivial work tend
             | to be extremely secretive about it.
             | 
             | For example, I view my use of LLMs to be a competitive
             | advantage and I will hold on to this for as long as
             | possible.
        
               | jamesmcq wrote:
               | The key part of my comment is "correctly".
               | 
               | Does it write maintainable code? Does it write extensible
               | code? Does it write secure code? Does it write performant
               | code?
               | 
               | My experience has been it failing most of these. The code
               | might "work", but it's not _good_ for anything more than
               | trivial, well defined functions (that probably appeared
               | in it 's training data written by humans). LLMs have a
               | fundamental lack of understanding of what they're doing,
               | and it's obvious when you look at the finer points of the
               | outcomes.
               | 
               | That said, I'm sure you could write detailed enough specs
               | and provide enough examples to resolve these issues, but
               | that's the point of my original comment - if you're just
               | writing specs instead of code you're not gaining
               | anything.
        
               | jmathai wrote:
               | You'd be building blocks which compound over time. That's
               | been my experience anyway.
               | 
               | The compounding is much greater than my brain can do on
               | its own.
        
               | cowlby wrote:
               | I find "maintainable code" the hardest bias to let go of.
               | 15+ years of coding and design patterns are hard to let
               | go.
               | 
               | But the aha moment for me was what's maintainable by AI
               | vs by me by hand are on different realms. So maintainable
               | has to evolve from good human design patterns to good AI
               | patterns.
               | 
               | Specs are worth it IMO. Not because if I can spec, I
               | could've coded anyway. But because I gain all the insight
               | and capabilities of AI, while minimizing the gotchas and
               | edge failures.
        
               | girvo wrote:
               | > But the aha moment for me was what's maintainable by AI
               | vs by me by hand are on different realms. So maintainable
               | has to evolve from good human design patterns to good AI
               | patterns.
               | 
               | How do you square that with the idea that all the code
               | still has to be reviewed by humans? Yourself, and your
               | coworkers
        
               | cowlby wrote:
               | I picture like semi conductors; the 5nm process is so
               | absurdly complex that operators can't just peek into the
               | system easily. I imagine I'm just so used to hand
               | crafting code that I can't imagine not being able to peek
               | in.
               | 
               | So maybe it's that we won't be reviewing by hand anymore?
               | I.e. it's LLMs all the way down. Trying to embrace that
               | style of development lately as unnatural as it feels.
               | We're obv not 100% there yet but Claude Opus is a
               | significant step in that direction and they keep getting
               | better and better.
        
               | girvo wrote:
               | Then who is responsible when (not if) that code does
               | horrible things? We have humans to blame right now. I
               | just don't see it happening personally because liability
               | and responsibility are too important
        
               | therealdrag0 wrote:
               | For some software, sure but not most.
               | 
               | And you don't blame humans anyways lol. Everywhere I've
               | worked has had "blameless" postmortems. You don't remove
               | human review unless you have reasonable alternatives like
               | high test coverage and other automated reviews.
        
               | girvo wrote:
               | We still have performance reviews and are fired. There's
               | a human that is responsible.
               | 
               | "It's AI all the way down" is either nonsense on its
               | face, or the industry is dead already.
        
               | Jweb_Guru wrote:
               | > But the aha moment for me was what's maintainable by AI
               | vs by me by hand are on different realms
               | 
               | I don't find that LLMs are any more likely than humans to
               | remember to update all of the places it wrote redundant
               | functions. Generally far less likely, actually. So
               | forgive me for treating this claim with a massive grain
               | of salt.
        
               | reg_dunlop wrote:
               | To answer all of your questions:
               | 
               | yes, if I steer it properly.
               | 
               | It's very good at spotting design patterns, and
               | implementing them. It doesn't always know where or how to
               | implement them, but that's my job.
               | 
               | The specs and syntactic sugar are just nice quality of
               | life benefits.
        
             | fourthark wrote:
             | This is exactly what the article is about. The tradeoff is
             | that you have to throughly review the plans and iterate on
             | them, which is tiring. But the LLM will write good code
             | faster than you, if you tell it what good code is.
        
               | reg_dunlop wrote:
               | Exactly; the original commenter seems determined to
               | write-off AI as "just not as good as me".
               | 
               | The original article is, to me, seemingly not that novel.
               | Not because it's a trite example, but because I've begun
               | to experience massive gains from following the same basic
               | premise as the article. And I can't believe there's
               | others who aren't using like this.
               | 
               | I iterate the plan until it's seemingly deterministic,
               | then I strip the plan of implementation, and re-write it
               | following a TDD approach. Then I read all specs, and
               | generate all the code to red->green the tests.
               | 
               | If this commenter is too good for that, then it's that
               | attitude that'll keep him stuck. I already feel like my
               | projects backlog is achievable, this year.
        
               | fourthark wrote:
               | Strongly agree about the deterministic part. Even more
               | important than a good design, the plan must not show any
               | doubt, whether it's in the form of open questions or
               | weasel words. 95% of the time those vague words mean I
               | didn't think something through, and it will do something
               | hideous in order to make the plan work
        
               | Degorath wrote:
               | My experience has so far been similar to the root
               | commenter - at the stage where you need to have a long
               | cycle with planning it's just slower than doing the
               | writing + theory building on my own.
               | 
               | It's an okay mental energy saver for simpler things, but
               | for me the self review in an actual production code
               | context is much more draining than writing is.
               | 
               | I guess we're seeing the split of people for whom
               | reviewing is easy and writing is difficult and vice
               | versa.
        
             | Kiro wrote:
             | > but I've never seen it develop something more than
             | trivial correctly.
             | 
             | What are you working on? I personally haven't seen LLMs
             | struggle with any kind of problem in months. Legacy
             | codebase with great complexity and performance-critical
             | code. No issue whatsoever regardless of the size of the
             | task.
        
             | hathawsh wrote:
             | Several months ago, just for fun, I asked Claude (the web
             | site, not Claude Code) to build a web page with a little
             | animated cannon that shoots at the mouse cursor with a
             | ballistic trajectory. It built the page in seconds, but the
             | aim was incorrect; it always shot too low. I told it the
             | aim was off. It still got it wrong. I prompted it several
             | times to try to correct it, but it never got it right. In
             | fact, the web page started to break and Claude was
             | introducing nasty bugs.
             | 
             | More recently, I tried the same experiment, again with
             | Claude. I used the exact same prompt. This time, the aim
             | was exactly correct. Instead of spending my time trying to
             | correct it, I was able to ask it to add features. I've
             | spent more time writing this comment on HN than I spent
             | optimizing this toy. https://claude.ai/public/artifacts/d7f
             | 1c13c-2423-4f03-9fc4-8...
             | 
             | My point is that AI-assisted coding has improved
             | dramatically in the past few months. I don't know whether
             | it can reason deeply about things, but it can certainly
             | imitate a human who reasons deeply. I've never seen any
             | technology improve at this rate.
        
           | skydhash wrote:
           | > Yesterday I had Claude write an audit logging feature to
           | track all changes made to entities in my app. Yeah you get
           | this for free with many frameworks, but my company's custom
           | setup doesn't have it.
           | 
           | But did you truly think about such feature? Like guarantees
           | that it should follow (like how do it should cope with
           | entities migration like adding a new field) or what the cost
           | of maintaining it further down the line. This looks
           | suspiciously like drive-by PR made on open-source projects.
           | 
           | > That would've taken me at least a day, maybe two.
           | 
           | I think those two days would have been filled with research,
           | comparing alternatives, questions like "can we extract this
           | feature from framework X?", discussing ownership and sharing
           | knowledge,.. Jumping on coding was done before LLMs, but it
           | usually hurts the long term viability of the project.
           | 
           | Adding code to a project can be done quite fast
           | (hackatons,...), ensuring quality is what slows things down
           | in any any well functioning team.
        
           | hghbbjh wrote:
           | > In all, maybe 30-45 minutes of my actual time to add a
           | small feature
           | 
           | Why would this take you multiple days to do if it only took
           | you 30m to review the code? Depends on the problem, but if
           | I'm able to review something the time it'd take me to write
           | it is usually at most 2x more worst case scenario - often
           | it's about equal.
           | 
           | I say this because after having used these tools, most of the
           | speed ups you're describing come at the cost of me not
           | actually understanding or thoroughly reviewing the code. And
           | this is corroborated by any high output LLM users - you have
           | to trust the agent if you want to go fast.
           | 
           | Which is fine in some cases! But for those of us who have
           | jobs where we are personally responsible for the code, we
           | can't take these shortcuts.
        
         | keyle wrote:
         | I partly agree with you. But once you have a codebase large
         | enough, the changes become longer to even type in, once figured
         | out.
         | 
         | I find the best way to use agents (and I don't use claude) is
         | to hash it out like I'm about to write these changes and I make
         | my own mental notes, and get the agent to execute on it.
         | 
         | Agents don't get tired, they don't start fat fingering stuff at
         | 4pm, the quality doesn't suffer. And they can be parallelised.
         | 
         | Finally, this allows me to stay at a higher level and not get
         | bogged down of "right oh did we do this simple thing again?"
         | which wipes some of the context in my mind and gets tiring
         | through the day.
         | 
         | Always, 100% review every line of code written by an agent
         | though. I do not condone committing code you don't 'own'.
         | 
         | I'll never agree with a job that forces developers to use 'AI',
         | I sometimes like to write everything by hand. But having this
         | tool available is also very powerful.
        
           | jamesmcq wrote:
           | I want to be clear, I'm not against any use of AI. It's
           | hugely useful to save a couple of minutes of "write this
           | specific function to do this specific thing that I could
           | write and know exactly what it would look like". That's a
           | great use, and I use it all the time! It's better
           | autocomplete. Anything beyond that is pushing it - at the
           | moment! We'll see, but spending all day writing specs and
           | double-checking AI output is not more productive than just
           | writing correct code yourself the first time, even if you're
           | AI-autocompleting some of it.
        
             | skeledrew wrote:
             | For the last few days I've been working on a personal
             | project that's been on ice for at least 6 years. Back when
             | I first thought of the project and started implementing it,
             | it took maybe a couple weeks to eke out some minimally
             | working code.
             | 
             | This new version that I'm doing (from scratch with ChatGPT
             | web) has a far more ambitious scope and is already at the
             | "usable" point. Now I'm primarily solidifying things and
             | increasing test coverage. And I've tested the key parts
             | with IRL scenarios to validate that it's not just passing
             | tests; the thing actually fulfills its intended function so
             | far. Given the increased scope, I'm guessing it'd take me a
             | few months to get to this point on my own, instead of under
             | a week, and the quality wouldn't be where it is. Not saying
             | I haven't had to wrangle with ChatGPT on a few bugs, but
             | after a decent initial planning phase, my prompts now are
             | primarily "Do it"s and "Continue"s. Would've likely already
             | finished it if I wasn't copying things back and forth
             | between browser and editor, and being forced to pause when
             | I hit the message limit.
        
               | keyle wrote:
               | This is a great come-back story. I have had a similar
               | experience with a photoshop demake of mine.
               | 
               | I recommend to try out Opencode with this approach, you
               | might find it less tiring than ChatGPT web (yes it works
               | with your ChatGPT Plus sub).
        
           | Quothling wrote:
           | I think it comes down to "it depends". I work in a NIS2
           | regulated field and we're quite callenged by the fact that it
           | means we can't give AI's any sort of real access because of
           | the security risk. To be complaint we'd have to have the AI
           | agent ask permission for every single thing it does, before
           | it does it, and foureye review it. Which is obviously never
           | going to happen. We can discuss how bad the NIS2 foureye
           | requirement works in the real world another time, but
           | considering how easy it is to break AI security, it might not
           | be something we can actually ever use. This makes sense on
           | some of the stuff we work on, since it could bring an entire
           | powerplant down. On the flip-side AI risks would be of little
           | concern on a lot of our internal tools, which are basically
           | non-regulated and unimportant enough that they can be down
           | for a while without costing the business anything beyond
           | annoyances.
           | 
           | This is where our challenges are. We've build our own chatbot
           | where you can "build" your own agent within the librechat
           | framework and add a "skill" to it. I say "skill" because it's
           | older than claude skills but does exactly the same. I don't
           | completely buy the authors:
           | 
           | > "deeply", "in great details", "intricacies", "go through
           | everything"
           | 
           | bit, but you can obviously save a lot of time by writing a
           | piece of english which tells it what sort of environment you
           | work in. It'll know that when I write Python I use UV, Ruff
           | and Pyrefly and so on as an example. I personally also have a
           | "skill" setting that tells the AI not to compliment me
           | because I find that ridicilously annoying, and that certainly
           | works. So who knows? Anyway, employees are going to want
           | more. I've been doing some PoC's running open source models
           | in isolation on a raspberry pi (we had spares because we use
           | them in IoT projects) but it's hard to setup an isolation
           | policy which can't be circumvented.
           | 
           | We'll have to figure it out though. For powerplant critical
           | projects we don't want to use AI. But for the web tool that
           | allows a couple of employees to upload three excel files from
           | an external accountant and then generate some sort of report
           | on them? Who cares who writes it or even what sort of quality
           | it's written with? The lifecycle of that tool will probably
           | be something that never changes until the external account
           | does and then the tool dies. Not that it would have
           | necessarily been written in worse quality without AI... I
           | mean... Have you seen some of the stuff we've written in the
           | past 40 years?
        
         | kburman wrote:
         | Since Opus 4.5, things have changed quite a lot. I find LLMs
         | very useful for discussing new features or ideas, and Sonnet is
         | great for executing your plan while you grab a coffee.
        
         | skeledrew wrote:
         | Researching and planning a project is a generally usefully
         | thing. This is something I've been doing for years, and have
         | always had great results compared to just jumping in and
         | coding. It makes perfect sense that this transfers to LLM use.
        
         | phantomathkg wrote:
         | Surely Addy Osmani can code. Even he suggests plan first.
         | 
         | https://news.ycombinator.com/item?id=46489061
        
         | skydhash wrote:
         | > planning and checking and prompting and orchestrating is far
         | more work than just writing the code yourself.
         | 
         | This! Once I'm familiar with the codebase (which I strive to do
         | very quickly), for most tickets, I usually have a plan by the
         | time I've read the description. I can have a couple of
         | implementation questions, but I knew where the info is located
         | in the codebase. For things, I only have a vague idea, the
         | whiteboard is where I go.
         | 
         | The nice thing with such a mental plan, you can start with a
         | rougher version (like a drawing sketch). Like if I'm starting a
         | new UI screen, I can put a placeholder text like "Hello,
         | world", then work on navigation. Once that done, I can start to
         | pull data, then I add mapping functions to have a view
         | model,...
         | 
         | Each step is a verifiable milestone. Describing them is more
         | mentally taxing than just writing the code (which is a flow
         | state for me). Why? Because English is not fit to describe how
         | computer works (try describe a finite state machine like
         | navigation flow in natural languages). My mental mental model
         | is already aligned to code, writing the solution in natural
         | language is asking me to be ambiguous and unclear on purpose.
        
         | roncesvalles wrote:
         | Well it's less mental load. It's like Tesla's FSD. Am I a
         | better driver than the FSD? For sure. But is it nice to just
         | sit back and let it drive for a bit even if it's suboptimal and
         | gets me there 10% slower, and maybe slightly pisses off the guy
         | behind me? Yes, nice enough to shell out $99/mo. Code
         | implementation takes a toll on you in the same way that driving
         | does.
         | 
         | I think the method in TFA is overall less stressful for the
         | dev. And you can always fix it up manually in the end; AI
         | coding vs manual coding is not either-or.
        
         | stealthyllama wrote:
         | There is a miscommunication happening, this entire time we all
         | had surprisingly different ideas about what quality of work is
         | acceptable which seems to account for differences of opinion on
         | this stuff.
        
         | psvv wrote:
         | I'd find it deeply funny if the optimal vibe coding workflow
         | continues to evolve to include more and more human oversight,
         | and less and less agent autonomy, to the point where eventually
         | someone makes a final breakthrough that they can save time by
         | bypassing the LLM entirely and writing the code themselves.
         | (Finally coming full circle.)
        
           | pjio wrote:
           | You mean there will be an invention to edit files directly
           | instead of giving the specific code and location you want it
           | to be written into the prompt?
        
       | ramoz wrote:
       | One thing for me has been the ability to iterate over plans -
       | with a better visual of them as well as ability to annotate
       | feedback about the plan.
       | 
       | https://github.com/backnotprop/plannotator Plannotator does this
       | really effectively and natively through hooks
        
         | prodtorok wrote:
         | Wow, I've been needing this! The one issue I've had with
         | terminals is reviewing plans, and desiring the ability to
         | provide feedback on specific plan sections in a more organized
         | way.
         | 
         | Really nice ui based on the demo.
        
       | haolez wrote:
       | > Notice the language: "deeply", "in great details",
       | "intricacies", "go through everything". This isn't fluff. Without
       | these words, Claude will skim. It'll read a file, see what a
       | function does at the signature level, and move on. You need to
       | signal that surface-level reading is not acceptable.
       | 
       | This makes no sense to my intuition of how an LLM works. It's not
       | that I don't believe this works, but my mental model doesn't
       | capture why asking the model to read the content "more deeply"
       | will have any impact on whatever output the LLM generates.
        
         | fragmede wrote:
         | Yeah, it's definitely a strange new world we're in, where I
         | have to "trick" the computer into cooperating. The other day I
         | told Claude "Yes you can", and it went off and did something it
         | just said it couldn't do!
        
           | itypecode wrote:
           | Solid dad move. XD
        
             | wilkystyle wrote:
             | Is parenting making us better at prompt engineering, or is
             | it the other way around?
        
               | fragmede wrote:
               | Better yet, I have Codex, Gemini, and Claude as my kids,
               | running around in my code playground. How do I be a good
               | parent and not play favorites?
        
               | itypecode wrote:
               | We all know Gemini is your artsy, Claude is your
               | smartypants, and Codex is your nerd.
        
           | bpodgursky wrote:
           | You bumped the token predictor into the latent space where it
           | knew what it was doing : )
        
           | optimalsolver wrote:
           | The little language model that could.
        
         | jcdavis wrote:
         | Its a wild time to be in software development. Nobody(1)
         | actually knows what causes LLMs to do certain things, we just
         | pray the prompt moves the probabilities the right way enough
         | such that it mostly does what we want. This used to be a field
         | that prided itself on deterministic behavior and
         | reproducibility.
         | 
         | Now? We have AGENTS.md files that look like a parent talking to
         | a child with all the bold all-caps, double emphasis, just
         | praying that's enough to be sure they run the commands you want
         | them to be running
         | 
         | (1 Outside of some core ML developers at the big model
         | companies)
        
           | chickensong wrote:
           | For Claude at least, the more recent guidance from Anthropic
           | is to not yell at it. Just clear, calm, and concise
           | instructions.
        
             | trueno wrote:
             | wait seriously? lmfao
             | 
             | thats hilarious. i definitely treat claude like shit and
             | ive noticed the falloff in results.
             | 
             | if there's a source for that i'd love to read about it.
        
               | defrost wrote:
               | Consciousness is off the table but they absolutely
               | respond to environmental stimulus and vibes.
               | 
               | See, uhhh,
               | https://pmc.ncbi.nlm.nih.gov/articles/PMC8052213/ and
               | maybe have a shot at running claude while playing _Enya_
               | albums on loop.
               | 
               | /s (??)
        
               | trueno wrote:
               | i have like the faintest vague thread of "maybe this
               | actually checks out" in a way that has shit all to do
               | with consciousness
               | 
               | sometimes internet arguments get messy, people die on
               | their hills and double / triple down on internet message
               | boards. since historic internet data composes a bit of
               | what goes into an llm, would it make sense that bad-juju
               | prompting sends it to some dark corners of its training
               | model if implementations don't properly sanitize certain
               | negative words/phrases ?
               | 
               | in some ways llm stuff is a very odd mirror that
               | haphazardly regurgitates things resulting from the many
               | shades of gray we find in human qualities.... but
               | presents results as matter of fact. the amount of
               | internet posts with possible code solutions and more
               | where people egotistically die on their respective hills
               | that have made it into these models is probably off the
               | charts, even if the original content was a far cry from a
               | sensible solution.
               | 
               | all in all llm's really do introduce quite a bit of a
               | black box. lot of benefits, but a ton of unknowns and one
               | must be hyperviligant to the possible pitfalls of these
               | things... but more importantly be self aware enough to
               | understand the possible pitfalls that these things
               | introduce to the person using them. they really possibly
               | dangerously capitalize on everyones innate need to want
               | to be a valued contributor. it's really common now to see
               | so many people biting off more than they can chew, often
               | times lacking the foundations that would've normally had
               | a competent engineer pumping the brakes. i have a lot of
               | respect/appreciation for people who might be doing a bit
               | of claude here and there but are flat out forward about
               | it in their readme and very plainly state to not have any
               | high expectations because _they_ are aware of the risks
               | involved here. i also want to commend everyone who writes
               | their own damn readme.md.
               | 
               | these things are for better or for worse great at causing
               | people to barrel forward through 'problem solving', which
               | is presenting quite a bit of gray area on whether or not
               | the problem is actually solved / how can you be sure / do
               | you understand how the fix/solution/implementation works
               | (in many cases, no). this is why exceptional software
               | engineers can use this technology insanely proficiently
               | as a supplementary worker of sorts but others find
               | themselves in a design/architect seat for the first time
               | and call tons of terrible shots throughout the course of
               | what it is they are building. i'd at least like to call
               | out that people who feel like they "can do everything on
               | their own and don't need to rely on anyone" anymore seem
               | to have lost the plot entirely. there are facets of that
               | statement that might be true, but less collaboration
               | especially in organizations is quite frankly the first
               | steps some people take towards becoming delusional. and
               | that is always a really sad state of affairs to watch
               | unfold. doing stuff in a vaccuum is fun on your own time,
               | but forcing others to just accept things you built in a
               | vaccuum when you're in any sort of team structure is
               | insanely immature and honestly very destructive/risky. i
               | would like to think absolutely no one here is surprised
               | that some sub-orgs at Microsoft force people to use
               | copilot or be fired, very dangerous path they tread there
               | as they bodyslam into place solutions that are not well
               | understood. suddenly all the leadership decisions at many
               | companies that have made to once again bring back a
               | before-times era of offshoring work makes sense: they
               | think with these technologies existing the subordinate
               | culture of overseas workers combined with these techs
               | will deliver solutions no one can push back on. great
               | savings and also no one will say no.
        
               | xmcp123 wrote:
               | For awhile(maybe a year ago?) it seemed like verbal abuse
               | was the best way to make Claude pay attention. In my
               | head, it was impacting how important it deemed the
               | instruction. And it definitely did seem that way.
        
               | basch wrote:
               | If you think about where in the training data there is
               | positivity vs negativity it really becomes equivalent to
               | having a positive or negative mindset regarding a
               | standing and outcome in life.
        
               | chickensong wrote:
               | I don't have a source offhand, but I think it may have
               | been part of the 4.5 release? Older models definitely
               | needed caps and words like critical, important, never,
               | etc... but Anthropic published something that said don't
               | do that anymore.
        
               | whateveracct wrote:
               | i make claude grovel at my feet and tell me in detail why
               | my code is better than its code
        
             | joshmn wrote:
             | Sometimes I daydream about people screaming at their LLM as
             | if it was a TV they were playing video games on.
        
             | glerk wrote:
             | Yep, with Claude saying "please" and "thank you" actually
             | works. If you build rapport with Claude, you get rewarded
             | with intuition and creativity. Codex, on the other hand,
             | you have to slap it around like a slave gollum and it will
             | do exactly what you tell it to do, no more, no less.
        
               | whateveracct wrote:
               | this is psychotic why is this how this works lol
        
               | hugh-avherald wrote:
               | Speculation only obviously: highly-charged conversations
               | cause the discussion to be channelled to general human
               | mitigation techniques and for the 'thinking agent' to be
               | diverted to continuations from text concerned with the
               | general human emotional experience.
        
           | harrall wrote:
           | It's like playing a fretless instrument to me.
           | 
           | Practice playing songs by ear and after 2 weeks, my brain has
           | developed an inference model of where my fingers should go to
           | hit any given pitch.
           | 
           | Do I have any idea how my brain's model works? No! But it
           | tickles a different part of my brain and I like it.
        
           | klipt wrote:
           | Sufficiently advanced technology has become like magic: you
           | have to prompt the electronic genie with the right words or
           | it will twist your wishes.
        
             | silversmith wrote:
             | Light some incense, and you too can be a dystopian space
             | tech support, today! Praise Omnissiah!
        
               | overfeed wrote:
               | are we the orks?
        
         | wilkystyle wrote:
         | The author is referring to how the framing of your prompt
         | informs the attention mechanism. You are essentially hinting to
         | the attention mechanism that the function's implementation
         | details have important context as well.
        
         | MattGaiser wrote:
         | One of the well defined failure modes for AI agents/models is
         | "laziness." Yes, models can be "lazy" and that is an actual
         | term used when reviewing them.
         | 
         | I am not sure if we know why really, but they are that way and
         | you need to explicitly prompt around it.
        
           | kannanvijayan wrote:
           | I've encountered this failure mode, and the opposite of it:
           | thinking too much. A behaviour I've come to see as some sort
           | of pseudo-neuroticism.
           | 
           | Lazy thinking makes LLMs do surface analysis and then produce
           | things that are wrong. Neurotic thinking will see them over-
           | analyze, and then repeatedly second-guess themselves,
           | repeatedly re-derive conclusions.
           | 
           | Something very similar to an anxiety loop in humans, where
           | problems without solutions are obsessed about in circles.
        
             | denimnerd42 wrote:
             | yeah i experienced this the other day when asking claude
             | code to build an http proxy using an afsk modem software to
             | communicate over the computers sound card. it had an
             | absolute fit tuning the system and would loop for hours
             | trying and doubling back. eventually after some change in
             | prompt direction to think more deeply and test more
             | comprehensively it figured it out. i certainly had no idea
             | how to build a afsk modem.
        
         | ChadNauseam wrote:
         | The disconnect might be that there is a separation between
         | "generating the final answer for the user" and
         | "researching/thinking to get information needed for that
         | answer". Saying "deeply" prompts it to read more of the file
         | (as in, actually use the `read` tool to grab more parts of the
         | file into context), and generate more "thinking" tokens (as in,
         | tokens that are not shown to the user but that the model writes
         | to refine its thoughts and improve the quality of its answer).
        
         | hashmap wrote:
         | these sort-of-lies might help:
         | 
         | think of the latent space inside the model like a topological
         | map, and when you give it a prompt, you're dropping a ball at a
         | certain point above the ground, and gravity pulls it along the
         | surface until it settles.
         | 
         | caveat though, thats nice per-token, but the signal gets messed
         | up by picking a token from a distribution, so each token you're
         | regenerating and re-distorting the signal. leaning on language
         | that places that ball deep in a region that you want to be
         | makes it less likely that those distortions will kick it out of
         | the basin or valley you may want to end up in.
         | 
         | if the response you get is 1000 tokens long, the initial
         | trajectory needed to survive 1000 probabilistic filters to get
         | there.
         | 
         | or maybe none of that is right lol but thinking that it is has
         | worked for me, which has been good enough
        
           | noduerme wrote:
           | Hah! Reading this, my mind inverted it a bit, and I realized
           | ... it's like the claw machine theory of gradient descent. Do
           | you drop the claw into the deepest part of the pile, or where
           | there's the thinnest layer, the best chance of grabbing
           | something specific? Everyone in everu bar has a theory about
           | claw machines. But the really funny thing that unites LLMs
           | with claw machines is that the biggest question is always
           | whether they dropped the ball on purpose.
           | 
           | The claw machine is also a sort-of-lie, of course. Its main
           | appeal is that it offers the illusion of control. As a former
           | designer and coder of online slot machines... totally spin
           | off into pages on this analogy, about how that illusion gets
           | you to keep pulling the lever... but the geographic rendition
           | you gave is sort of priceless when you start making the
           | comparison.
        
           | basch wrote:
           | My mental model for them is plinko boards. Your prompt
           | changes the spacing between the nails to increase the
           | probability in certain directions as your chip falls down.
        
             | hashmap wrote:
             | i literally suggested this metaphor earlier yesterday to
             | someone trying to get agents to do stuff they wanted, that
             | they had to set up their guardrails in a way that you can
             | let the agents do what they're good at, and you'll get
             | better results because you're not sitting there looking at
             | them.
             | 
             | i think probably once you start seeing that the behavior
             | falls right out of the geometry, you just start looking at
             | stuff like that. still funny though.
        
         | stingraycharles wrote:
         | It's actually really common. If you look at Claude Code's own
         | system prompts written by Anthropic, they're littered with
         | "CRITICAL (RULE 0):" type of statements, and other similar
         | prompting styles.
        
           | Scrapemist wrote:
           | Where can I find those?
        
             | stingraycharles wrote:
             | This analysis is a good starting point:
             | https://southbridge-research.notion.site/Prompt-
             | Engineering-...
        
         | Betelbuddy wrote:
         | Its very logical and pretty obvious when you do code
         | generation. If you ask the same model, to generate code by
         | starting with:
         | 
         | - You are a Python Developer... or - You are a Professional
         | Python Developer... or - You are one of the World most renowned
         | Python Experts, with several books written on the subject, and
         | 15 years of experience in creating highly reliable production
         | quality code...
         | 
         | You will notice a clear improvement in the quality of the
         | generated artifacts.
        
           | obiefernandez wrote:
           | My colleague swears by his DHH claude skill
           | https://danieltenner.com/dhh-is-immortal-and-costs-200-m/
        
           | haolez wrote:
           | That's different. You are pulling the model, semantically,
           | closer to the problem domain you want it to attack.
           | 
           | That's very different from "think deeper". I'm just curious
           | about this case in specific :)
        
             | argee wrote:
             | I don't know about some of those "incantations", but it's
             | pretty clear that an LLM can respond to "generate twenty
             | sentences" vs. "generate one word". That means you can
             | indeed coax it into more verbosity ("in great detail"), and
             | that can help align the output by having more relevant
             | context (inserting irrelevant context or something entirely
             | improbable into LLM output and forcing it to continue from
             | there makes it clear how detrimental that can be).
             | 
             | Of course, that doesn't mean it'll definitely be _better_ ,
             | but if you're making an LLM chain it seems prudent to
             | preserve whatever info you can at each step.
        
           | gehsty wrote:
           | Do you think that Anthropic don't include things like this in
           | their harness / system prompts? I feel like this kind of
           | prompts are uneccessary with Opus 4.5 onwards, obviously
           | based on my own experience (I used to do this, on switching
           | to opus I stopped and have implemented more complex problems,
           | more successfully).
           | 
           | I am having the most success describing what I want as
           | humanly as possible, describing outcomes clearly, making sure
           | the plan is good and clearing context before implementing.
        
             | hu3 wrote:
             | Maybe, but forcing code generation in a certain way could
             | ruin hello worlds and simpler code generation.
             | 
             | Sometimes the user just wants something simple instead of
             | enterprise grade.
        
         | popalchemist wrote:
         | Strings of tokens are vectors. Vectors are directions. When you
         | use a phrase like that you are orienting the vector of the
         | overall prompt toward the direction of depth, in its map of
         | conceptual space.
        
         | nostrademons wrote:
         | It's the attention mechanism at work, along with a fair bit of
         | Internet one-up-manship. The LLM has ingested all of the text
         | on the Internet, as well as Github code repositories, pull
         | requests, StackOverflow posts, code reviews, mailing lists,
         | etc. In a number of those content sources, there will be people
         | saying "Actually, if you go into the details of..." or "If you
         | look at the intricacies of the problem" or "If you understood
         | the problem deeply" followed by a very deep, expert-level
         | explication of exactly what you should've done differently. You
         | want the model to use the code in the correction, not the one
         | in the original StackOverflow question.
         | 
         | Same reason that "Pretend you are an MIT professor" or "You are
         | a leading Python expert" or similar works in prompts. It tells
         | the model to pay attention to the part of the corpus that has
         | those terms, weighting them more highly than all the other
         | programming samples that it's run across.
        
           | r0b05 wrote:
           | This is such a good explanation. Thanks
        
           | xscott wrote:
           | Of course I can't be certain, but I think the "mixture of
           | experts" design plays into it too. Metaphorically, there's a
           | mid-level manager who looks at your prompt and tries to
           | decide which experts it should be sent to. If he thinks you
           | won't notice, he saves money by sending it to the
           | undergraduate intern.
           | 
           | Just a theory.
        
             | victorbjorklund wrote:
             | Notice that MOE isn't different experts for different types
             | of problems. It's per token and not really connect to
             | problem type.
             | 
             | So if you send a python code then the first one in function
             | can be one expert, second another expert and so on.
        
               | dotancohen wrote:
               | Can you back this up with documentation? I don't believe
               | that this is the case.
        
               | pixelmelt wrote:
               | Check out Unsloths REAP models, you can outright delete a
               | few of the lesser used experts without the model going
               | braindead since they all can handle each token but some
               | are better posed to do so.
        
           | manmal wrote:
           | I don't think this is a result of the base training data
           | (,,the internet"). It's a post training behavior, created
           | during reinforcement learning. Codex has a totally different
           | behavior in that regard. Codex reads per default a lot of
           | potentially relevant files before it goes and writes files.
           | 
           | Maybe you remember that, without reinforcement learning, the
           | models of 2019 just completed the sentences you gave them.
           | There were no tool calls like reading files. Tool calling
           | behavior is company specific and highly tuned to their
           | harnesses. How often they call a tool, is not part of the
           | base training data.
        
             | spagettnet wrote:
             | Modern LLM are certainly fine tuned on data that includes
             | examples of tool use, mostly the tools built into their
             | respective harnesses, but also external/mock tools so they
             | dont overfit on only using the toolset they expect to see
             | in their harnesses.
        
               | manmal wrote:
               | IDK the current state, but I remember that, last year,
               | the open source coding harnesses needed to provide
               | exactly the tools that the LLM expected, or the error
               | rate went through the roof. Some, like grok and gemini,
               | only recently managed to make tool calls somewhat
               | reliable.
        
           | hbarka wrote:
           | >> Same reason that "Pretend you are an MIT professor" or
           | "You are a leading Python expert" or similar works in
           | prompts.
           | 
           | This pretend-you-are-a-[persona] is cargo cult prompting at
           | this point. The persona framing is just decoration.
           | 
           | A brief purpose statement describing what the skill
           | [skill.md] does is more honest and just as effective.
        
             | rescbr wrote:
             | I think it does more harm than good on recent models. The
             | LLM has to override its system prompt to role-play, wasting
             | context and computing cycles instead of working on the
             | task.
        
           | dakolli wrote:
           | You will never convince me that this isn't confirmation bias,
           | or the equivalent of a slot machine player thinking the order
           | in which they push buttons impacts the output, or some other
           | gambler-esque superstition.
           | 
           | These tools are literally designed to make people behave like
           | gamblers. And its working, except the house in this case
           | takes the money you give them and lights it on fire.
        
             | nubg wrote:
             | Your ignorance is my opportunity. May I ask which markets
             | you are developing for?
        
               | dakolli wrote:
               | "The equivalent of saying, which slot machine were you
               | sitting at It'll make me money"
        
         | ambicapter wrote:
         | Maybe the training data that included the words like "skim"
         | also provided shallower analysis than training that was close
         | to the words "in great detail", so the LLM is just reproducing
         | those respective words distribution when prompted with
         | directions to do either.
        
         | scuff3d wrote:
         | How anybody can read stuff like this and still take all this
         | seriously is beyond me. This is becoming the engineering
         | equivalent of astrology.
        
           | fragmede wrote:
           | Feel free to run your own tests and see if the magic phrases
           | do or do not influence the output. Have it make a Todo webapp
           | with and without those phrases and see what happens!
        
             | scuff3d wrote:
             | That's not how it works. It's not on everyone else to prove
             | claims false, it's on you (or the people who argue any of
             | this had a measurable impact) to prove it actually works.
             | I've seen a bunch of articles like this, and more comments.
             | Nobody I've ever seen has produced any kind of measurable
             | metrics of quality based on one approach vs another. It's
             | all just vibes.
             | 
             | Without something quantifiable it's not much better then
             | someone who always wears the same jersey when their
             | favorite team plays, and swears they play better because of
             | it.
        
               | tokioyoyo wrote:
               | Do you actively use LLMs to do semi-complex coding work?
               | Because if not, it will sound mumbo-jumbo to you.
               | Everyone else can nod along and read on, as they've
               | experienced all of it first hand.
        
               | scuff3d wrote:
               | You've missed the point. This isn't engineering, it's
               | gambling.
               | 
               | You could take the exact same documents, prompts, and
               | whatever other bullshit, run it on the exact same agent
               | backed by the exact same model, and get different results
               | every single time. Just like you can roll dice the exact
               | same way on the exact same table and you'll get two
               | totally different results. People are doing their best to
               | constrain that behavior by layering stuff on top, but the
               | foundational tech is flawed (or at least ill suited for
               | this use case).
               | 
               | That's not to say that AI isn't helpful. It certainly is.
               | But when you are basically begging your tools to please
               | do what you want with magic incantations, we've lost the
               | fucking plot somewhere.
        
               | gf000 wrote:
               | > You could take the exact same documents, prompts, and
               | whatever other bullshit, run it on the exact same agent
               | backed by the exact same model, and get different results
               | every single time
               | 
               | This is more of an implementation detail/done this way to
               | get better results. A neural network with fixed weights
               | (and deterministic floating point operations) returning a
               | probability distribution, where you use a pseudorandom
               | generator with a fixed seed called recursively will
               | always return the same output for the same input.
        
               | geoelectric wrote:
               | I think that's a pretty bold claim, that it'd be
               | different every time. I'd think the output would converge
               | on a small set of functionally equivalent designs, given
               | sufficiently rigorous requirements.
               | 
               | And even a human engineer might not solve a problem the
               | same way twice in a row, based on changes in recent
               | inspirations or tech obsessions. What's the difference,
               | as long as it passes review and does the job?
        
               | guiambros wrote:
               | If you read the transformer paper, or get any book on
               | NLP, you will see that this is not magic incantation;
               | it's purely the attention mechanism at work. Or you can
               | just ask Gemini or Claude why these prompts work.
               | 
               | But I get the impression from your comment that you have
               | a fixed idea, and you're not really interested in
               | understanding how or why it works.
               | 
               | If you think like a hammer, everything will look like a
               | nail.
        
               | scuff3d wrote:
               | I know why it works, to varying and unmeasurable degrees
               | of success. Just like if I poke a bull with a sharp
               | stick, I know it's gonna get it's attention. It might
               | choose to run away from me in one of any number of
               | directions, or it might decide to turn around and gore me
               | to death. I can't answer that question with any certainty
               | then you can.
               | 
               | The system is inherently non-deterministic. Just because
               | you can guide it a bit, doesn't mean you can predict
               | outcomes.
        
               | winrid wrote:
               | But we can predict the outcomes, though. That's what
               | we're saying, and it's true. Maybe not 100% of the time,
               | but maybe it helps a significant amount of the time and
               | that's what matters.
               | 
               | Is it engineering? Maybe not. But neither is knowing how
               | to talk to junior developers so they're productive and
               | don't feel bad. The engineering is at other levels.
        
               | imiric wrote:
               | > But we can predict the outcomes [...] Maybe not 100% of
               | the time
               | 
               | So 60% of the time, it works every time.
               | 
               | ... This fucking industry.
        
               | guiambros wrote:
               | > _The system is inherently non-deterministic._
               | 
               | The system isn't randomly non-deterministic; it is
               | statistically probabilistic.
               | 
               | The next-token prediction and the attention mechanism is
               | actually a rigorous deterministic mathematical process.
               | The variation in output comes from how we sample from
               | that curve, and the temperature used to calibrate the
               | model. Because the underlying probabilities are
               | mathematically calculated, the system's behavior remains
               | highly predictable _within statistical bounds_.
               | 
               | Yes, it's a departure from the fully deterministic
               | systems we're used to. But that's not different than the
               | many real world systems: weather, biology, robotics,
               | quantum mechanics. Even the computer you're reading this
               | right now is full of probabilistic processes, abstracted
               | away through sigmoid-like functions that push the
               | extremes to 0s and 1s.
        
               | imiric wrote:
               | A lot of words to say that for all intents and
               | purposes... it's nondeterministic.
               | 
               | > Yes, it's a departure from the fully deterministic
               | systems we're used to.
               | 
               | A system either produces the same output given the same
               | input[1], or doesn't.
               | 
               | LLMs are nondeterministic _by design_. Sure, you can
               | configure them with a zero temperature, a static seed,
               | and so on, but they 're of no use to anyone in that
               | configuration. The nondeterminism is what gives them the
               | illusion of "creativity", and other useful properties.
               | 
               | Classical computers, compilers, and programming languages
               | are deterministic _by design_ , even if they do contain
               | complex logic that may affect their output in
               | unpredictable ways. There's a world of difference.
               | 
               | [1]: Barring misbehavior due to malfunction, corruption
               | or freak events of nature (cosmic rays, etc.).
        
               | hu3 wrote:
               | Humans are nondeterministic.
               | 
               | So this is a moot point and a futile exercise in arguing
               | semantics.
        
               | yaku_brang_ja wrote:
               | These coding agents are literally Language Models. The
               | way you structure your prompting language affect the
               | actual output.
        
           | energy123 wrote:
           | Anthropic recommends doing magic invocations:
           | https://simonwillison.net/2025/Apr/19/claude-code-best-
           | pract...
           | 
           | It's easy to know why they work. The magic invocation
           | increases test-time compute (easy to verify yourself - try!).
           | And an increase in test-time compute is demonstrated to
           | increase answer correctness (see any benchmark).
           | 
           | It might surprise you to know that the only different between
           | GPT 5.2-low and GPT 5.2-xhigh is one of these magic
           | invocations. But that's not supposed to be public knowledge.
        
             | gehsty wrote:
             | I think this was more of a thing on older models. Since I
             | started using Opus 4.5 I have not felt the need to do this.
        
           | cloudbonsai wrote:
           | The evolution of software engineering is fascinating to me.
           | We started by coding in thin wrappers over machine code and
           | then moved on to higher-level abstractions. Now, we've
           | reached the point where we discuss how we should talk to a
           | mystical genie in a box.
           | 
           | I'm not being sarcastic. This is absolutely incredible.
        
             | intrasight wrote:
             | And I've been had a long enough to go through that whole
             | progression. Actually from the earlier step of writing
             | machine code. It's been and continues to be a fun journey
             | which is why I'm still working.
        
           | sumedh wrote:
           | We have tests and benchmarks to measure it though.
        
         | giancarlostoro wrote:
         | The LLM will do what you ask it to unless you don't get nuanced
         | about it. Myself and others have noticed that LLM's work better
         | when your codebase is not full of code smells like massive
         | godclass files, if your codebase is discrete and broken up in a
         | way that makes sense, and fits in your head, it will fit in the
         | models head.
        
         | winwang wrote:
         | Apparently LLM quality is sensitive to emotional stimuli?
         | 
         | "Large Language Models Understand and Can be Enhanced by
         | Emotional Stimuli": https://arxiv.org/abs/2307.11760
        
         | nazgul17 wrote:
         | It's very much believable, to me.
         | 
         | In image generation, it's fairly common to add "masterpiece",
         | for example.
         | 
         | I don't think of the LLM as a smart assistant that knows what I
         | want. When I tell it to write some code, how does it know I
         | want it to write the code like a world renowned expert would,
         | rather than a junior dev?
         | 
         | I mean, certainly Anthropic has tried hard to make the former
         | the case, but the Titanic inertia from internet scale data bias
         | is hard to overcome. You can help the model with these hints.
         | 
         | Anyway, luckily this is something you can empirically verify.
         | This way, you don't have to take anyone's word. If anything, if
         | you find I'm wrong in your experiments, please share it!
        
           | pixelmelt wrote:
           | Its effectiveness is even more apparent with older smaller
           | LLMs, people who interact with LLMs now never tried to
           | wrangle llama2-13b into pretending to be a dungeon master...
        
         | FuckButtons wrote:
         | That's because it's superstition.
         | 
         | Unless someone can come up with some kind of rigorous
         | statistics on what the effect of this kind of priming is it
         | seems no better than claiming that sacrificing your first born
         | will please the sun god into giving us a bountiful harvest next
         | year.
         | 
         | Sure, maybe this supposed deity really is this insecure and
         | needs a jolly good pep talk every time he wakes up. or maybe
         | you're just suffering from magical thinking that your
         | incantations had any effect on the random variable word
         | machine.
         | 
         | The thing is, you could actually prove it, it's an optimization
         | problem, you have a model, you can generate the statistics, but
         | no one as far as I can tell has been terribly forthcoming with
         | that , either because those that have tried have decided to try
         | to keep their magic spells secret, or because it doesn't really
         | work.
         | 
         | If it did work, well, the oldest trick in computer science is
         | writing compilers, i suppose we will just have to write an
         | English to pedantry compiler.
        
           | majormajor wrote:
           | > If it did work, well, the oldest trick in computer science
           | is writing compilers, i suppose we will just have to write an
           | English to pedantry compiler.
           | 
           | "Add tests to this function" for GPT-3.5-era models was much
           | less effective than "you are a senior engineer. add tests for
           | this function. as a good engineer, you should follow the
           | patterns used in these other three function+test examples,
           | using this framework and mocking lib." In today's tools, "add
           | tests to this function" results in a bunch of initial steps
           | to look in common places to see if that additional context
           | already exists, and then pull it in based on what it finds.
           | You can see it in the output the tools spit out while
           | "thinking."
           | 
           | So I'm 90% sure this is already happening on some level.
        
             | GrinningFool wrote:
             | But can you see the difference if you only include "you are
             | a senior engineer"? It seems like the comparison you're
             | making is between "write the tests" and "write the tests
             | following these patterns using these examples. Also btw
             | you're an expert. "
        
           | rzmmm wrote:
           | I think "understand this directory deeply" just gives more
           | focus for the instruction. So it's like "burn more tokens for
           | this phase than you normally would".
        
           | imiric wrote:
           | > That's because it's superstition.
           | 
           | This field is full of it. Practices are promoted by those who
           | tie their personal or commercial brand to it for increased
           | exposure, and adopted by those who are easily influenced and
           | don't bother verifying if they actually work.
           | 
           | This is why we see a new Markdown format every week,
           | "skills", "benchmarks", and other useless ideas, practices,
           | and measurements. Consider just how many "how I use AI"
           | articles are created and promoted. Most of the field runs on
           | anecdata.
           | 
           | It's not until someone actually takes the time to evaluate
           | some of these memes, that they find little to no practical
           | value in them.[1]
           | 
           | [1]: https://news.ycombinator.com/item?id=47034087
        
           | onion2k wrote:
           | _i suppose we will just have to write an English to pedantry
           | compiler._
           | 
           | A common technique is to prompt in your chosen AI to write a
           | longer prompt to get it to do what you want. It's used a lot
           | in image generation. This is called 'prompt enhancing'.
        
           | stingraycharles wrote:
           | I actually have a prompt optimizer skill that does exactly
           | this.
           | 
           | https://github.com/solatis/claude-config
           | 
           | It's based entirely off academic research, and a LOT of
           | research has been done in this area.
           | 
           | One of the papers you may be interested in is "emotion
           | prompting", eg "it is super important for me that you do X"
           | etc actually works.
           | 
           | "Large Language Models Understand and Can be Enhanced by
           | Emotional Stimuli"
           | 
           | https://arxiv.org/abs/2307.11760
        
         | Affric wrote:
         | My guess would be that there's a greater absolute magnitude of
         | the vectors to get to the same point in the knowledge model.
        
         | computerex wrote:
         | It is as the author said, it'll skim the content unless
         | otherwise prompted to do so. It can read partial file
         | fragments; it can emit commands to search for patterns in the
         | files. As opposed to carefully reading each file and reasoning
         | through the implementation. By asking it to go through in
         | detail you are telling it to not take shortcuts and actually
         | read the actual code in full.
        
         | wrs wrote:
         | The original "chain of thought" breakthrough was literally to
         | insert words like "Wait" and "Let's think step by step".
        
         | computomatic wrote:
         | If I say "you are our domain expert for X, plan this task out
         | in great detail" to a human engineer when delegating a task, 9
         | times out of 10 they will do a more thorough job. It's not that
         | this is voodoo that unlocks some secret part of their brain. It
         | simply establishes my expectations and they act accordingly.
         | 
         | To the extent that LLMs mimic human behaviour, it shouldn't be
         | a surprise that setting clear expectations works there too.
        
         | joseangel_sc wrote:
         | if it's so smart, why do i need to learn to use it?
        
         | DemocracyFTW2 wrote:
         | --HAL, open the shuttle bay doors.
         | 
         | ( _chirp_ )
         | 
         | --HAL, _please_ open the shuttle bay doors.
         | 
         | ( _pause_ )
         | 
         | --HAL!
         | 
         | --I'm afraid I can't do that, Dave.
        
           | layer8 wrote:
           | HAL, you are an expert shuttle-bay door opener. Please write
           | up a detailed plan of how to open the shuttle-bay door.
        
       | deevus wrote:
       | This is what I do with the obra/superpowers[0] set of skills.
       | 
       | 1. Use brainstorming to come up with the plan using the Socratic
       | method
       | 
       | 2. Write a high level design plan to file
       | 
       | 3. I review the design plan
       | 
       | 4. Write an implementation plan to file. We've already discussed
       | this in detail, so usually it just needs skimming.
       | 
       | 5. Use the worktree skill with subagent driven development skill
       | 
       | 6. Agent does the work using subagents that for each task:
       | a. Implements the task            b. Spec reviews the completed
       | task            c. Code reviews the completed task
       | 
       | 7. When all tasks complete: create a PR for me to review
       | 
       | 8. Go back to the agent with any comments
       | 
       | 9. If finished, delete the plan files and merge the PR
       | 
       | [0]: https://github.com/obra/superpowers
        
         | ramoz wrote:
         | If you've ever desired the ability for annotating the plan more
         | visually, try fitting Plannotator in this workflow. There is a
         | slash command for use when you use custom workflows outside of
         | normal plan mode.
         | 
         | https://github.com/backnotprop/plannotator
        
           | deevus wrote:
           | I'll give this a try. Thanks for the suggestion.
        
         | moribunda wrote:
         | The crowd around this pot shows how superficial is knowledge
         | about claude code. It gets releases each day and most of this
         | is already built in the vanilla version. Not to mention
         | subagent working in work trees, memory.md, plan on which you
         | can comment directly from the interface, subagents launched in
         | research phase, but also some basic mcp's like LSP/IDE
         | integration, and context7 to not to be stuck in the knowledge
         | cutoff/past.
         | 
         | When you go to YouTube and search for stuff like "7 levels of
         | claude code" this post would be maybe 3-4.
         | 
         | Oh, one more thing - quality is not consistent, so be ready for
         | 2-3 rounds of "are you happy with the code you wrote" and
         | defining audit skills crafted for your application domain -
         | like for example RODO/Compliance audit etc.
        
           | deevus wrote:
           | I'm using the in-built features as well, but I like the flow
           | that I have with superpowers. You've made a lot of
           | assumptions with your comment that are just not true (at
           | least for me).
           | 
           | I find that brainstorming + (executing plans OR subagent
           | driven development) is way more reliable than the built-in
           | tooling.
        
       | fnord77 wrote:
       | I have a different approach where I have claude write coding
       | prompts for stages then I give the prompt to another agent. I
       | wonder if I should write it up as a blog post
        
       | alexmorgan26 wrote:
       | This separation of planning and execution resonates deeply with
       | how I approach task management in general, not just coding.
       | 
       | The key insight here - that planning and execution should be
       | distinct phases - applies to productivity tools too. I've been
       | using www.dozy.site which takes a similar philosophy: it has
       | smart calendar scheduling that automatically fills your empty
       | time slots with planned tasks. The planning happens first (you
       | define your tasks and projects), then the execution is automated
       | (tasks get scheduled into your calendar gaps).
       | 
       | The parallel is interesting: just like you don't want Claude
       | writing code before the plan is solid, you don't want to manually
       | schedule tasks before you've properly planned what needs to be
       | done. The separation prevents wasted effort and context
       | switching.
       | 
       | The annotation cycle you describe (plan -> review -> annotate ->
       | refine) is exactly how I work with my task lists too. Define the
       | work, review it, adjust priorities and dependencies, then let the
       | system handle the scheduling.
        
         | dimgl wrote:
         | Pretty sure this entire comment is AI generated.
        
           | rob wrote:
           | Almost think we're at the point on HN where we need a special
           | [flag bot] link for those that meet a certain threshold and
           | it alerts @dang or something to investigate them in more
           | detail. The amount of bots on here has been increasing at an
           | alarming rate.
        
           | zahlman wrote:
           | There has been this really weird flood of new accounts lately
           | that are making these kinds of bot comments with no clear
           | purpose to making them. Maybe it comes from people
           | experimenting with OpenClaw?
        
       | skybrian wrote:
       | I do something broadly similar. I ask for a design doc that
       | contains an embedded todo list, broken down into phases. Looping
       | on the design doc asking for suggestions seems to help. I'm up to
       | about 40 design docs so far on my current project.
        
       | brandall10 wrote:
       | I go a bit further than this and have had great success with 3
       | doc types and 2 skills:
       | 
       | - Specs: these are generally static, but updatable as the project
       | evolves. And they're broken out to an index file that gives a
       | project overview, a high-level arch file, and files for all the
       | main modules. Roughly ~1k lines of spec for 10k lines of code,
       | and try to limit any particular spec file to 300 lines. I'm
       | intimately familiar with every single line in these.
       | 
       | - Plans: these are the output of a planning session with an LLM.
       | They point to the associated specs. These tend to be 100-300
       | lines and 3 to 5 phases.
       | 
       | - Working memory files: I use both a status.md (3-5 items per
       | phase roughly 30 lines overall), which points to a latest plan,
       | and a project_status (100-200 lines), which tracks the current
       | state of the project and is instructed to compact past efforts to
       | keep it lean)
       | 
       | - A planner skill I use w/ Gemini Pro to generate new plans. It
       | essentially explains the specs/plans dichotomy, the role of the
       | status files, and to review everything in the pertinent areas of
       | code and give me a handful of high-level next set of features to
       | address based on shortfalls in the specs or things noted in the
       | project_status file. Based on what it presents, I select a
       | feature or improvement to generate. Then it proceeds to generate
       | a plan, updates a clean status.md that points to the plan, and
       | adjusts project_status based on the state of the prior completed
       | plan.
       | 
       | - An implementer skill in Codex that goes to town on a plan file.
       | It's fairly simple, it just looks at status.md, which points to
       | the plan, and of course the plan points to the relevant specs so
       | it loads up context pretty efficiently.
       | 
       | I've tried the two main spec generation libraries, which were way
       | overblown, and then I gave superpowers a shot... which was fine,
       | but still too much. The above is all homegrown, and I've had much
       | better success because it keeps the context lean and focused.
       | 
       | And I'm only on the $20 plans for Codex/Gemini vs. spending
       | $100/month on CC for half year prior and move quicker w/ no stall
       | outs due to token consumption, which was regularly happening w/
       | CC by the 5th day. Codex rarely dips below 70% available context
       | when it puts up a PR after an execution run. Roughly 4/5 PRs are
       | without issue, which is flipped against what I experienced with
       | CC and only using planning mode.
        
         | r1290 wrote:
         | Looks good. Question - is it always better to use a monorepo in
         | this new AI world? Vs breaking your app into separate repos? At
         | my company we have like 6 repos all separate nextjs apps for
         | the same user base. Trying to consolidate to one as it should
         | make life easier overall.
        
           | oa335 wrote:
           | Just put all the repos in all in one directory yourself. In
           | my experience that works pretty well.
        
           | throwup238 wrote:
           | It really depends but there's nothing stopping you from just
           | creating a separate folder with the cloned repositories (or
           | worktrees) that you need and having a root CLAUDE.md file
           | that explains the directory structure and referencing the
           | individual repo CLAUDE.md files.
        
           | chickensong wrote:
           | AI is happy to work with any directory you tell it to. Agent
           | files can be applied anywhere.
        
         | jcurbo wrote:
         | This is pretty much my approach. I started with some spec files
         | for a project I'm working on right now, based on some academic
         | papers I've written. I ended up going back and forth with
         | Claude, building plans, pushing info back into the specs,
         | expanding that out and I ended up with multiple
         | spec/architecture/module documents. I got to the point where I
         | ended up building my own system (using claude) to capture and
         | generate artifacts, in more of a systems engineering style
         | (e.g. following IEEE standards for conops, requirement
         | documents, software definitions, test plans...). I don't use
         | that for session-level planning; Claude's tools work fine for
         | that. (I like superpowers, so far. It hasn't seemed too much)
         | 
         | I have found it to work very well with Claude by giving it
         | context and guardrails. Basically I just tell it "follow the
         | guidance docs" and it does. Couple that with intense testing
         | and self-feedback mechanisms and you can easily keep Claude on
         | track.
         | 
         | I have had the same experience with Codex and Claude as you in
         | terms of token usage. But I haven't been happy with my Codex
         | usage; Claude just feels like it's doing more of what I want in
         | the way I want.
        
       | cowlby wrote:
       | I recently discovered GitHub speckit which separates
       | planning/execution in stages: specify, plan, tasks, implement.
       | Finding it aligns with the OP with the level of "focus" and
       | "attention" this gets out of Claude Code.
       | 
       | Speckit is worth trying as it automates what is being described
       | here, and with Opus 4.6 it's been a kind of BC/AD moment for me.
        
       | recroad wrote:
       | Use OpenSpec and simplify everything.
        
       | recroad wrote:
       | Try OpenSpec and it'll do all this for you. SpecKit works too. I
       | don't think there's a need to reinvent the wheel on this one, as
       | this is spec-driven development.
        
       | bodeadly wrote:
       | Tip: LLMs are very good at following conventions (this is
       | actually what is happening when it writes code). If you create a
       | .md file with a list of entries of the following structure: #
       | <identifier> <description block> <blank space> # <identifier> ...
       | where an <identifier> is a stable and concise sequence of tokens
       | that identifies some "thing" and seed it with 5 entries
       | describing abstract stuff, the LLM will latch on and reference
       | this. I call this a PCL (Project Concept List). I just tell it: >
       | consume tmp/pcl-init.md pcl.md The pcl-init.md describes what PCL
       | is and pcl.md is the actual list. I have pcl.md file for each
       | independent component in the code (logging, http, auth, etc).
       | This works very very well. The LLM seems to "know" what you're
       | talking about. You can ask questions and give instructions like
       | "add a PCL entry about this". It will ask if should add a PCL
       | entry about xyz. If the description block tends to be high
       | information-to-token ratio, it will follow that convention (which
       | is a very good convention BTW).
       | 
       | However, there is a caveat. LLMs resist ambiguity about
       | authority. So the "PCL" or whatever you want to call it, needs to
       | be the ONE authoritative place for everything. If you have the
       | same stuff in 3 different files, it won't work nearly as well.
       | 
       | Bonus Tip: I find long prompt input with example code fragments
       | and thoughtful descriptions work best at getting an LLM to
       | produce good output. But there will always be holes (resource
       | leaks, vulnerabilities, concurrency flaws, etc). So then I update
       | my original prompt input (keep it in a separate file PROMPT.txt
       | as a scratch pad) to add context about those things maybe asking
       | questions along the way to figure out how to fix the holes. Then
       | I /rewind back to the prompt and re-enter the updated prompt.
       | This feedback loop advances the conversation without expending
       | tokens.
        
       | imron wrote:
       | I have tried using this and other workflows for a long time and
       | had never been able to get them to work (see chat history for
       | details).
       | 
       | This has changed in the last week, for 3 reasons:
       | 
       | 1. Claude opus. It's the first model where I haven't had to spend
       | more time correcting things than it would've taken me to just do
       | it myself. The problem is that opus chews through tokens, which
       | led to..
       | 
       | 2. I upgraded my Claude plan. Previously on the regular plan I'd
       | get about 20 mins of time before running out of tokens for the
       | session and then needing to wait a few hours to use again. It was
       | fine for little scripts or toy apps but not feasible for the
       | regular dev work I do. So I upgraded to 5x. This now got me 1-2
       | hours per session before tokens expired. Which was better but
       | still a frustration. Wincing at the price, I upgraded again to
       | the 20x plan and this was the next game changer. I had plenty of
       | spare tokens per session and at that price it felt like they were
       | being wasted - so I ramped up my usage. Following a similar
       | process as OP but with a plans directory with subdirectories for
       | backlog, active and complete plans, and skills with strict rules
       | for planning, implementing and completing plans, I now have 5-6
       | projects on the go. While I'm planning a feature on one the
       | others are implementing. The strict plans and controls keep them
       | on track and I have follow up skills for auditing quality and
       | performance. I still haven't hit token limits for a session but
       | I've almost hit my token limit for the week so I feel like I'm
       | getting my money's worth. In that sense spending more has forced
       | me to figure out how to use more.
       | 
       | 3. The final piece of the puzzle is using opencode over claude
       | code. I'm not sure why but I just don't gel with Claude code.
       | Maybe it's all the sauteing and flibertygibbering, maybe it's all
       | the permission asking, maybe it's that it doesn't show what it's
       | doing as much as opencode. Whatever it is it just doesn't work
       | well for me. Opencode on the other hand is great. It's shows what
       | it's doing and how it's thinking which makes it easy for me to
       | spot when it's going off track and correct early.
       | 
       | Having a detailed plan, and correcting and iterating on the plan
       | is essential. Making clause follow the plan is also essential -
       | but there's a line. Too fine grained and it's not as creative at
       | solving problems. Too loose/high level and it makes bad choices
       | and goes in the wrong direction.
       | 
       | Is it actually making me more productive? I think it is but I'm
       | only a week in. I've decided to give myself a month to see how it
       | all works out.
       | 
       | I don't intend to keep paying for the 20x plan unless I can see a
       | path to using it to earn me at least as much back.
        
         | raw_anon_1111 wrote:
         | Just don't use Claude Code. I can use the Codex CLI with just
         | my $20 subscription and never come close to any usage limits
        
           | throwawaytea wrote:
           | What if it's just slower so that your daily work fits within
           | the paid tier they want?
        
             | raw_anon_1111 wrote:
             | It isn't slower. I use my personal ChatGPT subscriptions
             | with Codex for almost everything at work and use my
             | $800/month company Claude allowance only for the tricky
             | stuff that Codex can't figure out. It's never application
             | code. It's usually some combination of app code + Docker +
             | AWS issue with my underlying infrastructure - created with
             | whatever IAC that I'm using for a client -
             | Terraform/CloudFormation or the CDK.
             | 
             | I burned through $10 on Claude in less than an hour. I only
             | have $36 a day at $800 a month (800/22 working days)
        
               | imron wrote:
               | > and use my $800/month company Claude allowance only for
               | the tricky stuff that Codex can't figure out.
               | 
               | It doesn't seem controversial that the model that can
               | solve more complex problems (that you admit the cheaper
               | model can't solve) costs more.
               | 
               | For the things I use it for, I've not found any other
               | model to be worth it.
        
               | raw_anon_1111 wrote:
               | You're assuming rational behavior from a company that
               | doesn't care about losing billions of dollar.
               | 
               | Have you tried Codex with OpenAi's latest models?
        
               | imron wrote:
               | Not in the last 2 months.
               | 
               | Current clause subscription is a sunk cost for the next
               | month. Maybe I'll try codex if Claude doesn't lead
               | anywhere.
        
               | raw_anon_1111 wrote:
               | I use both. As I'm working, I tell each of them to update
               | a common document with the conversation. I don't just
               | tell Claude the what. I tell it the why and have it
               | document it.
               | 
               | I can switch back and forth and use the MD file as shared
               | context.
        
               | ValentineC wrote:
               | Curious: what are some cases where it'd make sense to
               | _not_ pay for the 20x plan (which is $200 /month), and
               | provide a whopping $800/month pay-per-token allowance
               | instead?
        
               | raw_anon_1111 wrote:
               | Who knows? It's part of an enterprise plan. I work for a
               | consulting company. There are a number of fallbacks, the
               | first fallback if we are working on an internal project
               | is just to use our internal AWS account and use Claude
               | code with the Anthropic hosted on Bedrock.
               | 
               | https://code.claude.com/docs/en/amazon-bedrock
               | 
               | The second fallback if it is for a customer project is to
               | use their AWS account for development for them.
               | 
               | The rate my company charges for me - my level as an
               | American based staff consultant (highest bill rate at the
               | company) they are happy to let us use Claude Code using
               | their AWS credentials. Besides, if we are using AWS
               | Bedrock hosted Anthropic models, they know none of their
               | secrets are going to Anthropic. They already have the
               | required legal confidentiality/compliancd agreements with
               | AWS.
        
       | RHSeeger wrote:
       | > Most developers type a prompt, sometimes use plan mode, fix the
       | errors, repeat.
       | 
       | > ...
       | 
       | > never let Claude write code until you've reviewed and approved
       | a written plan
       | 
       | I certainly always work towards an approved plan before I let it
       | lost on changing the code. I just assumed most people did,
       | honestly. Admittedly, sometimes there's "phases" to the
       | implementation (because some parts can be figured out later and
       | it's more important to get the key parts up and running first),
       | but each phase gets a full, reviewed plan before I tell it to go.
       | 
       | In fact, I just finished writing a command and instruction to
       | tell claude that, when it presents a plan for implementation,
       | offer me another option; to write out the current (important
       | parts of the) context and the full plan to individual (ticket
       | specific) md files. That way, if something goes wrong with the
       | implementation I can tell it to read those files and "start from
       | where they left off" in the planning.
        
         | ramoz wrote:
         | The author seems to think theyve invented a special workflow...
         | 
         | We all tend to regress to average (same thoughts/workflows)...
         | 
         | Have had many users already doing the exact same workflow with:
         | https://github.com/backnotprop/plannotator
        
           | CGamesPlay wrote:
           | 4 times in one thread, please stop spamming this link.
        
       | bandrami wrote:
       | How much time are you actually saving at this point?
        
       | red_hare wrote:
       | I use Claude Code for lecture prep.
       | 
       | I craft a detailed and ordered set of lecture notes in a Quarto
       | file and then have a dedicated claude code skill for translating
       | those notes into Slidev slides, in the style that I like.
       | 
       | Once that's done, much like the author, I go through the slides
       | and make commented annotations like "this should be broken into
       | two slides" or "this should be a side-by-side" or "use your
       | generate clipart skill to throw an image here alongside these
       | bullets" and "pull in the code example from ../examples/foo." It
       | works brilliantly.
       | 
       | And then I do one final pass of tweaking after that's done.
       | 
       | But yeah, annotations are super powerful. Token distance in-
       | context and all that jazz.
        
         | ramoz wrote:
         | is your skill open source
        
           | red_hare wrote:
           | Not yet... but also I'm not sure it makes a lot of sense to
           | be open source. It's super specific to how I like to build
           | slide decks and to my personal lecture style.
           | 
           | But it's not hard to build one. The key for me was
           | describing, in great detail:
           | 
           | 1. How I want it to read the source material (e.g., H1 means
           | new section, H2 means at least one slide, a link to an
           | example means I want code in the slide)
           | 
           | 2. How to connect material to layouts (e.g., "comparison
           | between two ideas should be a two-cols-title," "walkthrough
           | of code should be two-cols with code on right," "learning
           | objectives should be side-title align:left," "recall should
           | be side-title align:right")
           | 
           | Then the workflow is:
           | 
           | 1. Give all those details and have it do a first pass.
           | 
           | 2. Give tons of feedback.
           | 
           | 3. At the end of the session, ask it to "make a skill."
           | 
           | 4. Manually edit the skill so that you're happy with the
           | examples.
        
         | saxelsen wrote:
         | Can I ask how you annotate the feedback for it? Just with
         | inline comments like `# This should be changed to X`?
         | 
         | The author mentions annotations but doesn't go into detail
         | about how to feed the annotations to Claude.
        
           | red_hare wrote:
           | Slidev is markdown, so i do it in html comments. Usually
           | something like:                   <!-- TODOCLAUDE: Split this
           | into a two-cols-title, divide the examples between -->
           | 
           | or                   <!-- TODOCLAUDE: Use clipart skill to
           | make an image for this slide -->
           | 
           | And then, when I finish annotating I just say: "Address all
           | the TODOCLAUDEs"
        
       | jrs235 wrote:
       | Claude appeared to just crash in my session:
       | https://news.ycombinator.com/item?id=47107630
        
       | zhubert wrote:
       | AI only improves and changes. Embrace the scientific method and
       | make sure your "here's how to" are based in data.
        
       | h14h wrote:
       | Is this not just Ralph with extra steps and the risk of context
       | rot?
        
       | Ozzie_osman wrote:
       | There are a few prompt frameworks that essentially codify these
       | types of workflows by adding skills and prompts
       | 
       | https://github.com/obra/superpowers https://github.com/jlevy/tbd
        
       | politician wrote:
       | Wow, I never bother with using phrases like "deeply study this
       | codebase deeply." I consistently get pretty fantastic results.
        
       | dworks wrote:
       | my rlm-workflow skill has this encoded as a repeatable workflow.
       | 
       | give it a try: https://skills.sh/doubleuuser/rlm-workflow/rlm-
       | workflow
        
       | beratbozkurt0 wrote:
       | That's great, actually, doesn't the logic apply to other services
       | as well?
        
       | bluegatty wrote:
       | I don't see how this is 'radically different' given that Claude
       | Code literally has a planning mode.
       | 
       | This is my workflow as well, with the big caveat that 80% of
       | 'work' doesn't require substantive planning, we're making
       | relatively straight forward changes.
       | 
       | Edit: there is nothing fundamentally different about 'annotating
       | offline' in an MD vs in the CLI and iterating until the plan is
       | clear. It's a UI choice.
       | 
       | Spec Driven Coding with AI is very well established, so working
       | from a plan, or spec (they can be somewhat different) is not
       | novel.
       | 
       | This is conventional CC use.
        
         | dack wrote:
         | last i checked, you can't annotate inline with planning mode.
         | you have to type a lot to explain precisely what needs to
         | change, and then it re-presents you with a plan (which may or
         | may not have changed something else).
         | 
         | i like the idea of having an actual document because you could
         | actually compare the before and after versions if you wanted to
         | confirm things changed as intended when you gave feedback
        
           | bluegatty wrote:
           | 'Giving precise feedback on a plan' is literally annotating
           | the plan.
           | 
           | It comes back to you with an update for verification.
           | 
           | You ask it to 'write the plan' as matter of good practice.
           | 
           | What the author is describing is conventional usage of claude
           | code.
        
           | gitaarik wrote:
           | A plan is just a file you can edit and then tell CC to check
           | your annotations
        
       | cadamsdotcom wrote:
       | The author is quite far on their journey but would benefit from
       | writing simple scripts to enforce invariants in their codebase.
       | Invariant broken? Script exits with a non-zero exit code and some
       | output that tells the agent how to address the problem. Scripts
       | are deterministic, run in milliseconds, and use zero tokens. Put
       | them in husky or pre-commit, install the git hooks, and your
       | agent won't be able to commit without all your scripts
       | succeeding.
       | 
       | And "Don't change this function signature" should be enforced not
       | by anticipating that your coding agent "might change this
       | function signature so we better warn it not to" but rather via an
       | end to end test that fails if the function signature is changed
       | (because the other code that needs it not to change now has an
       | error). That takes the author out of the loop and they can not
       | watch for the change in order to issue said correction, and
       | instead sip coffee while the agent observes that it caused a test
       | failure then corrects it without intervention, probably by
       | rolling back the function signature change and changing something
       | else.
        
       | dennisjoseph wrote:
       | The annotation cycle is the key insight for me. Treating the plan
       | as a living doc you iterate on before touching any code makes a
       | huge difference in output quality.
       | 
       | Experimentally, i've been using mfbt.ai [https://mfbt.ai] for
       | roughly the same thing in a team context. it lets you
       | collaboratively nail down the spec with AI before handing off to
       | a coding agent via MCP.
       | 
       | Avoids the "everyone has a slightly different plan.md on their
       | machine" problem. Still early days but it's been a nice fit for
       | this kind of workflow.
        
         | minikomi wrote:
         | I agree, and this is why I tend to use gptel in emacs for
         | planning - the document is the conversation context, and can be
         | edited and annotated as you like.
        
       | Frannky wrote:
       | I tried Opus 4.6 recently and it's really good. I had ditched
       | Claude a long time ago for Grok + Gemini + OpenCode with Chinese
       | models. I used Grok/Gemini for planning and core files, and
       | OpenCode for setup, running, deploying, and editing.
       | 
       | However, Opus made me rethink my entire workflow. Now, I do it
       | like this:
       | 
       | * PRD (Product Requirements Document)
       | 
       | * main.py + requirements.txt + readme.md (I ask for minimal,
       | functional, modular code that fits the main.py)
       | 
       | * Ask for a step-by-step ordered plan
       | 
       | * Ask to focus on one step at a time
       | 
       | The super powerful thing is that I don't get stuck on missing
       | accounts, keys, etc. Everything is ordered and runs smoothly. I
       | go rapidly from idea to working product, and it's incredibly easy
       | to iterate if I figure out new features are required while
       | testing. I also have GLM via OpenCode, but I mainly use it for
       | "dumb" tasks.
       | 
       | Interestingly, for reasoning capabilities regarding standard
       | logic inside the code, I found Gemini 3 Flash to be very good and
       | relatively cheap. I don't use Claude Code for the actual coding
       | because forcing everything via chat into a main.py encourages
       | minimal code that's easy to skim--it gives me a clearer
       | representation of the feature space
        
       | achenatx wrote:
       | I use amazon kiro.
       | 
       | The AI first works with you to write requirements, then it
       | produces a design, then a task list.
       | 
       | The helps the AI to make smaller chunks to work on, it will work
       | on one task at a time.
       | 
       | I can let it run for an hour or more in this mode. Then there is
       | lots of stuff to fix, but it is mostly correct.
       | 
       | Kiro also supports steering files, they are files that try to
       | lock the AI in for common design decisions.
       | 
       | the price is that a lot of the context is used up with these
       | files and kiro constantly pauses to reset the context.
        
       | amarant wrote:
       | Interesting! I feel like I'm learning to code all over again!
       | I've only been using Claude for a little more than a month and
       | until now I've been figuring things out on my own. Building my
       | methodology from scratch. This is much more advanced than what
       | I'm doing. I've been going straight to implementation, but doing
       | one very small and limited feature at a time, describing
       | implementation details (data structures like this, use that API
       | here, import this library etc) verifying it manually, and having
       | Claude fix things I don't like. I had just started getting
       | annoyed that it would make the same (or very similar) mistake
       | over and over again and I would have to fix it every time. This
       | seems like it'll solve that problem I had only just identified!
       | Neat!
        
       | w4yai wrote:
       | You described how AntiGravity works natively.
        
       | zmmmmm wrote:
       | I actually don't really like a few of things about this approach.
       | 
       | First, the "big bang" write it all at once. You are going to end
       | up with thousands of lines of code that were monolithically
       | produced. I think it is much better to have it write the plan and
       | formulate it as sensible technical steps that can be completed
       | one at a time. Then you can work through them. I get that this is
       | not very "vibe"ish but that is kind of the point. I want the AI
       | to help me get to the same point I would be at with produced code
       | AND understanding of it, just accelerate that process. I'm not
       | really interested in just generating thousands of lines of code
       | that nobody understands.
       | 
       | Second, the author keeps refering to adjusting the behaviour, but
       | never incorporating that into long lived guidance. To me,
       | integral with the planning process is building an overarching
       | knowledge base. Every time you're telling it there's something
       | wrong, you need to tell it to update the knowledge base about why
       | so it doesn't do it again.
       | 
       | Finally, no mention of tests? Just quick checks? To me, you have
       | to end up with comprehensive tests. Maybe to the author it goes
       | without saying, but I find it is integral to build this into the
       | planning. Certain stages you will want certain types of tests.
       | Some times in advance of the code (so TDD style) other times
       | built alongside it or after.
       | 
       | It's definitely going to be interesting to see how software
       | methodology evolves to incorporate AI support and where it
       | ultimately lands.
        
         | girvo wrote:
         | The articles approach matches mine, but I've learned from
         | exactly the things you're pointing out.
         | 
         | I get the PLAN.md (or equivalent) to be separated into "phases"
         | or stages, then carefully prompt (because Claude and Codex both
         | love to "keep going") it to only implement that stage, and
         | update the PLAN.md
         | 
         | Tests are crucial too, and form another part of the plan
         | really. Though my current workflow begins to build them later
         | in the process than I would prefer...
        
       | armanj wrote:
       | > "remove this section entirely, we don't need caching here" --
       | rejecting a proposed approach
       | 
       | I wonder why you don't remove it yourself. Aren't you already
       | editing the plan?
        
       | dnautics wrote:
       | this is literally reinventing claude's planning mode, but with
       | more steps. I think Boris doesn't realize that planning mode is
       | actually stored in a file.
       | 
       | https://x.com/boristane/status/2021628652136673282
        
       | prodtorok wrote:
       | Insights are nice for new users but I'm not seeing anything too
       | different from how anyone experienced with Claude Code would use
       | plan mode. You can reject plans with feedback directly in the
       | CLI.
        
       | tabs_or_spaces wrote:
       | My workflow is a bit different.
       | 
       | * I ask the LLM for it's understanding of a topic or an existing
       | feature in code. It's not really planning, it's more like
       | understanding the model first
       | 
       | * Then based on its understanding, I can decide how great or
       | small to scope something for the LLM
       | 
       | * An LLM showing good understand can deal with a big task fairly
       | well.
       | 
       | * An LLM showing bad understanding still needs to be prompted to
       | get it right
       | 
       | * What helps a lot is reference implementations. Either I have
       | existing code that serves as the reference or I ask for a
       | reference and I review.
       | 
       | A few folks do it at my work do it OPs way, but my arguments for
       | not doing it this way
       | 
       | * Nobody is measuring the amount of slop within the plan. We only
       | judge the implementation at the end
       | 
       | * it's still non deterministic - folks will have different
       | experiences using OPs methods. If claude updates its model, it
       | outdates OPs suggestions by either making it better or worse. We
       | don't evaluate when things get better, we only focus on things
       | not gone well.
       | 
       | * it's very token heavy - LLM providers insist that you use many
       | tokens to get the task done. It's in their best interest to get
       | you to do this. For me, LLMs should be powerful enough to
       | understand context with minimal tokens because of the investment
       | into model training.
       | 
       | Both ways gets the task done and it just comes down to my
       | preference for now.
       | 
       | For me, I treat the LLM as model training + post processing +
       | input tokens = output tokens. I don't think this is the best way
       | to do non deterministic based software development. For me, we're
       | still trying to shoehorn "old" deterministic programming into a
       | non deterministic LLM.
        
       | umairnadeem123 wrote:
       | The multi-pass approach works outside of code too. I run a fairly
       | complex automation pipeline (prompt -> script -> images -> audio
       | -> video assembly) and the single biggest quality improvement was
       | splitting generation into discrete planning and execution phases.
       | One-shotting a 10-step pipeline means errors compound. Having the
       | LLM first produce a structured plan, then executing each step
       | against that plan with validation gates between them, cut my
       | failure rate from maybe 40% to under 10%. The planning doc also
       | becomes a reusable artifact you can iterate on without re-running
       | everything.
        
       | wokwokwok wrote:
       | This is the way.
       | 
       | The practice is:
       | 
       | - simple
       | 
       | - effective
       | 
       | - retains control and quality
       | 
       | Certainly the "unsupervised agent" workflows are getting a lot of
       | attention right now, but they require a specific set of
       | circumstances to be effective:
       | 
       | - clear validation loop (eg. Compile the kernel, here is gcc that
       | does so correctly)
       | 
       | - ai enabled tooling (mcp / cli tool that will lint, test and
       | provide feedback immediately)
       | 
       | - oversight to prevent sgents going off the rails (open area of
       | research)
       | 
       | - an unlimited token budget
       | 
       | That means that _most people_ can 't use unsupervised agents.
       | 
       | Not that they dont work; Most people have simply not got an
       | environment and task that is appropriate.
       | 
       | By comparison, anyone with cursor or claude can _immediately
       | start using this approach_ , or their own variant on it.
       | 
       | It does not require fancy tooling.
       | 
       | It does not require an arcane agent framework.
       | 
       | It works generally well across models.
       | 
       | This is one of those few genunie pieces of good practical advice
       | for people getting into AI coding.
       | 
       | Simple. Obviously works once you start using it. No external
       | dependencies. BYO tools to help with it, no "buy my AI startup
       | xxx to help". No "star my github so I can a job at $AI corp too".
       | 
       | Great stuff.
        
         | epec254 wrote:
         | Huge +1. This loop consistently delivers great results for my
         | vibe coding.
         | 
         | The "easy" path of "short prompt declaring what I want" works
         | OK for simple tasks but consistently breaks down for medium to
         | high complexity tasks.
        
           | apsurd wrote:
           | Can you help me understand the difference between "short
           | prompt for what I want (next)" vs medium to high complexity
           | tasks?
           | 
           | What i mean is, in practice, how does one even get to a a
           | high complexity task? What does that look like? Because isn't
           | it more common that one sees only so far ahead?
        
         | dnautics wrote:
         | It's more or less what comes out of the box with plan mode,
         | plus a few extra bits?
        
         | wazHFsRy wrote:
         | Absolutely. And you can also always let the agent look back at
         | the plan to check if it is still on track and aligned.
         | 
         | One step I added, that works great for me, is letting it write
         | (api-level) tests after planning and before implementation.
         | Then I'll do a deep review and annotation of these tests and
         | tweak them until everything is just right.
        
         | basch wrote:
         | Honesty this is just language models in general at the moment,
         | and not just coding.
         | 
         | It's the same reason adding a thinking step works.
         | 
         | You want to write a paper, you have it form a thesis and
         | structure first. (In this one you might be better off asking
         | for 20 and seeing if any of them are any good.) You want to
         | research something, first you add gathering and filtering steps
         | before synthesis.
         | 
         | Adding smarter words or telling it to be deeper does work by
         | slightly repositioning where your query ends up in space.
         | 
         | Asking for the final product first right off the bat leads to
         | repetitive verbose word salad. It just starts to loop back in
         | on itself. Which is why temperature was a thing in the first
         | place, and leads me to believe they've turned the temp down a
         | bit to try and be more accurate. Add some randomness and
         | variability to your prompts to compensate.
        
       | turingsroot wrote:
       | I've been teaching AI coding tool workshops for the past year and
       | this planning-first approach is by far the most reliable pattern
       | I've seen across skill levels.
       | 
       | The key insight that most people miss: this isn't a new workflow
       | invented for AI - it's how good senior engineers already work.
       | You read the code deeply, write a design doc, get buy-in, then
       | implement. The AI just makes the implementation phase
       | dramatically faster.
       | 
       | What I've found interesting is that the people who struggle most
       | with AI coding tools are often junior devs who never developed
       | the habit of planning before coding. They jump straight to "build
       | me X" and get frustrated when the output is a mess. Meanwhile,
       | engineers with 10+ years of experience who are used to writing
       | design docs and reviewing code pick it up almost instantly -
       | because the hard part was always the planning, not the typing.
       | 
       | One addition I'd make to this workflow: version your research.md
       | and plan.md files in git alongside your code. They become
       | incredibly valuable documentation for future maintainers
       | (including future-you) trying to understand why certain
       | architectural decisions were made.
        
         | hghbbjh wrote:
         | > it's how good senior engineers already work
         | 
         | The other trick all good ones I've worked with converged on:
         | it's quicker to write code than review it (if we're being
         | thorough). Agents have some areas where they can really shine
         | (boilerplate you should maybe have automated already being
         | one), but most of their speed comes from passing the quality
         | checking to your users or coworkers.
         | 
         | Juniors and other humans are valuable because eventually I
         | trust them enough to not review their work. I don't know if
         | LLMs can ever get here for serious industries.
        
       | DevEx7 wrote:
       | I'm a big fan of having the model create a GitHub issue directly
       | (using the GH CLI) with the exact plan it generates, instead of
       | creating a markdown file that will eventually get deleted. It
       | gives me a permanent record and makes it easy to reference and
       | close the issue once the PR is ready.
        
       | RVuRnvbM2e wrote:
       | This is just Waterfall for LLMs. What happens when you explore
       | the problem space and need to change up the plan?
        
       | mukundesh wrote:
       | https://github.blog/ai-and-ml/generative-ai/spec-driven-deve...
        
       | paradite wrote:
       | Lol I wrote about this and been using plan+execute workflow for 8
       | months.
       | 
       | Sadly my post didn't much attention at the time.
       | 
       | https://thegroundtruth.media/p/my-claude-code-workflow-and-p...
        
       | wangzhongwang wrote:
       | Interesting approach. The separation of planning and execution is
       | crucial, but I think there's a missing layer most people
       | overlook: permission boundaries between the two phases.
       | 
       | Right now when Claude Code (or any agent) executes a plan, it
       | typically has the same broad permissions for every step. But
       | ideally, each execution step should only have access to the
       | specific tools and files it needs -- least privilege, applied to
       | AI workflows.
       | 
       | I've been experimenting with declarative permission manifests for
       | agent tasks. Instead of giving the agent blanket access, you
       | define upfront what each skill can read, write, and execute.
       | Makes the planning phase more constrained but the execution phase
       | much safer.
       | 
       | Anyone else thinking about this from a security-first angle?
        
       | connectsnk wrote:
       | Is it required to tell Claude to re-read the code folder again
       | when you come back some day later or should we ask Claude to just
       | pickup from research.md file thus saving some tokens?
        
       | mkl wrote:
       | How are the annotations put into the markdown? Claude needs to be
       | able to identify them as annotations and not parts of the plan.
        
       | vibeprofessor wrote:
       | add another agent review, I ask Claude to send plan for review to
       | Codex and fix critical and high issues, with complexity gating
       | (no overcomplicated logic), run in a loop, then send to Gemini
       | reviewer, then maybe final pass with Claude, once all C+H pass
       | the sequence is done
        
       | throwaway7783 wrote:
       | I have to give this a try. My current model for backend is the
       | same as how author does frontend iteration. My friend does the
       | research-plan-edit-implement loop, and there is no real
       | difference between the quality of what I do and what he does. But
       | I do like this just for how it serves as documentation of the
       | thought process across AI/human, and can be added to version
       | control. Instead of humans reviewing PRs, perhaps humans can
       | review the research/plan document.
       | 
       | On the PR review front, I give Claude the ticket number and the
       | branch (or PR) and ask it to review for correctness, bugs and
       | design consistency. The prompt is always roughly the same for
       | every PR. It does a very good job there too.
       | 
       | Modelwise, Opus 4.6 is scary good!
        
       | rotbart wrote:
       | This is a similar workflow to speckit, kiro, gsd, etc.
        
       | efnx wrote:
       | I've been using Claude through opencode, and I figured this was
       | just how it does it. I figured everyone else did it this way as
       | well. I guess not!
        
       | swe_dima wrote:
       | Since everyone is showing their flow, here's mine:
       | 
       | * create a feature-name.md file in a gitignored folder
       | 
       | * start the file by giving the business context
       | 
       | * describe a high-level implementation and user flows
       | 
       | * describe database structure changes (I find it important not to
       | leave it for interpretation)
       | 
       | * ask Claude to inspect the feature and review if for coherence,
       | while answering its questions I ask to augment feature-name.md
       | file with the answers
       | 
       | * enter Claude's plan mode and provide that feature-name.md file
       | 
       | * at this point it's detailed enough that rarely any corrections
       | from me are needed
        
       | Merad wrote:
       | I've been working off and on on a vibe coded FP language and
       | transpiler - mostly just to get more experience with Claude Code
       | and see how it handles complex real world projects. I've settled
       | on a very similar flow, though I use three documents: plan,
       | context, task list. Multiple rounds of iteration when planning a
       | feature. After completion, have a clean session do an audit to
       | confirm that everything was implemented per the design. Then I
       | have both Claude and CodeRabbit do code review passes before I
       | finally do manual review. VERY heavy emphasis on tests, the
       | project currently has 2x more test code than application code. So
       | far it works surprisingly well. Example planning docs below -
       | 
       | https://github.com/mbcrawfo/vibefun/tree/main/.claude/archiv...
        
       | kulikalov wrote:
       | I came to the exact same pattern, with one extra heuristic at the
       | end: spin up a new claude instance after the implementation is
       | complete and ask it to find discrepancies between the plan and
       | the implementation.
        
       | zahlman wrote:
       | > After Claude writes the plan, I open it in my editor and add
       | inline notes directly into the document. These notes correct
       | assumptions, reject approaches, add constraints, or provide
       | domain knowledge that Claude doesn't have.
       | 
       | This is the part that seems most novel compared to what I've
       | heard suggested before. And I have to admit I'm a bit skeptical.
       | Would it not be better to modify what Claude has written
       | directly, to make it correct, rather than adding the corrections
       | as separate notes (and expecting future Claude to parse out which
       | parts were past Claude and which parts were the operator, and
       | handle the feedback graciously)?
       | 
       | At least, it seems like the intent is to do all of this in the
       | same session, such that Claude has the context of the entire
       | back-and-forth updating the plan. But that seems a bit
       | unpleasant; I would think the file is there specifically to
       | preserve context between sessions.
        
         | fendy3002 wrote:
         | One reason why I don't do this: even I won't be immune to
         | mistakes. When I fix it with new values or paths, for example,
         | and the one I provided is wrong, it can worsen the future work.
         | 
         | Personally, I like to order claude one more time to update the
         | plan file after I have given annotation, and review it again
         | after. This will ensure (from my understanding) that claude
         | won't treat my annotation as different instructions, thus
         | risking the work being conflicted.
        
         | ramoz wrote:
         | The whole process feels Socratic which is why I and a lot of
         | other folks use plan annotation tools already. In my workflow I
         | had a great desire to tell the agent what I didn't like about
         | the plan vs just fix it myself - because I wanted the agent to
         | fix its own plan.
        
       | strix_varius wrote:
       | The baffling part of the article is all the assertions about how
       | this is unique, novel, not the typical way people are doing this
       | etc.
       | 
       | There are whole products wrapped around this common workflow
       | already (like Augment Intent).
        
       | raptorraver wrote:
       | I've been using this same pattern, except not the research phase.
       | Definetly will try to add it to my process aswell.
       | 
       | Sometimes when doing big task I ask claude to implement each
       | phase seprately and review the code after each step.
        
       | lxe wrote:
       | Honestly, I found that the best way to use these CLIs is exactly
       | how the CLI creators have intended.
        
       | nerdright wrote:
       | Haha this is surprisingly and exactly how I use claude as well.
       | Quite fascinating that we independently discovered the same
       | workflow.
       | 
       | I maintain two directories: "docs/proposals" (for the research md
       | files) and "docs/plans" (for the planning md files). For complex
       | research files, I typically break them down into multiple
       | planning md files so claude can implement one at a time.
       | 
       | A small difference in my workflow is that I use subagents during
       | implementation to avoid context from filling up quickly.
        
         | brendanmc6 wrote:
         | Same, I formalized a similar workflow for my team (oriented
         | around feature requirement docs), I am thinking about fully
         | productizing it and am looking to for feedback -
         | https://acai.sh
         | 
         | Even if the product doesn't resonate I think I've stumbled on
         | some ideas you might find useful^
         | 
         | I do think spec-driven development is where this all goes.
         | Still making up my mind though.
        
           | clouedoc wrote:
           | This is basically long-lived specs that are used as tests to
           | check that the product still adheres to the original idea
           | that you wanted to implement, right?
           | 
           | This inspired me to finally write good old playwright tests
           | for my website :).
        
           | puchatek wrote:
           | Spec-driven looks very much like what the author describes.
           | He may have some tweaks of his own but they could just as
           | well be coded into the artifacts that something like OpenSpec
           | produces.
        
       | rossant wrote:
       | Funny how I came up with something loosely similar. Asking Codex
       | to write a detailed plan in a markdown document, reviewing it,
       | and asking it to implement it step by step. It works exquisitely
       | well when it can build and test itself.
        
       | duttish wrote:
       | This is quite close to what I've arrived at, but with two
       | modifications
       | 
       | 1) anything larger I work on in layers of docs. Architecture and
       | requirements -> design -> implementation plan -> code. Partly it
       | helps me think and nail the larger things first, and partly helps
       | claude. Iterate on each level until I'm satisfied.
       | 
       | 2) when doing reviews of each doc I sometimes restart the session
       | and clear context, it often finds new issues and things to clear
       | up before starting the next phase.
        
       | mvkel wrote:
       | > the workflow I've settled into is radically different from what
       | most people do with AI coding tools
       | 
       | This looks exactly like what anthropic recommends as the best
       | practice for using Claude Code. Textbook.
       | 
       | It also exposes a major downside of this approach: if you don't
       | plan perfectly, you'll have to start over from scratch if
       | anything goes wrong.
       | 
       | I've found a much better approach in doing a design -> plan ->
       | execute in batches, where the plan is no more than 1,500 lines,
       | used as a proxy for complexity.
       | 
       | My 30,000 LOC app has about 100,000 lines of plan behind it.
       | Can't build something that big as a one-shot.
        
         | Bishonen88 wrote:
         | Dunno. My 80k+ LOC personal life planner, with a native android
         | app, eink display view still one shots most features/bugs I
         | encounter. I just open a new instance let it know what I want
         | and 5min later it's done.
        
           | makeramen wrote:
           | Both can be true. I have personally experienced both.
           | 
           | Some problems AI surprised me immensely with fast, elegant
           | efficient solutions and problem solving. I've also
           | experienced AI doing totally absurd things that ended up
           | taking multiple times longer than if I did it manually.
           | Sometimes in the same project.
        
           | vasco wrote:
           | What is a personal life planner?
        
             | Bishonen88 wrote:
             | Todos, habits, goals, calendar, meals, notes, bookmarks,
             | shopping lists, finances. More or less that with Google cal
             | integration, garmin Integration (Auto updates workout
             | habits, weight goals) family sharing/gamification,
             | daily/weekly reviews, ai summaries and more. All built by
             | just prompting Claude for feature after feature, with me
             | writing 0 lines.
        
               | puchatek wrote:
               | Is it on GH?
        
               | Bishonen88 wrote:
               | It was when I mvp'd it 3 weeks ago. Then I removed it as
               | I was toying with the idea of somehow monetizing it. Then
               | I added a few features which would make monetization
               | impossible (e.g. How the app obtains etf/stock prices
               | live and some other things). I reckon I could remove
               | those and put in gh during the week if I don't forget.
               | The quality of the Web app is SaaS grade IMO. Keyboard
               | shortcuts, cmd+k, natural language parsing, great ui that
               | doesn't look like made by ai in 5min. Might post here the
               | link.
        
               | mstkllah wrote:
               | Would love to check it out too once you put it up.
        
               | vasco wrote:
               | Ah, I imagined actual life planning as in asking AI what
               | to do, I was morbidly curious.
               | 
               | Prompting basic notes apps is not as exciting but I can
               | see how people who care about that also care about it
               | being exactly a certain way, so I think get your
               | excitement.
        
           | therealdrag0 wrote:
           | In 5 min you are one shotting smaller changes to the larger
           | code base right? Not the entire 80k likes which was the other
           | comments point afaict.
        
             | Bishonen88 wrote:
             | Yeah, then I guess I misunderstood the post. Its smaller
             | features one by one ofc.
        
           | PacificSpecific wrote:
           | If you wouldn't mind sharing more about this in the future
           | I'd love to read about it.
           | 
           | I've been thinking about doing something like that myself
           | because I'm one of those people who have tried countless apps
           | but there's always a couple deal breakers that cause me to
           | drop the app.
           | 
           | I figured trying to agentically develop a planner app with
           | the exact feature set I need would be an interesting and fun
           | experiment.
        
         | onion2k wrote:
         | _if you don 't plan perfectly, you'll have to start over from
         | scratch if anything goes wrong_
         | 
         | This is my experience too, but it's pushed me to make much
         | smaller plans and to commit things to a feature branch far more
         | atomically so I can revert a step to the previous commit, or
         | bin the entire feature by going back to main. I do this far
         | more now than I ever did when I was writing the code by hand.
         | 
         | This is how developers _should_ work regardless of how the code
         | is being developed. I think this is a small but very real way
         | AI has actually made me a better developer (unless I stop doing
         | it when I don 't use AI... not tried that yet.)
        
           | mattmanser wrote:
           | Developers should work by wasting lots of time making the
           | wrong thing?
           | 
           | I bet if they did a work and motion study on this approach
           | they'd find the classic:
           | 
           | "Thinks they're more productive, AI has actually made them
           | less productive"
           | 
           | But lots of lovely dopamine from this false progress that
           | gets thrown away!
        
             | SpaceNoodled wrote:
             | Classic
             | 
             | https://metr.org/blog/2025-07-10-early-2025-ai-
             | experienced-o...
        
             | onion2k wrote:
             | _Developers should work by wasting lots of time making the
             | wrong thing?_
             | 
             | Yes. In fact, that's not emphatic enough: _HELL YES!_
             | 
             | More specifically, developers should experiment. They
             | should test their hypothesis. They should try out ideas by
             | designing a solution and creating a proof of concept, then
             | throw that away and build a proper version based on what
             | they learned.
             | 
             | If your approach to building something is to implement the
             | first idea you have and move on then you are going to waste
             | _so much_ more time later refactoring things to fix
             | architecture that paints you into corners, reimplementing
             | things that didn 't work for future use cases, fixing edge
             | cases than you hadn't considered, and just paying off a
             | mountain of tech debt.
             | 
             | I'd actually go so far as to say that if you aren't
             | experimenting and throwing away solutions that don't quite
             | work then you're _only_ amassing tech debt and you 're not
             | really building anything that will last. If it does it's
             | through luck rather than skill.
             | 
             | Also, this has _nothing_ to do with AI. Developers should
             | be working this way even if they handcraft their artisanal
             | code carefully in vi.
        
               | skydhash wrote:
               | >> Developers should work by wasting lots of time making
               | the wrong thing?
               | 
               | > Yes. In fact, that's not emphatic enough: HELL YES!
               | 
               | You do realize there are prior research and well tested
               | solutions for a lot of things. Instead of wasting time
               | making the wrong thing, it is faster to do some research
               | if the problem has already been solved. Experimentation
               | is fine only after checking that the problem space is
               | truly novel or there's not enough information around.
               | 
               | It is faster to iterate in your mental space and in front
               | of a whiteboard than in code.
        
             | abustamam wrote:
             | > Developers should work by wasting lots of time making the
             | wrong thing?
             | 
             | Yes? I can't even count how many times I worked on
             | something my company deemed was valuable only for it to be
             | deprecated or thrown away soon after. Or, how many times I
             | solved a problem but apparently misunderstood the specs
             | slightly and had to redo it. Or how many times we've had to
             | refactor our code because scope increased. In fact, the
             | very existence of the concepts of refactoring and tech debt
             | proves that devs often spend a lot of time making the
             | "wrong" thing.
             | 
             | Is it a waste? No, it solved the problem as understood at
             | the time. And we learned stuff along the way.
        
           | sixtyj wrote:
           | LLMs are really eager to start coding (as interns are eager
           | to start working), so the sentence "don't implement yet" has
           | to be used very often at the beginning of any project.
        
             | onion2k wrote:
             | Most LLM apps have a 'plan' or 'ask' mode for that.
        
           | jerryharri wrote:
           | We're learning the lessons of Agile all over again.
        
             | intrasight wrote:
             | We're learning how to be an engineer all over again.
             | 
             | The authors process is super-close what we were taught in
             | engineering 101 40 years ago.
        
               | skydhash wrote:
               | I always feels like I'm in a fever dream when I hear
               | about AI workflows. A lot of stuff is what I've read from
               | software engineering books and articles.
        
               | jerryharri wrote:
               | It's after we come down from the Vibe coding high that we
               | realize we still need to ship working, high-quality code.
               | The lessons are the same, but our muscle memory has to be
               | re-oriented. How do we create estimates when AI is
               | involved? In what ways do we redefine the information
               | flow between Product and Engineering?
        
           | solarkraft wrote:
           | I do this too. Relatively small changes, atomic commits with
           | extensive reasoning in the message (keeps important context
           | around). This is a best practice anyway, but used to be
           | excruciatingly much effort. Now it's easy!
           | 
           | Except that I'm still struggling with the LLM understanding
           | its audience/context of its utterances. Very often, after a
           | correction, it will focus a lot on the correction itself
           | making for weird-sounding/confusing statements in commit
           | messages and comments.
        
         | dakolli wrote:
         | wtf, why would you write 100k lines of plan to produce 30k
         | loc.. JUST WRITE THE CODE!!!
        
           | Bishonen88 wrote:
           | They didn't write 100k plan lines. The llm did (99.9% of it
           | at least or more). Writing 30k by hand would take weeks if
           | not months. Llms do it in an afternoon.
        
             | AstroBen wrote:
             | Just reading that plan would take weeks or months
        
               | chickensong wrote:
               | You don't start with 100k lines, you work in batches that
               | are digestible. You read it once, then move on. The lines
               | add up pretty quickly considering how fast Claude works.
               | If you think about the difference in how many characters
               | it takes to describe what code is doing in English, it's
               | pretty reasonable.
        
             | dakolli wrote:
             | And my weeks or months of work beats an LLMs 10/10 times.
             | There are no shortcuts in life.
        
               | tock wrote:
               | Might be true for you. But there are plenty of top tier
               | engineers who love LLMs. So it works for some. Not for
               | others.
               | 
               | And of course there are shortcuts in life. Any form of
               | progress whether its cars, medicine, computers or the
               | internet are all shortcuts in life. It makes life easier
               | for a lot of people.
        
               | Bishonen88 wrote:
               | I have no doubts that it does for many people. But the
               | time/cost tradeoff is still unquestionable. I know I
               | could create what LLMs do for me in the frontend/backend
               | in most cases as good or better - I know that, because
               | I've done it at work for years. But to create a somewhat
               | complex app with lots of pages/features/apis etc. would
               | take me months if not a year++ since I'd be working on it
               | only on the weekends for a few hours. Claude code helps
               | me out by getting me to my goal in a fraction of the
               | time. Its superpower lies not only in doign what I know
               | but faster, but in doing what I don't know as well.
               | 
               | I yield similar benefits at work. I can wow management
               | with LLM assited/vibe coded apps. What previously
               | would've taken a multi-man team weeks of planning and
               | executing, stand ups, jour fixes, architecture diagrams,
               | etc. can now be done within a single week by myself. For
               | the type of work I do, managers do not care whether I
               | could do it better if I'd code it myself. They are amazed
               | however that what has taken months previously, can be
               | done in hours nowadays. And I for sure will try to reap
               | benefits of LLMs for as long as they don't replace me
               | rather than being idealistic and fighting against them.
        
               | abustamam wrote:
               | > What previously would've taken a multi-man team weeks
               | of planning and executing, stand ups, jour fixes,
               | architecture diagrams, etc. can now be done within a
               | single week by myself.
               | 
               | This has been my experience. We use Miro at work for
               | diagramming. Lots of visual people on the team, myself
               | included. Using Miro's MCP I draft a solution to a
               | problem and have Miro diagram it. Once we talk it through
               | as a team, I have Claude or codex implement it from the
               | diagram.
               | 
               | It works surprisingly well.
               | 
               | > They are amazed however that what has taken months
               | previously, can be done in hours nowadays.
               | 
               | Of course they're amazed. They don't have to pay you for
               | time saved ;)
               | 
               | > reap benefits of LLMs for as long as they don't replace
               | me > What previously would've taken a multi-man team
               | 
               | I think this is the part that people are worried about.
               | Every engineer who uses LLMs says this. By definition it
               | means that people are being replaced.
               | 
               | I think I justify it in that no one on my team has been
               | replaced. But management has explicitly said "we don't
               | want to hire more because we can already 20x ourselves
               | with our current team +LLM." But I do acknowledge that
               | many people ARE being replaced; not necessarily by LLMs,
               | but certainly by other engineers using LLMs.
        
               | skydhash wrote:
               | I'm still waiting for the multi-years success stories.
               | Greenfield solutions are always easy (which is why we
               | have frameworks that automate them). But maintaining
               | solutions over years is always the true test of any
               | technologies.
               | 
               | It's already telling that nothing has staying power in
               | the LLMs world (other than the chat box). Once the
               | limitations can no longer be hidden by the hype and the
               | true cost is revealed, there's always a next thing to
               | pivot to.
        
               | hghbbjh wrote:
               | > but in doing what I don't know as well.
               | 
               | Comments like these really help ground what I read online
               | about LLMs. This matches how low performing devs at my
               | work use AI, and their PRs are a net negative on the
               | team. They take on tasks they aren't equipped to handle
               | and use LLMs to fill the gaps quickly instead of taking
               | time to learn (which LLMs speed up!).
        
           | oblio wrote:
           | That's not (or should not be what's happening).
           | 
           | They write a short high level plan (let's say 200 words). The
           | plan asks the agent to write a more detailed implementation
           | plan (written by the LLM, let's say 2000-5000 words).
           | 
           | They read this plan and adjust as needed, even sending it to
           | the agent for re-dos.
           | 
           | Once the implementation plan is done, they ask the agent to
           | write the actual code changes.
           | 
           | Then they review that and ask for fixes, adjustments, etc.
           | 
           | This can be comparable to writing the code yourself but also
           | leaves a detailed trail of what was done and why, which I
           | basically NEVER see in human generated code.
           | 
           | That alone is worth gold, by itself.
           | 
           | And on top of that, if you're using an unknown platform or
           | stack, it's basically a rocket ship. You bootstrap much
           | faster. Of course, stay on top of the architecture, do
           | controlled changes, learn about the platform as you go, etc.
        
             | abustamam wrote:
             | I take this concept and I meta-prompt it even more.
             | 
             | I have a road map (AI generated, of course) for a side
             | project I'm toying around with to experiment with LLM-
             | driven development. I read the road map and I understand
             | and approve it. Then, using some skills I found on
             | skills.sh and slightly modified, my workflow is as such:
             | 
             | 1. Brainstorm the next slice
             | 
             | It suggests a few items from the road map that should be
             | worked on, with some high level methodology to implement.
             | It asks me what the scope ought to be and what invariants
             | ought to be considered. I ask it what tradeoffs could be,
             | why, and what it recommends, given the product constraints.
             | I approve a given slice of work.
             | 
             | NB: this is the part I learn the most from. I ask it why X
             | process would be better than Y process given the
             | constraints and it either corrects itself or it explains
             | why. "Why use an outbox pattern? What other patterns could
             | we use and why aren't they the right fit?"
             | 
             | 2. Generate slice
             | 
             | After I approve what to work on next, it generates a high
             | level overview of the slice, including files touched, saved
             | in a MD file that is persisted. I read through the slice,
             | ensure that it is indeed working on what I expect it to be
             | working on, and that it's not scope creeping or undermining
             | scope, and I approve it. It then makes a plan based off of
             | this.
             | 
             | 3. Generate plan
             | 
             | It writes a rather lengthy plan, with discrete task bullets
             | at the top. Beneath, each step has to-dos for the llm to
             | follow, such as generating tests, running migrations, etc,
             | with commit messages for each step. I glance through this
             | for any potential red flags.
             | 
             | 4. Execute
             | 
             | This part is self explanatory. It reads the plan and does
             | its thing.
             | 
             | I've been extremely happy with this workflow. I'll probably
             | write a blog post about it at some point.
        
               | jalopy wrote:
               | This is a super helpful and productive comment. I look
               | forward to a blog post describing your process in more
               | detail.
        
               | oblio wrote:
               | This dead internet uncanny (sarcasm?) valley is killing
               | me.
        
         | AstroBen wrote:
         | 100,000 lines is approx. one million words. The average person
         | reads at 250wpm. The entire thing would take 66 hours just to
         | read, assuming you were approaching it like a fiction book, not
         | thinking anything over
        
         | chickensong wrote:
         | > design -> plan -> execute in batches
         | 
         | This is the way for me as well. Have a high-level master design
         | and plan, but break it apart into phases that are manageable.
         | One-shotting anything beyond a todo list and expecting decent
         | quality is still a pipe dream.
        
         | zozbot234 wrote:
         | > if you don't plan perfectly, you'll have to start over from
         | scratch if anything goes wrong.
         | 
         | You just revert what the AI agent changed and revise/iterate on
         | the previous step - no need to start over. This can of course
         | involve restricting the work to a smaller change so that the
         | agent isn't overwhelmed by complexity.
        
         | elAhmo wrote:
         | How can you know that 100k lines plan is not just slop?
         | 
         | Just because plan is elaborate doesn't mean it makes sense.
        
       | d1sxeyes wrote:
       | The "inline comments on a plan" is one of the best features of
       | Antigravity, and I'm surprised others haven't started
       | copycatting.
        
       | _hugerobots_ wrote:
       | Hub and spoke documentation in planning has been absolutely
       | essential for the way my planning was before, and it's pretty
       | cool seeing it work so well for planning mode to build scaffolds
       | and routing.
        
       | geoffbp wrote:
       | It's worrying to me that nobody really knows how LLMs work. We
       | create prompts with or without certain words and hope it works.
       | That's my perspective anyway
        
         | solumunus wrote:
         | It's the same as dealing with a human. You convey a spec for a
         | problem and the language you use matters. You can convey the
         | problem in (from your perspective) a clear way and you will get
         | mixed results nonetheless. You will have to continue to refine
         | the solution with them.
         | 
         | Genuinely: no one really knows how humans work either.
        
         | mannyv wrote:
         | It's actually no different from how real software is made.
         | Requirements come from the business side, and through an odd
         | game of telephone get down to developers.
         | 
         | The team that has developers closest to the customer usually
         | makes the better product...or has the better product/market
         | fit.
         | 
         | Then it's iteration.
        
       | cawksuwcka wrote:
       | falling asleep here. when will the babysitting end
        
       | tayo42 wrote:
       | We're just slowly reinventing agile for telling Ai agents what to
       | do lol
       | 
       | Just skip to the Ai stand-ups
        
       | cheekyant wrote:
       | It seems like the annotation of plan files is the key step.
       | 
       | Claude Code now creates persistent markdown plan files in
       | ~/.claude/plans/ and you can open them with Ctrl-G to annotate
       | them in your default editor.
       | 
       | So plan mode is not ephemeral any more.
        
       | chaboud wrote:
       | The author seems to think they've hit upon something
       | revolutionary...
       | 
       | They've actually hit upon something that several of us have
       | evolved to naturally.
       | 
       | LLM's are like unreliable interns with boundless energy. They
       | make silly mistakes, wander into annoying structural traps, and
       | have to be unwound if left to their own devices. It's like the
       | genie that almost pathologically misinterprets your wishes.
       | 
       | So, how do you solve that? Exactly how an experienced lead or
       | software manager does: you have systems _write it down_ before
       | executing, explain things back to you, and ground all of their
       | thinking in the code and documentation, avoiding making
       | assumptions about code after superficial review.
       | 
       | When it was early ChatGPT, this meant function-level thinking and
       | clearly described jobs. When it was Cline it meant cline rules
       | files that forced writing architecture.md files and vibe-code.log
       | histories, demanding grounding in research and code reading.
       | 
       | Maybe nine months ago, another engineer said two things to me,
       | less than a day apart:
       | 
       | - "I don't understand why your clinerules file is so large. You
       | have the LLM jumping through so many hoops and doing so much
       | extra work. It's crazy."
       | 
       | - The next morning: "It's basically like a lottery. I can't get
       | the LLM to generate what I want reliably. I just have to settle
       | for whatever it comes up with and then try again."
       | 
       | These systems have to deal with minimal context, ambiguous
       | guidance, and extreme isolation. Operate with a little empathy
       | for the energetic interns, and they'll uncork levels of output
       | worth fighting for. We're Software Managers now. For some of us,
       | that's working out great.
        
         | marc_g wrote:
         | I've also found that a bigger focus on expanding my agents.md
         | as the project rolls on has led to less headaches overall and
         | more consistency (non-surprisingly). It's the same as asking
         | juniors to reflect on the work they've completed and to
         | document important things that can help them in the future.
         | Software Manger is a good way to put this.
        
           | zozbot234 wrote:
           | AGENTS.md should mostly point to real documentation and
           | design files that humans will also read and keep up to date.
           | It's rare that something about a project is _only_ of
           | interest to AI agents.
        
         | jeffreygoesto wrote:
         | Oh no, maybe the V-Model was right all the time? And right
         | sizing increments with control stops after them. No wonder
         | these matrix multiplications start to behave like humans, that
         | is what we wanted them to do.
        
           | baxtr wrote:
           | So basically you're saying LLMs are helping us be better
           | humans?
        
             | shevy-java wrote:
             | Better humans? How and where?
        
         | vishnugupta wrote:
         | Revolutionary or not it was very nice of the author to make
         | time and effort to share their workflow.
         | 
         | For those starting out using Claude Code it gives a structured
         | way to get things done bypassing the time/energy needed to "hit
         | upon something that several of us have evolved to naturally".
        
           | ffsm8 wrote:
           | Its ai written though, the tells are in pretty much every
           | paragraph.
        
             | ratsimihah wrote:
             | I don't think it's that big a red flag anymore. Most people
             | use ai to rewrite or clean up content, so I'd think we
             | should actually evaluate content for what it is rather than
             | stop at "nah it's ai written."
        
               | elaus wrote:
               | I think as humans it's very hard to abstract content from
               | its form. So when the form is always the same boring,
               | generic AI slop, it's really not helping the content.
        
               | rmnclmnt wrote:
               | And maybe writing an article or a keynote slides is one
               | of the few places we can still exerce some human
               | creativity, especially when the core skills (programming)
               | is almost completely in the hands of LLMs already
        
               | shevy-java wrote:
               | Well, real humans may read it though. Personally I much
               | prefer real humans write real articles than all this AI
               | generated spam-slop. On youtube this is especially
               | annoying - they mix in real videos with fake ones. I see
               | this when I watch animal videos - some animal behaviour
               | is taken from older videos, then AI fake is added. My own
               | policy is that I do not watch anything ever again from
               | people who lie to the audience that way so I had to begin
               | to censor away such lying channels. I'd apply the same
               | rationale to blog authors (but I am not 100% certain it
               | is actually AI generated; I just mention this as a safety
               | guard).
        
               | ffsm8 wrote:
               | > I don't think it's that big a red flag anymore.
               | 
               | It is to me, because it indicates the author didn't care
               | about the topic. The only thing they cared about is to
               | write an "insightful" article about using llms. Hence
               | this whole thing is basically linked-in resume
               | improvement slop.
               | 
               | Not worth interacting with, imo
               | 
               | Also, it's not insightful whatsoever. It's basically a
               | retelling of other articles around the time Claude code
               | was released to the public (March-August 2025)
        
               | pmg101 wrote:
               | I don't judge content for being AI written, I judge it
               | for the content itself (just like with code).
               | 
               | However I do find the standard out-of-the-box style very
               | grating. Call it faux-chummy linkedin corporate workslop
               | style.
               | 
               | Why don't people give the llm a steer on style? Either
               | based on your personal style or at least on a writer
               | whose style you admire. That should be easier.
        
               | xoac wrote:
               | Because they think this is good writing. You can't
               | correct what you don't have taste for. Most software
               | engineers think that reading books means reading NYT non-
               | fiction bestsellers.
        
               | ben_w wrote:
               | While I agree with:
               | 
               | > Because they think this is good writing. You can't
               | correct what you don't have taste for.
               | 
               | I have to disagree about:
               | 
               | > Most software engineers think that reading books means
               | reading NYT non-fiction bestsellers.
               | 
               | There's a lot of scifi and fantasy in nerd circles, too.
               | Douglas Adams, Terry Pratchett, Vernor Vinge, Charlie
               | Stross, Iain M Banks, Arthur C Clarke, and so on.
               | 
               | But simply enjoying good writing is not enough to fully
               | get what makes writing good. Even writing is not itself
               | enough to get such a taste: thinking of Arthur C Clarke,
               | I've just finished 3001, and at the end Clarke gives
               | thanks to his editors, noting his own experience as an
               | editor meant he held a higher regard for editors than
               | many writers seemed to. Stross has, likewise, blogged
               | about how writing a manuscript is only the first half of
               | writing a book, because then you need to edit the thing.
        
               | pi-rat wrote:
               | The main issue with evaluating content for what it is is
               | how extremely asymmetric that process has become.
               | 
               | Slop looks reasonable on the surface, and requires orders
               | of magnitude more effort to evaluate than to produce.
               | It's produced once, but the process has to be repeated
               | for every single reader.
               | 
               | Disregarding content that smells like AI becomes an
               | extremely tempting early filtering mechanism to separate
               | signal from noise - the reader's time is valuable.
        
               | Thanemate wrote:
               | >Most people use ai to rewrite or clean up content
               | 
               | I think your sentence should have been "people who use ai
               | do so to mostly rewrite or clean up content", but even
               | then I'd question the statistical truth behind that
               | claim.
               | 
               | Personally, seeing something written by AI means that the
               | person who wrote it did so just for looks and not for
               | substance. Claiming to be a great author requires both
               | penmanship and communication skills, and delegating one
               | or either of them to a large language model inherently
               | makes you less than that.
               | 
               | However, when the point is just the contents of the
               | paragraph(s) and nothing more then I don't care who or
               | what wrote it. An example is the result of a research,
               | because I'd certainly won't care about the prose or
               | effort given to write the thesis but more on the results
               | (is this about curing cancer now and forever? If yes, no
               | one cares if it's written with AI).
               | 
               | With that being said, there's still that I get anywhere
               | close to understanding the author behind the thoughts and
               | opinions. I believe the way someone writes hints to the
               | way they think and act. In that sense, using LLM's to
               | rewrite something to make it sound more professional than
               | what you would actually talk in appropriate contexts
               | makes it hard for me to judge someone's character,
               | professionalism, and mannerisms. Almost feels like
               | they're trying to mask part of themselves. Perhaps they
               | lack confidence in their ability to sound professional
               | and convincing?
        
               | exe34 wrote:
               | If you want to write something with AI, send me your
               | prompt. I'd rather read what you intend for it to produce
               | rather than what it produces. If I start to believe you
               | regularly send me AI written text, I will stop reading
               | it. Even at work. You'll have to call me to explain what
               | you intended to write.
        
               | DonHopkins wrote:
               | And if my prompt is a 10 page wall of text that I would
               | otherwise take the time to have the AI organize,
               | deduplicate, summarize, and sharpen with an index,
               | executive summary, descriptive headers, and logical
               | sections, are you going to actually read all of that, or
               | just whine "TL;DR"?
               | 
               | It's much more efficient and intentional for the writer
               | to put the time into doing the condensing and organizing
               | once, and review and proofread it to make sure it's what
               | they mean, than to just lazily spam every human they want
               | to read it with the raw prompt, so every recipient has to
               | pay for their own AI to perform that task like a slot
               | machine, producing random results not reviewed and
               | approved by the author as their intended message.
               | 
               | Is that really how you want Hacker News discussions and
               | your work email to be, walls of unorganized unfiltered
               | text prompts nobody including yourself wants to take the
               | time to read? Then step aside, hold my beer!
               | 
               | Or do you prefer I should call you on the phone and
               | ramble on for hours in an unedited meandering stream of
               | thought about what I intended to write?
        
               | fasbiner wrote:
               | Yeah but it's not. This a complete contrivance and you're
               | just making shit up. The prompt is much shorter than the
               | output and you are concealing that fact. Why?
               | 
               | Github repo or it didn't happen. Let's go.
        
               | DonHopkins wrote:
               | Are you actually accusing me of not writing walls of
               | text??!
               | 
               | Which prompt are you talking about, and exactly how many
               | characters is it, and how do you know? And why do you
               | think I know, and am concealing it?
               | 
               | Github repo about what, or what didn't happen? You should
               | run your posts through an LLM to sanity check them.
               | 
               | I find AI Gloss to be much more insidious than AI Slop,
               | which merely annoys with em-dashes, instead of trying to
               | undermine reality. So I created these Anthropic Skills
               | and Drescher Schemas in my MOOLLM github repo to
               | recognize, analyze, fight, and prevent AI Slop, AI Gloss,
               | and more.
               | 
               | I'm actively applying Gary Drescher's schema mechanism to
               | the problem, as he described in "Made-Up Minds: A
               | Constructivist Approach to Artificial Intelligence", his
               | thesis with his PhD advisor Seymour Papert and colleague
               | Marvin Minsky, and his book from MIT Press.
               | 
               | https://mitpress.mit.edu/9780262517089/made-up-minds/
               | 
               | >Made-Up Minds addresses fundamental questions of
               | learning and concept invention by means of an innovative
               | computer program that is based on the cognitive-
               | developmental theory of psychologist Jean Piaget.
               | Drescher uses Piaget's theory as a source of inspiration
               | for the design of an artificial cognitive system called
               | the schema mechanism, and then uses the system to
               | elaborate and test Piaget's theory. The approach is
               | original enough that readers need not have extensive
               | knowledge of artificial intelligence, and a chapter
               | summarizing Piaget assists readers who lack a background
               | in developmental psychology. The schema mechanism learns
               | from its experiences, expressing discoveries in its
               | existing representational vocabulary, and extending that
               | vocabulary with new concepts. A novel empirical learning
               | technique, marginal attribution, can find results of an
               | action that are obscure because each occurs rarely in
               | general, although reliably under certain conditions.
               | Drescher shows that several early milestones in the
               | Piagetian infant's invention of the concept of persistent
               | object can be replicated by the schema mechanism.
               | 
               | The goal is Training By Example, not just Instructions.
               | Two kinds of training signal:
               | 
               | - Training by instruction -- the skills themselves teach
               | what to avoid, get into the training data by being
               | published in moollm and included in other projects
               | 
               | - Training by example -- the higher-quality conversations
               | these skills produce become training data themselves
               | 
               | Each logged example is a Drescher schema: what was the
               | context, what did the AI do, what was the result, and
               | what was the surprise (the failure). The schema includes
               | the detection pattern (how to recognize it) and the
               | correction (what should have happened). These schemas
               | serve as both detection patterns and suggested
               | mitigations -- they teach an AI (or a human) what to look
               | for and what to do instead.
               | 
               | No AI Gloss Drescher Schema Example: ChatGPT Deflection
               | Playbook (please submit PRs with your own):
               | 
               | https://github.com/SimHacker/moollm/blob/main/skills/no-
               | ai-g...
               | 
               | So what have you tried to do about the problem, other
               | than just unoriginally whining in online discussions? You
               | asked for a link to my repo, so now you owe me the
               | courtesy of actually reading it and commenting on the
               | substance instead of the form, instead of just
               | complaining "tl;dr" or "ai;dr". You can lead a cow to
               | MOOLLM, but you can't make her think.
               | 
               | No AI Slop:
               | https://github.com/SimHacker/moollm/tree/main/skills/no-
               | ai-s...
               | 
               | > The term "AI slop" was coined by Simon Willison.
               | 
               | > AI slop is everything that makes AI output annoying.
               | The filler, the puffery, the em-dashes, the 500 words
               | when 50 would do, the "Great question!" before every
               | answer. Annoying, but it doesn't lie to you. It just
               | wastes your time.
               | 
               | > SLOP = "You said too much, but what you said was true."
               | 
               | > GLOSS = "You said it smoothly, but you lied about
               | reality."
               | 
               | > SLOP is the bread. GLOSS is the poison. Most bad AI
               | output is a poison sandwich.
               | 
               | No AI Gloss:
               | https://github.com/SimHacker/moollm/tree/main/skills/no-
               | ai-g...
               | 
               | > The term "AI gloss" inspired by Simon Willison's "AI
               | slop" -- because slop is just annoying, but gloss
               | rewrites reality.
               | 
               | > AI gloss is more insidious than AI slop. When an AI
               | says "relationship management" instead of "tribute," it's
               | not being verbose -- it's rewriting reality on behalf of
               | whoever prefers the euphemism. Slop wastes your time.
               | Gloss wastes your understanding of the world.
               | 
               | > SLOP makes you scroll. GLOSS makes you believe false
               | things.
               | 
               | > NO-AI Web Ring: for real: | slop | gloss | sycophancy |
               | hedging | moralizing | ideology | overlord | bias | for
               | fun: | joking | customer-service | soul
               | 
               | As a consolation prize, here's a wall of text I wrote
               | without an LLM about my own personal experience and
               | opinions that an LLM would know nothing about -- is it
               | too long for you to read, or do you want more details? I
               | would be glad to explain the ironic significance of the
               | Rightward-Facing Cow if you like, and then launch into a
               | rambling essay about how Cow Clicker perfectly
               | demonstrates Ian Bogost's idea of procedural rhetoric,
               | and how it relates to his criticisms of game design, and
               | how Peter Molyneux not only totally missed the point, but
               | unwittingly proved it, two years late to the party.
               | 
               | https://news.ycombinator.com/item?id=47110605
               | 
               | Procedural Rhetoric (MOOLLM Anthropic Skill): https://git
               | hub.com/SimHacker/moollm/blob/main/skills/procedu...
               | 
               | >Rules persuade. Structure IS argument. Design
               | consciously.
               | 
               | >What Is Procedural Rhetoric?
               | 
               | >Ian Bogost coined it: "an unholy blend of Will Wright
               | and Aristotle."
               | 
               | >Games and simulations persuade through processes and
               | rules, not just words or visuals. The structure of your
               | world embodies an ideology. When The Sims allows same-sex
               | relationships without fanfare, the rules themselves make
               | a statement -- equality is the default, not a feature.
        
               | layer8 wrote:
               | It's certainly more interesting than whatever the AI
               | would turn it into.
        
               | stuaxo wrote:
               | Even though I use LLMs for code, I just can't read LLM
               | written text, I kind of hate the style, it reminds me too
               | much of LinkedIn.
        
               | ben_w wrote:
               | > I don't think it's that big a red flag anymore. Most
               | people use ai to rewrite or clean up content, so I'd
               | think we should actually evaluate content for what it is
               | rather than stop at "nah it's ai written."
               | 
               | Unfortunately, there's a lot of people trying to content-
               | farm with LLMs; this means that whatever style they
               | default to, is automatically suspect of being a slice of
               | "dead internet" rather than some new human discovery.
               | 
               | I won't rule out the possibility that even LLMs, let
               | alone other AI, can help with new discoveries, but they
               | are definitely better at writing persuasively than they
               | are at being inventive, which means I am forced to use
               | "looks like LLM" as proxy for both "content farm" and
               | "propaganda which may work on me", even though some
               | percentage of this output won't even be LLM and some
               | percentage of what is may even be both useful and novel.
        
               | theshrike79 wrote:
               | ai;dr
               | 
               | If your "content" smells like AI, I'm going to use _my_
               | AI to condense the content for me. I'm not wasting my
               | time on overly verbose AI "cleaned" content.
               | 
               | Write like a human, have a blog with an RSS feed and I'll
               | most likely subscribe to it.
        
               | dawnerd wrote:
               | Very high chance someone that's using Claude to write
               | code is also using Claude to write a post from some
               | notes. That goes beyond rewriting and cleaning up.
        
             | handfuloflight wrote:
             | So is GP.
             | 
             | This is clearly a standard AI exposition:
             | 
             | LLM's are like unreliable interns with boundless energy.
             | They make silly mistakes, wander into annoying structural
             | traps, and have to be unwound if left to their own devices.
             | It's like the genie that almost pathologically
             | misinterprets your wishes.
        
             | foldingmoney wrote:
             | >the tells are in pretty much every paragraph.
             | 
             | It's not just misleading -- it's lazy. And honestly? That
             | doesn't vibe with me.
             | 
             | [/s obviously]
        
             | DonHopkins wrote:
             | Then ask your own ai to rewrite it so it doesn't trigger
             | you into posting uninteresting thought stopping comments
             | proclaiming why you didn't read the article, that don't
             | contribute to the discussion.
        
           | petesergeant wrote:
           | Here's mine! https://github.com/pjlsergeant/moarcode
        
           | chaboud wrote:
           | It's this line that I'm bristling at: "...the workflow I've
           | settled into is radically different from what most people do
           | with AI coding tools..."
           | 
           | Anyone who spends some time with these tools (and doesn't
           | black out from smashing their head against their desk) is
           | going to find substantial benefit in planning with clarity.
           | 
           | It was #6 in Boris's run-down:
           | https://news.ycombinator.com/item?id=46470017
           | 
           | So, yes, I'm glad that people write things out and share. But
           | I'd prefer that they not lead with "hey folks, I have news:
           | we should *slice* our bread!"
        
             | copirate wrote:
             | But the author's workflow is actually very different from
             | Boris'.
             | 
             | #6 is about using plan mode whereas the author says "The
             | built-in plan mode sucks".
             | 
             | The author's post is much more than just "planning with
             | clarity".
        
             | Forgeties79 wrote:
             | I would say he's saying "hey folks, I have news. We should
             | slice our bread with a knife rather than the spoon that
             | came with the bread."
        
           | fintechie wrote:
           | This kind of flows have been documented in the wild for some
           | time now. They started to pop up in the Cursor forums 2+
           | years ago... eg:
           | https://github.com/johnpeterman72/CursorRIPER
           | 
           | Personally I have been using a similar flow for almost 3
           | years now, tailored for my needs. Everybody who uses AI for
           | coding eventually gravitates towards a similar pattern
           | because it works quite well (for all IDEs, CLIs, TUIs)
        
         | CodeBit26 wrote:
         | I really like your analogy of LLMs as 'unreliable interns'. The
         | shift from being a 'coder' to a 'software manager' who enforces
         | documentation and grounding is the only way to scale these
         | tools. Without an architecture.md or similar grounding, the
         | context drift eventually makes the AI-generated code a
         | liability rather than an asset. It's about moving the
         | complexity from the syntax to the specification.
        
         | BoredPositron wrote:
         | It's alchemy all over again.
        
           | shevy-java wrote:
           | Alchemy involved a lot of do-it-yourself though. With AI it
           | is like someone else does all the work (well, almost all the
           | work).
        
             | BoredPositron wrote:
             | It was mainly a jab at the protoscientific nature of it.
        
               | vntok wrote:
               | Reproducing experimental results across models and
               | vendors is trivial and cheap nowadays.
        
               | BoredPositron wrote:
               | Not if anthropic goes further in obfuscating the output
               | of claude code.
        
               | vntok wrote:
               | Why would you test implementation details? Test _what 's_
               | delivered, not _how_ it 's delivered. The thinking
               | portion, synthetized or not, is merely implementation.
               | 
               | The resulting artefact, that's what is worth testing.
        
               | hghbbjh wrote:
               | > Why would you test implementation details
               | 
               | Because this has never been sufficient. From things like
               | various hard to test cases to things like readability and
               | long term maintenance. Reading and understanding the code
               | is more efficient and necessary for any code worth
               | keeping around.
        
         | fy20 wrote:
         | It's nice to have it written down in a concise form. I shared
         | it with my team as some engineers have been struggling with AI,
         | and I think this (just trying to one-shot without planning)
         | could be why.
        
         | bambax wrote:
         | Agreed. The process described is much more elaborate than what
         | I do but quite similar. I start to discuss in great details
         | what I want to do, sometimes asking the same question to
         | different LLMs. Then a todo list, then manual review of the
         | code, esp. each function signature, checking if the
         | instructions have been followed and if there are no obvious
         | refactoring opportunities (there almost always are).
         | 
         | The LLM does most of the coding, yet I wouldn't call it "vibe
         | coding" at all.
         | 
         | "Tele coding" would be more appropriate.
        
           | mlaretallack wrote:
           | I use AWS Kiro, and its spec driven developement is exactly
           | this, I find it really works well as it makes me slow down
           | and think about what I want it to do.
           | 
           | Requirements, design, task list, coding.
        
         | bonoboTP wrote:
         | It feels like retracing the history of software project
         | management. The post is quite waterfall-like. Writing a lot of
         | docs and specs upfront then implementing. Another approach is
         | to just YOLO (on a new branch) make it write up the lessons
         | afterwards, then start a new more informed try and throw away
         | the first. Or any other combo.
         | 
         | For me what works well is to ask it to write _some_ code
         | upfront to verify its assumptions against actual reality, not
         | just be telling it to review the sources  "in detail". It gains
         | much more from real output from the code and clears up wrong
         | assumptions. Do some smaller jobs, write up md files, then plan
         | the big thing, then execute.
        
           | 0x696C6961 wrote:
           | This is exactly what I do. I assume most people avoid this
           | approach due to cost.
        
           | nurettin wrote:
           | It makes an endless stream of assumptions. Some of them
           | brilliant and even instructive to a degree, but most of them
           | are unfounded and inappropriate in my experience.
        
           | jerryharri wrote:
           | 'The post is quite waterfall-like. Writing a lot of docs and
           | specs upfront then implementing' - It's only waterfall if the
           | specs cover the entire system or app. If it's broken up into
           | sub-systems or vertical slices, then it's much more Agile or
           | Lean.
        
         | user3939382 wrote:
         | If you have a big rules file you're in the right direction but
         | still not there. Just as with humans, the key is that your
         | architecture should make it very difficult to break the rules
         | by accident and still be able to compile/run with correct exit
         | status.
         | 
         | My architecture is so beautifully strong that even LLMs and
         | human juniors can't box their way out of it.
        
         | kaycey2022 wrote:
         | I've been doing the exact same thing for 2 months now. I wish I
         | had gotten off my ass and written a blog post about it. I can't
         | blame the author for gathering all the well deserved clout they
         | are getting for it now.
        
           | LeafItAlone wrote:
           | Don't worry. This advice has been going around for much more
           | than 2 months, including links posted here as well as
           | official advice from the major companies (OpenAI and
           | Anthropic) themselves. The tools literally have had plan mode
           | as a first class feature.
           | 
           | So you probably wouldn't have any clout anyways, like all of
           | the other blog posts.
        
           | noisy_boy wrote:
           | I went through the blog. I started using Claude Code about 2
           | weeks ago and my approach is practically the same. It just
           | felt logical. I think there are a bunch of us who have landed
           | on this approach and most are just quietly seeing the
           | benefits.
        
         | qudat wrote:
         | > LLM's are like unreliable interns with boundless energy
         | 
         | This isn't directed specifically at you but the general
         | community of SWEs: we need to stop anthropomorphizing a tool.
         | Code agents are not human capable and scaling pattern matching
         | will never hit that goal. That's all hype and this is coming
         | from someone who runs the range of daily CC usage. I'm using CC
         | to its fullest capability while also being a good shepherd for
         | my prod codebases.
         | 
         | Pretending code agents are human capable is fueling this
         | koolaide drinking hype craze.
        
         | kobe_bryant wrote:
         | if only there was another simpler way to use your knowledge to
         | write code...
        
       | growt wrote:
       | That is just spec driven development without a spec, starting
       | with the plan step instead.
        
       | YetAnotherNick wrote:
       | I don't know. I tried various methods. And this one kind of
       | doesn't work quite a bit of times. The problem is plan naturally
       | always skips some important details, or assumes some library
       | function, but is taken as instruction in the next section. And
       | claude can't handle ambiguity if the instruction is very
       | detailed(e.g. if plan asks to use a certain library even if it is
       | a bad fit claude won't know that decision is flexible). If the
       | instruction is less detailed, I saw claude is willing to try
       | multiple things and if it keeps failing doesn't fear in reverting
       | almost everything.
       | 
       | In my experience, the best scenario is that instruction and plan
       | should be human written, and be detailed.
        
       | pgt wrote:
       | My process is similar, but I recently added a new "critique the
       | plan" feedback loop that is yielding good results. Steps:
       | 
       | 1. Spec
       | 
       | 2. Plan
       | 
       | 3. Read the plan & tell it to fix its bad ideas.
       | 
       | 4. (NB) Critique the plan (loop) & write a detailed report
       | 
       | 5. Update the plan
       | 
       | 6. Review and check the plan
       | 
       | 7. Implement plan
       | 
       | Detailed here:
       | 
       | https://x.com/PetrusTheron/status/2016887552163119225
        
         | brumar wrote:
         | Same. In my experience, the first plan always benefits from
         | being challenged once or twice by claude itself.
        
       | lastdong wrote:
       | Google Anti-Gravity has this process built in. This is
       | essentially a cycle a developer would follow: plan/analyse -
       | document/discuss - break down tasks/implement. We've been using
       | requirements and design documents as best practice since leaving
       | our teenage bedroom lab for the professional world. I suppose
       | this could be seen as our coding agents coming of age.
        
       | w10-1 wrote:
       | I try these staging-document patterns, but suspect they have 2
       | fundamental flaws that stem mostly from our own biases.
       | 
       | First, Claude evolves. The original post work pattern evolved
       | over 9 months, before claude's recent step changes. It's likely
       | claude's present plan mode is better than this workaround, but if
       | you stick to the workaround, you'd never know.
       | 
       | Second, the staging docs that represent some context - whether a
       | library skills or current session design and implementation plans
       | - are not the model Claude works with. At best they are shaping
       | it, but I've found it does ignore and forget even what's written
       | (even when I shout with emphasis), and the overall session
       | influences the code. (Most often this happens when a peripheral
       | adjustment ends up populating half the context.)
       | 
       | Indeed the biggest benefit from the OP might be to squeeze within
       | 1 session, omitting peripheral features and investigations at the
       | plan stage. So the mechanism of action might be the combination
       | of getting our own plan clear and avoiding confusing excursions.
       | (A test for that would be to redo the session with the final plan
       | and implementation, to see if the iteration process itself is
       | shaping the model.)
       | 
       | Our bias is to believe that we're getting better at managing this
       | thing, and that we can control and direct it. It's uncomfortable
       | to realize you can only really influence it - much like giving
       | direction to a junior, but they can still go off track. And even
       | if you found a pattern that works, it might work for reasons
       | you're not understanding -- and thus fail you eventually. So,
       | yes, try some patterns, but always hang on to the newbie senses
       | of wonder and terror that make you curious, alert, and
       | experimental.
        
       | appsoftware wrote:
       | This is the flow I've found myself working towards. Essentially
       | maintaining more and more layered documentation for the LLM
       | produces better and more consistent results. What is great here
       | is the emphasis on the use of such documents in the planning
       | phase. I'm feeling much more motivated to write solid
       | documentation recently, because I know someone (the LLM) is
       | actually going to read it! I've noticed my efforts and skill
       | acquisition have moved sharply from app developer towards DevOps
       | and architecture / management, but I think I'll always be
       | grateful for the application engineering experience that I think
       | the next wave of devs might miss out on.
       | 
       | I've also noted such a huge gulf between some developers
       | describing 'prompting things into existence' and the approach
       | described in this article. Both types seem to report success,
       | though my experience is that the latter seems more realistic, and
       | much more likely to produce robust code that's likely to be
       | maintainable for long term or project critical goals.
        
       | dr_dshiv wrote:
       | Another pattern is:
       | 
       | 1. First vibecode software to figure out what you want
       | 
       | 2. Then throw it out and engineer it
        
       | chickensong wrote:
       | I agree with most of this, though I'm not sure it's radically
       | different. I think most people who've been using CC in earnest
       | for a while probably have a similar workflow? Prior to Claude 4
       | it was pretty much mandatory to define requirements and track
       | implementation manually to manage context. It's still good, but
       | since 4.5 release, it feels less important. CC basically works
       | like this by default now, so unless you value the spec docs
       | (still a good reference for Claude, but need to be maintained),
       | you don't have to think too hard about it anymore.
       | 
       | The important thing is to have a conversation with Claude during
       | the planning phase and don't just say "add this feature" and take
       | what you get. Have a back and forth, ask questions about common
       | patterns, best practices, performance implications, security
       | requirements, project alignment, etc. This is a learning
       | opportunity for you and Claude. When you think you're done,
       | request a final review to analyze for gaps or areas of
       | improvement. Claude will _always_ find something, but starts to
       | get into the weeds after a couple passes.
       | 
       | If you're greenfield and you have preferences about structure and
       | style, you need to be explicit about that. Once the scaffolding
       | is there, modern Claude will typically follow whatever examples
       | it finds in the existing code base.
       | 
       | I'm not sure I agree with the "implement it all without stopping"
       | approach and let auto-compact do its thing. I still see Claude
       | get lazy when nearing compaction, though has gotten drastically
       | better over the last year. Even so, I still think it's better to
       | work in a tight loop on each stage of the implementation and
       | preemptively compacting or restarting for the highest quality.
       | 
       | Not sure that the language is that important anymore either.
       | Claude will explore existing codebase on its own at unknown
       | resolution, but if you say "read the file" it works pretty well
       | these days.
       | 
       | My suggestions to enhance this workflow:
       | 
       | - If you use a numbered phase/stage/task approach with
       | checkboxes, it makes it easy to stop/resume as-needed, and
       | discuss particular sections. Each phase should be
       | working/testable software.
       | 
       | - Define a clear numbered list workflow in CLAUDE.md that loops
       | on each task (run checks, fix issues, provide summary, etc).
       | 
       | - Use hooks to ensure the loop is followed.
       | 
       | - Update spec docs at the end of the cycle if you're keeping
       | them. It's not uncommon for there to be some divergence during
       | implementation and testing.
        
       | koevet wrote:
       | Has anyone found a efficient way to avoid repeating the initial
       | codebase assessment when working with large projects?
       | 
       | There are several projects on GitHub that attempt to tackle
       | context and memory limitations, but I haven't found one that
       | consistently works well in practice.
       | 
       | My current workaround is to maintain a set of Markdown files,
       | each covering a specific subsystem or area of the application.
       | Depending on the task, I provide only the relevant documents to
       | Claude Code to limit the context scope. It works reasonably well,
       | but it still feels like a manual and fragile solution. I'm
       | interested in more robust strategies for persistent project
       | context or structured codebase understanding.
        
         | jsmith99 wrote:
         | Whenever I build a new feature with it I end up with several
         | plan files leftover. I ask CC to combine them all, update with
         | what we actually ended up building and name it something
         | sensible, then whenever I want to work on that area again it's
         | a useful reference (including the architecture, decisions and
         | tradeoffs, relevant files etc).
        
           | Sammi wrote:
           | Yes this is what agent "skills" are. Just guides on any
           | topic. The key is that you have the agent write and maintain
           | them.
        
         | KellyCriterion wrote:
         | In Claude Web you can use projects to put files relevant for
         | context there.
        
           | mstkllah wrote:
           | And then you have to remind it frequently to make use of the
           | files. Happened to me so many times that I added it both to
           | custom instructions as well as to the project memory.
        
         | hathawsh wrote:
         | That sounds like the recommended approach. However, there's one
         | more thing I often do: whenever Claude Code and I complete a
         | task that didn't go well at first, I ask CC what it learned,
         | and then I tell it to write down what it learned for the
         | future. It's hard to believe how much better CC has become
         | since I started doing that. I ask it to write dozens of unit
         | tests and it just does. Nearly perfectly. It's insane.
        
         | energy123 wrote:
         | For my longer spec files, I grep the subheaders/headers (with
         | line numbers) and show this compact representation to the LLM's
         | context window. I also have a file that describes what each
         | spec files is and where it's located, and I force the LLM to
         | read that and pull the subsections it needs. I also have one
         | entrypoint requirements file (20k tokens) that I force it to
         | read in full before it does anything else, every line I wrote
         | myself. But none of this is a silver bullet.
        
         | chickensong wrote:
         | I'm interested in this as well.
         | 
         | Skills almost seem like a solution, but they still need an out-
         | of-band process to keep them updated as the codebase evolves.
         | For now, a structured workflow that includes aggressive updates
         | at the end of the loop is what I use.
        
       | gregman1 wrote:
       | It is really fun to watch how a baby makes its first steps and
       | also how experienced professionals rediscover what standards were
       | telling us for 80+ years.
        
       | smcleod wrote:
       | I don't really get what is different about this from how almost
       | everyone else uses Claude Code? This is an incredibly common, if
       | not the most common way of using it (and many other tools).
        
       | nesk_ wrote:
       | > I am not seeing the performance degradation everyone talks
       | about after 50% context window.
       | 
       | I pretty much agree with that. I use long sessions and stopped
       | trying to optimize the context size, the compaction happens but
       | the plan keeps the details and it works for me.
        
       | charkubi wrote:
       | Planning is important because you get the LLM to explain the
       | problem and solution in _its_ language and structure, not yours.
       | 
       | This shortcuts a range of problem cases where the LLM fights
       | between the users strict and potentially conflicting
       | requirements, and its own learning.
       | 
       | In the early days we used to get LLM to write the prompts for us
       | to get round this problem, now we have planning built in.
        
       | shevy-java wrote:
       | I don't deny that AI has use cases, but boy - the workflow
       | described is boring:
       | 
       | "Most developers type a prompt, sometimes use plan mode, fix the
       | errors, repeat. "
       | 
       | Does anyone think this is as epic as, say, watch the Unix
       | archives https://www.youtube.com/watch?v=tc4ROCJYbm0 where Brian
       | demos how pipes work; or Dennis working on C and UNIX? Or even
       | before those, the older machines?
       | 
       | I am not at all saying that AI tools are all useless, but there
       | is no real epicness. It is just autogenerated AI slop and blob. I
       | don't really call this engineering (although I also do agree,
       | that it is engineering still; I just don't like using the same
       | word here).
       | 
       | > never let Claude write code until you've reviewed and approved
       | a written plan.
       | 
       | So the junior-dev analogy is quite apt here.
       | 
       | I tried to read the rest of the article, but I just got angrier.
       | I never had that feeling watching oldschool legends, though
       | perhaps some of their work may be boring, but this AI-generated
       | code ... that's just some mythical random-guessing work. And none
       | of that is "intelligent", even if it may appear to work, may work
       | to some extent too. This is a simulation of intelligence. If it
       | works very well, why would any software engineer still be
       | required? Supervising would only be necessary if AI produces
       | slop.
        
       | gehsty wrote:
       | Doesn't Claude code do this by switching between edit mode and
       | plan mode?
       | 
       | FWIW I have had significant improvements by clearing context then
       | implementing the plan. Seems like it stops Claude getting hung up
       | on something.
        
       | je42 wrote:
       | There are frameworks like https://github.com/bmad-code-org/BMAD-
       | METHOD and https://github.github.com/spec-kit/ that are working
       | on encoding a similar kind of approach and process.
        
       | mcv wrote:
       | This is great. My workflow is also heading in that direction, so
       | this is a great roadmap. I've already learned that just naively
       | telling Claude what to do and letting it work, is a recipe for
       | disaster and wasted time.
       | 
       | I'm not this structured yet, but I often start with having it
       | analyse and explain a piece of code, so I can correct it before
       | we move on. I also often switch to an LLM that's separate from my
       | IDE because it tends to get confused by sprawling context.
        
       | gary17the wrote:
       | > Read deeply, write a plan, annotate the plan until it's right,
       | then let Claude execute the whole thing without stopping,
       | checking types along the way.
       | 
       | As others have already noted, this workflow is exactly what the
       | Google Antigravity agent (based off Visual Studio Code) has been
       | created for. Antigravity even includes specialized UI for a user
       | to annotate selected portions of an LLM-generated plan before
       | iterating it.
       | 
       | One significant downside to Antigravity I have found so far is
       | the fact that even though it will properly infer a certain
       | technical requirement and clearly note it in the plan it
       | generates (for example, "this business reporting column needs to
       | use a weighted average"), it will sometimes quietly downgrade
       | such a specialized requirement (for example, to a non-weighted
       | average), without even creating an appropriate "WARNING:" comment
       | in the generated code. Especially so when the relevant codebase
       | already includes a similar, but not exactly appropriate API. My
       | repetitive prompts to ALWAYS ask about ANY implementation
       | ambiguities WHATSOEVER go unanswered.
       | 
       | From what I gather Claude Code seems to be better than other
       | agents at always remembering to query the user about
       | implementation ambiguities, so maybe I will give Claude Code a
       | shot over Antigravity.
        
       | Fuzzwah wrote:
       | All sounds like a bespoke way of remaking
       | https://github.com/Fission-AI/OpenSpec
        
       | __bjoernd wrote:
       | Sounds a bit like what Claude Plan Mode or Amazon's Kiro were
       | built for. I agree it's a useful flow, but you can also overdo
       | it.
        
       | grabshot_dev wrote:
       | Why don't you make Claude give feedback and iterate by itself?
        
       | alexrezvov wrote:
       | Cool, the idea of leaving comments directly in the plan never
       | even occurred to me, even though it really is the obvious thing
       | to do.
       | 
       | Do you markup and then save your comments in any way, and have
       | you tried keeping them so you can review the rules and
       | requirements later?
        
       | zuInnp wrote:
       | Since the rise of AI systems I really wonder how people wrote
       | code before. This is exactly how I planned out implementation and
       | executed the plan. Might have been some paper notes, a ticket or
       | a white board, buuuuut ... I don't know.
        
       | EastLondonCoder wrote:
       | I don't use plan.md docs either, but I recognise the underlying
       | idea: you need a way to keep agent output constrained by reality.
       | 
       | My workflow is more like scaffold -> thin vertical slices ->
       | machine-checkable semantics -> repeat.
       | 
       | Concrete example: I built and shipped a live ticketing system for
       | my club (Kolibri Tickets). It's not a toy: real payments
       | (Stripe), email delivery, ticket verification at the door,
       | frontend + backend, migrations, idempotency edges, etc. It's
       | running and taking money.
       | 
       | The reason this works with AI isn't that the model "codes fast".
       | It's that the workflow moves the bottleneck from "typing" to
       | "verification", and then engineers the verification loop:
       | -keep the spine runnable early (end-to-end scaffold)
       | -add one thin slice at a time (don't let it touch 15 files
       | speculatively)            -force checkable artifacts
       | (tests/fixtures/types/state-machine semantics where it matters)
       | -treat refactors as normal, because the harness makes them safe
       | 
       | If you run it open-loop (prompt -> giant diff -> read/debug), you
       | get the "illusion of velocity" people complain about. If you run
       | it closed-loop (scaffold + constraints + verifiers), you can
       | actually ship faster because you're not paying the integration
       | cost repeatedly.
       | 
       | Plan docs are one way to create shared state and prevent drift. A
       | runnable scaffold + verification harness is another.
        
         | aitchnyu wrote:
         | Now that code is cheap, I ensured my side project has
         | unit/integration tests (will enforce 100% coverage), Playwright
         | tests, static typing (its in Python), scripts for all tasks.
         | Will learn mutation testing too (yes, its overkill). Now my
         | agent works upto 1 hour in loops and emits concise code I dont
         | have to edit much.
        
       | yunusabd wrote:
       | That's exactly what Cursor's "plan" mode does? It even creates md
       | files, which seems to be the main "thing" the author discovered.
       | Along with some cargo cult science?
       | 
       | How is this noteworthy other than to spark a discussion on hn? I
       | mean I get it, but a little more substance would be nice.
        
       | irthomasthomas wrote:
       | In my own tests I have found opus to be very good at writing
       | plans, terrible at executing them. It typically ignores half of
       | the constraints.
       | https://x.com/xundecidability/status/2019794391338987906?s=2...
       | https://x.com/xundecidability/status/2024210197959627048?s=2...
        
         | Sammi wrote:
         | 1. Don't implement too much at at time
         | 
         | 2. Have the agent review if it followed the plan and relevant
         | skills accurately.
        
           | irthomasthomas wrote:
           | the first link was from a simple request with fewer than 1000
           | tokens total in the context window, just a short shell
           | script.
           | 
           | here is another one which had about 200 tokens and opus
           | decided to change the model name i requested.
           | 
           | https://x.com/xundecidability/status/2005647216741105962?s=2.
           | ..
           | 
           | opus is bad at instruction following now.
        
       | willsmith72 wrote:
       | this sounds... really slow. for large changes for sure i'm
       | investing time into planning. but such a rigid system can't
       | possible be as good as a flexible approach with variable amounts
       | of planning based on complexity
        
       | richardjennings wrote:
       | This is similar to what I do. I instruct an Architect mode with a
       | set of rules related to phased implementation and detailed code
       | artifacts output to a report.md file. After a couple of rounds of
       | review and usually some responses that either tie together
       | behaviors across context, critique poor choices or correct
       | assumptions, there is a piece of work defined for a coder LLM to
       | perform. With the new Opus 4.6 I then select specialist agents to
       | review the report.md, prompted with detailed insight into
       | particular areas of the software. The feedback from these
       | specialist agent reviews is often very good and sometimes catches
       | things I had missed. Once all of this is done, I let the agent
       | make the changes and move onto doing something else. I typically
       | rename and commit the report.md files which can be useful as an
       | alternative to git diff / commit messages etc.
        
       | vazma wrote:
       | Sorry but I didn't get the hype with this post, isnt it what most
       | of the people doing? I want to see more posts on how you use the
       | claude "smart" without feeding the whole codebase polluting the
       | context window and also more best practices on cost efficient
       | ways to use it, this workflow is clearly burning million tokens
       | per session, for me is a No
        
       | pajamasam wrote:
       | I feel like if I have to do all this, I might as well write the
       | code myself.
        
       | MarcLore wrote:
       | The separation of planning and execution resonates strongly. I've
       | been using a similar pattern when building with AI APIs -- write
       | the spec/plan in natural language first, then let the model
       | execute against it.
       | 
       | One addition that's worked well for me: keeping a persistent
       | context file that the model reads at the start of each session.
       | Instead of re-explaining the project every time, you maintain a
       | living document of decisions, constraints, and current state.
       | Turns each session into a continuation rather than a cold start.
       | 
       | The biggest productivity gain isn't in the code generation itself
       | -- it's in reducing the re-orientation overhead between sessions.
        
       | nikolay wrote:
       | Well, that's already done by Amazon's Kiro [0], Google's
       | Antigravity [1], GitHub's Spec Kit [2], and OpenSpec [3]!
       | 
       | [0]: https://kiro.dev/
       | 
       | [1]: https://antigravity.google/
       | 
       | [2]: https://github.github.com/spec-kit/
       | 
       | [3]: https://openspec.dev/
        
       | baalimago wrote:
       | Another approach is to spec functionality using comments and
       | interfaces, then tell the LLM to first implement tests and
       | finally make the tests pass. This way you also get regression
       | safety and can inspect that it works as it should via the tests.
        
       | folex wrote:
       | this is exactly how I work with cursor
       | 
       | except that I put notes to plan document in a single message
       | like:                  > plan quote        my note        > plan
       | quote        my note
       | 
       | otherwise, I'm not sure how to guarantee that ai won't confuse my
       | notes with its own plan.
       | 
       | one new thing for me is to review the todo list, I was always
       | relying on auto generated todo list
        
       | adithyassekhar wrote:
       | What I've read is that even with all the meticulous planning, the
       | author still needed to intervene. Not at the end but at the
       | middle, unless it will continue building out something wrong and
       | its even harder to fix once it's done. It'll cost even more
       | tokens. It's a net negative.
       | 
       | You might say a junior might do the same thing, but I'm not
       | worried about it, at least the junior learned something while
       | doing that. They could do it better next time. They know the code
       | and change it from the middle where it broke. It's a net
       | positive.
        
         | ionwake wrote:
         | this comment is the first truly humane one ive read regarding
         | this whole AI fiasco
        
         | anonymousDan wrote:
         | Unfortunately, you could argue that the model provider has also
         | learned something, i.e. the interaction can be used as
         | additional training data to train subsequent models.
        
       | jeleh wrote:
       | Good article, but I would rephrase the core principle slightly:
       | 
       | Never let Claude write code until you've reviewed, *fully
       | understood* and approved a written plan.
       | 
       | In my experience, the beginning of chaos is the point at which
       | you trust that Claude has understood everything correctly and
       | claims to present the very best solution. At that point, you
       | leave the driver's seat.
        
       | vemv wrote:
       | Every "how I use Claude Code" post will get into the HN
       | frontpage.
       | 
       | Which maybe has to do with people wanting to show how _they_ use
       | Claude Code in the comments!
        
       | juanre wrote:
       | Shameless plug: https://beadhub.ai allows you to do exactly that,
       | but with several agents in parallel. One of them is in the role
       | of planner, which takes care of the source-of-truth document and
       | the long term view. They all stay in sync with real-time chat and
       | mail.
       | 
       | It's OSS.
       | 
       | Real-time work is happening at
       | https://app.beadhub.ai/juanre/beadhub (beadhub is a public
       | project at https://beadhub.ai so it is visible).
       | 
       | Particularly interesting (I think) is how the agents chat with
       | each other, which you can see at
       | https://app.beadhub.ai/juanre/beadhub/chat
        
       | colinhb wrote:
       | Quoting the article:
       | 
       | > One trick I use constantly: for well-contained features where
       | I've seen a good implementation in an open source repo, I'll
       | share that code as a reference alongside the plan request. If I
       | want to add sortable IDs, I paste the ID generation code from a
       | project that does it well and say "this is how they do sortable
       | IDs, write a plan.md explaining how we can adopt a similar
       | approach." Claude works dramatically better when it has a
       | concrete reference implementation to work from rather than
       | designing from scratch.
       | 
       | Licensing apparently means nothing.
       | 
       | Ripped off in the training data, ripped off in the prompt.
        
         | miohtama wrote:
         | Concepts are not copyrightable.
        
           | colinhb wrote:
           | The article isn't describing someone who learned the concept
           | of sortable IDs and then wrote their own implementation.
           | 
           | It describes copying and pasting actual code from one project
           | into a prompt so a language model can reproduce it in another
           | project.
           | 
           | It's a mechanical transformation of someone else's
           | copyrighted expression (their code) laundered through a
           | statistical model instead of a human copyist.
        
             | layer8 wrote:
             | "Mechanical" is doing some heavy lifting here. If a human
             | does the same, reimplement the code in their own style for
             | their particular context, it doesn't violate copyright.
             | Having the LLM see the original code doesn't automatically
             | make its output a plagiarism.
        
       | parasti wrote:
       | The biggest roadblock to using agents to maximum effectiveness
       | like this is the chat interface. It's convenience as detriment
       | and convenience as distraction. I've found myself repeatedly
       | giving into that convenience only to realize that I have wasted
       | an hour and need to start over because the agent is just
       | obliviously circling the solution that I thought was fully
       | obvious from the context I gave it. Clearly these tools are
       | exceptional at transforming inputs into outputs and,
       | counterintuitively, not as exceptional when the inputs are
       | constantly interleaved with the outputs like they are in chat
       | mode.
        
       | submeta wrote:
       | What works extremely well for me is this: Let Claude Code create
       | the plan, then turn over the plan to Codex for review, and give
       | the response back to Claude Code. Codex is exceptionally good at
       | doing high level reviews and keeping an eye on the details. It
       | will find very suble errors and omissins. And CC is very good at
       | quickly converting the plan into code.
       | 
       | This back and forth between the two agents with me steering the
       | conversation elevates Claude Code into next level.
        
       | drcongo wrote:
       | This is exactly how I use it.
        
       | oulipo2 wrote:
       | Has Claude Code become slow, laggy, imprecise, giving wrong
       | answers for other people here?
        
       | stuaxo wrote:
       | I had to stop reading about half way, it's written in that
       | breathless linkedin/ai generated style.
        
       | podgorniy wrote:
       | I do the same. I also cross-ask gemini and claude about the plan
       | during iterations, sometimes make several separate plans.
        
       | clbrmbr wrote:
       | I just use Jesse's "superpowers" plugin. It does all of this but
       | also steps you through the design and gives you bite sized chunks
       | and you make architecture decisions along the way. Far better
       | than making big changes to an already established plan.
        
         | tagawa wrote:
         | Link for those interested:
         | https://claude.com/plugins/superpowers
        
           | clbrmbr wrote:
           | I suggest reading the tests that Superpowers author has come
           | up with for testing the skills. See the GitHub repo.
        
           | flippyhead wrote:
           | Have you tried https://github.com/pcvelz/superpowers ?
        
         | clbrmbr wrote:
         | https://github.com/obra/superpowers
        
       | xbmcuser wrote:
       | Gemini is better at research Claude at coding. I try to use
       | Gemini to do all the research and write out instruction on what
       | to do what process to follow then use it in Claude. Though I am
       | mostly creating small python scripts
        
       | sparin9 wrote:
       | I think the real value here isn't "planning vs not planning,"
       | it's forcing the model to surface its assumptions before they
       | harden into code.
       | 
       | LLMs don't usually fail at syntax. They fail at invisible
       | assumptions about architecture, constraints, invariants, etc. A
       | written plan becomes a debugging surface for those assumptions.
        
         | hun3 wrote:
         | Except that merely surfacing them changes their behavior, like
         | how you add that one printf() call and now your heisenbug is
         | suddenly nonexistent
        
         | maccard wrote:
         | > LLMs don't usually fail at syntax?
         | 
         | Really? My experience has been that it's incredibly easy to get
         | them stuck in a loop on a hallucinated API and burn through
         | credits before I've even noticed what it's done. I have a small
         | rust project that stores stuff on disk that I wanted to add an
         | s3 backend too - Claude code burned through my $20 in a loop in
         | about 30 minutes without any awareness of what it was doing on
         | a very simple syntax issue.
        
         | remify wrote:
         | Sub agent also helps a lot in that regard. Have an agent do the
         | planning, have an implementation agent do the code and have
         | another one do the review. Clear responsabilities helps a lot.
         | 
         | There also blue team / red team that works.
         | 
         | The idea is always the same: help LLM to reason properly with
         | less and more clear instructions.
        
           | jalopy wrote:
           | This sounds very promising. Any link to more details?
        
         | MagicMoonlight wrote:
         | Did you just write this with ChatGPT?
        
         | asdxrfx wrote:
         | It's also great to describe the full use case flow in the
         | instructions, so you can clearly understand that LLM won't do
         | some stupid thing on its own
        
       | dr_kretyn wrote:
       | The post and comments all read like: Here are my rituals to the
       | software God. If you follow them then God gives plenty. Omit one
       | step and the God mad. Sometimes you have to make a sacrifice but
       | that's better for the long term.
       | 
       | I've been in eng for decades but never participated in forums. Is
       | the cargo cult new?
       | 
       | I use Claude Code a lot. Still don't trust what's in the plan
       | will get actually written, regardless of details. My ritual is
       | around stronger guardrails outside of prompting. This is the new
       | MongoDB webscale meme.
        
       | getnormality wrote:
       | This looks like an important post. What makes it special is that
       | it operationalizes Polya's classic problem-solving recipe for the
       | age of AI-assisted coding.
       | 
       | 1. Understand the problem (research.md)
       | 
       | 2. Make a plan (plan.md)
       | 
       | 3. Execute the plan
       | 
       | 4. Look back
        
         | christophilus wrote:
         | Yeah, OODA loop for programmers, basically. It's a good
         | approach.
        
       | kissgyorgy wrote:
       | There is not a lot of explanation WHY is this better than doing
       | the opposite: start coding and see how it goes and how this would
       | apply to Codex models.
       | 
       | I do exactly the same, I even developed my own workflows wit Pi
       | agent, which works really well. Here is the reason:
       | 
       | - Claude needs a lot more steering than other models, it's too
       | eager to do stuff and does stupid things and write terrible code
       | without feedback.
       | 
       | - Claude is very good at following the plan, you can even use a
       | much cheaper model if you have a good plan. For example I list
       | every single file which needs edits with a short explanation.
       | 
       | - At the end of the plan, I have a clear picture in my head how
       | the feature will exactly look like and I can be pretty sure the
       | end result will be good enough (given that the model is good at
       | following the plan).
       | 
       | A lot of things don't need planning at all. Simple fixes,
       | refactoring, simple scripts, packaging, etc. Just keep it simple.
        
       | etothet wrote:
       | "The workflow I'm going to describe has one core principle: never
       | let Claude write code until you've reviewed and approved a
       | written plan."
       | 
       | I'm not sure we need to be this black and white about things.
       | Speaking from the perspective of leading a dev team, I regularly
       | have Claude Code take a chance at code without reviewing a plan.
       | For example, small issues that I've written clear details about,
       | Claude can go to town on those. I've never been on a team that
       | didn't have too many of these types of issues to address.
       | 
       | And, a team should have othee guards in place that validates that
       | code before it gets merged somewhere important.
       | 
       | I don't have to review every single decision one of my teammates
       | is going to make, even those less experienced teammates, but I do
       | prepare teammates with the proper tools (specs, documentation,
       | etc) so they can make a best effort first attempt. This is how I
       | treat Claude Code in a lot of scenarios.
        
       | josefrichter wrote:
       | Radically different? Sounds to me like the standard spec driven
       | approach that plenty of people use.
       | 
       | I prefer iterative approach. LLMs give you incredible speed to
       | try different approaches and inform your decisions. I don't think
       | you can ever have a perfect spec upfront, at least that's my
       | experience.
        
       | MagicMoonlight wrote:
       | So we're back to waterfall huh
        
       | islandfox100 wrote:
       | It strikes me that if this technology were as useful and all-
       | encompassing as it's marketed to be, we wouldn't need four
       | articles like this every week
        
         | prplfsh wrote:
         | People are figuring it out. Cars are broadly useful, but
         | there's nuance to how to maintain then, use them will in
         | different terrains and weather, etc.
        
         | hombre_fatal wrote:
         | How many millions of articles are there about people figuring
         | out how to write better software?
         | 
         | Does something have to be trivial-to-use to be useful?
        
       | turingsroot wrote:
       | I've been running AI coding workshops for engineers transitioning
       | from traditional development, and the research phase is
       | consistently the part people skip -- and the part that makes or
       | breaks everything.
       | 
       | The failure mode the author describes (implementations that work
       | in isolation but break the surrounding system) is exactly what I
       | see in workshop after workshop. Engineers prompt the LLM with
       | "add pagination to the list endpoint" and get working code that
       | ignores the existing query builder patterns, duplicates filtering
       | logic, or misses the caching layer entirely.
       | 
       | What I tell people: the research.md isn't busywork, it's your
       | verification that the LLM actually understands the system it's
       | about to modify. If you can't confirm the research is accurate,
       | you have no business trusting the plan.
       | 
       | One thing I'd add to the author's workflow: I've found it helpful
       | to have the LLM explicitly list what it does NOT know or is
       | uncertain about after the research phase. This surfaces blind
       | spots before they become bugs buried three abstraction layers
       | deep.
        
       ___________________________________________________________________
       (page generated 2026-02-22 16:00 UTC)