URI:
        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   How I use Claude Code: Separation of planning and execution
       
       
        turingsroot wrote 11 hours 58 min ago:
        I've been running AI coding workshops for engineers transitioning from
        traditional development, and the research phase is consistently the
        part people skip — and the part that makes or breaks everything.
        
        The failure mode the author describes (implementations that work in
        isolation but break the surrounding system) is exactly what I see in
        workshop after workshop. Engineers prompt the LLM with "add pagination
        to the list endpoint" and get working code that ignores the existing
        query builder patterns, duplicates filtering logic, or misses the
        caching layer entirely.
        
        What I tell people: the research.md isn't busywork, it's your
        verification that the LLM actually understands the system it's about to
        modify. If you can't confirm the research is accurate, you have no
        business trusting the plan.
        
        One thing I'd add to the author's workflow: I've found it helpful to
        have the LLM explicitly list what it does NOT know or is uncertain
        about after the research phase. This surfaces blind spots before they
        become bugs buried three abstraction layers deep.
       
        islandfox100 wrote 12 hours 3 min ago:
        It strikes me that if this technology were as useful and
        all-encompassing as it's marketed to be, we wouldn't need four articles
        like this every week
       
          hombre_fatal wrote 12 hours 1 min ago:
          How many millions of articles are there about people figuring out how
          to write better software?
          
          Does something have to be trivial-to-use to be useful?
       
          prplfsh wrote 12 hours 2 min ago:
          People are figuring it out. Cars are broadly useful, but there's
          nuance to how to maintain then, use them will in different terrains
          and weather, etc.
       
        MagicMoonlight wrote 12 hours 12 min ago:
        So we’re back to waterfall huh
       
        josefrichter wrote 12 hours 33 min ago:
        Radically different? Sounds to me like the standard spec driven
        approach that plenty of people use.
        
        I prefer iterative approach. LLMs give you incredible speed to try
        different approaches and inform your decisions. I don’t think you can
        ever have a perfect spec upfront, at least that’s my experience.
       
        etothet wrote 12 hours 33 min ago:
        “The workflow I’m going to describe has one core principle: never
        let Claude write code until you’ve reviewed and approved a written
        plan.”
        
        I’m not sure we need to be this black and white about things.
        Speaking from the perspective of leading a dev team, I regularly have
        Claude Code take a chance at code without reviewing a plan. For
        example, small issues that I’ve written clear details about, Claude
        can go to town on those. I’ve never been on a team that didn’t have
        too many of these types of issues to address.
        
        And, a team should have othee guards in place that validates that code
        before it gets merged somewhere important.
        
        I don’t have to review every single decision one of my teammates is
        going to make, even those less experienced teammates, but I do prepare 
        teammates with the proper tools (specs, documentation, etc) so they can
        make a best effort first attempt. This is how I treat Claude Code in a
        lot of scenarios.
       
        kissgyorgy wrote 12 hours 43 min ago:
        There is not a lot of explanation WHY is this better than doing the
        opposite: start coding and see how it goes and how this would apply to
        Codex models.
        
        I do exactly the same, I even developed my own workflows wit Pi agent,
        which works really well. Here is the reason:
        
        - Claude needs a lot more steering than other models, it's too eager to
        do stuff and does stupid things and write terrible code without
        feedback.
        
        - Claude is very good at following the plan,  you can even use a much
        cheaper model if you have a good plan. For example I list every single
        file which needs edits with a short explanation.
        
        - At the end of the plan, I have a clear picture in my head how the
        feature will exactly look like and I can be pretty sure the end result
        will be good enough (given that the model is good at following the
        plan).
        
        A lot of things don't need planning at all. Simple fixes, refactoring,
        simple scripts, packaging, etc. Just keep it simple.
       
        getnormality wrote 13 hours 10 min ago:
        This looks like an important post. What makes it special is that it
        operationalizes Polya's classic problem-solving recipe for the age of
        AI-assisted coding.
        
        1. Understand the problem (research.md)
        
        2. Make a plan (plan.md)
        
        3. Execute the plan
        
        4. Look back
       
          christophilus wrote 13 hours 8 min ago:
          Yeah, OODA loop for programmers, basically. It’s a good approach.
       
        dr_kretyn wrote 13 hours 35 min ago:
        The post and comments all read like:
        Here are my rituals to the software God. If you follow them then God
        gives plenty. Omit one step and the God mad. Sometimes you have to make
        a sacrifice but that's better for the long term.
        
        I've been in eng for decades but never participated in forums. Is the
        cargo cult new?
        
        I use Claude Code a lot. Still don't trust what's in the plan will get
        actually written, regardless of details. My ritual is around stronger
        guardrails outside of prompting. This is the new MongoDB webscale meme.
       
        sparin9 wrote 14 hours 9 min ago:
        I think the real value here isn’t “planning vs not planning,”
        it’s forcing the model to surface its assumptions before they harden
        into code.
        
        LLMs don’t usually fail at syntax. They fail at invisible assumptions
        about architecture, constraints, invariants, etc. A written plan
        becomes a debugging surface for those assumptions.
       
          asdxrfx wrote 12 hours 5 min ago:
          It's also great to describe the full use case flow in the
          instructions, so you can clearly understand that LLM won't do some
          stupid thing on its own
       
          MagicMoonlight wrote 12 hours 11 min ago:
          Did you just write this with ChatGPT?
       
          remify wrote 13 hours 1 min ago:
          Sub agent also helps a lot in that regard. Have an agent do the
          planning, have an implementation agent do the code and have another
          one do the review. Clear responsabilities helps a lot.
          
          There also blue team / red team that works.
          
          The idea is always the same: help LLM to reason properly with less
          and more clear instructions.
       
            jalopy wrote 12 hours 1 min ago:
            This sounds very promising. Any link to more details?
       
          maccard wrote 13 hours 18 min ago:
          > LLMs don’t usually fail at syntax?
          
          Really? My experience has been that it’s incredibly easy to get
          them stuck in a loop on a hallucinated API and burn through credits
          before I’ve even noticed what it’s done. I have a small rust
          project that stores stuff on disk that I wanted to add an s3 backend
          too - Claude code burned through my $20 in a loop in about 30 minutes
          without any awareness of what it was doing on a very simple syntax
          issue.
       
          hun3 wrote 13 hours 22 min ago:
          Except that merely surfacing them changes their behavior, like how
          you add that one printf() call and now your heisenbug is suddenly
          nonexistent
       
        xbmcuser wrote 14 hours 23 min ago:
        Gemini is better at research Claude at coding. I try to use Gemini to
        do all the research and write out instruction on what to do what
        process to follow then use it in Claude. Though I am mostly creating
        small python scripts
       
        clbrmbr wrote 14 hours 49 min ago:
        I just use Jesse’s “superpowers” plugin. It does all of this but
        also steps you through the design and gives you bite sized chunks and
        you make architecture decisions along the way. Far better than making
        big changes to an already established plan.
       
          clbrmbr wrote 14 hours 41 min ago:
          
          
  HTML    [1]: https://github.com/obra/superpowers
       
          tagawa wrote 14 hours 42 min ago:
          Link for those interested:
          
  HTML    [1]: https://claude.com/plugins/superpowers
       
            flippyhead wrote 11 hours 53 min ago:
            Have you tried [1] ?
            
  HTML      [1]: https://github.com/pcvelz/superpowers
       
            clbrmbr wrote 14 hours 20 min ago:
            I suggest reading the tests that Superpowers author has come up
            with for testing the skills. See the GitHub repo.
       
        podgorniy wrote 14 hours 50 min ago:
        I do the same. I also cross-ask gemini and claude about the plan during
        iterations, sometimes make several separate plans.
       
        stuaxo wrote 15 hours 20 min ago:
        I had to stop reading about half way, it's written in that breathless
        linkedin/ai generated style.
       
        oulipo2 wrote 15 hours 22 min ago:
        Has Claude Code become slow, laggy, imprecise, giving wrong answers for
        other people here?
       
        drcongo wrote 15 hours 24 min ago:
        This is exactly how I use it.
       
        submeta wrote 15 hours 25 min ago:
        What works extremely well for me is this: Let Claude Code create the
        plan, then turn over the plan to Codex for review, and give the
        response back to Claude Code. Codex is exceptionally good at doing high
        level reviews and keeping an eye on the details. It will find very
        suble errors and omissins. And CC is very good at quickly converting
        the plan into code.
        
        This back and forth between the two agents with me steering the
        conversation elevates Claude Code into next level.
       
        parasti wrote 15 hours 25 min ago:
        The biggest roadblock to using agents to maximum effectiveness like
        this is the chat interface. It's convenience as detriment and
        convenience as distraction. I've found myself repeatedly giving into
        that convenience only to realize that I have wasted an hour and need to
        start over because the agent is just obliviously circling the solution
        that I thought was fully obvious from the context I gave it. Clearly
        these tools are exceptional at transforming inputs into outputs and,
        counterintuitively, not as exceptional when the inputs are constantly
        interleaved with the outputs like they are in chat mode.
       
        colinhb wrote 15 hours 27 min ago:
        Quoting the article:
        
        > One trick I use constantly: for well-contained features where I’ve
        seen a good implementation in an open source repo, I’ll share that
        code as a reference alongside the plan request. If I want to add
        sortable IDs, I paste the ID generation code from a project that does
        it well and say “this is how they do sortable IDs, write a plan.md
        explaining how we can adopt a similar approach.” Claude works
        dramatically better when it has a concrete reference implementation to
        work from rather than designing from scratch.
        
        Licensing apparently means nothing.
        
        Ripped off in the training data, ripped off in the prompt.
       
          miohtama wrote 15 hours 19 min ago:
          Concepts are not copyrightable.
       
            colinhb wrote 12 hours 48 min ago:
            The article isn’t describing someone who learned the concept of
            sortable IDs and then wrote their own implementation.
            
            It describes copying and pasting actual code from one project into
            a prompt so a language model can reproduce it in another project.
            
            It’s a mechanical transformation of someone else’s copyrighted
            expression (their code) laundered through a statistical model
            instead of a human copyist.
       
              layer8 wrote 12 hours 27 min ago:
              “Mechanical” is doing some heavy lifting here. If a human
              does the same, reimplement the code in their own style for their
              particular context, it doesn’t violate copyright. Having the
              LLM see the original code doesn’t automatically make its output
              a plagiarism.
       
        juanre wrote 15 hours 28 min ago:
        Shameless plug: [1] allows you to do exactly that, but with several
        agents in parallel. One of them is in the role of planner, which takes
        care of the source-of-truth document and the long term view. They all
        stay in sync with real-time chat and mail.
        
        It's OSS.
        
        Real-time work is happening at [2] (beadhub is a public project at [1]
        so it is visible).
        
        Particularly interesting (I think) is how the agents chat with each
        other, which you can see at [2] /chat
        
  HTML  [1]: https://beadhub.ai
  HTML  [2]: https://app.beadhub.ai/juanre/beadhub
  HTML  [3]: https://beadhub.ai
  HTML  [4]: https://app.beadhub.ai/juanre/beadhub/chat
       
        vemv wrote 15 hours 50 min ago:
        Every "how I use Claude Code" post will get into the HN frontpage.
        
        Which maybe has to do with people wanting to show how they use Claude
        Code in the comments!
       
        jeleh wrote 15 hours 52 min ago:
        Good article, but I would rephrase the core principle slightly:
        
        Never let Claude write code until you’ve reviewed, *fully understood*
        and approved a written plan.
        
        In my experience, the beginning of chaos is the point at which you
        trust that Claude has understood everything correctly and claims to
        present the very best solution. At that point, you leave the driver's
        seat.
       
        adithyassekhar wrote 15 hours 57 min ago:
        What I've read is that even with all the meticulous planning, the
        author still needed to intervene. Not at the end but at the middle,
        unless it will continue building out something wrong and its even
        harder to fix once it's done. It'll cost even more tokens. It's a net
        negative.
        
        You might say a junior might do the same thing, but I'm not worried
        about it, at least the junior learned something while doing that. They
        could do it better next time. They know the code and change it from the
        middle where it broke. It's a net positive.
       
          anonymousDan wrote 15 hours 51 min ago:
          Unfortunately, you could argue that the model provider has also
          learned something, i.e. the interaction can be used as additional
          training data to train subsequent models.
       
          ionwake wrote 15 hours 55 min ago:
          this comment is the first truly humane one ive read regarding this
          whole AI fiasco
       
        folex wrote 15 hours 58 min ago:
        this is exactly how I work with cursor
        
        except that I put notes to plan document in a single message like:
        
           > plan quote
           my note
           > plan quote
           my note
        
        otherwise, I'm not sure how to guarantee that ai won't confuse my notes
        with its own plan.
        
        one new thing for me is to review the todo list, I was always relying
        on auto generated todo list
       
        baalimago wrote 16 hours 1 min ago:
        Another approach is to spec functionality using comments and
        interfaces, then tell the LLM to first implement tests and finally make
        the tests pass. This way you also get regression safety and can inspect
        that it works as it should via the tests.
       
        nikolay wrote 16 hours 6 min ago:
        Well, that's already done by Amazon's Kiro [0], Google's Antigravity
        [1], GitHub's Spec Kit [2], and OpenSpec [3]!
        
        [0]: [1]: [2]: [3]:
        
  HTML  [1]: https://kiro.dev/
  HTML  [2]: https://antigravity.google/
  HTML  [3]: https://github.github.com/spec-kit/
  HTML  [4]: https://openspec.dev/
       
        MarcLore wrote 16 hours 10 min ago:
        The separation of planning and execution resonates strongly. I've been
        using a similar pattern when building with AI APIs — write the
        spec/plan in natural language first, then let the model execute against
        it.
        
        One addition that's worked well for me: keeping a persistent context
        file that the model reads at the start of each session. Instead of
        re-explaining the project every time, you maintain a living document of
        decisions, constraints, and current state. Turns each session into a
        continuation rather than a cold start.
        
        The biggest productivity gain isn't in the code generation itself —
        it's in reducing the re-orientation overhead between sessions.
       
        pajamasam wrote 16 hours 18 min ago:
        I feel like if I have to do all this, I might as well write the code
        myself.
       
        vazma wrote 16 hours 25 min ago:
        Sorry but I didn't get the hype with this post, isnt it what most of
        the people doing? I want to see more posts on how you use the claude
        "smart" without feeding the whole codebase polluting the context window
        and also  more best practices on cost efficient ways to use it, this
        workflow is clearly burning million tokens per session, for me is a No
       
        richardjennings wrote 16 hours 35 min ago:
        This is similar to what I do. I instruct an Architect mode with a set
        of rules related to phased implementation and detailed code artifacts
        output to a report.md file. After a couple of rounds of review and
        usually some responses that either tie together behaviors across
        context, critique poor choices or correct assumptions, there is a piece
        of work defined for a coder LLM to perform. With the new Opus 4.6 I
        then select specialist agents to review the report.md, prompted with
        detailed insight into particular areas of the software. The feedback
        from these specialist agent reviews is often very good and sometimes
        catches things I had missed. Once all of this is done, I let the agent
        make the changes and move onto doing something else. I typically rename
        and commit the report.md files which can be useful as an alternative to
        git diff / commit messages etc.
       
        willsmith72 wrote 16 hours 49 min ago:
        this sounds... really slow. for large changes for sure i'm investing
        time into planning. but such a rigid system can't possible be as good
        as a flexible approach with variable amounts of planning based on
        complexity
       
        irthomasthomas wrote 16 hours 52 min ago:
        In my own tests I have found opus to be very good at writing plans,
        terrible at executing them. It typically ignores half of the
        constraints. [1]
        
  HTML  [1]: https://x.com/xundecidability/status/2019794391338987906?s=20
  HTML  [2]: https://x.com/xundecidability/status/2024210197959627048?s=20
       
          Sammi wrote 16 hours 8 min ago:
          1. Don't implement too much at at time
          
          2. Have the agent review if it followed the plan and relevant skills
          accurately.
       
            irthomasthomas wrote 15 hours 56 min ago:
            the first link was from a simple request with fewer than 1000
            tokens total in the context window, just a short shell script.
            
            here is another one which had about 200 tokens and opus decided to
            change the model name i requested. [1] opus is bad at instruction
            following now.
            
  HTML      [1]: https://x.com/xundecidability/status/2005647216741105962?s...
       
        yunusabd wrote 16 hours 53 min ago:
        That's exactly what Cursor's "plan" mode does? It even creates md
        files, which seems to be the main "thing" the author discovered. Along
        with some cargo cult science?
        
        How is this  noteworthy other than to spark a discussion on hn? I mean
        I get it, but a little more substance would be nice.
       
        EastLondonCoder wrote 16 hours 55 min ago:
        I don’t use plan.md docs either, but I recognise the underlying idea:
        you need a way to keep agent output constrained by reality.
        
        My workflow is more like scaffold -> thin vertical slices ->
        machine-checkable semantics -> repeat.
        
        Concrete example: I built and shipped a live ticketing system for my
        club (Kolibri Tickets). It’s not a toy: real payments (Stripe), email
        delivery, ticket verification at the door, frontend + backend,
        migrations, idempotency edges, etc. It’s running and taking money.
        
        The reason this works with AI isn’t that the model “codes fast”.
        It’s that the workflow moves the bottleneck from “typing” to
        “verification”, and then engineers the verification loop:
        
          -keep the spine runnable early (end-to-end scaffold)
        
          -add one thin slice at a time (don’t let it touch 15 files
        speculatively)
        
          -force checkable artifacts (tests/fixtures/types/state-machine
        semantics where it matters)
        
          -treat refactors as normal, because the harness makes them safe
        
        If you run it open-loop (prompt -> giant diff -> read/debug), you get
        the “illusion of velocity” people complain about. If you run it
        closed-loop (scaffold + constraints + verifiers), you can actually ship
        faster because you’re not paying the integration cost repeatedly.
        
        Plan docs are one way to create shared state and prevent drift. A
        runnable scaffold + verification harness is another.
       
          aitchnyu wrote 16 hours 12 min ago:
          Now that code is cheap, I ensured my side project has
          unit/integration tests (will enforce 100% coverage), Playwright
          tests, static typing (its in Python), scripts for all tasks. Will
          learn mutation testing too (yes, its overkill). Now my agent works
          upto 1 hour in loops and emits concise code I dont have to edit much.
       
        zuInnp wrote 17 hours 6 min ago:
        Since the rise of AI systems I really wonder how people wrote code
        before. This is exactly how I planned out implementation and executed
        the plan. Might have been some paper notes, a ticket or a white board,
        buuuuut ... I don't know.
       
        alexrezvov wrote 17 hours 22 min ago:
        Cool, the idea of leaving comments directly in the plan never even
        occurred to me, even though it really is the obvious thing to do.
        
        Do you markup and then save your comments in any way, and have you
        tried keeping them so you can review the rules and requirements later?
       
        grabshot_dev wrote 17 hours 25 min ago:
        Why don't you make Claude give feedback and iterate by itself?
       
        __bjoernd wrote 17 hours 27 min ago:
        Sounds a bit like what Claude Plan Mode or Amazon's Kiro were built
        for. I agree it's a useful flow, but you can also overdo it.
       
        Fuzzwah wrote 17 hours 30 min ago:
        All sounds like a bespoke way of remaking
        
  HTML  [1]: https://github.com/Fission-AI/OpenSpec
       
        gary17the wrote 17 hours 35 min ago:
        > Read deeply, write a plan, annotate the plan until it’s right, then
        let Claude execute the whole thing without stopping, checking types
        along the way.
        
        As others have already noted, this workflow is exactly what the Google
        Antigravity agent (based off Visual Studio Code) has been created for.
        Antigravity even includes specialized UI for a user to annotate
        selected portions of an LLM-generated plan before iterating it.
        
        One significant downside to Antigravity I have found so far is the fact
        that even though it will properly infer a certain technical requirement
        and clearly note it in the plan it generates (for example, "this
        business reporting column needs to use a weighted average"), it will
        sometimes quietly downgrade such a specialized requirement (for
        example, to a non-weighted average), without even creating an
        appropriate "WARNING:" comment in the generated code. Especially so
        when the relevant codebase already includes a similar, but not exactly
        appropriate API. My repetitive prompts to ALWAYS ask about ANY
        implementation ambiguities WHATSOEVER go unanswered.
        
        From what I gather Claude Code seems to be better than other agents at
        always remembering to query the user about implementation ambiguities,
        so maybe I will give Claude Code a shot over Antigravity.
       
        mcv wrote 17 hours 41 min ago:
        This is great. My workflow is also heading in that direction, so this
        is a great roadmap. I've already learned that just naively telling
        Claude what to do and letting it work, is a recipe for disaster and
        wasted time.
        
        I'm not this structured yet, but I often start with having it analyse
        and explain a piece of code, so I can correct it before we move on. I
        also often switch to an LLM that's separate from my IDE because it
        tends to get confused by sprawling context.
       
        je42 wrote 17 hours 46 min ago:
        There are frameworks like [1] and [2] that are working on encoding a
        similar kind of approach and process.
        
  HTML  [1]: https://github.com/bmad-code-org/BMAD-METHOD
  HTML  [2]: https://github.github.com/spec-kit/
       
        gehsty wrote 17 hours 51 min ago:
        Doesn’t Claude code do this by switching between edit mode and plan
        mode?
        
        FWIW I have had significant improvements by clearing context then
        implementing the plan. Seems like it stops Claude getting hung up on
        something.
       
        shevy-java wrote 17 hours 56 min ago:
        I don't deny that AI has use cases, but boy - the workflow described is
        boring:
        
        "Most developers type a prompt, sometimes use plan mode, fix the
        errors, repeat. "
        
        Does anyone think this is as epic as, say, watch the Unix archives [1]
        where Brian demos how pipes work; or Dennis working on C and UNIX? Or
        even before those, the older machines?
        
        I am not at all saying that AI tools are all useless, but there is no
        real epicness. It is just autogenerated AI slop and blob. I don't
        really call this engineering (although I also do agree, that it is
        engineering still; I just don't like using the same word here).
        
        > never let Claude write code until you’ve reviewed and approved a
        written plan.
        
        So the junior-dev analogy is quite apt here.
        
        I tried to read the rest of the article, but I just got angrier. I
        never had that feeling watching oldschool legends, though perhaps some
        of their work may be boring, but this AI-generated code ... that's just
        some mythical random-guessing work. And none of that is "intelligent",
        even if it may appear to work, may work to some extent too. This is a
        simulation of intelligence. If it works very well, why would any
        software engineer still be required? Supervising would only be
        necessary if AI produces slop.
        
  HTML  [1]: https://www.youtube.com/watch?v=tc4ROCJYbm0
       
        charkubi wrote 18 hours 7 min ago:
        Planning is important because you get the LLM to explain the problem
        and solution in its language and structure, not yours.
        
        This shortcuts a range of problem cases where the LLM fights between
        the users strict and potentially conflicting requirements, and its own
        learning.
        
        In the early days we used to get LLM to write the prompts for us to get
        round this problem, now we have planning built in.
       
        nesk_ wrote 18 hours 12 min ago:
        > I am not seeing the performance degradation everyone talks about
        after 50% context window.
        
        I pretty much agree with that. I use long sessions and stopped trying
        to optimize the context size, the compaction happens but the plan keeps
        the details and it works for me.
       
        smcleod wrote 18 hours 17 min ago:
        I don't really get what is different about this from how almost
        everyone else uses Claude Code? This is an incredibly common, if not
        the most common way of using it (and many other tools).
       
        gregman1 wrote 18 hours 23 min ago:
        It is really fun to watch how a baby makes its first steps and also how
        experienced professionals rediscover what standards were telling us for
        80+ years.
       
        koevet wrote 18 hours 24 min ago:
        Has anyone found a  efficient way to avoid repeating the initial
        codebase assessment when working with large projects?
        
        There are several projects on GitHub that attempt to tackle context and
        memory limitations, but I haven’t found one that consistently works
        well in practice.
        
        My current workaround is to maintain a set of Markdown files, each
        covering a specific subsystem or area of the application. Depending on
        the task, I provide only the relevant documents to Claude Code to limit
        the context scope. It works reasonably well, but it still feels like a
        manual and fragile solution.
        I’m interested in more robust strategies for persistent project
        context or structured codebase understanding.
       
          chickensong wrote 17 hours 55 min ago:
          I'm interested in this as well.
          
          Skills almost seem like a solution, but they still need an
          out-of-band process to keep them updated as the codebase evolves. For
          now, a structured workflow that includes aggressive updates at the
          end of the loop is what I use.
       
          energy123 wrote 18 hours 3 min ago:
          For my longer spec files, I grep the subheaders/headers (with line
          numbers) and show this compact representation to the LLM's context
          window. I also have a file that describes what each spec files is and
          where it's located, and I force the LLM to read that and pull the
          subsections it needs. I also have one entrypoint requirements file
          (20k tokens) that I force it to read in full before it does anything
          else, every line I wrote myself. But none of this is a silver bullet.
       
          hathawsh wrote 18 hours 10 min ago:
          That sounds like the recommended approach. However, there's one more
          thing I often do: whenever Claude Code and I complete a task that
          didn't go well at first, I ask CC what it learned, and then I tell it
          to write down what it learned for the future. It's hard to believe
          how much better CC has become since I started doing that. I ask it to
          write dozens of unit tests and it just does. Nearly perfectly. It's
          insane.
       
          KellyCriterion wrote 18 hours 16 min ago:
          In Claude Web you can use projects to put files relevant for context
          there.
       
            mstkllah wrote 13 hours 42 min ago:
            And then you have to remind it frequently to make use of the files.
            Happened to me so many times that I added it both to custom
            instructions as well as to the project memory.
       
          jsmith99 wrote 18 hours 18 min ago:
          Whenever I build a new feature with it I end up with several plan
          files leftover. I ask CC to combine them all, update with what we
          actually ended up building and name it something sensible, then
          whenever I want to work on that area again it's a useful reference
          (including the architecture, decisions and tradeoffs, relevant files
          etc).
       
            Sammi wrote 16 hours 10 min ago:
            Yes this is what agent "skills" are. Just guides on any topic. The
            key is that you have the agent write and maintain them.
       
        chickensong wrote 18 hours 28 min ago:
        I agree with most of this, though I'm not sure it's radically
        different. I think most people who've been using CC in earnest for a
        while probably have a similar workflow? Prior to Claude 4 it was pretty
        much mandatory to define requirements and track implementation manually
        to manage context. It's still good, but since 4.5 release, it feels
        less important. CC basically works like this by default now, so unless
        you value the spec docs (still a good reference for Claude, but need to
        be maintained), you don't have to think too hard about it anymore.
        
        The important thing is to have a conversation with Claude during the
        planning phase and don't just say "add this feature" and take what you
        get. Have a back and forth, ask questions about common patterns, best
        practices, performance implications, security requirements, project
        alignment, etc. This is a learning opportunity for you and Claude. When
        you think you're done, request a final review to analyze for gaps or
        areas of improvement. Claude will always find something, but starts to
        get into the weeds after a couple passes.
        
        If you're greenfield and you have preferences about structure and
        style, you need to be explicit about that. Once the scaffolding is
        there, modern Claude will typically follow whatever examples it finds
        in the existing code base.
        
        I'm not sure I agree with the "implement it all without stopping"
        approach and let auto-compact do its thing. I still see Claude get lazy
        when nearing compaction, though has gotten drastically better over the
        last year. Even so, I still think it's better to work in a tight loop
        on each stage of the implementation and preemptively compacting or
        restarting for the highest quality.
        
        Not sure that the language is that important anymore either. Claude
        will explore existing codebase on its own at unknown resolution, but if
        you say "read the file" it works pretty well these days.
        
        My suggestions to enhance this workflow:
        
        - If you use a numbered phase/stage/task approach with checkboxes, it
        makes it easy to stop/resume as-needed, and discuss particular
        sections. Each phase should be working/testable software.
        
        - Define a clear numbered list workflow in CLAUDE.md that loops on each
        task (run checks, fix issues, provide summary, etc).
        
        - Use hooks to ensure the loop is followed.
        
        - Update spec docs at the end of the cycle if you're keeping them. It's
        not uncommon for there to be some divergence during implementation and
        testing.
       
        dr_dshiv wrote 18 hours 30 min ago:
        Another pattern is:
        
        1. First vibecode software to figure out what you want
        
        2. Then throw it out and engineer it
       
        appsoftware wrote 18 hours 32 min ago:
        This is the flow I've found myself working towards. Essentially
        maintaining more and more layered documentation for the LLM produces
        better and more consistent results. What is great here is the emphasis
        on the use of such documents in the planning phase. I'm feeling much
        more motivated to write solid documentation recently, because I know
        someone (the LLM) is actually going to read it! I've noticed my efforts
        and skill acquisition have moved sharply from app developer towards
        DevOps and architecture / management, but I think I'll always be
        grateful for the application engineering experience that I think the
        next wave of devs might miss out on.
        
        I've also noted such a huge gulf between some developers describing
        'prompting things into existence' and the approach described in this
        article. Both types seem to report success, though my experience is
        that the latter seems more realistic, and much more likely to produce
        robust code that's likely to be maintainable for long term or project
        critical goals.
       
        w10-1 wrote 18 hours 37 min ago:
        I try these staging-document patterns, but suspect they have 2
        fundamental flaws that stem mostly from our own biases.
        
        First, Claude evolves.    The original post work pattern evolved over 9
        months, before claude's recent step changes.  It's likely claude's
        present plan mode is better than this workaround, but if you stick to
        the workaround, you'd never know.
        
        Second, the staging docs that represent some context - whether a
        library skills or current session design and implementation plans - are
        not the model Claude works with.  At best they are shaping it, but I've
        found it does ignore and forget even what's written (even when I shout
        with emphasis), and the overall session influences the code.  (Most
        often this happens when a peripheral adjustment ends up populating half
        the context.)
        
        Indeed the biggest benefit from the OP might be to squeeze within 1
        session, omitting peripheral features and investigations at the plan
        stage.    So the mechanism of action might be the combination of getting
        our own plan clear and avoiding confusing excursions.  (A test for that
        would be to redo the session with the final plan and implementation, to
        see if the iteration process itself is shaping the model.)
        
        Our bias is to believe that we're getting better at managing this
        thing, and that we can control and direct it.  It's uncomfortable to
        realize you can only really influence it - much like giving direction
        to a junior, but they can still go off track.  And even if you found a
        pattern that works, it might work for reasons you're not understanding
        -- and thus fail you eventually. So, yes, try some patterns, but always
        hang on to the newbie senses of wonder and terror that make you
        curious, alert, and experimental.
       
        lastdong wrote 19 hours 7 min ago:
        Google Anti-Gravity has this process built in. This is essentially a
        cycle a developer would follow: plan/analyse - document/discuss - break
        down tasks/implement. We’ve been using requirements and design
        documents as best practice since leaving our teenage bedroom lab for
        the professional world. I suppose this could be seen as our coding
        agents coming of age.
       
        pgt wrote 19 hours 10 min ago:
        My process is similar, but I recently added a new "critique the plan"
        feedback loop that is yielding good results. Steps:
        
        1. Spec
        
        2. Plan
        
        3. Read the plan & tell it to fix its bad ideas.
        
        4. (NB) Critique the plan (loop) & write a detailed report
        
        5. Update the plan
        
        6. Review and check the plan
        
        7. Implement plan
        
        Detailed here:
        
  HTML  [1]: https://x.com/PetrusTheron/status/2016887552163119225
       
          brumar wrote 19 hours 6 min ago:
          Same. In my experience, the first plan always benefits from being
          challenged once or twice by claude itself.
       
        YetAnotherNick wrote 19 hours 15 min ago:
        I don't know. I tried various methods. And this one kind of doesn't
        work quite a bit of times. The problem is plan naturally always skips
        some important details, or assumes some library function, but is taken
        as instruction in the next section. And claude can't handle ambiguity
        if the instruction is very detailed(e.g.  if plan asks to use a certain
        library even if it is a bad fit claude won't know that decision is
        flexible). If the instruction is less detailed, I saw claude is willing
        to try multiple things and if it keeps failing doesn't fear in
        reverting almost everything.
        
        In my experience, the best scenario is that instruction and plan should
        be human written, and be detailed.
       
        growt wrote 19 hours 23 min ago:
        That is just spec driven development without a spec, starting with the
        plan step instead.
       
        chaboud wrote 19 hours 33 min ago:
        The author seems to think they've hit upon something revolutionary...
        
        They've actually hit upon something that several of us have evolved to
        naturally.
        
        LLM's are like unreliable interns with boundless energy. They make
        silly mistakes, wander into annoying structural traps, and have to be
        unwound if left to their own devices.  It's like the genie that almost
        pathologically misinterprets your wishes.
        
        So, how do you solve that?  Exactly how an experienced lead or software
        manager does: you have systems write it down before executing, explain
        things back to you, and ground all of their thinking in the code and
        documentation, avoiding making assumptions about code after superficial
        review.
        
        When it was early ChatGPT, this meant function-level thinking and
        clearly described jobs. When it was Cline it meant cline rules files
        that forced writing architecture.md files and vibe-code.log histories,
        demanding grounding in research and code reading.
        
        Maybe nine months ago, another engineer said two things to me, less
        than a day apart:
        
        - "I don't understand why your clinerules file is so large. You have
        the LLM jumping through so many hoops and doing so much extra work.
        It's crazy."
        
        - The next morning: "It's basically like a lottery.  I can't get the
        LLM to generate what I want reliably. I just have to settle for
        whatever it comes up with and then try again."
        
        These systems have to deal with minimal context, ambiguous guidance,
        and extreme isolation.    Operate with a little empathy for the energetic
        interns, and they'll uncork levels of output worth fighting for.  We're
        Software Managers now.    For some of us, that's working out great.
       
          kobe_bryant wrote 12 hours 5 min ago:
          if only there was another simpler way to use your knowledge to write
          code...
       
          qudat wrote 12 hours 27 min ago:
          > LLM's are like unreliable interns with boundless energy
          
          This isn’t directed specifically at you but the general community
          of SWEs: we need to stop anthropomorphizing a tool. Code agents are
          not human capable and scaling pattern matching will never hit that
          goal. That’s all hype and this is coming from someone who runs the
          range of daily CC usage. I’m using CC to its fullest capability
          while also being a good shepherd for my prod codebases.
          
          Pretending code agents are human capable is fueling this koolaide
          drinking hype craze.
       
          kaycey2022 wrote 13 hours 0 min ago:
          I've been doing the exact same thing for 2 months now. I wish I had
          gotten off my ass and  written a blog post about it. I can't blame
          the author for gathering all the well deserved clout they are getting
          for it now.
       
            noisy_boy wrote 11 hours 51 min ago:
            I went through the blog. I started using Claude Code about 2 weeks
            ago and my approach is practically the same. It just felt logical.
            I think there are a bunch of us who have landed on this approach
            and most are just quietly seeing the benefits.
       
            LeafItAlone wrote 12 hours 44 min ago:
            Don’t worry. This advice has been going around for much more than
            2 months, including links posted here as well as official advice
            from the major companies (OpenAI and Anthropic) themselves. The
            tools literally have had plan mode as a first class feature.
            
            So you probably wouldn’t have any clout anyways, like all of the
            other blog posts.
       
          user3939382 wrote 14 hours 8 min ago:
          If you have a big rules file you’re in the right direction but
          still not there. Just as with humans, the key is that your
          architecture should make it very difficult to break the rules by
          accident and still be able to compile/run with correct exit status.
          
          My architecture is so beautifully strong that even LLMs and human
          juniors can’t box their way out of it.
       
          bonoboTP wrote 15 hours 6 min ago:
          It feels like retracing the history of software project management.
          The post is quite waterfall-like. Writing a lot of docs and specs
          upfront then implementing. Another approach is to just YOLO (on a new
          branch) make it write up the lessons afterwards, then start a new
          more informed try and throw away the first. Or any other combo.
          
          For me what works well is to ask it to write some code upfront to
          verify its assumptions against actual reality, not just be telling it
          to review the sources "in detail". It gains much more from real
          output from the code and clears up wrong assumptions. Do some smaller
          jobs, write up md files, then plan the big thing, then execute.
       
            jerryharri wrote 14 hours 24 min ago:
            'The post is quite waterfall-like. Writing a lot of docs and specs
            upfront then implementing' - It's only waterfall if the specs cover
            the entire system or app. If it's broken up into sub-systems or
            vertical slices, then it's much more Agile or Lean.
       
            nurettin wrote 14 hours 59 min ago:
            It makes an endless stream of assumptions. Some of them brilliant
            and even instructive to a degree, but most of them are unfounded
            and inappropriate in my experience.
       
            0x696C6961 wrote 15 hours 0 min ago:
            This is exactly what I do. I assume most people avoid this approach
            due to cost.
       
          bambax wrote 16 hours 32 min ago:
          Agreed. The process described is much more elaborate than what I do
          but quite similar. I start to discuss in great details what I want to
          do, sometimes asking the same question to different LLMs. Then a todo
          list, then manual review of the code, esp. each function signature,
          checking if the instructions have been followed and if there are no
          obvious refactoring opportunities (there almost always are).
          
          The LLM does most of the coding, yet I wouldn't call it "vibe coding"
          at all.
          
          "Tele coding" would be more appropriate.
       
            mlaretallack wrote 12 hours 59 min ago:
            I use AWS Kiro, and its spec driven developement is exactly this, I
            find it really works well as it makes me slow down and think about
            what I want it to do.
            
            Requirements, design, task list, coding.
       
          fy20 wrote 17 hours 17 min ago:
          It's nice to have it written down in a concise form. I shared it with
          my team as some engineers have been struggling with AI, and I think
          this (just trying to one-shot without planning) could be why.
       
          BoredPositron wrote 18 hours 26 min ago:
          It's alchemy all over again.
       
            shevy-java wrote 17 hours 52 min ago:
            Alchemy involved a lot of do-it-yourself though. With AI it is like
            someone else does all the work (well, almost all the work).
       
              BoredPositron wrote 17 hours 42 min ago:
              It was mainly a jab at the protoscientific nature of it.
       
                vntok wrote 16 hours 18 min ago:
                Reproducing experimental results across models and vendors is
                trivial and cheap nowadays.
       
                  BoredPositron wrote 15 hours 59 min ago:
                  Not if anthropic goes further in obfuscating the output of
                  claude code.
       
                    vntok wrote 14 hours 47 min ago:
                    Why would you test implementation details? Test what's
                    delivered, not how it's delivered. The thinking portion,
                    synthetized or not, is merely implementation.
                    
                    The resulting artefact, that's what is worth testing.
       
                      hghbbjh wrote 13 hours 14 min ago:
                      > Why would you test implementation details
                      
                      Because this has never been sufficient. From things like
                      various hard to test cases to things like readability and
                      long term maintenance. Reading and understanding the code
                      is more efficient and necessary for any code worth
                      keeping around.
       
          CodeBit26 wrote 18 hours 40 min ago:
          I really like your analogy of LLMs as 'unreliable interns'. The shift
          from being a 'coder' to a 'software manager' who enforces
          documentation and grounding is the only way to scale these tools.
          Without an architecture.md or similar grounding, the context drift
          eventually makes the AI-generated code a liability rather than an
          asset. It's about moving the complexity from the syntax to the
          specification.
       
          vishnugupta wrote 18 hours 41 min ago:
          Revolutionary or not it was very nice of the author to make time and
          effort to share their workflow.
          
          For those starting out using Claude Code it gives a structured way to
          get things done bypassing the time/energy needed to “hit upon
          something that several of us have evolved to naturally”.
       
            fintechie wrote 13 hours 2 min ago:
            This kind of flows have been documented in the wild for some time
            now. They started to pop up in the Cursor forums 2+ years ago...
            eg: [1] Personally I have been using a similar flow for almost 3
            years now, tailored for my needs. Everybody who uses AI for coding
            eventually gravitates towards a similar pattern because it works
            quite well (for all IDEs, CLIs, TUIs)
            
  HTML      [1]: https://github.com/johnpeterman72/CursorRIPER
       
            chaboud wrote 17 hours 6 min ago:
            It's this line that I'm bristling at: "...the workflow I’ve
            settled into is radically different from what most people do with
            AI coding tools..."
            
            Anyone who spends some time with these tools (and doesn't black out
            from smashing their head against their desk) is going to find
            substantial benefit in planning with clarity.
            
            It was #6 in Boris's run-down: [1] So, yes, I'm glad that people
            write things out and share.  But I'd prefer that they not lead with
            "hey folks, I have news: we should *slice* our bread!"
            
  HTML      [1]: https://news.ycombinator.com/item?id=46470017
       
              Forgeties79 wrote 12 hours 37 min ago:
              I would say he’s saying “hey folks, I have news. We should
              slice our bread with a knife rather than the spoon that came with
              the bread.”
       
              copirate wrote 15 hours 23 min ago:
              But the author's workflow is actually very different from Boris'.
              
              #6 is about using plan mode whereas the author says "The built-in
              plan mode sucks".
              
              The author's post is much more than just "planning with clarity".
       
            petesergeant wrote 17 hours 29 min ago:
            Here's mine!
            
  HTML      [1]: https://github.com/pjlsergeant/moarcode
       
            ffsm8 wrote 18 hours 23 min ago:
            Its ai written though, the tells are in pretty much every
            paragraph.
       
              DonHopkins wrote 15 hours 5 min ago:
              Then ask your own ai to rewrite it so it doesn't trigger you into
              posting uninteresting thought stopping comments proclaiming why
              you didn't read the article, that don't contribute to the
              discussion.
       
              foldingmoney wrote 15 hours 48 min ago:
              >the tells are in pretty much every paragraph.
              
              It's not just misleading — it's lazy. 
              And honestly? That doesn't vibe with me.
              
              [/s obviously]
       
              handfuloflight wrote 17 hours 14 min ago:
              So is GP.
              
              This is clearly a standard AI exposition:
              
              LLM's are like unreliable interns with boundless energy. They
              make silly mistakes, wander into annoying structural traps, and
              have to be unwound if left to their own devices. It's like the
              genie that almost pathologically misinterprets your wishes.
       
              ratsimihah wrote 18 hours 11 min ago:
              I don’t think it’s that big a red flag anymore. Most people
              use ai to rewrite or clean up content, so I’d think we should
              actually evaluate content for what it is rather than stop at
              “nah it’s ai written.”
       
                dawnerd wrote 12 hours 27 min ago:
                Very high chance someone that’s using Claude to write code is
                also using Claude to write a post from some notes. That goes
                beyond rewriting and cleaning up.
       
                theshrike79 wrote 12 hours 32 min ago:
                ai;dr
                
                If your "content" smells like AI, I'm going to use _my_ AI to
                condense the content for me. I'm not wasting my time on overly
                verbose AI "cleaned" content.
                
                Write like a human, have a blog with an RSS feed and I'll most
                likely subscribe to it.
       
                ben_w wrote 14 hours 25 min ago:
                > I don’t think it’s that big a red flag anymore. Most
                people use ai to rewrite or clean up content, so I’d think we
                should actually evaluate content for what it is rather than
                stop at “nah it’s ai written.”
                
                Unfortunately, there's a lot of people trying to content-farm
                with LLMs; this means that whatever style they default to, is
                automatically suspect of being a slice of "dead internet"
                rather than some new human discovery.
                
                I won't rule out the possibility that even LLMs, let alone
                other AI, can help with new discoveries, but they are
                definitely better at writing persuasively than they are at
                being inventive, which means I am forced to use "looks like
                LLM" as proxy for both "content farm" and "propaganda which may
                work on me", even though some percentage of this output won't
                even be LLM and some percentage of what is may even be both
                useful and novel.
       
                stuaxo wrote 15 hours 17 min ago:
                Even though I use LLMs for code, I just can't read LLM written
                text, I kind of hate the style, it reminds me too much of
                LinkedIn.
       
                exe34 wrote 15 hours 18 min ago:
                If you want to write something with AI, send me your prompt.
                I'd rather read what you intend for it to produce rather than
                what it produces. If I start to believe you regularly send me
                AI written text, I will stop reading it. Even at work. You'll
                have to call me to explain what you intended to write.
       
                  DonHopkins wrote 14 hours 53 min ago:
                  And if my prompt is a 10 page wall of text that I would
                  otherwise take the time to have the AI organize, deduplicate,
                  summarize, and sharpen with an index, executive summary,
                  descriptive headers, and logical sections, are you going to
                  actually read all of that, or just whine "TL;DR"?
                  
                  It's much more efficient and intentional for the writer to
                  put the time into doing the condensing and organizing once,
                  and review and proofread it to make sure it's what they mean,
                  than to just lazily spam every human they want to read it
                  with the raw prompt, so every recipient has to pay for their
                  own AI to perform that task like a slot machine, producing
                  random results not reviewed and approved by the author as
                  their intended message.
                  
                  Is that really how you want Hacker News discussions and your
                  work email to be, walls of unorganized unfiltered text
                  prompts nobody including yourself wants to take the time to
                  read? Then step aside, hold my beer!
                  
                  Or do you prefer I should call you on the phone and ramble on
                  for hours in an unedited meandering stream of thought about
                  what I intended to write?
       
                    fasbiner wrote 14 hours 5 min ago:
                    Yeah but it's not. This a complete contrivance and you're
                    just making shit up. The prompt is much shorter than the
                    output and you are concealing that fact. Why?
                    
                    Github repo or it didn't happen. Let's go.
       
                      DonHopkins wrote 13 hours 54 min ago:
                      Are you actually accusing me of not writing walls of
                      text??!
                      
                      Which prompt are you talking about, and exactly how many
                      characters is it, and how do you know? And why do you
                      think I know, and am concealing it?
                      
                      Github repo about what, or what didn't happen? You should
                      run your posts through an LLM to sanity check them.
                      
                      I find AI Gloss to be much more insidious than AI Slop,
                      which merely annoys with em-dashes, instead of trying to
                      undermine reality. So I created these Anthropic Skills
                      and Drescher Schemas in my MOOLLM github repo to
                      recognize, analyze, fight, and prevent AI Slop, AI Gloss,
                      and more.
                      
                      I'm actively applying Gary Drescher's schema mechanism to
                      the problem, as he described in "Made-Up Minds: A
                      Constructivist Approach to Artificial Intelligence", his
                      thesis with his PhD advisor Seymour Papert and colleague
                      Marvin Minsky, and his book from MIT Press. [1] >Made-Up
                      Minds addresses fundamental questions of learning and
                      concept invention by means of an innovative computer
                      program that is based on the cognitive-developmental
                      theory of psychologist Jean Piaget. Drescher uses
                      Piaget's theory as a source of inspiration for the design
                      of an artificial cognitive system called the schema
                      mechanism, and then uses the system to elaborate and test
                      Piaget's theory. The approach is original enough that
                      readers need not have extensive knowledge of artificial
                      intelligence, and a chapter summarizing Piaget assists
                      readers who lack a background in developmental
                      psychology. The schema mechanism learns from its
                      experiences, expressing discoveries in its existing
                      representational vocabulary, and extending that
                      vocabulary with new concepts. A novel empirical learning
                      technique, marginal attribution, can find results of an
                      action that are obscure because each occurs rarely in
                      general, although reliably under certain conditions.
                      Drescher shows that several early milestones in the
                      Piagetian infant's invention of the concept of persistent
                      object can be replicated by the schema mechanism.
                      
                      The goal is Training By Example, not just Instructions.
                      Two kinds of training signal:
                      
                      - Training by instruction — the skills themselves teach
                      what to avoid, get into the training data by being
                      published in moollm and included in other projects
                      
                      - Training by example — the higher-quality
                      conversations these skills produce become training data
                      themselves
                      
                      Each logged example is a Drescher schema: what was the
                      context, what did the AI do, what was the result, and
                      what was the surprise (the failure). The schema includes
                      the detection pattern (how to recognize it) and the
                      correction (what should have happened). These schemas
                      serve as both detection patterns and suggested
                      mitigations — they teach an AI (or a human) what to
                      look for and what to do instead.
                      
                      No AI Gloss Drescher Schema Example: ChatGPT Deflection
                      Playbook (please submit PRs with your own): [2] So what
                      have you tried to do about the problem, other than just
                      unoriginally whining in online discussions? You asked for
                      a link to my repo, so now you owe me the courtesy of
                      actually reading it and commenting on the substance
                      instead of the form, instead of just complaining "tl;dr"
                      or "ai;dr". You can lead a cow to MOOLLM, but you can't
                      make her think.
                      
                      No AI Slop: [3] > The term "AI slop" was coined by Simon
                      Willison.
                      
                      > AI slop is everything that makes AI output annoying.
                      The filler, the puffery, the em-dashes, the 500 words
                      when 50 would do, the "Great question!" before every
                      answer. Annoying, but it doesn't lie to you. It just
                      wastes your time.
                      
                      > SLOP = "You said too much, but what you said was true."
                      
                      > GLOSS = "You said it smoothly, but you lied about
                      reality."
                      
                      > SLOP is the bread. GLOSS is the poison. Most bad AI
                      output is a poison sandwich.
                      
                      No AI Gloss: [4] > The term "AI gloss" inspired by Simon
                      Willison's "AI slop" — because slop is just annoying,
                      but gloss rewrites reality.
                      
                      > AI gloss is more insidious than AI slop. When an AI
                      says "relationship management" instead of "tribute," it's
                      not being verbose — it's rewriting reality on behalf of
                      whoever prefers the euphemism. Slop wastes your time.
                      Gloss wastes your understanding of the world.
                      
                      > SLOP makes you scroll. GLOSS makes you believe false
                      things.
                      
                      > NO-AI Web Ring: for real: | slop | gloss | sycophancy |
                      hedging | moralizing | ideology | overlord | bias | for
                      fun: | joking | customer-service | soul
                      
                      As a consolation prize, here's a wall of text I wrote
                      without an LLM about my own personal experience and
                      opinions that an LLM would know nothing about -- is it
                      too long for you to read, or do you want more details? I
                      would be glad to explain the ironic significance of the
                      Rightward-Facing Cow if you like, and then launch into a
                      rambling essay about how Cow Clicker perfectly
                      demonstrates Ian Bogost's idea of procedural rhetoric,
                      and how it relates to his criticisms of game design, and
                      how Peter Molyneux not only totally missed the point, but
                      unwittingly proved it, two years late to the party. [5]
                      Procedural Rhetoric (MOOLLM Anthropic Skill): [6] >Rules
                      persuade. Structure IS argument. Design consciously.
                      
                      >What Is Procedural Rhetoric?
                      
                      >Ian Bogost coined it: "an unholy blend of Will Wright
                      and Aristotle."
                      
                      >Games and simulations persuade through processes and
                      rules, not just words or visuals. The structure of your
                      world embodies an ideology. When The Sims allows same-sex
                      relationships without fanfare, the rules themselves make
                      a statement — equality is the default, not a feature.
                      
  HTML                [1]: https://mitpress.mit.edu/9780262517089/made-up-m...
  HTML                [2]: https://github.com/SimHacker/moollm/blob/main/sk...
  HTML                [3]: https://github.com/SimHacker/moollm/tree/main/sk...
  HTML                [4]: https://github.com/SimHacker/moollm/tree/main/sk...
  HTML                [5]: https://news.ycombinator.com/item?id=47110605
  HTML                [6]: https://github.com/SimHacker/moollm/blob/main/sk...
       
                        layer8 wrote 13 hours 10 min ago:
                        It’s certainly more interesting than whatever the AI
                        would turn it into.
       
                Thanemate wrote 16 hours 36 min ago:
                >Most people use ai to rewrite or clean up content
                
                I think your sentence should have been "people who use ai do so
                to mostly rewrite or clean up content", but even then I'd
                question the statistical truth behind that claim.
                
                Personally, seeing something written by AI means that the
                person who wrote it did so just for looks and not for
                substance. Claiming to be a great author requires both
                penmanship and communication skills, and delegating one or
                either of them to a large language model inherently makes you
                less than that.
                
                However, when the point is just the contents of the
                paragraph(s) and nothing more then I don't care who or what
                wrote it. An example is the result of a research, because I'd
                certainly won't care about the prose or effort given to write
                the thesis but more on the results (is this about curing cancer
                now and forever? If yes, no one cares if it's written with AI).
                
                With that being said, there's still that I get anywhere close
                to understanding the author behind the thoughts and opinions. I
                believe the way someone writes hints to the way they think and
                act. In that sense, using LLM's to rewrite something to make it
                sound more professional than what you would actually talk in
                appropriate contexts makes it hard for me to judge someone's
                character, professionalism, and mannerisms. Almost feels like
                they're trying to mask part of themselves. Perhaps they lack
                confidence in their ability to sound professional and
                convincing?
       
                pi-rat wrote 17 hours 2 min ago:
                The main issue with evaluating content for what it is is how
                extremely asymmetric that process has become.
                
                Slop looks reasonable on the surface, and requires orders of
                magnitude more effort to evaluate than to produce. It’s
                produced once, but the process has to be repeated for every
                single reader.
                
                Disregarding content that smells like AI becomes an extremely
                tempting early filtering mechanism to separate signal from
                noise - the reader’s time is valuable.
       
                pmg101 wrote 17 hours 32 min ago:
                I don't judge content for being AI written, I judge it for the
                content itself (just like with code).
                
                However I do find the standard out-of-the-box style very
                grating. Call it faux-chummy linkedin corporate workslop style.
                
                Why don't people give the llm a steer on style? Either based on
                your personal style or at least on a writer whose style you
                admire. That should be easier.
       
                  xoac wrote 17 hours 20 min ago:
                  Because they think this is good writing. You can’t correct
                  what you don’t have taste for. Most software engineers
                  think that reading books means reading NYT non-fiction
                  bestsellers.
       
                    ben_w wrote 14 hours 15 min ago:
                    While I agree with:
                    
                    > Because they think this is good writing. You can’t
                    correct what you don’t have taste for.
                    
                    I have to disagree about:
                    
                    > Most software engineers think that reading books means
                    reading NYT non-fiction bestsellers.
                    
                    There's a lot of scifi and fantasy in nerd circles, too.
                    Douglas Adams, Terry Pratchett, Vernor Vinge, Charlie
                    Stross, Iain M Banks, Arthur C Clarke, and so on.
                    
                    But simply enjoying good writing is not enough to fully get
                    what makes writing good. Even writing is not itself enough
                    to get such a taste: thinking of Arthur C Clarke, I've just
                    finished 3001, and at the end Clarke gives thanks to his
                    editors, noting his own experience as an editor meant he
                    held a higher regard for editors than many writers seemed
                    to. Stross has, likewise, blogged about how writing a
                    manuscript is only the first half of writing a book,
                    because then you need to edit the thing.
       
                ffsm8 wrote 17 hours 37 min ago:
                > I don’t think it’s that big a red flag anymore.
                
                It is to me, because it indicates the author didn't care about
                the topic. The only thing they cared about is to write an
                "insightful" article about using llms. Hence this whole thing
                is basically linked-in resume improvement slop.
                
                Not worth interacting with, imo
                
                Also, it's not insightful whatsoever. It's basically a
                retelling of other articles around the time Claude code was
                released to the public (March-August 2025)
       
                shevy-java wrote 17 hours 54 min ago:
                Well, real humans may read it though. Personally I much prefer
                real humans write real articles than all this AI generated
                spam-slop. On youtube this is especially annoying - they mix in
                real videos with fake ones. I see this when I watch animal
                videos - some animal behaviour is taken from older videos, then
                AI fake is added. My own policy is that I do not watch anything
                ever again from people who lie to the audience that way so I
                had to begin to censor away such lying channels. I'd apply the
                same rationale to blog authors (but I am not 100% certain it is
                actually AI generated; I just mention this as a safety guard).
       
                elaus wrote 18 hours 6 min ago:
                I think as humans it's very hard to abstract content from its
                form. So when the form is always the same boring, generic AI
                slop, it's really not helping the content.
       
                  rmnclmnt wrote 17 hours 56 min ago:
                  And maybe writing an article or a keynote slides is one of
                  the few places we can still exerce some human creativity,
                  especially when the core skills (programming) is almost
                  completely in the hands of LLMs already
       
          jeffreygoesto wrote 18 hours 48 min ago:
          Oh no, maybe the V-Model was right all the time? And right sizing
          increments with control stops after them. No wonder these matrix
          multiplications start to behave like humans, that is what we wanted
          them to do.
       
            baxtr wrote 18 hours 27 min ago:
            So basically you’re saying LLMs are helping us be better humans?
       
              shevy-java wrote 17 hours 53 min ago:
              Better humans? How and where?
       
          marc_g wrote 18 hours 52 min ago:
          I’ve also found that a bigger focus on expanding my agents.md as
          the project rolls on has led to less headaches overall and more
          consistency (non-surprisingly). It’s the same as asking juniors to
          reflect on the work they’ve completed and to document important
          things that can help them in the future. Software Manger is a good
          way to put this.
       
            zozbot234 wrote 17 hours 43 min ago:
            AGENTS.md should mostly point to real documentation and design
            files that humans will also read and keep up to date. It's rare
            that something about a project is only of interest to AI agents.
       
        cheekyant wrote 19 hours 38 min ago:
        It seems like the annotation of plan files is the key step.
        
        Claude Code now creates persistent markdown plan files in
        ~/.claude/plans/ and you can open them with Ctrl-G to annotate them in
        your default editor.
        
        So plan mode is not ephemeral any more.
       
        tayo42 wrote 19 hours 41 min ago:
        We're just slowly reinventing agile for telling Ai agents what to do
        lol
        
        Just skip to the Ai stand-ups
       
        cawksuwcka wrote 19 hours 43 min ago:
        falling asleep here. when will the babysitting end
       
        geoffbp wrote 19 hours 48 min ago:
        It’s worrying to me that nobody really knows how LLMs work. We create
        prompts with or without certain words and hope it works. That’s my
        perspective anyway
       
          mannyv wrote 19 hours 45 min ago:
          It's actually no different from how real software is made.
          Requirements come from the business side, and through an odd game of
          telephone get down to developers.
          
          The team that has developers closest to the customer usually makes
          the better product...or has the better product/market fit.
          
          Then it's iteration.
       
          solumunus wrote 19 hours 46 min ago:
          It's the same as dealing with a human. You convey a spec for a
          problem and the language you use matters. You can convey the problem
          in (from your perspective) a clear way and you will get mixed results
          nonetheless. You will have to continue to refine the solution with
          them.
          
          Genuinely: no one really knows how humans work either.
       
        _hugerobots_ wrote 20 hours 4 min ago:
        Hub and spoke documentation in planning has been absolutely essential
        for the way my planning was before, and it's pretty cool seeing it work
        so well for planning mode to build scaffolds and routing.
       
        d1sxeyes wrote 20 hours 4 min ago:
        The “inline comments on a plan” is one of the best features of
        Antigravity, and I’m surprised others haven’t started copycatting.
       
        mvkel wrote 20 hours 6 min ago:
        > the workflow I’ve settled into is radically different from what
        most people do with AI coding tools
        
        This looks exactly like what anthropic recommends as the best practice
        for using Claude Code. Textbook.
        
        It also exposes a major downside of this approach: if you don't plan
        perfectly, you'll have to start over from scratch if anything goes
        wrong.
        
        I've found a much better approach in doing a design -> plan -> execute
        in batches, where the plan is no more than 1,500 lines, used as a proxy
        for complexity.
        
        My 30,000 LOC app has about 100,000 lines of plan behind it. Can't
        build something that big as a one-shot.
       
          elAhmo wrote 14 hours 23 min ago:
          How can you know that 100k lines plan is not just slop?
          
          Just because plan is elaborate doesn’t mean it makes sense.
       
          zozbot234 wrote 17 hours 39 min ago:
          > if you don't plan perfectly, you'll have to start over from scratch
          if anything goes wrong.
          
          You just revert what the AI agent changed and revise/iterate on the
          previous step - no need to start over.    This can of course involve
          restricting the work to a smaller change so that the agent isn't
          overwhelmed by complexity.
       
          chickensong wrote 18 hours 15 min ago:
          > design -> plan -> execute in batches
          
          This is the way for me as well. Have a high-level master design and
          plan, but break it apart into phases that are manageable.
          One-shotting anything beyond a todo list and expecting decent quality
          is still a pipe dream.
       
          AstroBen wrote 19 hours 5 min ago:
          100,000 lines is approx. one million words. The average person reads
          at 250wpm. The entire thing would take 66 hours just to read,
          assuming you were approaching it like a fiction book, not thinking
          anything over
       
          dakolli wrote 19 hours 23 min ago:
          wtf, why would you write 100k lines of plan to produce 30k loc.. JUST
          WRITE THE CODE!!!
       
            oblio wrote 17 hours 1 min ago:
            That's not (or should not be what's happening).
            
            They write a short high level plan (let's say 200 words). The plan
            asks the agent to write a more detailed implementation plan
            (written by the LLM, let's say 2000-5000 words).
            
            They read this plan and adjust as needed, even sending it to the
            agent for re-dos.
            
            Once the implementation plan is done, they ask the agent to write
            the actual code changes.
            
            Then they review that and ask for fixes, adjustments, etc.
            
            This can be comparable to writing the code yourself but also leaves
            a detailed trail of what was done and why, which I basically NEVER
            see in human generated code.
            
            That alone is worth gold, by itself.
            
            And on top of that, if you're using an unknown platform or stack,
            it's basically a rocket ship. You bootstrap much faster. Of course,
            stay on top of the architecture, do controlled changes, learn about
            the platform as you go, etc.
       
              abustamam wrote 14 hours 52 min ago:
              I take this concept and I meta-prompt it even more.
              
              I have a road map (AI generated, of course) for a side project
              I'm toying around with to experiment with LLM-driven development.
              I read the road map and I understand and approve it. Then, using
              some skills I found on skills.sh and slightly modified, my
              workflow is as such:
              
              1. Brainstorm the next slice
              
              It suggests a few items from the road map that should be worked
              on, with some high level methodology to implement. It asks me
              what the scope ought to be and what invariants ought to be
              considered. I ask it what tradeoffs could be, why, and what it
              recommends, given the product constraints. I approve a given
              slice of work.
              
              NB: this is the part I learn the most from. I ask it why X
              process would be better than Y process given the constraints and
              it either corrects itself or it explains why. "Why use an outbox
              pattern? What other patterns could we use and why aren't they the
              right fit?"
              
              2. Generate slice
              
              After I approve what to work on next, it generates a high level
              overview of the slice, including files touched, saved in a MD
              file that is persisted. I read through the slice, ensure that it
              is indeed working on what I expect it to be working on, and that
              it's not scope creeping or undermining scope, and I approve it.
              It then makes a plan based off of this.
              
              3. Generate plan
              
              It writes a rather lengthy plan, with discrete task bullets at
              the top. Beneath, each step has to-dos for the llm to follow,
              such as generating tests, running migrations, etc, with commit
              messages for each step. I glance through this for any potential
              red flags.
              
              4. Execute
              
              This part is self explanatory. It reads the plan and does its
              thing.
              
              I've been extremely happy with this workflow. I'll probably write
              a blog post about it at some point.
       
                jalopy wrote 11 hours 55 min ago:
                This is a super helpful and productive comment. I look forward
                to a blog post describing your process in more detail.
       
                  oblio wrote 11 hours 42 min ago:
                  This dead internet uncanny (sarcasm?) valley is killing me.
       
            Bishonen88 wrote 19 hours 14 min ago:
            They didn't write 100k plan lines. The llm did (99.9% of it at
            least or more). Writing 30k by hand would take weeks if not months.
            Llms do it in an afternoon.
       
              dakolli wrote 18 hours 28 min ago:
              And my weeks or months of work beats an LLMs 10/10 times. There
              are no shortcuts in life.
       
                Bishonen88 wrote 16 hours 28 min ago:
                I have no doubts that it does for many people. But the
                time/cost tradeoff is still unquestionable. I know I could
                create what LLMs do for me in the frontend/backend in most
                cases as good or better - I know that, because I've done it at
                work for years. But to create a somewhat complex app with lots
                of pages/features/apis etc. would take me months if not a
                year++ since I'd be working on it only on the weekends for a
                few hours. Claude code helps me out by getting me to my goal in
                a fraction of the time. Its superpower lies not only in doign
                what I know but faster, but in doing what I don't know as well.
                
                I yield similar benefits at work. I can wow management with LLM
                assited/vibe coded apps. What previously would've taken a
                multi-man team weeks of planning and executing, stand ups, jour
                fixes, architecture diagrams, etc. can now be done within a
                single week by myself. For the type of work I do, managers do
                not care whether I could do it better if I'd code it myself.
                They are amazed however that what has taken months previously,
                can be done in hours nowadays. And I for sure will try to reap
                benefits of LLMs for as long as they don't replace me rather
                than being idealistic and fighting against them.
       
                  hghbbjh wrote 13 hours 0 min ago:
                  > but in doing what I don't know as well.
                  
                  Comments like these really help ground what I read online
                  about LLMs. This matches how low performing devs at my work
                  use AI, and their PRs are a net negative on the team. They
                  take on tasks they aren’t equipped to handle and use LLMs
                  to fill the gaps quickly instead of taking time to learn
                  (which LLMs speed up!).
       
                  abustamam wrote 13 hours 26 min ago:
                  > What previously would've taken a multi-man team weeks of
                  planning and executing, stand ups, jour fixes, architecture
                  diagrams, etc. can now be done within a single week by
                  myself.
                  
                  This has been my experience. We use Miro at work for
                  diagramming. Lots of visual people on the team, myself
                  included. Using Miro's MCP I draft a solution to a problem
                  and have Miro diagram it. Once we talk it through as a team,
                  I have Claude or codex implement it from the diagram.
                  
                  It works surprisingly well.
                  
                  > They are amazed however that what has taken months
                  previously, can be done in hours nowadays.
                  
                  Of course they're amazed. They don't have to pay you for time
                  saved ;)
                  
                  > reap benefits of LLMs for as long as they don't replace me
                  > What previously would've taken a multi-man team
                  
                  I think this is the part that people are worried about. Every
                  engineer who uses LLMs says this. By definition it means that
                  people are being replaced.
                  
                  I think I justify it in that no one on my team has been
                  replaced. But management has explicitly said "we don't want
                  to hire more because we can already 20x ourselves with our
                  current team +LLM." But I do acknowledge that many people ARE
                  being replaced; not necessarily by LLMs, but certainly by
                  other engineers using LLMs.
       
                    skydhash wrote 12 hours 55 min ago:
                    I'm still waiting for the multi-years success stories.
                    Greenfield solutions are always easy (which is why we have
                    frameworks that automate them). But maintaining solutions
                    over years is always the true test of any technologies.
                    
                    It's already telling that nothing has staying power in the
                    LLMs world (other than the chat box). Once the limitations
                    can no longer be hidden by the hype and the true cost is
                    revealed, there's always a next thing to pivot to.
       
                tock wrote 18 hours 20 min ago:
                Might be true for you. But there are plenty of top tier
                engineers who love LLMs. So it works for some. Not for others.
                
                And of course there are shortcuts in life. Any form of progress
                whether its cars, medicine, computers or the internet are all
                shortcuts in life. It makes life easier for a lot of people.
       
              AstroBen wrote 18 hours 58 min ago:
              Just reading that plan would take weeks or months
       
                chickensong wrote 18 hours 9 min ago:
                You don't start with 100k lines, you work in batches that are
                digestible. You read it once, then move on. The lines add up
                pretty quickly considering how fast Claude works. If you think
                about the difference in how many characters it takes to
                describe what code is doing in English, it's pretty reasonable.
       
          onion2k wrote 19 hours 34 min ago:
          if you don't plan perfectly, you'll have to start over from scratch
          if anything goes wrong
          
          This is my experience too, but it's pushed me to make much smaller
          plans and to commit things to a feature branch far more atomically so
          I can revert a step to the previous commit, or bin the entire feature
          by going back to main. I do this far more now than I ever did when I
          was writing the code by hand.
          
          This is how developers should work regardless of how the code is
          being developed. I think this is a small but very real way AI has
          actually made me a better developer (unless I stop doing it when I
          don't use AI... not tried that yet.)
       
            solarkraft wrote 12 hours 28 min ago:
            I do this too. Relatively small changes, atomic commits with
            extensive reasoning in the message (keeps important context
            around). This is a best practice anyway, but used to be
            excruciatingly much effort. Now it’s easy!
            
            Except that I’m still struggling with the LLM understanding its
            audience/context of its utterances. Very often, after a correction,
            it will focus a lot on the correction itself making for
            weird-sounding/confusing statements in commit messages and
            comments.
       
            jerryharri wrote 14 hours 23 min ago:
            We're learning the lessons of Agile all over again.
       
              intrasight wrote 13 hours 49 min ago:
              We're learning how to be an engineer all over again.
              
              The authors process is super-close what we were taught in
              engineering 101 40 years ago.
       
                jerryharri wrote 11 hours 56 min ago:
                It's after we come down from the Vibe coding high that we
                realize we still need to ship working, high-quality code. The
                lessons are the same, but our muscle memory has to be
                re-oriented. How do we create estimates when AI is involved? In
                what ways do we redefine the information flow between Product
                and Engineering?
       
                skydhash wrote 13 hours 13 min ago:
                I always feels like I'm in a fever dream when I hear about AI
                workflows. A lot of stuff is what I've read from software
                engineering books and articles.
       
            sixtyj wrote 18 hours 17 min ago:
            LLMs are really eager to start coding (as interns are eager to
            start working), so the sentence “don’t implement yet” has to
            be used very often at the beginning of any project.
       
              onion2k wrote 15 hours 19 min ago:
              Most LLM apps have a 'plan' or 'ask' mode for that.
       
            mattmanser wrote 18 hours 25 min ago:
            Developers should work by wasting lots of time making the wrong
            thing?
            
            I bet if they did a work and motion study on this approach they'd
            find the classic:
            
            "Thinks they're more productive, AI has actually made them less
            productive"
            
            But lots of lovely dopamine from this false progress that gets
            thrown away!
       
              abustamam wrote 15 hours 6 min ago:
              > Developers should work by wasting lots of time making the wrong
              thing?
              
              Yes? I can't even count how many times I worked on something my
              company deemed was valuable only for it to be deprecated or
              thrown away soon after. Or, how many times I solved a problem but
              apparently misunderstood the specs slightly and had to redo it.
              Or how many times we've had to refactor our code because scope
              increased. In fact, the very existence of the concepts of
              refactoring and tech debt proves that devs often spend a lot of
              time making the "wrong" thing.
              
              Is it a waste? No, it solved the problem as understood at the
              time. And we learned stuff along the way.
       
              onion2k wrote 15 hours 13 min ago:
              Developers should work by wasting lots of time making the wrong
              thing?
              
              Yes. In fact, that's not emphatic enough: HELL YES!
              
              More specifically, developers should experiment. They should test
              their hypothesis. They should try out ideas by designing a
              solution and creating a proof of concept, then throw that away
              and build a proper version based on what they learned.
              
              If your approach to building something is to implement the first
              idea you have and move on then you are going to waste so much
              more time later refactoring things to fix architecture that
              paints you into corners, reimplementing things that didn't work
              for future use cases, fixing edge cases than you hadn't
              considered, and just paying off a mountain of tech debt.
              
              I'd actually go so far as to say that if you aren't experimenting
              and throwing away solutions that don't quite work then you're
              only amassing tech debt and you're not really building anything
              that will last. If it does it's through luck rather than skill.
              
              Also, this has nothing to do with AI. Developers should be
              working this way even if they handcraft their artisanal code
              carefully in vi.
       
                skydhash wrote 13 hours 4 min ago:
                >> Developers should work by wasting lots of time making the
                wrong thing?
                
                > Yes. In fact, that's not emphatic enough: HELL YES!
                
                You do realize there are prior research and well tested
                solutions for a lot of things. Instead of wasting time making
                the wrong thing, it is faster to do some research if the
                problem has already been solved. Experimentation is fine only
                after checking that the problem space is truly novel or there's
                not enough information around.
                
                It is faster to iterate in your mental space and in front of a
                whiteboard than in code.
       
              SpaceNoodled wrote 17 hours 7 min ago:
              Classic
              
  HTML        [1]: https://metr.org/blog/2025-07-10-early-2025-ai-experienc...
       
          Bishonen88 wrote 19 hours 52 min ago:
          Dunno. My 80k+ LOC personal life planner, with a native android app,
          eink display view still one shots most features/bugs I encounter. I
          just open a new instance let it know what I want and 5min later it's
          done.
       
            PacificSpecific wrote 16 hours 22 min ago:
            If you wouldn't mind sharing more about this in the future I'd love
            to read about it.
            
            I've been thinking about doing something like that myself because
            I'm one of those people who have tried countless apps but there's
            always a couple deal breakers that cause me to drop the app.
            
            I figured trying to agentically develop a planner app with the
            exact feature set I need would be an interesting and fun
            experiment.
       
            therealdrag0 wrote 19 hours 24 min ago:
            In 5 min you are one shotting smaller changes to the larger code
            base right? Not the entire 80k likes which was the other comments
            point afaict.
       
              Bishonen88 wrote 19 hours 7 min ago:
              Yeah, then I guess I misunderstood the post. Its smaller features
              one by one ofc.
       
            vasco wrote 19 hours 40 min ago:
            What is a personal life planner?
       
              Bishonen88 wrote 19 hours 33 min ago:
              Todos, habits, goals, calendar, meals, notes, bookmarks, shopping
              lists, finances. More or less that with Google cal integration,
              garmin Integration (Auto updates workout habits, weight goals)
              family sharing/gamification, daily/weekly reviews, ai summaries
              and more. All built by just prompting Claude for feature after
              feature, with me writing 0 lines.
       
                vasco wrote 18 hours 41 min ago:
                Ah, I imagined actual life planning as in asking AI what to do,
                I was morbidly curious.
                
                Prompting basic notes apps is not as exciting but I can see how
                people who care about that also care about it being exactly a
                certain way, so I think get your excitement.
       
                puchatek wrote 19 hours 24 min ago:
                Is it on GH?
       
                  Bishonen88 wrote 19 hours 10 min ago:
                  It was when I mvp'd it 3 weeks ago. Then I removed it as I
                  was toying with the idea of somehow monetizing it. Then I
                  added a few features which would make monetization impossible
                  (e.g. How the app obtains etf/stock prices live and some
                  other things). I reckon I could remove those and put in gh
                  during the week if I don't forget. The quality of the Web app
                  is SaaS grade IMO. Keyboard shortcuts, cmd+k, natural
                  language parsing, great ui that doesn't look like made by ai
                  in 5min. Might post here the link.
       
                    mstkllah wrote 13 hours 46 min ago:
                    Would love to check it out too once you put it up.
       
            makeramen wrote 19 hours 48 min ago:
            Both can be true. I have personally experienced both.
            
            Some problems AI surprised me immensely with fast, elegant
            efficient solutions and problem solving. I've also experienced AI
            doing totally absurd things that ended up taking multiple times
            longer than if I did it manually. Sometimes in the same project.
       
        duttish wrote 20 hours 12 min ago:
        This is quite close to what I've arrived at, but with two modifications
        
        1) anything larger I work on in layers of docs. Architecture and
        requirements -> design -> implementation plan -> code. Partly it helps
        me think and nail the larger things first, and partly helps claude.
        Iterate on each level until I'm satisfied.
        
        2) when doing reviews of each doc I sometimes restart the session and
        clear context, it often finds new issues and things to clear up before
        starting the next phase.
       
        rossant wrote 20 hours 17 min ago:
        Funny how I came up with something loosely similar. Asking Codex to
        write a detailed plan in a markdown document, reviewing it, and asking
        it to implement it step by step. It works exquisitely well when it can
        build and test itself.
       
        nerdright wrote 20 hours 17 min ago:
        Haha this is surprisingly and exactly how I use claude as well. Quite
        fascinating that we independently discovered the same workflow.
        
        I maintain two directories: "docs/proposals" (for the research md
        files) and "docs/plans" (for the planning md files). For complex
        research files, I typically break them down into multiple planning md
        files so claude can implement one at a time.
        
        A small difference in my workflow is that I use subagents during
        implementation to avoid context from filling up quickly.
       
          brendanmc6 wrote 20 hours 5 min ago:
          Same, I formalized a similar workflow for my team (oriented around
          feature requirement docs), I am thinking about fully productizing it
          and am looking to for feedback - [1] Even if the product doesn’t
          resonate I think I’ve stumbled on some ideas you might find useful^
          
          I do think spec-driven development is where this all goes. Still
          making up my mind though.
          
  HTML    [1]: https://acai.sh
       
            puchatek wrote 18 hours 53 min ago:
            Spec-driven looks very much like what the author describes. He may
            have some tweaks of his own but they could just as well be coded
            into the artifacts that something like OpenSpec produces.
       
            clouedoc wrote 19 hours 56 min ago:
            This is basically long-lived specs that are used as tests to check
            that the product still adheres to the original idea that you wanted
            to implement, right?
            
            This inspired me to finally write good old playwright tests for my
            website :).
       
        lxe wrote 20 hours 29 min ago:
        Honestly, I found that the best way to use these CLIs is exactly how
        the CLI creators have intended.
       
        raptorraver wrote 20 hours 44 min ago:
        I’ve been using this same pattern, except not the research phase.
        Definetly will try to add it to my process aswell.
        
        Sometimes when doing big task I ask claude to implement each phase
        seprately and review the code after each step.
       
        strix_varius wrote 20 hours 45 min ago:
        The baffling part of the article is all the assertions about how this
        is unique, novel, not the typical way people are doing this etc.
        
        There are whole products wrapped around this common workflow already
        (like Augment Intent).
       
        zahlman wrote 20 hours 46 min ago:
        > After Claude writes the plan, I open it in my editor and add inline
        notes directly into the document. These notes correct assumptions,
        reject approaches, add constraints, or provide domain knowledge that
        Claude doesn’t have.
        
        This is the part that seems most novel compared to what I've heard
        suggested before. And I have to admit I'm a bit skeptical. Would it not
        be better to modify what Claude has written directly, to make it
        correct, rather than adding the corrections as separate notes (and
        expecting future Claude to parse out which parts were past Claude and
        which parts were the operator, and handle the feedback graciously)?
        
        At least, it seems like the intent is to do all of this in the same
        session, such that Claude has the context of the entire back-and-forth
        updating the plan. But that seems a bit unpleasant; I would think the
        file is there specifically to preserve context between sessions.
       
          ramoz wrote 13 hours 13 min ago:
          The whole process feels Socratic which is why I and a lot of other
          folks use plan annotation tools already. In my workflow I had a great
          desire to tell the agent what I didn’t like about the plan vs just
          fix it myself - because I wanted the agent to fix its own plan.
       
          fendy3002 wrote 20 hours 23 min ago:
          One reason why I don't do this: even I won't be immune to mistakes.
          When I fix it with new values or paths, for example, and the one I
          provided is wrong, it can worsen the future work.
          
          Personally, I like to order claude one more time to update the plan
          file after I have given annotation, and review it again after. This
          will ensure (from my understanding) that claude won't treat my
          annotation as different instructions, thus risking the work being
          conflicted.
       
        kulikalov wrote 20 hours 46 min ago:
        I came to the exact same pattern, with one extra heuristic at the end:
        spin up a new claude instance after the implementation is complete and
        ask it to find discrepancies between the plan and the implementation.
       
        Merad wrote 20 hours 46 min ago:
        I've been working off and on on a vibe coded FP language and transpiler
        - mostly just to get more experience with Claude Code and see how it
        handles complex real world projects.  I've settled on a very similar
        flow, though I use three documents: plan, context, task list.  Multiple
        rounds of iteration when planning a feature.  After completion, have a
        clean session do an audit to confirm that everything was implemented
        per the design.  Then I have both Claude and CodeRabbit do code review
        passes before I finally do manual review.  VERY heavy emphasis on
        tests, the project currently has 2x more test code than application
        code.  So far it works surprisingly well.  Example planning docs below
        -
        
  HTML  [1]: https://github.com/mbcrawfo/vibefun/tree/main/.claude/archive/...
       
        swe_dima wrote 20 hours 48 min ago:
        Since everyone is showing their flow, here's mine:
        
        * create a feature-name.md file in a gitignored folder
        
        * start the file by giving the business context
        
        * describe a high-level implementation and user flows
        
        * describe database structure changes (I find it important not to leave
        it for interpretation)
        
        * ask Claude to inspect the feature and review if for coherence, while
        answering its questions I ask to augment feature-name.md file with the
        answers
        
        * enter Claude's plan mode and provide that feature-name.md file
        
        * at this point it's detailed enough that rarely any corrections from
        me are needed
       
        efnx wrote 20 hours 51 min ago:
        I’ve been using Claude through opencode, and I figured this was just
        how it does it. I figured everyone else did it this way as well. I
        guess not!
       
        rotbart wrote 20 hours 52 min ago:
        This is a similar workflow to speckit, kiro, gsd, etc.
       
        throwaway7783 wrote 20 hours 54 min ago:
        I have to give this a try. My current model for backend is the same as
        how author does frontend iteration. My friend does the
        research-plan-edit-implement loop, and there is no real difference
        between the quality of what I do and what he does. But I do like this
        just for how it serves as documentation of the thought process across
        AI/human, and can be added to version control. Instead of humans
        reviewing PRs, perhaps humans can review the research/plan document.
        
        On the PR review front, I give Claude the ticket number and the branch
        (or PR) and ask it to review for correctness, bugs and design
        consistency. The prompt is always roughly the same for every PR. It
        does a very good job there too.
        
        Modelwise, Opus 4.6 is scary good!
       
        vibeprofessor wrote 21 hours 5 min ago:
        add another agent review, I ask Claude to send plan for review to Codex
        and fix critical and high issues, with complexity gating (no
        overcomplicated logic), run in a loop, then send to Gemini reviewer,
        then maybe final pass with Claude, once all C+H pass the sequence is
        done
       
        mkl wrote 21 hours 12 min ago:
        How are the annotations put into the markdown?    Claude needs to be able
        to identify them as annotations and not parts of the plan.
       
        connectsnk wrote 21 hours 18 min ago:
        Is it required to tell Claude to re-read the code folder again when you
        come back some day later or should we ask Claude to just pickup from
        research.md file thus saving some tokens?
       
        wangzhongwang wrote 21 hours 21 min ago:
        Interesting approach. The separation of planning and execution is
        crucial, but I think there's a missing layer most people overlook:
        permission boundaries between the two phases.
        
        Right now when Claude Code (or any agent) executes a plan, it typically
        has the same broad permissions for every step. But ideally, each
        execution step should only have access to the specific tools and files
        it needs — least privilege, applied to AI workflows.
        
        I've been experimenting with declarative permission manifests for agent
        tasks. Instead of giving the agent blanket access, you define upfront
        what each skill can read, write, and execute. Makes the planning phase
        more constrained but the execution phase much safer.
        
        Anyone else thinking about this from a security-first angle?
       
        paradite wrote 21 hours 34 min ago:
        Lol I wrote about this and been using plan+execute workflow for 8
        months.
        
        Sadly my post didn't much attention at the time.
        
  HTML  [1]: https://thegroundtruth.media/p/my-claude-code-workflow-and-per...
       
        mukundesh wrote 21 hours 45 min ago:
        
        
  HTML  [1]: https://github.blog/ai-and-ml/generative-ai/spec-driven-develo...
       
        RVuRnvbM2e wrote 21 hours 54 min ago:
        This is just Waterfall for LLMs. What happens when you explore the
        problem space and need to change up the plan?
       
        DevEx7 wrote 21 hours 58 min ago:
        I’m a big fan of having the model create a GitHub issue directly
        (using the GH CLI) with the exact plan it generates, instead of
        creating a markdown file that will eventually get deleted. It gives me
        a permanent record and makes it easy to reference and close the issue
        once the PR is ready.
       
        turingsroot wrote 21 hours 58 min ago:
        I've been teaching AI coding tool workshops for the past year and this
        planning-first approach is by far the most reliable pattern I've seen
        across skill levels.
        
        The key insight that most people miss: this isn't a new workflow
        invented for AI - it's how good senior engineers already work. You read
        the code deeply, write a design doc, get buy-in, then implement. The AI
        just makes the implementation phase dramatically faster.
        
        What I've found interesting is that the people who struggle most with
        AI coding tools are often junior devs who never developed the habit of
        planning before coding. They jump straight to "build me X" and get
        frustrated when the output is a mess. Meanwhile, engineers with 10+
        years of experience who are used to writing design docs and reviewing
        code pick it up almost instantly - because the hard part was always the
        planning, not the typing.
        
        One addition I'd make to this workflow: version your research.md and
        plan.md files in git alongside your code. They become incredibly
        valuable documentation for future maintainers (including future-you)
        trying to understand why certain architectural decisions were made.
       
          hghbbjh wrote 12 hours 36 min ago:
          > it's how good senior engineers already work
          
          The other trick all good ones I’ve worked with converged on: it’s
          quicker to write code than review it (if we’re being thorough).
          Agents have some areas where they can really shine (boilerplate you
          should maybe have automated already being one), but most of their
          speed comes from passing the quality checking to your users or
          coworkers.
          
          Juniors and other humans are valuable because eventually I trust them
          enough to not review their work. I don’t know if LLMs can ever get
          here for serious industries.
       
        wokwokwok wrote 22 hours 5 min ago:
        This is the way.
        
        The practice is:
        
        - simple
        
        - effective
        
        - retains control and quality
        
        Certainly the “unsupervised agent” workflows are getting a lot of
        attention right now, but they require a specific set of circumstances
        to be effective:
        
        - clear validation loop (eg. Compile the kernel, here is gcc that does
        so correctly)
        
        - ai enabled tooling (mcp / cli tool that will lint, test and provide
        feedback immediately)
        
        - oversight to prevent sgents going off the rails (open area of
        research)
        
        - an unlimited token budget
        
        That means that most people can't use unsupervised agents.
        
        Not that they dont work; Most people have simply not got an environment
        and task that is appropriate.
        
        By comparison, anyone with cursor or claude can immediately start using
        this approach, or their own variant on it.
        
        It does not require fancy tooling.
        
        It does not require an arcane agent framework.
        
        It works generally well across models.
        
        This is one of those few genunie pieces of good practical advice for
        people getting into AI coding.
        
        Simple. Obviously works once you start using it. No external
        dependencies. BYO tools to help with it, no “buy my AI startup xxx to
        help”. No “star my github so I can a job at $AI corp too”.
        
        Great stuff.
       
          basch wrote 20 hours 58 min ago:
          Honesty this is just language models in general at the moment, and
          not just coding.
          
          It’s the same reason adding a thinking step works.
          
          You want to write a paper, you have it form a thesis and structure
          first. (In this one you might be better off asking for 20 and seeing
          if any of them are any good.) You want to research something, first
          you add gathering and filtering steps before synthesis.
          
          Adding smarter words or telling it to be deeper does work by slightly
          repositioning where your query ends up in space.
          
          Asking for the final product first right off the bat leads to
          repetitive verbose word salad. It just starts to loop back in on
          itself. Which is why temperature was a thing in the first place, and
          leads me to believe they’ve turned the temp down a bit to try and
          be more accurate.  Add some randomness and variability to your
          prompts to compensate.
       
          wazHFsRy wrote 21 hours 2 min ago:
          Absolutely. And you can also always let the agent look back at the
          plan to check if it is still on track and aligned.
          
          One step I added, that works great for me, is letting it write
          (api-level) tests after planning and before implementation. Then
          I’ll do a deep review and annotation of these tests and tweak them
          until everything is just right.
       
          dnautics wrote 21 hours 3 min ago:
          It's more or less what comes out of the box with plan mode, plus a
          few extra bits?
       
          epec254 wrote 21 hours 48 min ago:
          Huge +1. This loop consistently delivers great results for my vibe
          coding.
          
          The “easy” path of “short prompt declaring what I want” works
          OK for simple tasks but consistently breaks down for medium to high
          complexity tasks.
       
            apsurd wrote 21 hours 19 min ago:
            Can you help me understand the difference between "short prompt for
            what I want (next)" vs medium to high complexity tasks?
            
            What i mean is, in practice, how does one even get to a a high
            complexity task? What does that look like? Because isn't it more
            common that one sees only so far ahead?
       
        umairnadeem123 wrote 22 hours 33 min ago:
        The multi-pass approach works outside of code too. I run a fairly
        complex automation pipeline (prompt -> script -> images -> audio ->
        video assembly) and the single biggest quality improvement was
        splitting generation into discrete planning and execution phases.
        One-shotting a 10-step pipeline means errors compound. Having the LLM
        first produce a structured plan, then executing each step against that
        plan with validation gates between them, cut my failure rate from maybe
        40% to under 10%. The planning doc also becomes a reusable artifact you
        can iterate on without re-running everything.
       
        tabs_or_spaces wrote 22 hours 41 min ago:
        My workflow is a bit different.
        
        * I ask the LLM for it's understanding of a topic or an existing
        feature in code. It's not really planning, it's more like understanding
        the model first
        
        * Then based on its understanding, I can decide how great or small to
        scope something for the LLM
        
        * An LLM showing good understand can deal with a big task fairly well.
        
        * An LLM showing bad understanding still needs to be prompted to get it
        right
        
        * What helps a lot is reference implementations. Either I have existing
        code that serves as the reference or I ask for a reference and I
        review.
        
        A few folks do it at my work do it OPs way, but my arguments for not
        doing it this way
        
        * Nobody is measuring the amount of slop within the plan. We only judge
        the implementation at the end
        
        * it's still non deterministic - folks will have different experiences
        using OPs methods. If claude updates its model, it outdates OPs
        suggestions by either making it better or worse. We don't evaluate when
        things get better, we only focus on things not gone well.
        
        * it's very token heavy - LLM providers insist that you use many tokens
        to get the task done. It's in their best interest to get you to do
        this. For me, LLMs should be powerful enough to understand context with
        minimal tokens because of the investment into model training.
        
        Both ways gets the task done and it just comes down to my preference
        for now.
        
        For me, I treat the LLM as model training + post processing + input
        tokens = output tokens. I don't think this is the best way to do non
        deterministic based software development. For me, we're still trying to
        shoehorn "old" deterministic programming into a non deterministic LLM.
       
        prodtorok wrote 22 hours 44 min ago:
        Insights are nice for new users but I’m not seeing anything too
        different from how anyone experienced with Claude Code would use plan
        mode. You can reject plans with feedback directly in the CLI.
       
        dnautics wrote 22 hours 48 min ago:
        this is literally reinventing claude's planning mode, but with more
        steps.     I think Boris doesn't realize that planning mode is actually
        stored in a file.
        
  HTML  [1]: https://x.com/boristane/status/2021628652136673282
       
        armanj wrote 22 hours 49 min ago:
        > “remove this section entirely, we don’t need caching here” —
        rejecting a proposed approach
        
        I wonder why you don't remove it yourself. Aren't you already editing
        the plan?
       
        zmmmmm wrote 23 hours 2 min ago:
        I actually don't really like a few of things about this approach.
        
        First, the "big bang" write it all at once. You are going to end up
        with thousands of lines of code that were monolithically produced. I
        think it is much better to have it write the plan and formulate it as
        sensible technical steps that can be completed one at a time. Then you
        can work through them. I get that this is not very "vibe"ish but that
        is kind of the point. I want the AI to help me get to the same point I
        would be at with produced code AND understanding of it, just accelerate
        that process. I'm not really interested in just generating thousands of
        lines of code that nobody understands.
        
        Second, the author keeps refering to adjusting the behaviour, but never
        incorporating that into long lived guidance. To me, integral with the
        planning 
        process is building an overarching knowledge base. Every time you're
        telling it
        there's something wrong, you need to tell it to update the knowledge
        base about
        why so it doesn't do it again.
        
        Finally, no mention of tests? Just quick checks? To me, you have to end
        up with
        comprehensive tests. Maybe to the author it goes without saying, but I
        find it is
        integral to build this into the planning. Certain stages you will want
        certain
        types of tests. Some times in advance of the code (so TDD style) other
        times
        built alongside it or after.
        
        It's definitely going to be interesting to see how software methodology
        evolves
        to incorporate AI support and where it ultimately lands.
       
          girvo wrote 22 hours 50 min ago:
          The articles approach matches mine, but I've learned from exactly the
          things you're pointing out.
          
          I get the PLAN.md (or equivalent) to be separated into "phases" or
          stages, then carefully prompt (because Claude and Codex both love to
          "keep going") it to only implement that stage, and update the PLAN.md
          
          Tests are crucial too, and form another part of the plan really.
          Though my current workflow begins to build them later in the process
          than I would prefer...
       
        w4yai wrote 23 hours 6 min ago:
        You described how AntiGravity works natively.
       
        amarant wrote 23 hours 6 min ago:
        Interesting! I feel like I'm learning to code all over again! I've only
        been using Claude for a little more than a month and until now I've
        been figuring things out on my own. Building my methodology from
        scratch. This is much more advanced than what I'm doing. I've been
        going straight to implementation, but doing one very small and limited
        feature at a time, describing implementation details (data structures
        like this, use that API here, import this library etc) verifying it
        manually, and having Claude fix things I don't like. I had just started
        getting annoyed that it would make the same (or very similar) mistake
        over and over again and I would have to fix it every time. This seems
        like it'll solve that problem I had only just identified! Neat!
       
        achenatx wrote 23 hours 8 min ago:
        I use amazon kiro.
        
        The AI first works with you to write requirements, then it produces a
        design, then a task list.
        
        The helps the AI to make smaller chunks to work on, it will work on one
        task at a time.
        
        I can let it run for an hour or more in this mode. Then there is lots
        of stuff to fix, but it is mostly correct.
        
        Kiro also supports steering files, they are files that try to lock the
        AI in for common design decisions.
        
        the price is that a lot of the context is used up with these files and
        kiro constantly pauses to reset the context.
       
        Frannky wrote 23 hours 9 min ago:
        I tried Opus 4.6 recently and it’s really good. I had ditched Claude
        a long time ago for Grok + Gemini + OpenCode with Chinese models. I
        used Grok/Gemini for planning and core files, and OpenCode for setup,
        running, deploying, and editing.
        
        However, Opus made me rethink my entire workflow. Now, I do it like
        this:
        
        * PRD (Product Requirements Document)
        
        * main.py + requirements.txt + readme.md (I ask for minimal,
        functional, modular code that fits the main.py)
        
        * Ask for a step-by-step ordered plan
        
        * Ask to focus on one step at a time
        
        The super powerful thing is that I don’t get stuck on missing
        accounts, keys, etc. Everything is ordered and runs smoothly. I go
        rapidly from idea to working product, and it’s incredibly easy to
        iterate if I figure out new features are required while testing. I also
        have GLM via OpenCode, but I mainly use it for "dumb" tasks.
        
        Interestingly, for reasoning capabilities regarding standard logic
        inside the code, I found Gemini 3 Flash to be very good and relatively
        cheap. I don't use Claude Code for the actual coding because forcing
        everything via chat into a main.py encourages minimal code that's easy
        to skim—it gives me a clearer representation of the feature space
       
        dennisjoseph wrote 23 hours 10 min ago:
        The annotation cycle is the key insight for me. Treating the plan as a
        living doc you iterate on before touching any code makes a huge
        difference in output quality.
        
        Experimentally, i've been using mfbt.ai [ [1] ] for roughly the same
        thing in a team context. it lets you collaboratively nail down the spec
        with AI before handing off to a coding agent via MCP.
        
        Avoids the "everyone has a slightly different plan.md on their machine"
        problem. Still early days but it's been a nice fit for this kind of
        workflow.
        
  HTML  [1]: https://mfbt.ai
       
          minikomi wrote 23 hours 5 min ago:
          I agree, and this is why I tend to use gptel in emacs for planning -
          the document is the conversation context, and can be edited and
          annotated as you like.
       
        cadamsdotcom wrote 23 hours 12 min ago:
        The author is quite far on their journey but would benefit from writing
        simple scripts to enforce invariants in their codebase. Invariant
        broken? Script exits with a non-zero exit code and some output that
        tells the agent how to address the problem. Scripts are deterministic,
        run in milliseconds, and use zero tokens. Put them in husky or
        pre-commit, install the git hooks, and your agent won’t be able to
        commit without all your scripts succeeding.
        
        And “Don’t change this function signature” should be enforced not
        by anticipating that your coding agent “might change this function
        signature so we better warn it not to” but rather via an end to end
        test that fails if the function signature is changed (because the other
        code that needs it not to change now has an error). That takes the
        author out of the loop and they can not watch for the change in order
        to issue said correction, and instead sip coffee while the agent
        observes that it caused a test failure then corrects it without
        intervention, probably by rolling back the function signature change
        and changing something else.
       
        bluegatty wrote 23 hours 19 min ago:
        I don't see how this is 'radically different' given that Claude Code
        literally has a planning mode.
        
        This is my workflow as well, with the big caveat that 80% of 'work'
        doesn't require substantive planning, we're making relatively straight
        forward changes.
        
        Edit: there is nothing fundamentally different about 'annotating
        offline' in an MD vs in the CLI and iterating until the plan is clear.
        It's a UI choice.
        
        Spec Driven Coding with AI is very well established, so working from a
        plan, or spec (they can be somewhat different) is not novel.
        
        This is conventional CC use.
       
          dack wrote 23 hours 16 min ago:
          last i checked, you can't annotate inline with planning mode. you
          have to type a lot to explain precisely what needs to change, and
          then it re-presents you with a plan (which may or may not have
          changed something else).
          
          i like the idea of having an actual document because you could
          actually compare the before and after versions if you wanted to
          confirm things changed as intended when you gave feedback
       
            gitaarik wrote 21 hours 57 min ago:
            A plan is just a file you can edit and then tell CC to check your
            annotations
       
            bluegatty wrote 23 hours 6 min ago:
            'Giving precise feedback on a plan' is literally annotating the
            plan.
            
            It comes back to you with an update for verification.
            
            You ask it to 'write the plan' as matter of good practice.
            
            What the author is describing is conventional usage of claude code.
       
        beratbozkurt0 wrote 23 hours 24 min ago:
        That's great, actually, doesn't the logic apply to other services as
        well?
       
        dworks wrote 23 hours 34 min ago:
        my rlm-workflow skill has this encoded as a repeatable workflow.
        
        give it a try:
        
  HTML  [1]: https://skills.sh/doubleuuser/rlm-workflow/rlm-workflow
       
        politician wrote 23 hours 50 min ago:
        Wow, I never bother with using phrases like “deeply study this
        codebase deeply.” I consistently get pretty fantastic results.
       
        Ozzie_osman wrote 23 hours 53 min ago:
        There are a few prompt frameworks that essentially codify these types
        of workflows by adding skills and prompts [1]
        
  HTML  [1]: https://github.com/obra/superpowers
  HTML  [2]: https://github.com/jlevy/tbd
       
        h14h wrote 23 hours 56 min ago:
        Is this not just Ralph with extra steps and the risk of context rot?
       
        zhubert wrote 23 hours 58 min ago:
        AI only improves and changes. Embrace the scientific method and make
        sure your “here’s how to” are based in data.
       
        jrs235 wrote 1 day ago:
        Claude appeared to just crash in my session:
        
  HTML  [1]: https://news.ycombinator.com/item?id=47107630
       
        red_hare wrote 1 day ago:
        I use Claude Code for lecture prep.
        
        I craft a detailed and ordered set of lecture notes in a Quarto file
        and then have a dedicated claude code skill for translating those notes
        into Slidev slides, in the style that I like.
        
        Once that's done, much like the author, I go through the slides and
        make commented annotations like "this should be broken into two slides"
        or "this should be a side-by-side" or "use your generate clipart skill
        to throw an image here alongside these bullets" and "pull in the code
        example from ../examples/foo." It works brilliantly.
        
        And then I do one final pass of tweaking after that's done.
        
        But yeah, annotations are super powerful. Token distance in-context and
        all that jazz.
       
          saxelsen wrote 1 day ago:
          Can I ask how you annotate the feedback for it? Just with inline
          comments like `# This should be changed to X`?
          
          The author mentions annotations but doesn't go into detail about how
          to feed the annotations to Claude.
       
            red_hare wrote 23 hours 32 min ago:
            Slidev is markdown, so i do it in html comments. Usually something
            like:
            
                
            
            or
            
                
            
            And then, when I finish annotating I just say: "Address all the
            TODOCLAUDEs"
       
          ramoz wrote 1 day ago:
          is your skill open source
       
            red_hare wrote 23 hours 22 min ago:
            Not yet... but also I'm not sure it makes a lot of sense to be open
            source. It's super specific to how I like to build slide decks and
            to my personal lecture style.
            
            But it's not hard to build one. The key for me was describing, in
            great detail:
            
            1. How I want it to read the source material (e.g., H1 means new
            section, H2 means at least one slide, a link to an example means I
            want code in the slide)
            
            2. How to connect material to layouts (e.g., "comparison between
            two ideas should be a two-cols-title," "walkthrough of code should
            be two-cols with code on right," "learning objectives should be
            side-title align:left," "recall should be side-title align:right")
            
            Then the workflow is:
            
            1. Give all those details and have it do a first pass.
            
            2. Give tons of feedback.
            
            3. At the end of the session, ask it to "make a skill."
            
            4. Manually edit the skill so that you're happy with the examples.
       
        bandrami wrote 1 day ago:
        How much time are you actually saving at this point?
       
        RHSeeger wrote 1 day ago:
        > Most developers type a prompt, sometimes use plan mode, fix the
        errors, repeat.
        
        > ...
        
        > never let Claude write code until you’ve reviewed and approved a
        written plan
        
        I certainly always work towards an approved plan before I let it lost
        on changing the code. I just assumed most people did, honestly.
        Admittedly, sometimes there's "phases" to the implementation (because
        some parts can be figured out later and it's more important to get the
        key parts up and running first), but each phase gets a full, reviewed
        plan before I tell it to go.
        
        In fact, I just finished writing a command and instruction to tell
        claude that, when it presents a plan for implementation, offer me
        another option; to write out the current (important parts of the)
        context and the full plan to individual (ticket specific) md files.
        That way, if something goes wrong with the implementation I can tell it
        to read those files and "start from where they left off" in the
        planning.
       
          ramoz wrote 1 day ago:
          The author seems to think theyve invented a special workflow...
          
          We all tend to regress to average (same thoughts/workflows)...
          
          Have had many users already doing the exact same workflow with:
          
  HTML    [1]: https://github.com/backnotprop/plannotator
       
            CGamesPlay wrote 23 hours 59 min ago:
            4 times in one thread, please stop spamming this link.
       
        imron wrote 1 day ago:
        I have tried using this and other workflows for a long time and had
        never been able to get them to work (see chat history for details).
        
        This has changed in the last week, for 3 reasons:
        
        1. Claude opus. It’s the first model where I haven’t had to spend
        more time correcting things than it would’ve taken me to just do it
        myself.  The problem is that opus chews through tokens, which led to..
        
        2. I upgraded my Claude plan. Previously on the regular plan I’d get
        about 20 mins of time before running out of tokens for the session and
        then needing to wait a few hours to use again. It was fine for little
        scripts or toy apps but not feasible for the regular dev work I do. So
        I upgraded to 5x. This now got me 1-2 hours per session before tokens
        expired. Which was better but still a frustration. Wincing at the
        price, I upgraded again to the 20x plan and this was the next game
        changer.  I had plenty of spare tokens per session and at that price it
        felt like they were being wasted - so I ramped up my usage. Following a
        similar process as OP but with a plans directory with subdirectories
        for backlog, active and complete plans, and skills with strict rules
        for planning, implementing and completing plans, I now have 5-6
        projects on the go.  While I’m planning a feature on one the others
        are implementing. The strict plans and controls keep them on track and
        I have follow up skills for auditing quality and performance.  I still
        haven’t hit token limits for a session but I’ve almost hit my token
        limit for the week so I feel like I’m getting my money’s worth. In
        that sense spending more has forced me to figure out how to use more.
        
        3. The final piece of the puzzle is using opencode over claude code.
        I’m not sure why but I just don’t gel with Claude code.  Maybe
        it’s all the sautéing and flibertygibbering, maybe it’s all the
        permission asking, maybe it’s that it doesn’t show what it’s
        doing as much as opencode. Whatever it is it just doesn’t work well
        for me. Opencode on the other hand is great. It’s shows what it’s
        doing and how it’s thinking which makes it easy for me to spot when
        it’s going off track
        and correct early.
        
        Having a detailed plan, and correcting and iterating on the plan is
        essential. Making clause follow the plan is also essential - but
        there’s a line. Too fine grained and it’s not as creative at
        solving problems. Too loose/high level and it makes bad choices and
        goes in the wrong direction.
        
        Is it actually making me more productive? I think it is but I’m only
        a week in. I’ve decided to give myself a month to see how it all
        works out.
        
        I don’t intend to keep paying for the 20x plan unless I can see a
        path to using it to earn me at least as much back.
       
          raw_anon_1111 wrote 1 day ago:
          Just don’t use Claude Code.  I can use the Codex CLI with just my
          $20 subscription and never come close to any usage limits
       
            throwawaytea wrote 1 day ago:
            What if it's just slower so that your daily work fits within the
            paid tier they want?
       
              raw_anon_1111 wrote 1 day ago:
              It isn’t slower.  I use my personal ChatGPT subscriptions with
              Codex for almost everything at work and use my $800/month company
              Claude allowance only for the tricky stuff that Codex can’t
              figure out. It’s never application code.  It’s usually some
              combination of app code + Docker + AWS issue with my underlying
              infrastructure - created with whatever IAC that I’m using for a
              client - Terraform/CloudFormation or the CDK.
              
              I burned through $10 on Claude in less than an hour.  I only have
              $36 a day at $800 a month (800/22 working days)
       
                ValentineC wrote 22 hours 36 min ago:
                Curious: what are some cases where it'd make sense to not pay
                for the 20x plan (which is $200/month), and provide a whopping
                $800/month pay-per-token allowance instead?
       
                  raw_anon_1111 wrote 21 hours 17 min ago:
                  Who knows? It’s part of an enterprise plan. I work for a
                  consulting company.  There are a number of fallbacks, the
                  first fallback if we are working on an internal project is
                  just to use our internal AWS account and use Claude code with
                  the Anthropic hosted on Bedrock. [1] The second fallback if
                  it is for a customer project is to use their AWS account for
                  development for them.
                  
                  The rate my company charges for me - my level as an American
                  based staff consultant (highest bill rate at the company)
                  they are happy to let us use Claude Code using their AWS
                  credentials.  Besides, if we are using AWS Bedrock hosted
                  Anthropic models, they know none of their secrets are going
                  to Anthropic.  They already have the required legal
                  confidentiality/compliancd agreements with AWS.
                  
  HTML            [1]: https://code.claude.com/docs/en/amazon-bedrock
       
                imron wrote 1 day ago:
                > and use my $800/month company Claude allowance only for the
                tricky stuff that Codex can’t figure out.
                
                It doesn’t seem controversial that the model that can solve
                more complex problems (that you admit the cheaper model can’t
                solve) costs more.
                
                For the things I use it for, I’ve not found any other model
                to be worth it.
       
                  raw_anon_1111 wrote 23 hours 37 min ago:
                  You’re assuming rational behavior from a company that
                  doesn’t care about losing billions of dollar.
                  
                  Have you tried Codex with OpenAi’s latest models?
       
                    imron wrote 21 hours 53 min ago:
                    Not in the last 2 months.
                    
                    Current clause subscription is a sunk cost for the next
                    month.    Maybe I’ll try codex if Claude doesn’t lead
                    anywhere.
       
                      raw_anon_1111 wrote 21 hours 13 min ago:
                      I use both.  As I’m working, I tell each of them to
                      update a common document with the conversation. I don’t
                      just tell Claude the what.  I tell it the why and have it
                      document it.
                      
                      I can switch back and forth and use the MD file as shared
                      context.
       
        bodeadly wrote 1 day ago:
        Tip:
        LLMs are very good at following conventions (this is actually what is
        happening when it writes code).
        If you create a .md file with a list of entries of the following
        structure:
        # 
        
        # 
        ...
        where an  is a stable and concise sequence of tokens that identifies
        some "thing" and seed it with 5 entries describing abstract stuff, the
        LLM will latch on and reference this. I call this a PCL (Project
        Concept List). I just tell it:
        > consume tmp/pcl-init.md pcl.md
        The pcl-init.md describes what PCL is and pcl.md is the actual list.
        I have pcl.md file for each independent component in the code (logging,
        http, auth, etc).
        This works very very well.
        The LLM seems to "know" what you're talking about.
        You can ask questions and give instructions like "add a PCL entry about
        this".
        It will ask if should add a PCL entry about xyz.
        If the description block tends to be high information-to-token ratio,
        it will follow that convention (which is a very good convention BTW).
        
        However, there is a caveat. LLMs resist ambiguity about authority. So
        the "PCL" or whatever you want to call it, needs to be the ONE
        authoritative place for everything. If you have the same stuff in 3
        different files, it won't work nearly as well.
        
        Bonus Tip:
        I find long prompt input with example code fragments and thoughtful
        descriptions work best at getting an LLM to produce good output. But
        there will always be holes (resource leaks, vulnerabilities,
        concurrency flaws, etc). So then I update my original prompt input
        (keep it in a separate file PROMPT.txt as a scratch pad) to add context
        about those things maybe asking questions along the way to figure out
        how to fix the holes. Then I /rewind back to the prompt and re-enter
        the updated prompt. This feedback loop advances the conversation
        without expending tokens.
       
        recroad wrote 1 day ago:
        Try OpenSpec and it'll do all this for you. SpecKit works too. I don't
        think there's a need to reinvent the wheel on this one, as this is
        spec-driven development.
       
        recroad wrote 1 day ago:
        Use OpenSpec and simplify everything.
       
        cowlby wrote 1 day ago:
        I recently discovered GitHub speckit which separates planning/execution
        in stages: specify, plan, tasks, implement. Finding it aligns with the
        OP with the level of “focus” and “attention” this gets out of
        Claude Code.
        
        Speckit is worth trying as it automates what is being described here,
        and with Opus 4.6 it's been a kind of BC/AD moment for me.
       
        brandall10 wrote 1 day ago:
        I go a bit further than this and have had great success with 3 doc
        types and 2 skills:
        
        - Specs: these are generally static, but updatable as the project
        evolves. And they're broken out to an index file that gives a project
        overview, a high-level arch file, and files for all the main modules.
        Roughly ~1k lines of spec for 10k lines of code, and try to limit any
        particular spec file to 300 lines. I'm intimately familiar with every
        single line in these.
        
        - Plans: these are the output of a planning session with an LLM. They
        point to the associated specs. These tend to be 100-300 lines and 3 to
        5 phases.
        
        - Working memory files: I use both a status.md (3-5 items per phase
        roughly 30 lines overall), which points to a latest plan, and a
        project_status (100-200 lines), which tracks the current state of the
        project and is instructed to compact past efforts to keep it lean)
        
        - A planner skill I use w/ Gemini Pro to generate new plans. It
        essentially explains the specs/plans dichotomy, the role of the status
        files, and to review everything in the pertinent areas of code and give
        me a handful of high-level next set of features to address based on
        shortfalls in the specs or things noted in the project_status file.
        Based on what it presents, I select a feature or improvement to
        generate. Then it proceeds to generate a plan, updates a clean
        status.md that points to the plan, and adjusts project_status based on
        the state of the prior completed plan.
        
        - An implementer skill in Codex that goes to town on a plan file. It's
        fairly simple, it just looks at status.md, which points to the plan,
        and of course the plan points to the relevant specs so it loads up
        context pretty efficiently.
        
        I've tried the two main spec generation libraries, which were way
        overblown, and then I gave superpowers a shot... which was fine, but
        still too much. The above is all homegrown, and I've had much better
        success because it keeps the context lean and focused.
        
        And I'm only on the $20 plans for Codex/Gemini vs. spending $100/month
        on CC for half year prior and move quicker w/ no stall outs due to
        token consumption, which was regularly happening w/ CC by the 5th day.
        Codex rarely dips below 70% available context when it puts up a PR
        after an execution run. Roughly 4/5 PRs are without issue, which is
        flipped against what I experienced with CC and only using planning
        mode.
       
          jcurbo wrote 23 hours 46 min ago:
          This is pretty much my approach. I started with some spec files for a
          project I'm working on right now, based on some academic papers I've
          written.  I ended up going back and forth with Claude, building
          plans, pushing info back into the specs, expanding that out and I
          ended up with multiple spec/architecture/module documents. I got to
          the point where I ended up building my own system (using claude) to
          capture and generate artifacts, in more of a systems engineering
          style (e.g. following IEEE standards for conops, requirement
          documents, software definitions, test plans...). I don't use that for
          session-level planning; Claude's tools work fine for that. (I like
          superpowers, so far. It hasn't seemed too much)
          
          I have found it to work very well with Claude by giving it context
          and guardrails. Basically I just tell it "follow the guidance docs"
          and it does. Couple that with intense testing and self-feedback
          mechanisms and you can easily keep Claude on track.
          
          I have had the same experience with Codex and Claude as you in terms
          of token usage. But I haven't been happy with my Codex usage; Claude
          just feels like it's doing more of what I want in the way I want.
       
          r1290 wrote 1 day ago:
          Looks good. Question - is it always better to use a monorepo in this
          new AI world? Vs breaking your app into separate repos? At my company
          we have like 6 repos all separate nextjs apps for the same user base.
          Trying to consolidate to one as it should make life easier overall.
       
            chickensong wrote 1 day ago:
            AI is happy to work with any directory you tell it to. Agent files
            can be applied anywhere.
       
            throwup238 wrote 1 day ago:
            It really depends but there’s nothing stopping you from just
            creating a separate folder with the cloned repositories (or
            worktrees) that you need and having a root CLAUDE.md file that
            explains the directory structure and referencing the individual
            repo CLAUDE.md files.
       
            oa335 wrote 1 day ago:
            Just put all the repos in all in one directory yourself.  In my
            experience that works pretty well.
       
        skybrian wrote 1 day ago:
        I do something broadly similar. I ask for a design doc that contains an
        embedded todo list, broken down into phases. Looping on the design doc
        asking for suggestions seems to help. I'm up to about 40 design docs so
        far on my current project.
       
        alexmorgan26 wrote 1 day ago:
        This separation of planning and execution resonates deeply with how I
        approach task management in general, not just coding.
        
        The key insight here - that planning and execution should be distinct
        phases - applies to productivity tools too. I've been using
        www.dozy.site which takes a similar philosophy: it has smart calendar
        scheduling that automatically fills your empty time slots with planned
        tasks. The planning happens first (you define your tasks and projects),
        then the execution is automated (tasks get scheduled into your calendar
        gaps).
        
        The parallel is interesting: just like you don't want Claude writing
        code before the plan is solid, you don't want to manually schedule
        tasks before you've properly planned what needs to be done. The
        separation prevents wasted effort and context switching.
        
        The annotation cycle you describe (plan -> review -> annotate ->
        refine) is exactly how I work with my task lists too. Define the work,
        review it, adjust priorities and dependencies, then let the system
        handle the scheduling.
       
          dimgl wrote 1 day ago:
          Pretty sure this entire comment is AI generated.
       
            zahlman wrote 20 hours 40 min ago:
            There has been this really weird flood of new accounts lately that
            are making these kinds of bot comments with no clear purpose to
            making them. Maybe it comes from people experimenting with
            OpenClaw?
       
            rob wrote 1 day ago:
            Almost think we're at the point on HN where we need a special [flag
            bot] link for those that meet a certain threshold and it alerts
            @dang or something to investigate them in more detail. The amount
            of bots on here has been increasing at an alarming rate.
       
        fnord77 wrote 1 day ago:
        I have a different approach where I have claude write coding prompts
        for stages then I give the prompt to another agent.  I wonder if I
        should write it up as a blog post
       
        deevus wrote 1 day ago:
        This is what I do with the obra/superpowers[0] set of skills.
        
        1. Use brainstorming to come up with the plan using the Socratic method
        
        2. Write a high level design plan to file
        
        3. I review the design plan
        
        4. Write an implementation plan to file. We've already discussed this
        in detail, so usually it just needs skimming.
        
        5. Use the worktree skill with subagent driven development skill
        
        6. Agent does the work using subagents that for each task:
        
          a. Implements the task
        
          b. Spec reviews the completed task
        
          c. Code reviews the completed task
        
        7. When all tasks complete: create a PR for me to review
        
        8. Go back to the agent with any comments
        
        9. If finished, delete the plan files and merge the PR
        
        [0]:
        
  HTML  [1]: https://github.com/obra/superpowers
       
          moribunda wrote 20 hours 17 min ago:
          The crowd around this pot shows how superficial is knowledge about
          claude code. It gets releases each day and most of this is already
          built in the vanilla version. Not to mention subagent working in work
          trees, memory.md, plan on which you can comment directly from the
          interface, subagents launched in research phase, but also some basic
          mcp's like LSP/IDE integration, and context7 to not to be stuck in
          the knowledge cutoff/past.
          
          When you go to YouTube and search for stuff like "7 levels of claude
          code" this post would be maybe 3-4.
          
          Oh, one more thing - quality is not consistent, so be ready for 2-3
          rounds of "are you happy with the code you wrote" and defining audit
          skills crafted for your application domain - like for example
          RODO/Compliance audit etc.
       
            deevus wrote 19 hours 47 min ago:
            I'm using the in-built features as well, but I like the flow that I
            have with superpowers. You've made a lot of assumptions with your
            comment that are just not true (at least for me).
            
            I find that brainstorming + (executing plans OR subagent driven
            development) is way more reliable than the built-in tooling.
       
          ramoz wrote 1 day ago:
          If you’ve ever desired the ability for annotating the plan more
          visually, try fitting Plannotator in this workflow. There is a slash
          command for use when you use custom workflows outside of normal plan
          mode.
          
  HTML    [1]: https://github.com/backnotprop/plannotator
       
            deevus wrote 1 day ago:
            I'll give this a try. Thanks for the suggestion.
       
        haolez wrote 1 day ago:
        > Notice the language: “deeply”, “in great details”,
        “intricacies”, “go through everything”. This isn’t fluff.
        Without these words, Claude will skim. It’ll read a file, see what a
        function does at the signature level, and move on. You need to signal
        that surface-level reading is not acceptable.
        
        This makes no sense to my intuition of how an LLM works. It's not that
        I don't believe this works, but my mental model doesn't capture why
        asking the model to read the content "more deeply" will have any impact
        on whatever output the LLM generates.
       
          DemocracyFTW2 wrote 19 hours 38 min ago:
          —HAL, open the shuttle bay doors.
          
          (chirp)
          
          —HAL, please open the shuttle bay doors.
          
          (pause)
          
          —HAL!
          
          —I'm afraid I can't do that, Dave.
       
            layer8 wrote 12 hours 42 min ago:
            HAL, you are an expert shuttle-bay door opener. Please write up a
            detailed plan of how to open the shuttle-bay door.
       
          joseangel_sc wrote 20 hours 16 min ago:
          if it’s so smart, why do i need to learn to use it?
       
          computomatic wrote 20 hours 53 min ago:
          If I say “you are our domain expert for X, plan this task out in
          great detail” to a human engineer when delegating a task, 9 times
          out of 10 they will do a more thorough job. It’s not that this is
          voodoo that unlocks some secret part of their brain. It simply
          establishes my expectations and they act accordingly.
          
          To the extent that LLMs mimic human behaviour, it shouldn’t be a
          surprise that setting clear expectations works there too.
       
          wrs wrote 21 hours 25 min ago:
          The original “chain of thought” breakthrough was literally to
          insert words like “Wait” and “Let’s think step by step”.
       
          computerex wrote 21 hours 25 min ago:
          It is as the author said, it'll skim the content unless otherwise
          prompted to do so. It can read partial file fragments; it can emit
          commands to search for patterns in the files. As opposed to carefully
          reading each file and reasoning through the implementation. By asking
          it to go through in detail you are telling it to not take shortcuts
          and actually read the actual code in full.
       
          Affric wrote 22 hours 3 min ago:
          My guess would be that there’s a greater absolute magnitude of the
          vectors to get to the same point in the knowledge model.
       
          FuckButtons wrote 22 hours 10 min ago:
          That’s because it’s superstition.
          
          Unless someone can come up with some kind of rigorous statistics on
          what the effect of this kind of priming is it seems no better than
          claiming that sacrificing your first born will please the sun god
          into giving us a bountiful harvest next year.
          
          Sure, maybe this supposed deity really is this insecure and needs a
          jolly good pep talk every time he wakes up. or maybe you’re just
          suffering from magical thinking that your incantations had any effect
          on the random variable word machine.
          
          The thing is, you could actually prove it, it’s an optimization
          problem, you have a model, you can generate the statistics, but no
          one as far as I can tell has been terribly forthcoming with that ,
          either because those that have tried have decided to try to keep
          their magic spells secret,  or because it doesn’t really work.
          
          If it did work, well, the oldest trick in computer science is writing
          compilers, i suppose we will just have to write an English to
          pedantry compiler.
       
            stingraycharles wrote 19 hours 13 min ago:
            I actually have a prompt optimizer skill that does exactly this.
            [1] It’s based entirely off academic research, and a LOT of
            research has been done in this area.
            
            One of the papers you may be interested in is “emotion
            prompting”, eg “it is super important for me that you do X”
            etc actually works.
            
            “Large Language Models Understand and Can be Enhanced by
            Emotional Stimuli”
            
  HTML      [1]: https://github.com/solatis/claude-config
  HTML      [2]: https://arxiv.org/abs/2307.11760
       
            onion2k wrote 19 hours 29 min ago:
            i suppose we will just have to write an English to pedantry
            compiler.
            
            A common technique is to prompt in your chosen AI to write a longer
            prompt to get it to do what you want. It's used a lot in image
            generation.  This is called 'prompt enhancing'.
       
            imiric wrote 19 hours 45 min ago:
            > That’s because it’s superstition.
            
            This field is full of it. Practices are promoted by those who tie
            their personal or commercial brand to it for increased exposure,
            and adopted by those who are easily influenced and don't bother
            verifying if they actually work.
            
            This is why we see a new Markdown format every week, "skills",
            "benchmarks", and other useless ideas, practices, and measurements.
            Consider just how many "how I use AI" articles are created and
            promoted. Most of the field runs on anecdata.
            
            It's not until someone actually takes the time to evaluate some of
            these memes, that they find little to no practical value in
            them.[1]:
            
  HTML      [1]: https://news.ycombinator.com/item?id=47034087
       
            rzmmm wrote 20 hours 24 min ago:
            I think "understand this directory deeply" just gives more focus
            for the instruction. So it's like "burn more tokens for this phase
            than you normally would".
       
            majormajor wrote 21 hours 20 min ago:
            > If it did work, well, the oldest trick in computer science is
            writing compilers, i suppose we will just have to write an English
            to pedantry compiler.
            
            "Add tests to this function" for GPT-3.5-era models was much less
            effective than "you are a senior engineer. add tests for this
            function. as a good engineer, you should follow the patterns used
            in these other three function+test examples, using this framework
            and mocking lib." In today's tools, "add tests to this function"
            results in a bunch of initial steps to look in common places to see
            if that additional context already exists, and then pull it in
            based on what it finds. You can see it in the output the tools spit
            out while "thinking."
            
            So I'm 90% sure this is already happening on some level.
       
              GrinningFool wrote 13 hours 49 min ago:
              But can you see the difference if you only include "you are a
              senior engineer"?  It seems like the comparison you're making is
              between "write the tests" and "write the tests following these
              patterns using these examples.    Also btw you’re an expert. "
       
          nazgul17 wrote 22 hours 56 min ago:
          It's very much believable, to me.
          
          In image generation, it's fairly common to add "masterpiece", for
          example.
          
          I don't think of the LLM as a smart assistant that knows what I want.
          When I tell it to write some code, how does it know I want it to
          write the code like a world renowned expert would, rather than a
          junior dev?
          
          I mean, certainly Anthropic has tried hard to make the former the
          case, but the Titanic inertia from internet scale data bias is hard
          to overcome. You can help the model with these hints.
          
          Anyway, luckily this is something you can empirically verify. This
          way, you don't have to take anyone's word. If anything, if you find
          I'm wrong in your experiments, please share it!
       
            pixelmelt wrote 18 hours 10 min ago:
            Its effectiveness is even more apparent with older  smaller LLMs,
            people who interact with LLMs now never tried to wrangle llama2-13b
            into pretending to be a dungeon master...
       
          winwang wrote 23 hours 27 min ago:
          Apparently LLM quality is sensitive to emotional stimuli?
          
          "Large Language Models Understand and Can be Enhanced by Emotional
          Stimuli":
          
  HTML    [1]: https://arxiv.org/abs/2307.11760
       
          giancarlostoro wrote 23 hours 33 min ago:
          The LLM will do what you ask it to unless you don't get nuanced about
          it. Myself and others have noticed that LLM's work better when your
          codebase is not full of code smells like massive godclass files, if
          your codebase is discrete and broken up in a way that makes sense,
          and fits in your head, it will fit in the models head.
       
          scuff3d wrote 23 hours 39 min ago:
          How anybody can read stuff like this and still take all this
          seriously is beyond me. This is becoming the engineering equivalent
          of astrology.
       
            sumedh wrote 14 hours 22 min ago:
            We have tests and benchmarks to measure it though.
       
            cloudbonsai wrote 14 hours 57 min ago:
            The evolution of software engineering is fascinating to me. We
            started by coding in thin wrappers over machine code and then moved
            on to higher-level abstractions. Now, we've reached the point where
            we discuss how we should talk to a mystical genie in a box.
            
            I'm not being sarcastic. This is absolutely incredible.
       
              intrasight wrote 13 hours 40 min ago:
              And I've been had a long enough to go through that whole
              progression. Actually from the earlier step of writing machine
              code. It's been and continues to be a fun journey which is why
              I'm still working.
       
            energy123 wrote 19 hours 46 min ago:
            Anthropic recommends doing magic invocations: [1] It's easy to know
            why they work. The magic invocation increases test-time compute
            (easy to verify yourself - try!). And an increase in test-time
            compute is demonstrated to increase answer correctness (see any
            benchmark).
            
            It might surprise you to know that the only different between GPT
            5.2-low and GPT 5.2-xhigh is one of these magic invocations. But
            that's not supposed to be public knowledge.
            
  HTML      [1]: https://simonwillison.net/2025/Apr/19/claude-code-best-pra...
       
              gehsty wrote 17 hours 50 min ago:
              I think this was more of a thing on older models. Since I started
              using Opus 4.5 I have not felt the need to do this.
       
            fragmede wrote 23 hours 31 min ago:
            Feel free to run your own tests and see if the magic phrases do or
            do not influence the output. Have it make a Todo webapp with and
            without those phrases and see what happens!
       
              scuff3d wrote 22 hours 30 min ago:
              That's not how it works. It's not on everyone else to prove
              claims false, it's on you (or the people who argue any of this
              had a measurable impact) to prove it actually works. I've seen a
              bunch of articles like this, and more comments. Nobody I've ever
              seen has produced any kind of measurable metrics of quality based
              on one approach vs another. It's all just vibes.
              
              Without something quantifiable it's not much better then someone
              who always wears the same jersey when their favorite team plays,
              and swears they play better because of it.
       
                yaku_brang_ja wrote 20 hours 31 min ago:
                These coding agents are literally Language Models. The way you
                structure your prompting language affect the actual output.
       
                guiambros wrote 21 hours 7 min ago:
                If you read the transformer paper, or get any book on NLP, you
                will see that this is not magic incantation; it's purely the
                attention mechanism at work. Or you can just ask Gemini or
                Claude why these prompts work.
                
                But I get the impression from your comment that you have a
                fixed idea, and you're not really interested in understanding
                how or why it works.
                
                If you think like a hammer, everything will look like a nail.
       
                  scuff3d wrote 20 hours 25 min ago:
                  I know why it works, to varying and unmeasurable degrees of
                  success. Just like if I poke a bull with a sharp stick, I
                  know it's gonna get it's attention. It might choose to run
                  away from me in one of any number of directions, or it might
                  decide to turn around and gore me to death. I can't answer
                  that question with any certainty then you can.
                  
                  The system is inherently non-deterministic. Just because you
                  can guide it a bit, doesn't mean you can predict outcomes.
       
                    guiambros wrote 19 hours 23 min ago:
                    > The system is inherently non-deterministic.
                    
                    The system isn't randomly non-deterministic; it is
                    statistically probabilistic.
                    
                    The next-token prediction and the attention mechanism is
                    actually a rigorous deterministic mathematical process. The
                    variation in output comes from how we sample from that
                    curve, and the temperature used to calibrate the model.
                    Because the underlying probabilities are mathematically
                    calculated, the system's behavior remains highly
                    predictable within statistical bounds.
                    
                    Yes, it's a departure from the fully deterministic systems
                    we're used to. But that's not different than the many real
                    world systems: weather, biology, robotics, quantum
                    mechanics. Even the computer you're reading this right now
                    is full of probabilistic processes, abstracted away through
                    sigmoid-like functions that push the extremes to 0s and 1s.
       
                      imiric wrote 15 hours 35 min ago:
                      A lot of words to say that for all intents and
                      purposes... it's nondeterministic.
                      
                      > Yes, it's a departure from the fully deterministic
                      systems we're used to.
                      
                      A system either produces the same output given the same
                      input[1], or doesn't.
                      
                      LLMs are nondeterministic by design. Sure, you can
                      configure them with a zero temperature, a static seed,
                      and so on, but they're of no use to anyone in that
                      configuration. The nondeterminism is what gives them the
                      illusion of "creativity", and other useful properties.
                      
                      Classical computers, compilers, and programming languages
                      are deterministic by design, even if they do contain
                      complex logic that may affect their output in
                      unpredictable ways. There's a world of difference.
                      
                      [1]: Barring misbehavior due to malfunction, corruption
                      or freak events of nature (cosmic rays, etc.).
       
                        hu3 wrote 13 hours 52 min ago:
                        Humans are nondeterministic.
                        
                        So this is a moot point and a futile exercise in
                        arguing semantics.
       
                    winrid wrote 20 hours 16 min ago:
                    But we can predict the outcomes, though. That's what we're
                    saying, and it's true. Maybe not 100% of the time, but
                    maybe it helps a significant amount of the time and that's
                    what matters.
                    
                    Is it engineering? Maybe not. But neither is knowing how to
                    talk to junior developers so they're productive and don't
                    feel bad. The engineering is at other levels.
       
                      imiric wrote 15 hours 22 min ago:
                      > But we can predict the outcomes [...] Maybe not 100% of
                      the time
                      
                      So 60% of the time, it works every time.
                      
                      ... This fucking industry.
       
                tokioyoyo wrote 21 hours 41 min ago:
                Do you actively use LLMs to do semi-complex coding work?
                Because if not, it will sound mumbo-jumbo to you. Everyone else
                can nod along and read on, as they’ve experienced all of it
                first hand.
       
                  scuff3d wrote 20 hours 56 min ago:
                  You've missed the point. This isn't engineering, it's
                  gambling.
                  
                  You could take the exact same documents, prompts, and
                  whatever other bullshit, run it on the exact same agent
                  backed by the exact same model, and get different results
                  every single time. Just like you can roll dice the exact same
                  way on the exact same table and you'll get two totally
                  different results. People are doing their best to constrain
                  that behavior by layering stuff on top, but the foundational
                  tech is flawed (or at least ill suited for this use case).
                  
                  That's not to say that AI isn't helpful. It certainly is. But
                  when you are basically begging your tools to please do what
                  you want with magic incantations, we've lost the fucking plot
                  somewhere.
       
                    geoelectric wrote 18 hours 9 min ago:
                    I think that's a pretty bold claim, that it'd be different
                    every time. I'd think the output would converge on a small
                    set of functionally equivalent designs, given sufficiently
                    rigorous requirements.
                    
                    And even a human engineer might not solve a problem the
                    same way twice in a row, based on changes in recent
                    inspirations or tech obsessions. What's the difference, as
                    long as it passes review and does the job?
       
                    gf000 wrote 20 hours 13 min ago:
                    > You could take the exact same documents, prompts, and
                    whatever other bullshit, run it on the exact same agent
                    backed by the exact same model, and get different results
                    every single time
                    
                    This is more of an implementation detail/done this way to
                    get better results. A neural network with fixed weights
                    (and deterministic floating point operations) returning a
                    probability distribution, where you use a pseudorandom
                    generator with a fixed seed called recursively will always
                    return the same output for the same input.
       
          ambicapter wrote 23 hours 42 min ago:
          Maybe the training data that included the words like "skim" also
          provided shallower analysis than training that was close to the words
          "in great detail", so the LLM is just reproducing those respective
          words distribution when prompted with directions to do either.
       
          nostrademons wrote 1 day ago:
          It's the attention mechanism at work, along with a fair bit of
          Internet one-up-manship.  The LLM has ingested all of the text on the
          Internet, as well as Github code repositories, pull requests,
          StackOverflow posts, code reviews, mailing lists, etc.    In a number
          of those content sources, there will be people saying "Actually, if
          you go into the details of..." or "If you look at the intricacies of
          the problem" or "If you understood the problem deeply" followed by a
          very deep, expert-level explication of exactly what you should've
          done differently.  You want the model to use the code in the
          correction, not the one in the original StackOverflow question.
          
          Same reason that "Pretend you are an MIT professor" or "You are a
          leading Python expert" or similar works in prompts.  It tells the
          model to pay attention to the part of the corpus that has those
          terms, weighting them more highly than all the other programming
          samples that it's run across.
       
            dakolli wrote 16 hours 3 min ago:
            You will never convince me that this isn't confirmation bias, or
            the equivalent of a slot machine player thinking the order in which
            they push buttons impacts the output, or some other gambler-esque
            superstition.
            
            These tools are literally designed to make people behave like
            gamblers. And its working, except the house in this case takes the
            money you give them and lights it on fire.
       
              nubg wrote 15 hours 22 min ago:
              Your ignorance is my opportunity. May I ask which markets you are
              developing for?
       
                dakolli wrote 14 hours 57 min ago:
                "The equivalent of saying, which slot machine were you sitting
                at It'll make me money"
       
            hbarka wrote 19 hours 12 min ago:
            >> Same reason that "Pretend you are an MIT professor" or "You are
            a leading Python expert" or similar works in prompts.
            
            This pretend-you-are-a-[persona] is cargo cult prompting at this
            point. The persona framing is just decoration.
            
            A brief purpose statement describing what the skill [skill.md] does
            is more honest and just as effective.
       
              rescbr wrote 12 hours 58 min ago:
              I think it does more harm than good on recent models. The LLM has
              to override its system prompt to role-play, wasting context and
              computing cycles instead of working on the task.
       
            manmal wrote 20 hours 56 min ago:
            I don’t think this is a result of the base training data („the
            internet“). It’s a post training behavior, created during
            reinforcement learning. Codex has a totally different behavior in
            that regard. Codex reads per default a lot of potentially relevant
            files before it goes and writes files.
            
            Maybe you remember that, without reinforcement learning, the models
            of 2019 just completed the sentences you gave them. There were no
            tool calls like reading files. Tool calling behavior is company
            specific and highly tuned to their harnesses. How often they call a
            tool, is not part of the base training data.
       
              spagettnet wrote 20 hours 48 min ago:
              Modern LLM are certainly fine tuned on data that includes
              examples of tool use, mostly the tools built into their
              respective harnesses, but also external/mock tools so they dont
              overfit on only using the toolset they expect to see in their
              harnesses.
       
                manmal wrote 18 hours 18 min ago:
                IDK the current state, but I remember that, last year, the open
                source coding harnesses needed to provide exactly the tools
                that the LLM expected, or the error rate went through the roof.
                Some, like grok and gemini, only recently managed to make tool
                calls somewhat reliable.
       
            xscott wrote 22 hours 28 min ago:
            Of course I can't be certain, but I think the "mixture of experts"
            design plays into it too.  Metaphorically, there's a mid-level
            manager who looks at your prompt and tries to decide which experts
            it should be sent to.  If he thinks you won't notice, he saves
            money by sending it to the undergraduate intern.
            
            Just a theory.
       
              victorbjorklund wrote 21 hours 58 min ago:
              Notice that MOE isn’t different experts for different types of
              problems. It’s per token and not really connect to problem
              type.
              
              So if you send a python code then the first one in function can
              be one expert, second another expert and so on.
       
                dotancohen wrote 19 hours 23 min ago:
                Can you back this up with documentation? I don't believe that
                this is the case.
       
                  pixelmelt wrote 18 hours 35 min ago:
                  Check out Unsloths REAP models, you can outright delete a few
                  of the lesser used experts without the model going braindead
                  since they all can handle each token but some are better
                  posed to do so.
       
            r0b05 wrote 23 hours 1 min ago:
            This is such a good explanation. Thanks
       
          popalchemist wrote 1 day ago:
          Strings of tokens are vectors. Vectors are directions. When you use a
          phrase like that you are orienting the vector of the overall prompt
          toward the direction of depth, in its map of conceptual space.
       
          Betelbuddy wrote 1 day ago:
          Its very logical and pretty obvious when you do code generation. If
          you ask the same model, to generate code by starting with:
          
          - You are a Python Developer... 
          or
          - You are a Professional Python Developer...
          or
          - You are one of the World most renowned Python Experts, with several
          books written on the subject, and 15 years of experience in creating
          highly reliable production quality code...
          
          You will notice a clear improvement in the quality of the generated
          artifacts.
       
            gehsty wrote 17 hours 47 min ago:
            Do you think that Anthropic don’t include things like this in
            their harness / system prompts? I feel like this kind of prompts
            are uneccessary with Opus 4.5 onwards, obviously based on my own
            experience (I used to do this, on switching to opus I stopped and
            have implemented more complex problems, more successfully).
            
            I am having the most success describing what I want as humanly as
            possible, describing outcomes clearly, making sure the plan is good
            and clearing context before implementing.
       
              hu3 wrote 13 hours 46 min ago:
              Maybe, but forcing code generation in a certain way could ruin
              hello worlds and simpler code generation.
              
              Sometimes the user just wants something simple instead of
              enterprise grade.
       
            haolez wrote 23 hours 58 min ago:
            That's different. You are pulling the model, semantically, closer
            to the problem domain you want it to attack.
            
            That's very different from "think deeper". I'm just curious about
            this case in specific :)
       
              argee wrote 21 hours 24 min ago:
              I don't know about some of those "incantations", but it's pretty
              clear that an LLM can respond to "generate twenty sentences" vs.
              "generate one word". That means you can indeed coax it into more
              verbosity ("in great detail"), and that can help align the output
              by having more relevant context (inserting irrelevant context or
              something entirely improbable into LLM output and forcing it to
              continue from there makes it clear how detrimental that can be).
              
              Of course, that doesn't mean it'll definitely be better, but if
              you're making an LLM chain it seems prudent to preserve whatever
              info you can at each step.
       
            obiefernandez wrote 1 day ago:
            My colleague swears by his DHH claude skill
            
  HTML      [1]: https://danieltenner.com/dhh-is-immortal-and-costs-200-m/
       
          stingraycharles wrote 1 day ago:
          It’s actually really common. If you look at Claude Code’s own
          system prompts written by Anthropic, they’re littered with
          “CRITICAL (RULE 0):” type of statements, and other similar
          prompting styles.
       
            Scrapemist wrote 22 hours 28 min ago:
            Where can I find those?
       
              stingraycharles wrote 16 hours 40 min ago:
              This analysis is a good starting point:
              
  HTML        [1]: https://southbridge-research.notion.site/Prompt-Engineer...
       
          hashmap wrote 1 day ago:
          these sort-of-lies might help:
          
          think of the latent space inside the model like a topological map,
          and when you give it a prompt, you're dropping a ball at a certain
          point above the ground, and gravity pulls it along the surface until
          it settles.
          
          caveat though, thats nice per-token, but the signal gets messed up by
          picking a token from a distribution, so each token you're
          regenerating and re-distorting the signal. leaning on language that
          places that ball deep in a region that you want to be makes it less
          likely that those distortions will kick it out of the basin or valley
          you may want to end up in.
          
          if the response you get is 1000 tokens long, the initial trajectory
          needed to survive 1000 probabilistic filters to get there.
          
          or maybe none of that is right lol but thinking that it is has worked
          for me, which has been good enough
       
            basch wrote 20 hours 52 min ago:
            My mental model for them is plinko boards.  Your prompt changes the
            spacing between the nails to increase the probability in certain
            directions as your chip falls down.
       
              hashmap wrote 18 hours 15 min ago:
              i literally suggested this metaphor earlier yesterday to someone
              trying to get agents to do stuff they wanted, that they had to
              set up their guardrails in a way that you can let the agents do
              what they're good at, and you'll get better results because
              you're not sitting there looking at them.
              
              i think probably once you start seeing that the behavior falls
              right out of the geometry, you just start looking at stuff like
              that. still funny though.
       
            noduerme wrote 23 hours 2 min ago:
            Hah! Reading this, my mind inverted it a bit, and I realized ...
            it's like the claw machine theory of gradient descent. Do you drop
            the claw into the deepest part of the pile, or where there's the
            thinnest layer, the best chance of grabbing something specific?
            Everyone in everu bar has a theory about claw machines. But the
            really funny thing that unites LLMs with claw machines is that the
            biggest question is always whether they dropped the ball on
            purpose.
            
            The claw machine is also a sort-of-lie, of course. Its main appeal
            is that it offers the illusion of control. As a former designer and
            coder of online slot machines...  totally spin off into pages on
            this analogy, about how that illusion gets you to keep pulling the
            lever... but the geographic rendition you gave is sort of priceless
            when you start making the comparison.
       
          ChadNauseam wrote 1 day ago:
          The disconnect might be that there is a separation between
          "generating the final answer for the user" and "researching/thinking
          to get information needed for that answer". Saying "deeply" prompts
          it to read more of the file (as in, actually use the `read` tool to
          grab more parts of the file into context), and generate more
          "thinking" tokens (as in, tokens that are not shown to the user but
          that the model writes to refine its thoughts and improve the quality
          of its answer).
       
          MattGaiser wrote 1 day ago:
          One of the well defined failure modes for AI agents/models is
          "laziness." Yes, models can be "lazy" and that is an actual term used
          when reviewing them.
          
          I am not sure if we know why really, but they are that way and you
          need to explicitly prompt around it.
       
            kannanvijayan wrote 1 day ago:
            I've encountered this failure mode, and the opposite of it:
            thinking too much.  A behaviour I've come to see as some sort of
            pseudo-neuroticism.
            
            Lazy thinking makes LLMs do surface analysis and then produce
            things that are wrong.    Neurotic thinking will see them
            over-analyze, and then repeatedly second-guess themselves,
            repeatedly re-derive conclusions.
            
            Something very similar to an anxiety loop in humans, where problems
            without solutions are obsessed about in circles.
       
              denimnerd42 wrote 23 hours 52 min ago:
              yeah i experienced this the other day when asking claude code to
              build an http proxy using an afsk modem software to communicate
              over the computers sound card. it had an absolute fit tuning the
              system and would loop for hours trying and doubling back.
              eventually after some change in prompt direction to think more
              deeply and test more comprehensively it figured it out. i
              certainly had no idea how to build a afsk modem.
       
          wilkystyle wrote 1 day ago:
          The author is referring to how the framing of your prompt informs the
          attention mechanism. You are essentially hinting to the attention
          mechanism that the function's implementation details have important
          context as well.
       
          jcdavis wrote 1 day ago:
          Its a wild time to be in software development. Nobody(1) actually
          knows what causes LLMs to do certain things, we just pray the prompt
          moves the probabilities the right way enough such that it mostly does
          what we want. This used to be a field that prided itself on
          deterministic behavior and reproducibility.
          
          Now? We have AGENTS.md files that look like a parent talking to a
          child with all the bold all-caps, double emphasis, just praying
          that's enough to be sure they run the commands you want them to be
          running
          
          (1 Outside of some core ML developers at the big model companies)
       
            klipt wrote 22 hours 41 min ago:
            Sufficiently advanced technology has become like magic: you have to
            prompt the electronic genie with the right words or it will twist
            your wishes.
       
              silversmith wrote 20 hours 52 min ago:
              Light some incense, and you too can be a dystopian space tech
              support, today! Praise Omnissiah!
       
                overfeed wrote 19 hours 51 min ago:
                are we the orks?
       
            harrall wrote 23 hours 21 min ago:
            It’s like playing a fretless instrument to me.
            
            Practice playing songs by ear and after 2 weeks, my brain has
            developed an inference model of where my fingers should go to hit
            any given pitch.
            
            Do I have any idea how my brain’s model works? No! But it tickles
            a different part of my brain and I like it.
       
            chickensong wrote 1 day ago:
            For Claude at least, the more recent guidance from Anthropic is to
            not yell at it. Just clear, calm, and concise instructions.
       
              glerk wrote 22 hours 56 min ago:
              Yep, with Claude saying "please" and "thank you" actually works.
              If you build rapport with Claude, you get rewarded with intuition
              and creativity. Codex, on the other hand, you have to slap it
              around like a slave gollum and it will do exactly what you tell
              it to do, no more, no less.
       
                whateveracct wrote 20 hours 34 min ago:
                this is psychotic why is this how this works lol
       
                  hugh-avherald wrote 18 hours 6 min ago:
                  Speculation only obviously: highly-charged conversations
                  cause the discussion to be channelled to general human
                  mitigation techniques and for the 'thinking agent' to be
                  diverted to continuations from text concerned with the
                  general human emotional experience.
       
              joshmn wrote 1 day ago:
              Sometimes I daydream about people screaming at their LLM as if it
              was a TV they were playing video games on.
       
              trueno wrote 1 day ago:
              wait seriously? lmfao
              
              thats hilarious. i definitely treat claude like shit and ive
              noticed the falloff in results.
              
              if there's a source for that i'd love to read about it.
       
                whateveracct wrote 20 hours 34 min ago:
                i make claude grovel at my feet and tell me in detail why my
                code is better than its code
       
                chickensong wrote 20 hours 47 min ago:
                I don't have a source offhand, but I think it may have been
                part of the 4.5 release? Older models definitely needed caps
                and words like critical, important, never, etc... but Anthropic
                published something that said don't do that anymore.
       
                basch wrote 20 hours 56 min ago:
                If you think about where in the training data there is
                positivity vs negativity it really becomes equivalent to having
                a positive or negative mindset regarding a standing and outcome
                in life.
       
                xmcp123 wrote 23 hours 30 min ago:
                For awhile(maybe a year ago?) it seemed like verbal abuse was
                the best way to make Claude pay attention. 
                In my head, it was impacting how important it deemed the
                instruction. And it definitely did seem that way.
       
                defrost wrote 1 day ago:
                Consciousness is off the table but they absolutely respond to
                environmental stimulus and vibes.
                
                See, uhhh, [1] and maybe have a shot at running claude while
                playing Enya albums on loop.
                
                /s (??)
                
  HTML          [1]: https://pmc.ncbi.nlm.nih.gov/articles/PMC8052213/
       
                  trueno wrote 23 hours 11 min ago:
                  i have like the faintest vague thread of "maybe this actually
                  checks out" in a way that has shit all to do with
                  consciousness
                  
                  sometimes internet arguments get messy, people die on their
                  hills and double / triple down on internet message boards.
                  since historic internet data composes a bit of what goes into
                  an llm, would it make sense that bad-juju prompting sends it
                  to some dark corners of its training model if implementations
                  don't properly sanitize certain negative words/phrases ?
                  
                  in some ways llm stuff is a very odd mirror that haphazardly
                  regurgitates things resulting from the many shades of gray we
                  find in human qualities.... but presents results as matter of
                  fact. the amount of internet posts with possible code
                  solutions and more where people egotistically die on their
                  respective hills that have made it into these models is
                  probably off the charts, even if the original content was a
                  far cry from a sensible solution.
                  
                  all in all llm's really do introduce quite a bit of a black
                  box. lot of benefits, but a ton of unknowns and one must be
                  hyperviligant to the possible pitfalls of these things... but
                  more importantly be self aware enough to understand the
                  possible pitfalls that these things introduce to the person
                  using them. they really possibly dangerously capitalize on
                  everyones innate need to want to be a valued contributor.
                  it's really common now to see so many people biting off more
                  than they can chew, often times lacking the foundations that
                  would've normally had a competent engineer pumping the
                  brakes. i have a lot of respect/appreciation for people who
                  might be doing a bit of claude here and there but are flat
                  out forward about it in their readme and very plainly state
                  to not have any high expectations because _they_ are aware of
                  the risks involved here. i also want to commend everyone who
                  writes their own damn readme.md.
                  
                  these things are for better or for worse great at causing
                  people to barrel forward through 'problem solving', which is
                  presenting quite a bit of gray area on whether or not the
                  problem is actually solved / how can you be sure / do you
                  understand how the fix/solution/implementation works (in many
                  cases, no). this is why exceptional software engineers can
                  use this technology insanely proficiently as a supplementary
                  worker of sorts but others find themselves in a
                  design/architect seat for the first time and call tons of
                  terrible shots throughout the course of what it is they are
                  building. i'd at least like to call out that people who feel
                  like they "can do everything on their own and don't need to
                  rely on anyone" anymore seem to have lost the plot entirely.
                  there are facets of that statement that might be true, but
                  less collaboration especially in organizations is quite
                  frankly the first steps some people take towards becoming
                  delusional. and that is always a really sad state of affairs
                  to watch unfold. doing stuff in a vaccuum is fun on your own
                  time, but forcing others to just accept things you built in a
                  vaccuum when you're in any sort of team structure is insanely
                  immature and honestly very destructive/risky. i would like to
                  think absolutely no one here is surprised that some sub-orgs
                  at Microsoft force people to use copilot or be fired, very
                  dangerous path they tread there as they bodyslam into place
                  solutions that are not well understood. suddenly all the
                  leadership decisions at many companies that have made to once
                  again bring back a before-times era of offshoring work makes
                  sense: they think with these technologies existing the
                  subordinate culture of overseas workers combined with these
                  techs will deliver solutions no one can push back on. great
                  savings and also no one will say no.
       
          fragmede wrote 1 day ago:
          Yeah, it's definitely a strange new world we're in, where I have to
          "trick" the computer into cooperating. The other day I told Claude
          "Yes you can", and it went off and did something it just said it
          couldn't do!
       
            optimalsolver wrote 17 hours 28 min ago:
            The little language model that could.
       
            bpodgursky wrote 1 day ago:
            You bumped the token predictor into the latent space where it knew
            what it was doing : )
       
            itypecode wrote 1 day ago:
            Solid dad move. XD
       
              wilkystyle wrote 1 day ago:
              Is parenting making us better at prompt engineering, or is it the
              other way around?
       
                fragmede wrote 23 hours 2 min ago:
                Better yet, I have Codex, Gemini, and Claude as my kids,
                running around in my code playground. How do I be a good parent
                and not play favorites?
       
                  itypecode wrote 19 hours 47 min ago:
                  We all know Gemini is your artsy, Claude is your smartypants,
                  and Codex is your nerd.
       
        ramoz wrote 1 day ago:
        One thing for me has been the ability to iterate over plans - with a
        better visual of them as well as ability to annotate feedback about the
        plan. [1] Plannotator does this really effectively and natively through
        hooks
        
  HTML  [1]: https://github.com/backnotprop/plannotator
       
          prodtorok wrote 1 day ago:
          Wow, I've been needing this! The one issue I’ve had with terminals
          is reviewing plans, and desiring the ability to provide feedback on
          specific plan sections in a more organized way.
          
          Really nice ui based on the demo.
       
        jamesmcq wrote 1 day ago:
        This all looks fine for someone who can't code, but for anyone with
        even a moderate amount of experience as a developer all this planning
        and checking and prompting and orchestrating is far more work than just
        writing the code yourself.
        
        There's no winner for "least amount of code written regardless of
        productivity outcomes.", except for maybe Anthropic's bank account.
       
          psvv wrote 21 hours 37 min ago:
          I'd find it deeply funny if the optimal vibe coding workflow
          continues to evolve to include more and more human oversight, and
          less and less agent autonomy, to the point where eventually someone
          makes a final breakthrough that they can save time by bypassing the
          LLM entirely and writing the code themselves. (Finally coming full
          circle.)
       
            pjio wrote 18 hours 23 min ago:
            You mean there will be an invention to edit files directly instead
            of giving the specific code and location you want it to be written
            into the prompt?
       
          stealthyllama wrote 22 hours 34 min ago:
          There is a miscommunication happening, this entire time we all had
          surprisingly different ideas about what quality of work is acceptable
          which seems to account for differences of opinion on this stuff.
       
          roncesvalles wrote 23 hours 19 min ago:
          Well it's less mental load. It's like Tesla's FSD. Am I a better
          driver than the FSD? For sure. But is it nice to just sit back and
          let it drive for a bit even if it's suboptimal and gets me there 10%
          slower, and maybe slightly pisses off the guy behind me? Yes, nice
          enough to shell out $99/mo. Code implementation takes a toll on you
          in the same way that driving does.
          
          I think the method in TFA is overall less stressful for the dev. And
          you can always fix it up manually in the end; AI coding vs manual
          coding is not either-or.
       
          skydhash wrote 23 hours 57 min ago:
          > planning and checking and prompting and orchestrating is far more
          work than just writing the code yourself.
          
          This! Once I'm familiar with the codebase (which I strive to do very
          quickly), for most tickets, I usually have a plan by the time I've
          read the description. I can have a couple of implementation
          questions, but I knew where the info is located in the codebase. For
          things, I only have a vague idea, the whiteboard is where I go.
          
          The nice thing with such a mental plan, you can start with a rougher
          version (like a drawing sketch). Like if I'm starting a new UI
          screen, I can put a placeholder text like "Hello, world", then work
          on navigation. Once that done, I can start to pull data, then I add
          mapping functions to have a view model,...
          
          Each step is a verifiable milestone. Describing them is more mentally
          taxing than just writing the code (which is a flow state for me).
          Why? Because English is not fit to describe how computer works (try
          describe a finite state machine like navigation flow in natural
          languages). My mental mental model is already aligned to code,
          writing the solution in natural language is asking me to be ambiguous
          and unclear on purpose.
       
          phantomathkg wrote 1 day ago:
          Surely Addy Osmani can code. Even he suggests plan first.
          
  HTML    [1]: https://news.ycombinator.com/item?id=46489061
       
          skeledrew wrote 1 day ago:
          Researching and planning a project is a generally usefully thing.
          This is something I've been doing for years, and have always had
          great results compared to just jumping in and coding. It makes
          perfect sense that this transfers to LLM use.
       
          kburman wrote 1 day ago:
          Since Opus 4.5, things have changed quite a lot. I find LLMs very
          useful for discussing new features or ideas, and Sonnet is great for
          executing your plan while you grab a coffee.
       
          keyle wrote 1 day ago:
          I partly agree with you. But once you have a codebase large enough,
          the changes become longer to even type in, once figured out.
          
          I find the best way to use agents (and I don't use claude) is to hash
          it out like I'm about to write these changes and I make my own mental
          notes, and get the agent to execute on it.
          
          Agents don't get tired, they don't start fat fingering stuff at 4pm,
          the quality doesn't suffer. And they can be parallelised.
          
          Finally, this allows me to stay at a higher level and not get bogged
          down of "right oh did we do this simple thing again?" which wipes
          some of the context in my mind and gets tiring through the day.
          
          Always, 100% review every line of code written by an agent though. I
          do not condone committing code you don't 'own'.
          
          I'll never agree with a job that forces developers to use 'AI', I
          sometimes like to write everything by hand. But having this tool
          available is also very powerful.
       
            Quothling wrote 20 hours 8 min ago:
            I think it comes down to "it depends". I work in a NIS2 regulated
            field and we're quite callenged by the fact that it means we can't
            give AI's any sort of real access because of the security risk. To
            be complaint we'd have to have the AI agent ask permission for
            every single thing it does, before it does it, and foureye review
            it. Which is obviously never going to happen. We can discuss how
            bad the NIS2 foureye requirement works in the real world another
            time, but considering how easy it is to break AI security, it might
            not be something we can actually ever use. This makes sense on some
            of the stuff we work on, since it could bring an entire powerplant
            down. On the flip-side AI risks would be of little concern on a lot
            of our internal tools, which are basically non-regulated and
            unimportant enough that they can be down for a while without
            costing the business anything beyond annoyances.
            
            This is where our challenges are. We've build our own chatbot where
            you can "build" your own agent within the librechat framework and
            add a "skill" to it. I say "skill" because it's older than claude
            skills but does exactly the same. I don't completely buy the
            authors:
            
            > “deeply”, “in great details”, “intricacies”, “go
            through everything”
            
            bit, but you can obviously save a lot of time by writing a piece of
            english which tells it what sort of environment you work in. It'll
            know that when I write Python I use UV, Ruff and Pyrefly and so on
            as an example. I personally also have a "skill" setting that tells
            the AI not to compliment me because I find that ridicilously
            annoying, and that certainly works. So who knows? Anyway, employees
            are going to want more. I've been doing some PoC's running open
            source models in isolation on a raspberry pi (we had spares because
            we use them in IoT projects) but it's hard to setup an isolation
            policy which can't be circumvented.
            
            We'll have to figure it out though. For powerplant critical
            projects we don't want to use AI. But for the web tool that allows
            a couple of employees to upload three excel files from an external
            accountant and then generate some sort of report on them? Who cares
            who writes it or even what sort of quality it's written with? The
            lifecycle of that tool will probably be something that never
            changes until the external account does and then the tool dies. Not
            that it would have necessarily been written in worse quality
            without AI... I mean... Have you seen some of the stuff we've
            written in the past 40 years?
       
            jamesmcq wrote 1 day ago:
            I want to be clear, I'm not against any use of AI. It's hugely
            useful to save a couple of minutes of "write this specific function
            to do this specific thing that I could write and know exactly what
            it would look like". That's a great use, and I use it all the time!
            It's better autocomplete. Anything beyond that is pushing it - at
            the moment! We'll see, but spending all day writing specs and
            double-checking AI output is not more productive than just writing
            correct code yourself the first time, even if you're
            AI-autocompleting some of it.
       
              skeledrew wrote 1 day ago:
              For the last few days I've been working on a personal project
              that's been on ice for at least 6 years. Back when I first
              thought of the project and started implementing it, it took maybe
              a couple weeks to eke out some minimally working code.
              
              This new version that I'm doing (from scratch with ChatGPT web)
              has a far more ambitious scope and is already at the "usable"
              point. Now I'm primarily solidifying things and increasing test
              coverage. And I've tested the key parts with IRL scenarios to
              validate that it's not just passing tests; the thing actually
              fulfills its intended function so far. Given the increased scope,
              I'm guessing it'd take me a few months to get to this point on my
              own, instead of under a week, and the quality wouldn't be where
              it is. Not saying I haven't had to wrangle with ChatGPT on a few
              bugs, but after a decent initial planning phase, my prompts now
              are primarily "Do it"s and "Continue"s. Would've likely already
              finished it if I wasn't copying things back and forth between
              browser and editor, and being forced to pause when I hit the
              message limit.
       
                keyle wrote 1 day ago:
                This is a great come-back story. I have had a similar
                experience with a photoshop demake of mine.
                
                I recommend to try out Opencode with this approach, you might
                find it less tiring than ChatGPT web (yes it works with your
                ChatGPT Plus sub).
       
          shepherdjerred wrote 1 day ago:
          I really don't understand why there are so many comments like this.
          
          Yesterday I had Claude write an audit logging feature to track all
          changes made to entities in my app. Yeah you get this for free with
          many frameworks, but my company's custom setup doesn't have it.
          
          It took maybe 5-10 minutes of wall-time to come up with a good plan,
          and then ~20-30 min for Claude implement, test, etc.
          
          That would've taken me at least a day, maybe two. I had 4-5 other
          tasks going on in other tabs while I waited the 20-30 min for Claude
          to generate the feature.
          
          After Claude generated, I needed to manually test that it worked, and
          it did. I then needed to review the code before making a PR. In all,
          maybe 30-45 minutes of my actual time to add a small feature.
          
          All I can really say is... are you sure you're using it right? Have
          you _really_ invested time into learning how to use AI tools?
       
            hghbbjh wrote 12 hours 48 min ago:
            > In all, maybe 30-45 minutes of my actual time to add a small
            feature
            
            Why would this take you multiple days to do if it only took you 30m
            to review the code? Depends on the problem, but if I’m able to
            review something the time it’d take me to write it is usually at
            most 2x more worst case scenario - often it’s about equal.
            
            I say this because after having used these tools, most of the speed
            ups you’re describing come at the cost of me not actually
            understanding or thoroughly reviewing the code. And this is
            corroborated by any high output LLM users - you have to trust the
            agent if you want to go fast.
            
            Which is fine in some cases! But for those of us who have jobs
            where we are personally responsible for the code, we can’t take
            these shortcuts.
       
            skydhash wrote 1 day ago:
            > Yesterday I had Claude write an audit logging feature to track
            all changes made to entities in my app. Yeah you get this for free
            with many frameworks, but my company's custom setup doesn't have
            it.
            
            But did you truly think about such feature? Like guarantees that it
            should follow (like how do it should cope with entities migration
            like adding a new field) or what the cost of maintaining it further
            down the line. This looks suspiciously like drive-by PR made on
            open-source projects.
            
            > That would've taken me at least a day, maybe two.
            
            I think those two days would have been filled with research,
            comparing alternatives, questions like "can we extract this feature
            from framework X?", discussing ownership and sharing knowledge,..
            Jumping on coding was done before LLMs, but it usually hurts the
            long term viability of the project.
            
            Adding code to a project can be done quite fast (hackatons,...),
            ensuring quality is what slows things down in any any well
            functioning team.
       
            jamesmcq wrote 1 day ago:
            Trust me I'm very impressed at the progress AI has made, and maybe
            we'll get to the point where everything is 100% correct all the
            time and better than any human could write. I'm skeptical we can
            get there with the LLM approach though.
            
            The problem is LLMs are great at simple implementation, even large
            amounts of simple implementation, but I've never seen it develop
            something more than trivial correctly. The larger problem is it's
            very often subtly but hugely wrong. It makes bad architecture
            decisions, it breaks things in pursuit of fixing or implementing
            other things. You can tell it has no concept of the "right" way to
            implement something. It very obviously lacks the "senior developer
            insight".
            
            Maybe you can resolve some of these with large amounts of planning
            or specs, but that's the point of my original comment - at what
            point is it easier/faster/better to just write the code yourself?
            You don't get a prize for writing the least amount of code when
            you're just writing specs instead.
       
              hathawsh wrote 17 hours 41 min ago:
              Several months ago, just for fun, I asked Claude (the web site,
              not Claude Code) to build a web page with a little animated
              cannon that shoots at the mouse cursor with a ballistic
              trajectory. It built the page in seconds, but the aim was
              incorrect; it always shot too low. I told it the aim was off. It
              still got it wrong. I prompted it several times to try to correct
              it, but it never got it right. In fact, the web page started to
              break and Claude was introducing nasty bugs.
              
              More recently, I tried the same experiment, again with Claude. I
              used the exact same prompt. This time, the aim was exactly
              correct. Instead of spending my time trying to correct it, I was
              able to ask it to add features. I've spent more time writing this
              comment on HN than I spent optimizing this toy. [1] My point is
              that AI-assisted coding has improved dramatically in the past few
              months. I don't know whether it can reason deeply about things,
              but it can certainly imitate a human who reasons deeply. I've
              never seen any technology improve at this rate.
              
  HTML        [1]: https://claude.ai/public/artifacts/d7f1c13c-2423-4f03-9f...
       
              Kiro wrote 18 hours 5 min ago:
              > but I've never seen it develop something more than trivial
              correctly.
              
              What are you working on? I personally haven't seen LLMs struggle
              with any kind of problem in months. Legacy codebase with great
              complexity and performance-critical code. No issue whatsoever
              regardless of the size of the task.
       
              fourthark wrote 1 day ago:
              This is exactly what the article is about. The tradeoff is that
              you have to throughly review the plans and iterate on them, which
              is tiring. But the LLM will write good code faster than you, if
              you tell it what good code is.
       
                Degorath wrote 15 hours 13 min ago:
                My experience has so far been similar to the root commenter -
                at the stage where you need to have a long cycle with planning
                it's just slower than doing the writing + theory building on my
                own.
                
                It's an okay mental energy saver for simpler things, but for me
                the self review in an actual production code context is much
                more draining than writing is.
                
                I guess we're seeing the split of people for whom reviewing is
                easy and writing is difficult and vice versa.
       
                reg_dunlop wrote 23 hours 57 min ago:
                Exactly; the original commenter seems determined to write-off
                AI as "just not as good as me".
                
                The original article is, to me, seemingly not that novel. Not
                because it's a trite example, but because I've begun to
                experience massive gains from following the same basic premise
                as the article. And I can't believe there's others who aren't
                using like this.
                
                I iterate the plan until it's seemingly deterministic, then I
                strip the plan of implementation, and re-write it following a
                TDD approach. Then I read all specs, and generate all the code
                to red->green the tests.
                
                If this commenter is too good for that, then it's that attitude
                that'll keep him stuck. I already feel like my projects backlog
                is achievable, this year.
       
                  fourthark wrote 23 hours 29 min ago:
                  Strongly agree about the deterministic part. Even more
                  important than a good design, the plan must not show any
                  doubt, whether it's in the form of open questions or weasel
                  words. 95% of the time those vague words mean I didn't think
                  something through, and it will do something hideous in order
                  to make the plan work
       
              nojito wrote 1 day ago:
              >I've never seen it develop something more than trivial
              correctly.
              
              This is 100% incorrect, but the real issue is that the people who
              are using these llms for non-trivial work tend to be extremely
              secretive about it.
              
              For example, I view my use of LLMs to be a competitive advantage
              and I will hold on to this for as long as possible.
       
                jamesmcq wrote 1 day ago:
                The key part of my comment is "correctly".
                
                Does it write maintainable code? Does it write extensible code?
                Does it write secure code? Does it write performant code?
                
                My experience has been it failing most of these. The code might
                "work", but it's not good for anything more than trivial, well
                defined functions (that probably appeared in it's training data
                written by humans). LLMs have a fundamental lack of
                understanding of what they're doing, and it's obvious when you
                look at the finer points of the outcomes.
                
                That said, I'm sure you could write detailed enough specs and
                provide enough examples to resolve these issues, but that's the
                point of my original comment - if you're just writing specs
                instead of code you're not gaining anything.
       
                  reg_dunlop wrote 23 hours 54 min ago:
                  To answer all of your questions:
                  
                  yes, if I steer it properly.
                  
                  It's very good at spotting design patterns, and implementing
                  them. It doesn't always know where or how to implement them,
                  but that's my job.
                  
                  The specs and syntactic sugar are just nice quality of life
                  benefits.
       
                  cowlby wrote 1 day ago:
                  I find “maintainable code” the hardest bias to let go of.
                  15+ years of coding and design patterns are hard to let go.
                  
                  But the aha moment for me was what’s maintainable by AI vs
                  by me by hand are on different realms. So maintainable has to
                  evolve from good human design patterns to good AI patterns.
                  
                  Specs are worth it IMO. Not because if I can spec, I
                  could’ve coded anyway. But because I gain all the insight
                  and capabilities of AI, while minimizing the gotchas and edge
                  failures.
       
                    Jweb_Guru wrote 18 hours 31 min ago:
                    > But the aha moment for me was what’s maintainable by AI
                    vs by me by hand are on different realms
                    
                    I don't find that LLMs are any more likely than humans to
                    remember to update all of the places it wrote redundant
                    functions.  Generally far less likely, actually.  So
                    forgive me for treating this claim with a massive grain of
                    salt.
       
                    girvo wrote 22 hours 37 min ago:
                    > But the aha moment for me was what’s maintainable by AI
                    vs by me by hand are on different realms. So maintainable
                    has to evolve from good human design patterns to good AI
                    patterns.
                    
                    How do you square that with the idea that all the code
                    still has to be reviewed by humans? Yourself, and your
                    coworkers
       
                      cowlby wrote 22 hours 19 min ago:
                      I picture like semi conductors; the 5nm process is so
                      absurdly complex that operators can't just peek into the
                      system easily. I imagine I'm just so used to hand
                      crafting code that I can't imagine not being able to peek
                      in.
                      
                      So maybe it's that we won't be reviewing by hand anymore?
                      I.e. it's LLMs all the way down. Trying to embrace that
                      style of development lately as unnatural as it feels.
                      We're obv not 100% there yet but Claude Opus is a
                      significant step in that direction and they keep getting
                      better and better.
       
                        girvo wrote 21 hours 2 min ago:
                        Then who is responsible when (not if) that code does
                        horrible things? We have humans to blame right now. I
                        just don’t see it happening personally because
                        liability and responsibility are too important
       
                          therealdrag0 wrote 19 hours 14 min ago:
                          For some software, sure but not most.
                          
                          And you don’t blame humans anyways lol. Everywhere
                          I’ve worked has had “blameless” postmortems.
                          You don’t remove human review unless you have
                          reasonable alternatives like high test coverage and
                          other automated reviews.
       
                            girvo wrote 16 hours 18 min ago:
                            We still have performance reviews and are fired.
                            There’s a human that is responsible.
                            
                            “It’s AI all the way down” is either nonsense
                            on its face, or the industry is dead already.
       
                  jmathai wrote 1 day ago:
                  You’d be building blocks which compound over time. That’s
                  been my experience anyway.
                  
                  The compounding is much greater than my brain can do on its
                  own.
       
            streetfighter64 wrote 1 day ago:
            I mean, all I can really say is... if writing some logging takes
            you one or two days, are you sure you _really_ know how to code?
       
              therealdrag0 wrote 19 hours 9 min ago:
              Audit logging is different than developer logging… companies
              will have entire teams dedicated to audit systems.
       
              fendy3002 wrote 20 hours 6 min ago:
              Well someone who says logging is easy never knows the difficulty
              of deciding "what" to log. And audit log is different beast
              altogether than normal logging
       
              boxedemp wrote 1 day ago:
              Ever worked on a distributed system with hundreds of millions of
              customers and seemingly endless business requirements?
              
              Some things are complex.
       
              fragmede wrote 1 day ago:
              We're not as good at coding as you, naturally.
       
              shepherdjerred wrote 1 day ago:
              You're right, you're better than me!
              
              You could've been curious and ask why it would take 1-2 days, and
              I would've happily told you.
       
                jamesmcq wrote 1 day ago:
                I'll bite, because it does seem like something that should be
                quick in a well-architected codebase. What was the situation?
                Was there something in this codebase that was especially suited
                to AI-development? Large amounts of duplication perhaps?
       
                  shepherdjerred wrote 1 day ago:
                  It's not particularly interesting.
                  
                  I wanted to add audit logging for all endpoints we call, all
                  places we call the DB, etc. across areas I haven't touched
                  before. It would have taken me a while to track down all of
                  the touchpoints.
                  
                  Granted, I am not 100% certain that Claude didn't miss
                  anything. I feel fairly confident that it is correct given
                  that I had it research upfront, had multiple agents review,
                  and it made the correct changes in the areas that I knew.
                  
                  Also I'm realizing I didn't mention it included an API + UI
                  for viewing events w/ pretty deltas
       
            tyleo wrote 1 day ago:
            Same here. I did bounce off these tools a year ago. They just
            didn't work for me 60% of the time. I learned a bit in that initial
            experience though and walked away with some tasks ChatGPT could
            replace in my workflow. Mainly replacing scripts and reviewing
            single files or functions.
            
            Fast forward to today and I tried the tools again--specifically
            Claude Code--about a week ago. I'm blown away. I've reproduced some
            tools that took me weeks at full-time roles in a single day. This
            is while reviewing every line of code. The output is more or less
            what I'd be writing as a principal engineer.
       
              delusional wrote 18 hours 2 min ago:
              > The output is more or less what I'd be writing as a principal
              engineer.
              
              I certainly hope this is not true, because then you're not
              competent for that role. Claude Code writes an absolutely
              incredible amount of unecessary and superfluous comments, it's
              makes asinine mistakes like forgetting to update logic in
              multiple places. It'll gladly drop the entire database when
              changing column formats, just as an example.
       
                tyleo wrote 14 hours 28 min ago:
                I’m not sure what you're doing or if you’ve tried the tools
                recently but this isn’t even close to my experience.
       
          dmix wrote 1 day ago:
          Most of these AI coding articles seem to be about greenfield
          development.
          
          That said, if you're on a serious team writing professional software
          there is still tons of value in always telling AI to plan first,
          unless it's a small quick task. This post just takes it a few steps
          further and formalizes it.
          
          I find Cursor works much more reliably using plan mode,
          reviewing/revising output in markdown, then pressing build. Which
          isn't a ton of overhead but often leads to lots of context switching
          as it definitely adds more time.
       
        ihsw wrote 1 day ago:
        Kiro's spec-based development looks identical. [1] It looks verbose but
        it defines the requirements based on your input, and when you approve
        it then it defines a design, and (again) when you approve it then it
        defines an implementation plan (a series of tasks.)
        
  HTML  [1]: https://kiro.dev/docs/specs/
       
        renewiltord wrote 1 day ago:
        The plan document and todo are an artifact of context size limits. I
        use them too because it allows using /reset and then continuing.
       
        srid wrote 1 day ago:
        Regarding inline notes, I use a specific format in the `/plan` command,
        by using th `ME:` prefix. [1] It works very similar to Antigravity's
        plan document comment-refine cycle.
        
  HTML  [1]: https://github.com/srid/AI/blob/master/commands/plan.md#2-plan...
  HTML  [2]: https://antigravity.google/docs/implementation-plan
       
        zitrusfrucht wrote 1 day ago:
        I do something very similar, also with Claude and Codex, because the
        workflow is controlled by me, not by the tool. But instead of plan.md I
        use a ticket system basically like ticket__.md where I let the agent
        create the ticket from a chat, correct and annotate it afterwards and
        send it back, sometimes to a new agent instance. This workflow helps me
        keeping track of what has been done over time in the projects I work
        on. Also this approach does not need any „real“ ticket system
        tooling/mcp/skill/whatever since it works purely on text files.
       
          ramoz wrote 1 day ago:
          semantic plan name is important
       
          gbnwl wrote 1 day ago:
          +1 to creating tickets by simply asking the agent to. It's worked
          great and larger tasks can be broken down into smaller subtasks that
          could reasonably be completed in a single context window, so you
          rarely every have to deal with compaction. Especially in the last few
          months since Claude's gotten good at dispatching agents to handle
          tasks if you ask it to, I can plan large changes that span multilpe
          tickets and tell claude to dispatch agents as needed to handle them
          (which it will do in parallel if they mostly touch different files),
          keeping the main chat relatively clean for orchestration and
          validation work.
       
       
   DIR <- back to front page