[HN Gopher] How I use Claude Code: Separation of planning and ex...
___________________________________________________________________
How I use Claude Code: Separation of planning and execution
Author : vinhnx
Score : 741 points
Date : 2026-02-22 00:29 UTC (15 hours ago)
HTML web link (boristane.com)
TEXT w3m dump (boristane.com)
| zitrusfrucht wrote:
| I do something very similar, also with Claude and Codex, because
| the workflow is controlled by me, not by the tool. But instead of
| plan.md I use a ticket system basically like
| ticket_<number>_<slug>.md where I let the agent create the ticket
| from a chat, correct and annotate it afterwards and send it back,
| sometimes to a new agent instance. This workflow helps me keeping
| track of what has been done over time in the projects I work on.
| Also this approach does not need any ,,real" ticket system
| tooling/mcp/skill/whatever since it works purely on text files.
| gbnwl wrote:
| +1 to creating tickets by simply asking the agent to. It's
| worked great and larger tasks can be broken down into smaller
| subtasks that could reasonably be completed in a single context
| window, so you rarely every have to deal with compaction.
| Especially in the last few months since Claude's gotten good at
| dispatching agents to handle tasks if you ask it to, I can plan
| large changes that span multilpe tickets and tell claude to
| dispatch agents as needed to handle them (which it will do in
| parallel if they mostly touch different files), keeping the
| main chat relatively clean for orchestration and validation
| work.
| ramoz wrote:
| semantic plan name is important
| srid wrote:
| Regarding inline notes, I use a specific format in the `/plan`
| command, by using th `ME:` prefix.
|
| https://github.com/srid/AI/blob/master/commands/plan.md#2-pl...
|
| It works very similar to Antigravity's plan document comment-
| refine cycle.
|
| https://antigravity.google/docs/implementation-plan
| renewiltord wrote:
| The plan document and todo are an artifact of context size
| limits. I use them too because it allows using /reset and then
| continuing.
| ihsw wrote:
| Kiro's spec-based development looks identical.
|
| https://kiro.dev/docs/specs/
|
| It looks verbose but it defines the requirements based on your
| input, and when you approve it then it defines a design, and
| (again) when you approve it then it defines an implementation
| plan (a series of tasks.)
| jamesmcq wrote:
| This all looks fine for someone who can't code, but for anyone
| with even a moderate amount of experience as a developer all this
| planning and checking and prompting and orchestrating is far more
| work than just writing the code yourself.
|
| There's no winner for "least amount of code written regardless of
| productivity outcomes.", except for maybe Anthropic's bank
| account.
| dmix wrote:
| Most of these AI coding articles seem to be about greenfield
| development.
|
| That said, if you're on a serious team writing professional
| software there is still tons of value in always telling AI to
| plan first, unless it's a small quick task. This post just
| takes it a few steps further and formalizes it.
|
| I find Cursor works much more reliably using plan mode,
| reviewing/revising output in markdown, then pressing build.
| Which isn't a ton of overhead but often leads to lots of
| context switching as it definitely adds more time.
| shepherdjerred wrote:
| I really don't understand why there are so many comments like
| this.
|
| Yesterday I had Claude write an audit logging feature to track
| all changes made to entities in my app. Yeah you get this for
| free with many frameworks, but my company's custom setup
| doesn't have it.
|
| It took maybe 5-10 minutes of wall-time to come up with a good
| plan, and then ~20-30 min for Claude implement, test, etc.
|
| That would've taken me at least a day, maybe two. I had 4-5
| other tasks going on in other tabs while I waited the 20-30 min
| for Claude to generate the feature.
|
| After Claude generated, I needed to manually test that it
| worked, and it did. I then needed to review the code before
| making a PR. In all, maybe 30-45 minutes of my actual time to
| add a small feature.
|
| All I can really say is... are you sure you're using it right?
| Have you _really_ invested time into learning how to use AI
| tools?
| tyleo wrote:
| Same here. I did bounce off these tools a year ago. They just
| didn't work for me 60% of the time. I learned a bit in that
| initial experience though and walked away with some tasks
| ChatGPT could replace in my workflow. Mainly replacing
| scripts and reviewing single files or functions.
|
| Fast forward to today and I tried the tools again--
| specifically Claude Code--about a week ago. I'm blown away.
| I've reproduced some tools that took me weeks at full-time
| roles in a single day. This is while reviewing every line of
| code. The output is more or less what I'd be writing as a
| principal engineer.
| delusional wrote:
| > The output is more or less what I'd be writing as a
| principal engineer.
|
| I certainly hope this is not true, because then you're not
| competent for that role. Claude Code writes an absolutely
| incredible amount of unecessary and superfluous comments,
| it's makes asinine mistakes like forgetting to update logic
| in multiple places. It'll gladly drop the entire database
| when changing column formats, just as an example.
| tyleo wrote:
| I'm not sure what you're doing or if you've tried the
| tools recently but this isn't even close to my
| experience.
| streetfighter64 wrote:
| I mean, all I can really say is... if writing some logging
| takes you one or two days, are you sure you _really_ know how
| to code?
| shepherdjerred wrote:
| You're right, you're better than me!
|
| You could've been curious and ask why it would take 1-2
| days, and I would've happily told you.
| jamesmcq wrote:
| I'll bite, because it does seem like something that
| should be quick in a well-architected codebase. What was
| the situation? Was there something in this codebase that
| was especially suited to AI-development? Large amounts of
| duplication perhaps?
| shepherdjerred wrote:
| It's not particularly interesting.
|
| I wanted to add audit logging for all endpoints we call,
| all places we call the DB, etc. across areas I haven't
| touched before. It would have taken me a while to track
| down all of the touchpoints.
|
| Granted, I am not 100% certain that Claude didn't miss
| anything. I feel fairly confident that it is correct
| given that I had it research upfront, had multiple agents
| review, and it made the correct changes in the areas that
| I knew.
|
| Also I'm realizing I didn't mention it included an API +
| UI for viewing events w/ pretty deltas
| fragmede wrote:
| We're not as good at coding as _you_ , naturally.
| boxedemp wrote:
| Ever worked on a distributed system with hundreds of
| millions of customers and seemingly endless business
| requirements?
|
| Some things are complex.
| fendy3002 wrote:
| Well someone who says logging is easy never knows the
| difficulty of deciding "what" to log. And audit log is
| different beast altogether than normal logging
| therealdrag0 wrote:
| Audit logging is different than developer logging...
| companies will have entire teams dedicated to audit
| systems.
| jamesmcq wrote:
| Trust me I'm very impressed at the progress AI has made, and
| maybe we'll get to the point where everything is 100% correct
| all the time and better than any human could write. I'm
| skeptical we can get there with the LLM approach though.
|
| The problem is LLMs are great at simple implementation, even
| large amounts of simple implementation, but I've never seen
| it develop something more than trivial correctly. The larger
| problem is it's very often subtly but hugely wrong. It makes
| bad architecture decisions, it breaks things in pursuit of
| fixing or implementing other things. You can tell it has no
| concept of the "right" way to implement something. It very
| obviously lacks the "senior developer insight".
|
| Maybe you can resolve some of these with large amounts of
| planning or specs, but that's the point of my original
| comment - at what point is it easier/faster/better to just
| write the code yourself? You don't get a prize for writing
| the least amount of code when you're just writing specs
| instead.
| nojito wrote:
| >I've never seen it develop something more than trivial
| correctly.
|
| This is 100% incorrect, but the real issue is that the
| people who are using these llms for non-trivial work tend
| to be extremely secretive about it.
|
| For example, I view my use of LLMs to be a competitive
| advantage and I will hold on to this for as long as
| possible.
| jamesmcq wrote:
| The key part of my comment is "correctly".
|
| Does it write maintainable code? Does it write extensible
| code? Does it write secure code? Does it write performant
| code?
|
| My experience has been it failing most of these. The code
| might "work", but it's not _good_ for anything more than
| trivial, well defined functions (that probably appeared
| in it 's training data written by humans). LLMs have a
| fundamental lack of understanding of what they're doing,
| and it's obvious when you look at the finer points of the
| outcomes.
|
| That said, I'm sure you could write detailed enough specs
| and provide enough examples to resolve these issues, but
| that's the point of my original comment - if you're just
| writing specs instead of code you're not gaining
| anything.
| jmathai wrote:
| You'd be building blocks which compound over time. That's
| been my experience anyway.
|
| The compounding is much greater than my brain can do on
| its own.
| cowlby wrote:
| I find "maintainable code" the hardest bias to let go of.
| 15+ years of coding and design patterns are hard to let
| go.
|
| But the aha moment for me was what's maintainable by AI
| vs by me by hand are on different realms. So maintainable
| has to evolve from good human design patterns to good AI
| patterns.
|
| Specs are worth it IMO. Not because if I can spec, I
| could've coded anyway. But because I gain all the insight
| and capabilities of AI, while minimizing the gotchas and
| edge failures.
| girvo wrote:
| > But the aha moment for me was what's maintainable by AI
| vs by me by hand are on different realms. So maintainable
| has to evolve from good human design patterns to good AI
| patterns.
|
| How do you square that with the idea that all the code
| still has to be reviewed by humans? Yourself, and your
| coworkers
| cowlby wrote:
| I picture like semi conductors; the 5nm process is so
| absurdly complex that operators can't just peek into the
| system easily. I imagine I'm just so used to hand
| crafting code that I can't imagine not being able to peek
| in.
|
| So maybe it's that we won't be reviewing by hand anymore?
| I.e. it's LLMs all the way down. Trying to embrace that
| style of development lately as unnatural as it feels.
| We're obv not 100% there yet but Claude Opus is a
| significant step in that direction and they keep getting
| better and better.
| girvo wrote:
| Then who is responsible when (not if) that code does
| horrible things? We have humans to blame right now. I
| just don't see it happening personally because liability
| and responsibility are too important
| therealdrag0 wrote:
| For some software, sure but not most.
|
| And you don't blame humans anyways lol. Everywhere I've
| worked has had "blameless" postmortems. You don't remove
| human review unless you have reasonable alternatives like
| high test coverage and other automated reviews.
| girvo wrote:
| We still have performance reviews and are fired. There's
| a human that is responsible.
|
| "It's AI all the way down" is either nonsense on its
| face, or the industry is dead already.
| Jweb_Guru wrote:
| > But the aha moment for me was what's maintainable by AI
| vs by me by hand are on different realms
|
| I don't find that LLMs are any more likely than humans to
| remember to update all of the places it wrote redundant
| functions. Generally far less likely, actually. So
| forgive me for treating this claim with a massive grain
| of salt.
| reg_dunlop wrote:
| To answer all of your questions:
|
| yes, if I steer it properly.
|
| It's very good at spotting design patterns, and
| implementing them. It doesn't always know where or how to
| implement them, but that's my job.
|
| The specs and syntactic sugar are just nice quality of
| life benefits.
| fourthark wrote:
| This is exactly what the article is about. The tradeoff is
| that you have to throughly review the plans and iterate on
| them, which is tiring. But the LLM will write good code
| faster than you, if you tell it what good code is.
| reg_dunlop wrote:
| Exactly; the original commenter seems determined to
| write-off AI as "just not as good as me".
|
| The original article is, to me, seemingly not that novel.
| Not because it's a trite example, but because I've begun
| to experience massive gains from following the same basic
| premise as the article. And I can't believe there's
| others who aren't using like this.
|
| I iterate the plan until it's seemingly deterministic,
| then I strip the plan of implementation, and re-write it
| following a TDD approach. Then I read all specs, and
| generate all the code to red->green the tests.
|
| If this commenter is too good for that, then it's that
| attitude that'll keep him stuck. I already feel like my
| projects backlog is achievable, this year.
| fourthark wrote:
| Strongly agree about the deterministic part. Even more
| important than a good design, the plan must not show any
| doubt, whether it's in the form of open questions or
| weasel words. 95% of the time those vague words mean I
| didn't think something through, and it will do something
| hideous in order to make the plan work
| Degorath wrote:
| My experience has so far been similar to the root
| commenter - at the stage where you need to have a long
| cycle with planning it's just slower than doing the
| writing + theory building on my own.
|
| It's an okay mental energy saver for simpler things, but
| for me the self review in an actual production code
| context is much more draining than writing is.
|
| I guess we're seeing the split of people for whom
| reviewing is easy and writing is difficult and vice
| versa.
| Kiro wrote:
| > but I've never seen it develop something more than
| trivial correctly.
|
| What are you working on? I personally haven't seen LLMs
| struggle with any kind of problem in months. Legacy
| codebase with great complexity and performance-critical
| code. No issue whatsoever regardless of the size of the
| task.
| hathawsh wrote:
| Several months ago, just for fun, I asked Claude (the web
| site, not Claude Code) to build a web page with a little
| animated cannon that shoots at the mouse cursor with a
| ballistic trajectory. It built the page in seconds, but the
| aim was incorrect; it always shot too low. I told it the
| aim was off. It still got it wrong. I prompted it several
| times to try to correct it, but it never got it right. In
| fact, the web page started to break and Claude was
| introducing nasty bugs.
|
| More recently, I tried the same experiment, again with
| Claude. I used the exact same prompt. This time, the aim
| was exactly correct. Instead of spending my time trying to
| correct it, I was able to ask it to add features. I've
| spent more time writing this comment on HN than I spent
| optimizing this toy. https://claude.ai/public/artifacts/d7f
| 1c13c-2423-4f03-9fc4-8...
|
| My point is that AI-assisted coding has improved
| dramatically in the past few months. I don't know whether
| it can reason deeply about things, but it can certainly
| imitate a human who reasons deeply. I've never seen any
| technology improve at this rate.
| skydhash wrote:
| > Yesterday I had Claude write an audit logging feature to
| track all changes made to entities in my app. Yeah you get
| this for free with many frameworks, but my company's custom
| setup doesn't have it.
|
| But did you truly think about such feature? Like guarantees
| that it should follow (like how do it should cope with
| entities migration like adding a new field) or what the cost
| of maintaining it further down the line. This looks
| suspiciously like drive-by PR made on open-source projects.
|
| > That would've taken me at least a day, maybe two.
|
| I think those two days would have been filled with research,
| comparing alternatives, questions like "can we extract this
| feature from framework X?", discussing ownership and sharing
| knowledge,.. Jumping on coding was done before LLMs, but it
| usually hurts the long term viability of the project.
|
| Adding code to a project can be done quite fast
| (hackatons,...), ensuring quality is what slows things down
| in any any well functioning team.
| hghbbjh wrote:
| > In all, maybe 30-45 minutes of my actual time to add a
| small feature
|
| Why would this take you multiple days to do if it only took
| you 30m to review the code? Depends on the problem, but if
| I'm able to review something the time it'd take me to write
| it is usually at most 2x more worst case scenario - often
| it's about equal.
|
| I say this because after having used these tools, most of the
| speed ups you're describing come at the cost of me not
| actually understanding or thoroughly reviewing the code. And
| this is corroborated by any high output LLM users - you have
| to trust the agent if you want to go fast.
|
| Which is fine in some cases! But for those of us who have
| jobs where we are personally responsible for the code, we
| can't take these shortcuts.
| keyle wrote:
| I partly agree with you. But once you have a codebase large
| enough, the changes become longer to even type in, once figured
| out.
|
| I find the best way to use agents (and I don't use claude) is
| to hash it out like I'm about to write these changes and I make
| my own mental notes, and get the agent to execute on it.
|
| Agents don't get tired, they don't start fat fingering stuff at
| 4pm, the quality doesn't suffer. And they can be parallelised.
|
| Finally, this allows me to stay at a higher level and not get
| bogged down of "right oh did we do this simple thing again?"
| which wipes some of the context in my mind and gets tiring
| through the day.
|
| Always, 100% review every line of code written by an agent
| though. I do not condone committing code you don't 'own'.
|
| I'll never agree with a job that forces developers to use 'AI',
| I sometimes like to write everything by hand. But having this
| tool available is also very powerful.
| jamesmcq wrote:
| I want to be clear, I'm not against any use of AI. It's
| hugely useful to save a couple of minutes of "write this
| specific function to do this specific thing that I could
| write and know exactly what it would look like". That's a
| great use, and I use it all the time! It's better
| autocomplete. Anything beyond that is pushing it - at the
| moment! We'll see, but spending all day writing specs and
| double-checking AI output is not more productive than just
| writing correct code yourself the first time, even if you're
| AI-autocompleting some of it.
| skeledrew wrote:
| For the last few days I've been working on a personal
| project that's been on ice for at least 6 years. Back when
| I first thought of the project and started implementing it,
| it took maybe a couple weeks to eke out some minimally
| working code.
|
| This new version that I'm doing (from scratch with ChatGPT
| web) has a far more ambitious scope and is already at the
| "usable" point. Now I'm primarily solidifying things and
| increasing test coverage. And I've tested the key parts
| with IRL scenarios to validate that it's not just passing
| tests; the thing actually fulfills its intended function so
| far. Given the increased scope, I'm guessing it'd take me a
| few months to get to this point on my own, instead of under
| a week, and the quality wouldn't be where it is. Not saying
| I haven't had to wrangle with ChatGPT on a few bugs, but
| after a decent initial planning phase, my prompts now are
| primarily "Do it"s and "Continue"s. Would've likely already
| finished it if I wasn't copying things back and forth
| between browser and editor, and being forced to pause when
| I hit the message limit.
| keyle wrote:
| This is a great come-back story. I have had a similar
| experience with a photoshop demake of mine.
|
| I recommend to try out Opencode with this approach, you
| might find it less tiring than ChatGPT web (yes it works
| with your ChatGPT Plus sub).
| Quothling wrote:
| I think it comes down to "it depends". I work in a NIS2
| regulated field and we're quite callenged by the fact that it
| means we can't give AI's any sort of real access because of
| the security risk. To be complaint we'd have to have the AI
| agent ask permission for every single thing it does, before
| it does it, and foureye review it. Which is obviously never
| going to happen. We can discuss how bad the NIS2 foureye
| requirement works in the real world another time, but
| considering how easy it is to break AI security, it might not
| be something we can actually ever use. This makes sense on
| some of the stuff we work on, since it could bring an entire
| powerplant down. On the flip-side AI risks would be of little
| concern on a lot of our internal tools, which are basically
| non-regulated and unimportant enough that they can be down
| for a while without costing the business anything beyond
| annoyances.
|
| This is where our challenges are. We've build our own chatbot
| where you can "build" your own agent within the librechat
| framework and add a "skill" to it. I say "skill" because it's
| older than claude skills but does exactly the same. I don't
| completely buy the authors:
|
| > "deeply", "in great details", "intricacies", "go through
| everything"
|
| bit, but you can obviously save a lot of time by writing a
| piece of english which tells it what sort of environment you
| work in. It'll know that when I write Python I use UV, Ruff
| and Pyrefly and so on as an example. I personally also have a
| "skill" setting that tells the AI not to compliment me
| because I find that ridicilously annoying, and that certainly
| works. So who knows? Anyway, employees are going to want
| more. I've been doing some PoC's running open source models
| in isolation on a raspberry pi (we had spares because we use
| them in IoT projects) but it's hard to setup an isolation
| policy which can't be circumvented.
|
| We'll have to figure it out though. For powerplant critical
| projects we don't want to use AI. But for the web tool that
| allows a couple of employees to upload three excel files from
| an external accountant and then generate some sort of report
| on them? Who cares who writes it or even what sort of quality
| it's written with? The lifecycle of that tool will probably
| be something that never changes until the external account
| does and then the tool dies. Not that it would have
| necessarily been written in worse quality without AI... I
| mean... Have you seen some of the stuff we've written in the
| past 40 years?
| kburman wrote:
| Since Opus 4.5, things have changed quite a lot. I find LLMs
| very useful for discussing new features or ideas, and Sonnet is
| great for executing your plan while you grab a coffee.
| skeledrew wrote:
| Researching and planning a project is a generally usefully
| thing. This is something I've been doing for years, and have
| always had great results compared to just jumping in and
| coding. It makes perfect sense that this transfers to LLM use.
| phantomathkg wrote:
| Surely Addy Osmani can code. Even he suggests plan first.
|
| https://news.ycombinator.com/item?id=46489061
| skydhash wrote:
| > planning and checking and prompting and orchestrating is far
| more work than just writing the code yourself.
|
| This! Once I'm familiar with the codebase (which I strive to do
| very quickly), for most tickets, I usually have a plan by the
| time I've read the description. I can have a couple of
| implementation questions, but I knew where the info is located
| in the codebase. For things, I only have a vague idea, the
| whiteboard is where I go.
|
| The nice thing with such a mental plan, you can start with a
| rougher version (like a drawing sketch). Like if I'm starting a
| new UI screen, I can put a placeholder text like "Hello,
| world", then work on navigation. Once that done, I can start to
| pull data, then I add mapping functions to have a view
| model,...
|
| Each step is a verifiable milestone. Describing them is more
| mentally taxing than just writing the code (which is a flow
| state for me). Why? Because English is not fit to describe how
| computer works (try describe a finite state machine like
| navigation flow in natural languages). My mental mental model
| is already aligned to code, writing the solution in natural
| language is asking me to be ambiguous and unclear on purpose.
| roncesvalles wrote:
| Well it's less mental load. It's like Tesla's FSD. Am I a
| better driver than the FSD? For sure. But is it nice to just
| sit back and let it drive for a bit even if it's suboptimal and
| gets me there 10% slower, and maybe slightly pisses off the guy
| behind me? Yes, nice enough to shell out $99/mo. Code
| implementation takes a toll on you in the same way that driving
| does.
|
| I think the method in TFA is overall less stressful for the
| dev. And you can always fix it up manually in the end; AI
| coding vs manual coding is not either-or.
| stealthyllama wrote:
| There is a miscommunication happening, this entire time we all
| had surprisingly different ideas about what quality of work is
| acceptable which seems to account for differences of opinion on
| this stuff.
| psvv wrote:
| I'd find it deeply funny if the optimal vibe coding workflow
| continues to evolve to include more and more human oversight,
| and less and less agent autonomy, to the point where eventually
| someone makes a final breakthrough that they can save time by
| bypassing the LLM entirely and writing the code themselves.
| (Finally coming full circle.)
| pjio wrote:
| You mean there will be an invention to edit files directly
| instead of giving the specific code and location you want it
| to be written into the prompt?
| ramoz wrote:
| One thing for me has been the ability to iterate over plans -
| with a better visual of them as well as ability to annotate
| feedback about the plan.
|
| https://github.com/backnotprop/plannotator Plannotator does this
| really effectively and natively through hooks
| prodtorok wrote:
| Wow, I've been needing this! The one issue I've had with
| terminals is reviewing plans, and desiring the ability to
| provide feedback on specific plan sections in a more organized
| way.
|
| Really nice ui based on the demo.
| haolez wrote:
| > Notice the language: "deeply", "in great details",
| "intricacies", "go through everything". This isn't fluff. Without
| these words, Claude will skim. It'll read a file, see what a
| function does at the signature level, and move on. You need to
| signal that surface-level reading is not acceptable.
|
| This makes no sense to my intuition of how an LLM works. It's not
| that I don't believe this works, but my mental model doesn't
| capture why asking the model to read the content "more deeply"
| will have any impact on whatever output the LLM generates.
| fragmede wrote:
| Yeah, it's definitely a strange new world we're in, where I
| have to "trick" the computer into cooperating. The other day I
| told Claude "Yes you can", and it went off and did something it
| just said it couldn't do!
| itypecode wrote:
| Solid dad move. XD
| wilkystyle wrote:
| Is parenting making us better at prompt engineering, or is
| it the other way around?
| fragmede wrote:
| Better yet, I have Codex, Gemini, and Claude as my kids,
| running around in my code playground. How do I be a good
| parent and not play favorites?
| itypecode wrote:
| We all know Gemini is your artsy, Claude is your
| smartypants, and Codex is your nerd.
| bpodgursky wrote:
| You bumped the token predictor into the latent space where it
| knew what it was doing : )
| optimalsolver wrote:
| The little language model that could.
| jcdavis wrote:
| Its a wild time to be in software development. Nobody(1)
| actually knows what causes LLMs to do certain things, we just
| pray the prompt moves the probabilities the right way enough
| such that it mostly does what we want. This used to be a field
| that prided itself on deterministic behavior and
| reproducibility.
|
| Now? We have AGENTS.md files that look like a parent talking to
| a child with all the bold all-caps, double emphasis, just
| praying that's enough to be sure they run the commands you want
| them to be running
|
| (1 Outside of some core ML developers at the big model
| companies)
| chickensong wrote:
| For Claude at least, the more recent guidance from Anthropic
| is to not yell at it. Just clear, calm, and concise
| instructions.
| trueno wrote:
| wait seriously? lmfao
|
| thats hilarious. i definitely treat claude like shit and
| ive noticed the falloff in results.
|
| if there's a source for that i'd love to read about it.
| defrost wrote:
| Consciousness is off the table but they absolutely
| respond to environmental stimulus and vibes.
|
| See, uhhh,
| https://pmc.ncbi.nlm.nih.gov/articles/PMC8052213/ and
| maybe have a shot at running claude while playing _Enya_
| albums on loop.
|
| /s (??)
| trueno wrote:
| i have like the faintest vague thread of "maybe this
| actually checks out" in a way that has shit all to do
| with consciousness
|
| sometimes internet arguments get messy, people die on
| their hills and double / triple down on internet message
| boards. since historic internet data composes a bit of
| what goes into an llm, would it make sense that bad-juju
| prompting sends it to some dark corners of its training
| model if implementations don't properly sanitize certain
| negative words/phrases ?
|
| in some ways llm stuff is a very odd mirror that
| haphazardly regurgitates things resulting from the many
| shades of gray we find in human qualities.... but
| presents results as matter of fact. the amount of
| internet posts with possible code solutions and more
| where people egotistically die on their respective hills
| that have made it into these models is probably off the
| charts, even if the original content was a far cry from a
| sensible solution.
|
| all in all llm's really do introduce quite a bit of a
| black box. lot of benefits, but a ton of unknowns and one
| must be hyperviligant to the possible pitfalls of these
| things... but more importantly be self aware enough to
| understand the possible pitfalls that these things
| introduce to the person using them. they really possibly
| dangerously capitalize on everyones innate need to want
| to be a valued contributor. it's really common now to see
| so many people biting off more than they can chew, often
| times lacking the foundations that would've normally had
| a competent engineer pumping the brakes. i have a lot of
| respect/appreciation for people who might be doing a bit
| of claude here and there but are flat out forward about
| it in their readme and very plainly state to not have any
| high expectations because _they_ are aware of the risks
| involved here. i also want to commend everyone who writes
| their own damn readme.md.
|
| these things are for better or for worse great at causing
| people to barrel forward through 'problem solving', which
| is presenting quite a bit of gray area on whether or not
| the problem is actually solved / how can you be sure / do
| you understand how the fix/solution/implementation works
| (in many cases, no). this is why exceptional software
| engineers can use this technology insanely proficiently
| as a supplementary worker of sorts but others find
| themselves in a design/architect seat for the first time
| and call tons of terrible shots throughout the course of
| what it is they are building. i'd at least like to call
| out that people who feel like they "can do everything on
| their own and don't need to rely on anyone" anymore seem
| to have lost the plot entirely. there are facets of that
| statement that might be true, but less collaboration
| especially in organizations is quite frankly the first
| steps some people take towards becoming delusional. and
| that is always a really sad state of affairs to watch
| unfold. doing stuff in a vaccuum is fun on your own time,
| but forcing others to just accept things you built in a
| vaccuum when you're in any sort of team structure is
| insanely immature and honestly very destructive/risky. i
| would like to think absolutely no one here is surprised
| that some sub-orgs at Microsoft force people to use
| copilot or be fired, very dangerous path they tread there
| as they bodyslam into place solutions that are not well
| understood. suddenly all the leadership decisions at many
| companies that have made to once again bring back a
| before-times era of offshoring work makes sense: they
| think with these technologies existing the subordinate
| culture of overseas workers combined with these techs
| will deliver solutions no one can push back on. great
| savings and also no one will say no.
| xmcp123 wrote:
| For awhile(maybe a year ago?) it seemed like verbal abuse
| was the best way to make Claude pay attention. In my
| head, it was impacting how important it deemed the
| instruction. And it definitely did seem that way.
| basch wrote:
| If you think about where in the training data there is
| positivity vs negativity it really becomes equivalent to
| having a positive or negative mindset regarding a
| standing and outcome in life.
| chickensong wrote:
| I don't have a source offhand, but I think it may have
| been part of the 4.5 release? Older models definitely
| needed caps and words like critical, important, never,
| etc... but Anthropic published something that said don't
| do that anymore.
| whateveracct wrote:
| i make claude grovel at my feet and tell me in detail why
| my code is better than its code
| joshmn wrote:
| Sometimes I daydream about people screaming at their LLM as
| if it was a TV they were playing video games on.
| glerk wrote:
| Yep, with Claude saying "please" and "thank you" actually
| works. If you build rapport with Claude, you get rewarded
| with intuition and creativity. Codex, on the other hand,
| you have to slap it around like a slave gollum and it will
| do exactly what you tell it to do, no more, no less.
| whateveracct wrote:
| this is psychotic why is this how this works lol
| hugh-avherald wrote:
| Speculation only obviously: highly-charged conversations
| cause the discussion to be channelled to general human
| mitigation techniques and for the 'thinking agent' to be
| diverted to continuations from text concerned with the
| general human emotional experience.
| harrall wrote:
| It's like playing a fretless instrument to me.
|
| Practice playing songs by ear and after 2 weeks, my brain has
| developed an inference model of where my fingers should go to
| hit any given pitch.
|
| Do I have any idea how my brain's model works? No! But it
| tickles a different part of my brain and I like it.
| klipt wrote:
| Sufficiently advanced technology has become like magic: you
| have to prompt the electronic genie with the right words or
| it will twist your wishes.
| silversmith wrote:
| Light some incense, and you too can be a dystopian space
| tech support, today! Praise Omnissiah!
| overfeed wrote:
| are we the orks?
| wilkystyle wrote:
| The author is referring to how the framing of your prompt
| informs the attention mechanism. You are essentially hinting to
| the attention mechanism that the function's implementation
| details have important context as well.
| MattGaiser wrote:
| One of the well defined failure modes for AI agents/models is
| "laziness." Yes, models can be "lazy" and that is an actual
| term used when reviewing them.
|
| I am not sure if we know why really, but they are that way and
| you need to explicitly prompt around it.
| kannanvijayan wrote:
| I've encountered this failure mode, and the opposite of it:
| thinking too much. A behaviour I've come to see as some sort
| of pseudo-neuroticism.
|
| Lazy thinking makes LLMs do surface analysis and then produce
| things that are wrong. Neurotic thinking will see them over-
| analyze, and then repeatedly second-guess themselves,
| repeatedly re-derive conclusions.
|
| Something very similar to an anxiety loop in humans, where
| problems without solutions are obsessed about in circles.
| denimnerd42 wrote:
| yeah i experienced this the other day when asking claude
| code to build an http proxy using an afsk modem software to
| communicate over the computers sound card. it had an
| absolute fit tuning the system and would loop for hours
| trying and doubling back. eventually after some change in
| prompt direction to think more deeply and test more
| comprehensively it figured it out. i certainly had no idea
| how to build a afsk modem.
| ChadNauseam wrote:
| The disconnect might be that there is a separation between
| "generating the final answer for the user" and
| "researching/thinking to get information needed for that
| answer". Saying "deeply" prompts it to read more of the file
| (as in, actually use the `read` tool to grab more parts of the
| file into context), and generate more "thinking" tokens (as in,
| tokens that are not shown to the user but that the model writes
| to refine its thoughts and improve the quality of its answer).
| hashmap wrote:
| these sort-of-lies might help:
|
| think of the latent space inside the model like a topological
| map, and when you give it a prompt, you're dropping a ball at a
| certain point above the ground, and gravity pulls it along the
| surface until it settles.
|
| caveat though, thats nice per-token, but the signal gets messed
| up by picking a token from a distribution, so each token you're
| regenerating and re-distorting the signal. leaning on language
| that places that ball deep in a region that you want to be
| makes it less likely that those distortions will kick it out of
| the basin or valley you may want to end up in.
|
| if the response you get is 1000 tokens long, the initial
| trajectory needed to survive 1000 probabilistic filters to get
| there.
|
| or maybe none of that is right lol but thinking that it is has
| worked for me, which has been good enough
| noduerme wrote:
| Hah! Reading this, my mind inverted it a bit, and I realized
| ... it's like the claw machine theory of gradient descent. Do
| you drop the claw into the deepest part of the pile, or where
| there's the thinnest layer, the best chance of grabbing
| something specific? Everyone in everu bar has a theory about
| claw machines. But the really funny thing that unites LLMs
| with claw machines is that the biggest question is always
| whether they dropped the ball on purpose.
|
| The claw machine is also a sort-of-lie, of course. Its main
| appeal is that it offers the illusion of control. As a former
| designer and coder of online slot machines... totally spin
| off into pages on this analogy, about how that illusion gets
| you to keep pulling the lever... but the geographic rendition
| you gave is sort of priceless when you start making the
| comparison.
| basch wrote:
| My mental model for them is plinko boards. Your prompt
| changes the spacing between the nails to increase the
| probability in certain directions as your chip falls down.
| hashmap wrote:
| i literally suggested this metaphor earlier yesterday to
| someone trying to get agents to do stuff they wanted, that
| they had to set up their guardrails in a way that you can
| let the agents do what they're good at, and you'll get
| better results because you're not sitting there looking at
| them.
|
| i think probably once you start seeing that the behavior
| falls right out of the geometry, you just start looking at
| stuff like that. still funny though.
| stingraycharles wrote:
| It's actually really common. If you look at Claude Code's own
| system prompts written by Anthropic, they're littered with
| "CRITICAL (RULE 0):" type of statements, and other similar
| prompting styles.
| Scrapemist wrote:
| Where can I find those?
| stingraycharles wrote:
| This analysis is a good starting point:
| https://southbridge-research.notion.site/Prompt-
| Engineering-...
| Betelbuddy wrote:
| Its very logical and pretty obvious when you do code
| generation. If you ask the same model, to generate code by
| starting with:
|
| - You are a Python Developer... or - You are a Professional
| Python Developer... or - You are one of the World most renowned
| Python Experts, with several books written on the subject, and
| 15 years of experience in creating highly reliable production
| quality code...
|
| You will notice a clear improvement in the quality of the
| generated artifacts.
| obiefernandez wrote:
| My colleague swears by his DHH claude skill
| https://danieltenner.com/dhh-is-immortal-and-costs-200-m/
| haolez wrote:
| That's different. You are pulling the model, semantically,
| closer to the problem domain you want it to attack.
|
| That's very different from "think deeper". I'm just curious
| about this case in specific :)
| argee wrote:
| I don't know about some of those "incantations", but it's
| pretty clear that an LLM can respond to "generate twenty
| sentences" vs. "generate one word". That means you can
| indeed coax it into more verbosity ("in great detail"), and
| that can help align the output by having more relevant
| context (inserting irrelevant context or something entirely
| improbable into LLM output and forcing it to continue from
| there makes it clear how detrimental that can be).
|
| Of course, that doesn't mean it'll definitely be _better_ ,
| but if you're making an LLM chain it seems prudent to
| preserve whatever info you can at each step.
| gehsty wrote:
| Do you think that Anthropic don't include things like this in
| their harness / system prompts? I feel like this kind of
| prompts are uneccessary with Opus 4.5 onwards, obviously
| based on my own experience (I used to do this, on switching
| to opus I stopped and have implemented more complex problems,
| more successfully).
|
| I am having the most success describing what I want as
| humanly as possible, describing outcomes clearly, making sure
| the plan is good and clearing context before implementing.
| hu3 wrote:
| Maybe, but forcing code generation in a certain way could
| ruin hello worlds and simpler code generation.
|
| Sometimes the user just wants something simple instead of
| enterprise grade.
| popalchemist wrote:
| Strings of tokens are vectors. Vectors are directions. When you
| use a phrase like that you are orienting the vector of the
| overall prompt toward the direction of depth, in its map of
| conceptual space.
| nostrademons wrote:
| It's the attention mechanism at work, along with a fair bit of
| Internet one-up-manship. The LLM has ingested all of the text
| on the Internet, as well as Github code repositories, pull
| requests, StackOverflow posts, code reviews, mailing lists,
| etc. In a number of those content sources, there will be people
| saying "Actually, if you go into the details of..." or "If you
| look at the intricacies of the problem" or "If you understood
| the problem deeply" followed by a very deep, expert-level
| explication of exactly what you should've done differently. You
| want the model to use the code in the correction, not the one
| in the original StackOverflow question.
|
| Same reason that "Pretend you are an MIT professor" or "You are
| a leading Python expert" or similar works in prompts. It tells
| the model to pay attention to the part of the corpus that has
| those terms, weighting them more highly than all the other
| programming samples that it's run across.
| r0b05 wrote:
| This is such a good explanation. Thanks
| xscott wrote:
| Of course I can't be certain, but I think the "mixture of
| experts" design plays into it too. Metaphorically, there's a
| mid-level manager who looks at your prompt and tries to
| decide which experts it should be sent to. If he thinks you
| won't notice, he saves money by sending it to the
| undergraduate intern.
|
| Just a theory.
| victorbjorklund wrote:
| Notice that MOE isn't different experts for different types
| of problems. It's per token and not really connect to
| problem type.
|
| So if you send a python code then the first one in function
| can be one expert, second another expert and so on.
| dotancohen wrote:
| Can you back this up with documentation? I don't believe
| that this is the case.
| pixelmelt wrote:
| Check out Unsloths REAP models, you can outright delete a
| few of the lesser used experts without the model going
| braindead since they all can handle each token but some
| are better posed to do so.
| manmal wrote:
| I don't think this is a result of the base training data
| (,,the internet"). It's a post training behavior, created
| during reinforcement learning. Codex has a totally different
| behavior in that regard. Codex reads per default a lot of
| potentially relevant files before it goes and writes files.
|
| Maybe you remember that, without reinforcement learning, the
| models of 2019 just completed the sentences you gave them.
| There were no tool calls like reading files. Tool calling
| behavior is company specific and highly tuned to their
| harnesses. How often they call a tool, is not part of the
| base training data.
| spagettnet wrote:
| Modern LLM are certainly fine tuned on data that includes
| examples of tool use, mostly the tools built into their
| respective harnesses, but also external/mock tools so they
| dont overfit on only using the toolset they expect to see
| in their harnesses.
| manmal wrote:
| IDK the current state, but I remember that, last year,
| the open source coding harnesses needed to provide
| exactly the tools that the LLM expected, or the error
| rate went through the roof. Some, like grok and gemini,
| only recently managed to make tool calls somewhat
| reliable.
| hbarka wrote:
| >> Same reason that "Pretend you are an MIT professor" or
| "You are a leading Python expert" or similar works in
| prompts.
|
| This pretend-you-are-a-[persona] is cargo cult prompting at
| this point. The persona framing is just decoration.
|
| A brief purpose statement describing what the skill
| [skill.md] does is more honest and just as effective.
| rescbr wrote:
| I think it does more harm than good on recent models. The
| LLM has to override its system prompt to role-play, wasting
| context and computing cycles instead of working on the
| task.
| dakolli wrote:
| You will never convince me that this isn't confirmation bias,
| or the equivalent of a slot machine player thinking the order
| in which they push buttons impacts the output, or some other
| gambler-esque superstition.
|
| These tools are literally designed to make people behave like
| gamblers. And its working, except the house in this case
| takes the money you give them and lights it on fire.
| nubg wrote:
| Your ignorance is my opportunity. May I ask which markets
| you are developing for?
| dakolli wrote:
| "The equivalent of saying, which slot machine were you
| sitting at It'll make me money"
| ambicapter wrote:
| Maybe the training data that included the words like "skim"
| also provided shallower analysis than training that was close
| to the words "in great detail", so the LLM is just reproducing
| those respective words distribution when prompted with
| directions to do either.
| scuff3d wrote:
| How anybody can read stuff like this and still take all this
| seriously is beyond me. This is becoming the engineering
| equivalent of astrology.
| fragmede wrote:
| Feel free to run your own tests and see if the magic phrases
| do or do not influence the output. Have it make a Todo webapp
| with and without those phrases and see what happens!
| scuff3d wrote:
| That's not how it works. It's not on everyone else to prove
| claims false, it's on you (or the people who argue any of
| this had a measurable impact) to prove it actually works.
| I've seen a bunch of articles like this, and more comments.
| Nobody I've ever seen has produced any kind of measurable
| metrics of quality based on one approach vs another. It's
| all just vibes.
|
| Without something quantifiable it's not much better then
| someone who always wears the same jersey when their
| favorite team plays, and swears they play better because of
| it.
| tokioyoyo wrote:
| Do you actively use LLMs to do semi-complex coding work?
| Because if not, it will sound mumbo-jumbo to you.
| Everyone else can nod along and read on, as they've
| experienced all of it first hand.
| scuff3d wrote:
| You've missed the point. This isn't engineering, it's
| gambling.
|
| You could take the exact same documents, prompts, and
| whatever other bullshit, run it on the exact same agent
| backed by the exact same model, and get different results
| every single time. Just like you can roll dice the exact
| same way on the exact same table and you'll get two
| totally different results. People are doing their best to
| constrain that behavior by layering stuff on top, but the
| foundational tech is flawed (or at least ill suited for
| this use case).
|
| That's not to say that AI isn't helpful. It certainly is.
| But when you are basically begging your tools to please
| do what you want with magic incantations, we've lost the
| fucking plot somewhere.
| gf000 wrote:
| > You could take the exact same documents, prompts, and
| whatever other bullshit, run it on the exact same agent
| backed by the exact same model, and get different results
| every single time
|
| This is more of an implementation detail/done this way to
| get better results. A neural network with fixed weights
| (and deterministic floating point operations) returning a
| probability distribution, where you use a pseudorandom
| generator with a fixed seed called recursively will
| always return the same output for the same input.
| geoelectric wrote:
| I think that's a pretty bold claim, that it'd be
| different every time. I'd think the output would converge
| on a small set of functionally equivalent designs, given
| sufficiently rigorous requirements.
|
| And even a human engineer might not solve a problem the
| same way twice in a row, based on changes in recent
| inspirations or tech obsessions. What's the difference,
| as long as it passes review and does the job?
| guiambros wrote:
| If you read the transformer paper, or get any book on
| NLP, you will see that this is not magic incantation;
| it's purely the attention mechanism at work. Or you can
| just ask Gemini or Claude why these prompts work.
|
| But I get the impression from your comment that you have
| a fixed idea, and you're not really interested in
| understanding how or why it works.
|
| If you think like a hammer, everything will look like a
| nail.
| scuff3d wrote:
| I know why it works, to varying and unmeasurable degrees
| of success. Just like if I poke a bull with a sharp
| stick, I know it's gonna get it's attention. It might
| choose to run away from me in one of any number of
| directions, or it might decide to turn around and gore me
| to death. I can't answer that question with any certainty
| then you can.
|
| The system is inherently non-deterministic. Just because
| you can guide it a bit, doesn't mean you can predict
| outcomes.
| winrid wrote:
| But we can predict the outcomes, though. That's what
| we're saying, and it's true. Maybe not 100% of the time,
| but maybe it helps a significant amount of the time and
| that's what matters.
|
| Is it engineering? Maybe not. But neither is knowing how
| to talk to junior developers so they're productive and
| don't feel bad. The engineering is at other levels.
| imiric wrote:
| > But we can predict the outcomes [...] Maybe not 100% of
| the time
|
| So 60% of the time, it works every time.
|
| ... This fucking industry.
| guiambros wrote:
| > _The system is inherently non-deterministic._
|
| The system isn't randomly non-deterministic; it is
| statistically probabilistic.
|
| The next-token prediction and the attention mechanism is
| actually a rigorous deterministic mathematical process.
| The variation in output comes from how we sample from
| that curve, and the temperature used to calibrate the
| model. Because the underlying probabilities are
| mathematically calculated, the system's behavior remains
| highly predictable _within statistical bounds_.
|
| Yes, it's a departure from the fully deterministic
| systems we're used to. But that's not different than the
| many real world systems: weather, biology, robotics,
| quantum mechanics. Even the computer you're reading this
| right now is full of probabilistic processes, abstracted
| away through sigmoid-like functions that push the
| extremes to 0s and 1s.
| imiric wrote:
| A lot of words to say that for all intents and
| purposes... it's nondeterministic.
|
| > Yes, it's a departure from the fully deterministic
| systems we're used to.
|
| A system either produces the same output given the same
| input[1], or doesn't.
|
| LLMs are nondeterministic _by design_. Sure, you can
| configure them with a zero temperature, a static seed,
| and so on, but they 're of no use to anyone in that
| configuration. The nondeterminism is what gives them the
| illusion of "creativity", and other useful properties.
|
| Classical computers, compilers, and programming languages
| are deterministic _by design_ , even if they do contain
| complex logic that may affect their output in
| unpredictable ways. There's a world of difference.
|
| [1]: Barring misbehavior due to malfunction, corruption
| or freak events of nature (cosmic rays, etc.).
| hu3 wrote:
| Humans are nondeterministic.
|
| So this is a moot point and a futile exercise in arguing
| semantics.
| yaku_brang_ja wrote:
| These coding agents are literally Language Models. The
| way you structure your prompting language affect the
| actual output.
| energy123 wrote:
| Anthropic recommends doing magic invocations:
| https://simonwillison.net/2025/Apr/19/claude-code-best-
| pract...
|
| It's easy to know why they work. The magic invocation
| increases test-time compute (easy to verify yourself - try!).
| And an increase in test-time compute is demonstrated to
| increase answer correctness (see any benchmark).
|
| It might surprise you to know that the only different between
| GPT 5.2-low and GPT 5.2-xhigh is one of these magic
| invocations. But that's not supposed to be public knowledge.
| gehsty wrote:
| I think this was more of a thing on older models. Since I
| started using Opus 4.5 I have not felt the need to do this.
| cloudbonsai wrote:
| The evolution of software engineering is fascinating to me.
| We started by coding in thin wrappers over machine code and
| then moved on to higher-level abstractions. Now, we've
| reached the point where we discuss how we should talk to a
| mystical genie in a box.
|
| I'm not being sarcastic. This is absolutely incredible.
| intrasight wrote:
| And I've been had a long enough to go through that whole
| progression. Actually from the earlier step of writing
| machine code. It's been and continues to be a fun journey
| which is why I'm still working.
| sumedh wrote:
| We have tests and benchmarks to measure it though.
| giancarlostoro wrote:
| The LLM will do what you ask it to unless you don't get nuanced
| about it. Myself and others have noticed that LLM's work better
| when your codebase is not full of code smells like massive
| godclass files, if your codebase is discrete and broken up in a
| way that makes sense, and fits in your head, it will fit in the
| models head.
| winwang wrote:
| Apparently LLM quality is sensitive to emotional stimuli?
|
| "Large Language Models Understand and Can be Enhanced by
| Emotional Stimuli": https://arxiv.org/abs/2307.11760
| nazgul17 wrote:
| It's very much believable, to me.
|
| In image generation, it's fairly common to add "masterpiece",
| for example.
|
| I don't think of the LLM as a smart assistant that knows what I
| want. When I tell it to write some code, how does it know I
| want it to write the code like a world renowned expert would,
| rather than a junior dev?
|
| I mean, certainly Anthropic has tried hard to make the former
| the case, but the Titanic inertia from internet scale data bias
| is hard to overcome. You can help the model with these hints.
|
| Anyway, luckily this is something you can empirically verify.
| This way, you don't have to take anyone's word. If anything, if
| you find I'm wrong in your experiments, please share it!
| pixelmelt wrote:
| Its effectiveness is even more apparent with older smaller
| LLMs, people who interact with LLMs now never tried to
| wrangle llama2-13b into pretending to be a dungeon master...
| FuckButtons wrote:
| That's because it's superstition.
|
| Unless someone can come up with some kind of rigorous
| statistics on what the effect of this kind of priming is it
| seems no better than claiming that sacrificing your first born
| will please the sun god into giving us a bountiful harvest next
| year.
|
| Sure, maybe this supposed deity really is this insecure and
| needs a jolly good pep talk every time he wakes up. or maybe
| you're just suffering from magical thinking that your
| incantations had any effect on the random variable word
| machine.
|
| The thing is, you could actually prove it, it's an optimization
| problem, you have a model, you can generate the statistics, but
| no one as far as I can tell has been terribly forthcoming with
| that , either because those that have tried have decided to try
| to keep their magic spells secret, or because it doesn't really
| work.
|
| If it did work, well, the oldest trick in computer science is
| writing compilers, i suppose we will just have to write an
| English to pedantry compiler.
| majormajor wrote:
| > If it did work, well, the oldest trick in computer science
| is writing compilers, i suppose we will just have to write an
| English to pedantry compiler.
|
| "Add tests to this function" for GPT-3.5-era models was much
| less effective than "you are a senior engineer. add tests for
| this function. as a good engineer, you should follow the
| patterns used in these other three function+test examples,
| using this framework and mocking lib." In today's tools, "add
| tests to this function" results in a bunch of initial steps
| to look in common places to see if that additional context
| already exists, and then pull it in based on what it finds.
| You can see it in the output the tools spit out while
| "thinking."
|
| So I'm 90% sure this is already happening on some level.
| GrinningFool wrote:
| But can you see the difference if you only include "you are
| a senior engineer"? It seems like the comparison you're
| making is between "write the tests" and "write the tests
| following these patterns using these examples. Also btw
| you're an expert. "
| rzmmm wrote:
| I think "understand this directory deeply" just gives more
| focus for the instruction. So it's like "burn more tokens for
| this phase than you normally would".
| imiric wrote:
| > That's because it's superstition.
|
| This field is full of it. Practices are promoted by those who
| tie their personal or commercial brand to it for increased
| exposure, and adopted by those who are easily influenced and
| don't bother verifying if they actually work.
|
| This is why we see a new Markdown format every week,
| "skills", "benchmarks", and other useless ideas, practices,
| and measurements. Consider just how many "how I use AI"
| articles are created and promoted. Most of the field runs on
| anecdata.
|
| It's not until someone actually takes the time to evaluate
| some of these memes, that they find little to no practical
| value in them.[1]
|
| [1]: https://news.ycombinator.com/item?id=47034087
| onion2k wrote:
| _i suppose we will just have to write an English to pedantry
| compiler._
|
| A common technique is to prompt in your chosen AI to write a
| longer prompt to get it to do what you want. It's used a lot
| in image generation. This is called 'prompt enhancing'.
| stingraycharles wrote:
| I actually have a prompt optimizer skill that does exactly
| this.
|
| https://github.com/solatis/claude-config
|
| It's based entirely off academic research, and a LOT of
| research has been done in this area.
|
| One of the papers you may be interested in is "emotion
| prompting", eg "it is super important for me that you do X"
| etc actually works.
|
| "Large Language Models Understand and Can be Enhanced by
| Emotional Stimuli"
|
| https://arxiv.org/abs/2307.11760
| Affric wrote:
| My guess would be that there's a greater absolute magnitude of
| the vectors to get to the same point in the knowledge model.
| computerex wrote:
| It is as the author said, it'll skim the content unless
| otherwise prompted to do so. It can read partial file
| fragments; it can emit commands to search for patterns in the
| files. As opposed to carefully reading each file and reasoning
| through the implementation. By asking it to go through in
| detail you are telling it to not take shortcuts and actually
| read the actual code in full.
| wrs wrote:
| The original "chain of thought" breakthrough was literally to
| insert words like "Wait" and "Let's think step by step".
| computomatic wrote:
| If I say "you are our domain expert for X, plan this task out
| in great detail" to a human engineer when delegating a task, 9
| times out of 10 they will do a more thorough job. It's not that
| this is voodoo that unlocks some secret part of their brain. It
| simply establishes my expectations and they act accordingly.
|
| To the extent that LLMs mimic human behaviour, it shouldn't be
| a surprise that setting clear expectations works there too.
| joseangel_sc wrote:
| if it's so smart, why do i need to learn to use it?
| DemocracyFTW2 wrote:
| --HAL, open the shuttle bay doors.
|
| ( _chirp_ )
|
| --HAL, _please_ open the shuttle bay doors.
|
| ( _pause_ )
|
| --HAL!
|
| --I'm afraid I can't do that, Dave.
| layer8 wrote:
| HAL, you are an expert shuttle-bay door opener. Please write
| up a detailed plan of how to open the shuttle-bay door.
| deevus wrote:
| This is what I do with the obra/superpowers[0] set of skills.
|
| 1. Use brainstorming to come up with the plan using the Socratic
| method
|
| 2. Write a high level design plan to file
|
| 3. I review the design plan
|
| 4. Write an implementation plan to file. We've already discussed
| this in detail, so usually it just needs skimming.
|
| 5. Use the worktree skill with subagent driven development skill
|
| 6. Agent does the work using subagents that for each task:
| a. Implements the task b. Spec reviews the completed
| task c. Code reviews the completed task
|
| 7. When all tasks complete: create a PR for me to review
|
| 8. Go back to the agent with any comments
|
| 9. If finished, delete the plan files and merge the PR
|
| [0]: https://github.com/obra/superpowers
| ramoz wrote:
| If you've ever desired the ability for annotating the plan more
| visually, try fitting Plannotator in this workflow. There is a
| slash command for use when you use custom workflows outside of
| normal plan mode.
|
| https://github.com/backnotprop/plannotator
| deevus wrote:
| I'll give this a try. Thanks for the suggestion.
| moribunda wrote:
| The crowd around this pot shows how superficial is knowledge
| about claude code. It gets releases each day and most of this
| is already built in the vanilla version. Not to mention
| subagent working in work trees, memory.md, plan on which you
| can comment directly from the interface, subagents launched in
| research phase, but also some basic mcp's like LSP/IDE
| integration, and context7 to not to be stuck in the knowledge
| cutoff/past.
|
| When you go to YouTube and search for stuff like "7 levels of
| claude code" this post would be maybe 3-4.
|
| Oh, one more thing - quality is not consistent, so be ready for
| 2-3 rounds of "are you happy with the code you wrote" and
| defining audit skills crafted for your application domain -
| like for example RODO/Compliance audit etc.
| deevus wrote:
| I'm using the in-built features as well, but I like the flow
| that I have with superpowers. You've made a lot of
| assumptions with your comment that are just not true (at
| least for me).
|
| I find that brainstorming + (executing plans OR subagent
| driven development) is way more reliable than the built-in
| tooling.
| fnord77 wrote:
| I have a different approach where I have claude write coding
| prompts for stages then I give the prompt to another agent. I
| wonder if I should write it up as a blog post
| alexmorgan26 wrote:
| This separation of planning and execution resonates deeply with
| how I approach task management in general, not just coding.
|
| The key insight here - that planning and execution should be
| distinct phases - applies to productivity tools too. I've been
| using www.dozy.site which takes a similar philosophy: it has
| smart calendar scheduling that automatically fills your empty
| time slots with planned tasks. The planning happens first (you
| define your tasks and projects), then the execution is automated
| (tasks get scheduled into your calendar gaps).
|
| The parallel is interesting: just like you don't want Claude
| writing code before the plan is solid, you don't want to manually
| schedule tasks before you've properly planned what needs to be
| done. The separation prevents wasted effort and context
| switching.
|
| The annotation cycle you describe (plan -> review -> annotate ->
| refine) is exactly how I work with my task lists too. Define the
| work, review it, adjust priorities and dependencies, then let the
| system handle the scheduling.
| dimgl wrote:
| Pretty sure this entire comment is AI generated.
| rob wrote:
| Almost think we're at the point on HN where we need a special
| [flag bot] link for those that meet a certain threshold and
| it alerts @dang or something to investigate them in more
| detail. The amount of bots on here has been increasing at an
| alarming rate.
| zahlman wrote:
| There has been this really weird flood of new accounts lately
| that are making these kinds of bot comments with no clear
| purpose to making them. Maybe it comes from people
| experimenting with OpenClaw?
| skybrian wrote:
| I do something broadly similar. I ask for a design doc that
| contains an embedded todo list, broken down into phases. Looping
| on the design doc asking for suggestions seems to help. I'm up to
| about 40 design docs so far on my current project.
| brandall10 wrote:
| I go a bit further than this and have had great success with 3
| doc types and 2 skills:
|
| - Specs: these are generally static, but updatable as the project
| evolves. And they're broken out to an index file that gives a
| project overview, a high-level arch file, and files for all the
| main modules. Roughly ~1k lines of spec for 10k lines of code,
| and try to limit any particular spec file to 300 lines. I'm
| intimately familiar with every single line in these.
|
| - Plans: these are the output of a planning session with an LLM.
| They point to the associated specs. These tend to be 100-300
| lines and 3 to 5 phases.
|
| - Working memory files: I use both a status.md (3-5 items per
| phase roughly 30 lines overall), which points to a latest plan,
| and a project_status (100-200 lines), which tracks the current
| state of the project and is instructed to compact past efforts to
| keep it lean)
|
| - A planner skill I use w/ Gemini Pro to generate new plans. It
| essentially explains the specs/plans dichotomy, the role of the
| status files, and to review everything in the pertinent areas of
| code and give me a handful of high-level next set of features to
| address based on shortfalls in the specs or things noted in the
| project_status file. Based on what it presents, I select a
| feature or improvement to generate. Then it proceeds to generate
| a plan, updates a clean status.md that points to the plan, and
| adjusts project_status based on the state of the prior completed
| plan.
|
| - An implementer skill in Codex that goes to town on a plan file.
| It's fairly simple, it just looks at status.md, which points to
| the plan, and of course the plan points to the relevant specs so
| it loads up context pretty efficiently.
|
| I've tried the two main spec generation libraries, which were way
| overblown, and then I gave superpowers a shot... which was fine,
| but still too much. The above is all homegrown, and I've had much
| better success because it keeps the context lean and focused.
|
| And I'm only on the $20 plans for Codex/Gemini vs. spending
| $100/month on CC for half year prior and move quicker w/ no stall
| outs due to token consumption, which was regularly happening w/
| CC by the 5th day. Codex rarely dips below 70% available context
| when it puts up a PR after an execution run. Roughly 4/5 PRs are
| without issue, which is flipped against what I experienced with
| CC and only using planning mode.
| r1290 wrote:
| Looks good. Question - is it always better to use a monorepo in
| this new AI world? Vs breaking your app into separate repos? At
| my company we have like 6 repos all separate nextjs apps for
| the same user base. Trying to consolidate to one as it should
| make life easier overall.
| oa335 wrote:
| Just put all the repos in all in one directory yourself. In
| my experience that works pretty well.
| throwup238 wrote:
| It really depends but there's nothing stopping you from just
| creating a separate folder with the cloned repositories (or
| worktrees) that you need and having a root CLAUDE.md file
| that explains the directory structure and referencing the
| individual repo CLAUDE.md files.
| chickensong wrote:
| AI is happy to work with any directory you tell it to. Agent
| files can be applied anywhere.
| jcurbo wrote:
| This is pretty much my approach. I started with some spec files
| for a project I'm working on right now, based on some academic
| papers I've written. I ended up going back and forth with
| Claude, building plans, pushing info back into the specs,
| expanding that out and I ended up with multiple
| spec/architecture/module documents. I got to the point where I
| ended up building my own system (using claude) to capture and
| generate artifacts, in more of a systems engineering style
| (e.g. following IEEE standards for conops, requirement
| documents, software definitions, test plans...). I don't use
| that for session-level planning; Claude's tools work fine for
| that. (I like superpowers, so far. It hasn't seemed too much)
|
| I have found it to work very well with Claude by giving it
| context and guardrails. Basically I just tell it "follow the
| guidance docs" and it does. Couple that with intense testing
| and self-feedback mechanisms and you can easily keep Claude on
| track.
|
| I have had the same experience with Codex and Claude as you in
| terms of token usage. But I haven't been happy with my Codex
| usage; Claude just feels like it's doing more of what I want in
| the way I want.
| cowlby wrote:
| I recently discovered GitHub speckit which separates
| planning/execution in stages: specify, plan, tasks, implement.
| Finding it aligns with the OP with the level of "focus" and
| "attention" this gets out of Claude Code.
|
| Speckit is worth trying as it automates what is being described
| here, and with Opus 4.6 it's been a kind of BC/AD moment for me.
| recroad wrote:
| Use OpenSpec and simplify everything.
| recroad wrote:
| Try OpenSpec and it'll do all this for you. SpecKit works too. I
| don't think there's a need to reinvent the wheel on this one, as
| this is spec-driven development.
| bodeadly wrote:
| Tip: LLMs are very good at following conventions (this is
| actually what is happening when it writes code). If you create a
| .md file with a list of entries of the following structure: #
| <identifier> <description block> <blank space> # <identifier> ...
| where an <identifier> is a stable and concise sequence of tokens
| that identifies some "thing" and seed it with 5 entries
| describing abstract stuff, the LLM will latch on and reference
| this. I call this a PCL (Project Concept List). I just tell it: >
| consume tmp/pcl-init.md pcl.md The pcl-init.md describes what PCL
| is and pcl.md is the actual list. I have pcl.md file for each
| independent component in the code (logging, http, auth, etc).
| This works very very well. The LLM seems to "know" what you're
| talking about. You can ask questions and give instructions like
| "add a PCL entry about this". It will ask if should add a PCL
| entry about xyz. If the description block tends to be high
| information-to-token ratio, it will follow that convention (which
| is a very good convention BTW).
|
| However, there is a caveat. LLMs resist ambiguity about
| authority. So the "PCL" or whatever you want to call it, needs to
| be the ONE authoritative place for everything. If you have the
| same stuff in 3 different files, it won't work nearly as well.
|
| Bonus Tip: I find long prompt input with example code fragments
| and thoughtful descriptions work best at getting an LLM to
| produce good output. But there will always be holes (resource
| leaks, vulnerabilities, concurrency flaws, etc). So then I update
| my original prompt input (keep it in a separate file PROMPT.txt
| as a scratch pad) to add context about those things maybe asking
| questions along the way to figure out how to fix the holes. Then
| I /rewind back to the prompt and re-enter the updated prompt.
| This feedback loop advances the conversation without expending
| tokens.
| imron wrote:
| I have tried using this and other workflows for a long time and
| had never been able to get them to work (see chat history for
| details).
|
| This has changed in the last week, for 3 reasons:
|
| 1. Claude opus. It's the first model where I haven't had to spend
| more time correcting things than it would've taken me to just do
| it myself. The problem is that opus chews through tokens, which
| led to..
|
| 2. I upgraded my Claude plan. Previously on the regular plan I'd
| get about 20 mins of time before running out of tokens for the
| session and then needing to wait a few hours to use again. It was
| fine for little scripts or toy apps but not feasible for the
| regular dev work I do. So I upgraded to 5x. This now got me 1-2
| hours per session before tokens expired. Which was better but
| still a frustration. Wincing at the price, I upgraded again to
| the 20x plan and this was the next game changer. I had plenty of
| spare tokens per session and at that price it felt like they were
| being wasted - so I ramped up my usage. Following a similar
| process as OP but with a plans directory with subdirectories for
| backlog, active and complete plans, and skills with strict rules
| for planning, implementing and completing plans, I now have 5-6
| projects on the go. While I'm planning a feature on one the
| others are implementing. The strict plans and controls keep them
| on track and I have follow up skills for auditing quality and
| performance. I still haven't hit token limits for a session but
| I've almost hit my token limit for the week so I feel like I'm
| getting my money's worth. In that sense spending more has forced
| me to figure out how to use more.
|
| 3. The final piece of the puzzle is using opencode over claude
| code. I'm not sure why but I just don't gel with Claude code.
| Maybe it's all the sauteing and flibertygibbering, maybe it's all
| the permission asking, maybe it's that it doesn't show what it's
| doing as much as opencode. Whatever it is it just doesn't work
| well for me. Opencode on the other hand is great. It's shows what
| it's doing and how it's thinking which makes it easy for me to
| spot when it's going off track and correct early.
|
| Having a detailed plan, and correcting and iterating on the plan
| is essential. Making clause follow the plan is also essential -
| but there's a line. Too fine grained and it's not as creative at
| solving problems. Too loose/high level and it makes bad choices
| and goes in the wrong direction.
|
| Is it actually making me more productive? I think it is but I'm
| only a week in. I've decided to give myself a month to see how it
| all works out.
|
| I don't intend to keep paying for the 20x plan unless I can see a
| path to using it to earn me at least as much back.
| raw_anon_1111 wrote:
| Just don't use Claude Code. I can use the Codex CLI with just
| my $20 subscription and never come close to any usage limits
| throwawaytea wrote:
| What if it's just slower so that your daily work fits within
| the paid tier they want?
| raw_anon_1111 wrote:
| It isn't slower. I use my personal ChatGPT subscriptions
| with Codex for almost everything at work and use my
| $800/month company Claude allowance only for the tricky
| stuff that Codex can't figure out. It's never application
| code. It's usually some combination of app code + Docker +
| AWS issue with my underlying infrastructure - created with
| whatever IAC that I'm using for a client -
| Terraform/CloudFormation or the CDK.
|
| I burned through $10 on Claude in less than an hour. I only
| have $36 a day at $800 a month (800/22 working days)
| imron wrote:
| > and use my $800/month company Claude allowance only for
| the tricky stuff that Codex can't figure out.
|
| It doesn't seem controversial that the model that can
| solve more complex problems (that you admit the cheaper
| model can't solve) costs more.
|
| For the things I use it for, I've not found any other
| model to be worth it.
| raw_anon_1111 wrote:
| You're assuming rational behavior from a company that
| doesn't care about losing billions of dollar.
|
| Have you tried Codex with OpenAi's latest models?
| imron wrote:
| Not in the last 2 months.
|
| Current clause subscription is a sunk cost for the next
| month. Maybe I'll try codex if Claude doesn't lead
| anywhere.
| raw_anon_1111 wrote:
| I use both. As I'm working, I tell each of them to update
| a common document with the conversation. I don't just
| tell Claude the what. I tell it the why and have it
| document it.
|
| I can switch back and forth and use the MD file as shared
| context.
| ValentineC wrote:
| Curious: what are some cases where it'd make sense to
| _not_ pay for the 20x plan (which is $200 /month), and
| provide a whopping $800/month pay-per-token allowance
| instead?
| raw_anon_1111 wrote:
| Who knows? It's part of an enterprise plan. I work for a
| consulting company. There are a number of fallbacks, the
| first fallback if we are working on an internal project
| is just to use our internal AWS account and use Claude
| code with the Anthropic hosted on Bedrock.
|
| https://code.claude.com/docs/en/amazon-bedrock
|
| The second fallback if it is for a customer project is to
| use their AWS account for development for them.
|
| The rate my company charges for me - my level as an
| American based staff consultant (highest bill rate at the
| company) they are happy to let us use Claude Code using
| their AWS credentials. Besides, if we are using AWS
| Bedrock hosted Anthropic models, they know none of their
| secrets are going to Anthropic. They already have the
| required legal confidentiality/compliancd agreements with
| AWS.
| RHSeeger wrote:
| > Most developers type a prompt, sometimes use plan mode, fix the
| errors, repeat.
|
| > ...
|
| > never let Claude write code until you've reviewed and approved
| a written plan
|
| I certainly always work towards an approved plan before I let it
| lost on changing the code. I just assumed most people did,
| honestly. Admittedly, sometimes there's "phases" to the
| implementation (because some parts can be figured out later and
| it's more important to get the key parts up and running first),
| but each phase gets a full, reviewed plan before I tell it to go.
|
| In fact, I just finished writing a command and instruction to
| tell claude that, when it presents a plan for implementation,
| offer me another option; to write out the current (important
| parts of the) context and the full plan to individual (ticket
| specific) md files. That way, if something goes wrong with the
| implementation I can tell it to read those files and "start from
| where they left off" in the planning.
| ramoz wrote:
| The author seems to think theyve invented a special workflow...
|
| We all tend to regress to average (same thoughts/workflows)...
|
| Have had many users already doing the exact same workflow with:
| https://github.com/backnotprop/plannotator
| CGamesPlay wrote:
| 4 times in one thread, please stop spamming this link.
| bandrami wrote:
| How much time are you actually saving at this point?
| red_hare wrote:
| I use Claude Code for lecture prep.
|
| I craft a detailed and ordered set of lecture notes in a Quarto
| file and then have a dedicated claude code skill for translating
| those notes into Slidev slides, in the style that I like.
|
| Once that's done, much like the author, I go through the slides
| and make commented annotations like "this should be broken into
| two slides" or "this should be a side-by-side" or "use your
| generate clipart skill to throw an image here alongside these
| bullets" and "pull in the code example from ../examples/foo." It
| works brilliantly.
|
| And then I do one final pass of tweaking after that's done.
|
| But yeah, annotations are super powerful. Token distance in-
| context and all that jazz.
| ramoz wrote:
| is your skill open source
| red_hare wrote:
| Not yet... but also I'm not sure it makes a lot of sense to
| be open source. It's super specific to how I like to build
| slide decks and to my personal lecture style.
|
| But it's not hard to build one. The key for me was
| describing, in great detail:
|
| 1. How I want it to read the source material (e.g., H1 means
| new section, H2 means at least one slide, a link to an
| example means I want code in the slide)
|
| 2. How to connect material to layouts (e.g., "comparison
| between two ideas should be a two-cols-title," "walkthrough
| of code should be two-cols with code on right," "learning
| objectives should be side-title align:left," "recall should
| be side-title align:right")
|
| Then the workflow is:
|
| 1. Give all those details and have it do a first pass.
|
| 2. Give tons of feedback.
|
| 3. At the end of the session, ask it to "make a skill."
|
| 4. Manually edit the skill so that you're happy with the
| examples.
| saxelsen wrote:
| Can I ask how you annotate the feedback for it? Just with
| inline comments like `# This should be changed to X`?
|
| The author mentions annotations but doesn't go into detail
| about how to feed the annotations to Claude.
| red_hare wrote:
| Slidev is markdown, so i do it in html comments. Usually
| something like: <!-- TODOCLAUDE: Split this
| into a two-cols-title, divide the examples between -->
|
| or <!-- TODOCLAUDE: Use clipart skill to
| make an image for this slide -->
|
| And then, when I finish annotating I just say: "Address all
| the TODOCLAUDEs"
| jrs235 wrote:
| Claude appeared to just crash in my session:
| https://news.ycombinator.com/item?id=47107630
| zhubert wrote:
| AI only improves and changes. Embrace the scientific method and
| make sure your "here's how to" are based in data.
| h14h wrote:
| Is this not just Ralph with extra steps and the risk of context
| rot?
| Ozzie_osman wrote:
| There are a few prompt frameworks that essentially codify these
| types of workflows by adding skills and prompts
|
| https://github.com/obra/superpowers https://github.com/jlevy/tbd
| politician wrote:
| Wow, I never bother with using phrases like "deeply study this
| codebase deeply." I consistently get pretty fantastic results.
| dworks wrote:
| my rlm-workflow skill has this encoded as a repeatable workflow.
|
| give it a try: https://skills.sh/doubleuuser/rlm-workflow/rlm-
| workflow
| beratbozkurt0 wrote:
| That's great, actually, doesn't the logic apply to other services
| as well?
| bluegatty wrote:
| I don't see how this is 'radically different' given that Claude
| Code literally has a planning mode.
|
| This is my workflow as well, with the big caveat that 80% of
| 'work' doesn't require substantive planning, we're making
| relatively straight forward changes.
|
| Edit: there is nothing fundamentally different about 'annotating
| offline' in an MD vs in the CLI and iterating until the plan is
| clear. It's a UI choice.
|
| Spec Driven Coding with AI is very well established, so working
| from a plan, or spec (they can be somewhat different) is not
| novel.
|
| This is conventional CC use.
| dack wrote:
| last i checked, you can't annotate inline with planning mode.
| you have to type a lot to explain precisely what needs to
| change, and then it re-presents you with a plan (which may or
| may not have changed something else).
|
| i like the idea of having an actual document because you could
| actually compare the before and after versions if you wanted to
| confirm things changed as intended when you gave feedback
| bluegatty wrote:
| 'Giving precise feedback on a plan' is literally annotating
| the plan.
|
| It comes back to you with an update for verification.
|
| You ask it to 'write the plan' as matter of good practice.
|
| What the author is describing is conventional usage of claude
| code.
| gitaarik wrote:
| A plan is just a file you can edit and then tell CC to check
| your annotations
| cadamsdotcom wrote:
| The author is quite far on their journey but would benefit from
| writing simple scripts to enforce invariants in their codebase.
| Invariant broken? Script exits with a non-zero exit code and some
| output that tells the agent how to address the problem. Scripts
| are deterministic, run in milliseconds, and use zero tokens. Put
| them in husky or pre-commit, install the git hooks, and your
| agent won't be able to commit without all your scripts
| succeeding.
|
| And "Don't change this function signature" should be enforced not
| by anticipating that your coding agent "might change this
| function signature so we better warn it not to" but rather via an
| end to end test that fails if the function signature is changed
| (because the other code that needs it not to change now has an
| error). That takes the author out of the loop and they can not
| watch for the change in order to issue said correction, and
| instead sip coffee while the agent observes that it caused a test
| failure then corrects it without intervention, probably by
| rolling back the function signature change and changing something
| else.
| dennisjoseph wrote:
| The annotation cycle is the key insight for me. Treating the plan
| as a living doc you iterate on before touching any code makes a
| huge difference in output quality.
|
| Experimentally, i've been using mfbt.ai [https://mfbt.ai] for
| roughly the same thing in a team context. it lets you
| collaboratively nail down the spec with AI before handing off to
| a coding agent via MCP.
|
| Avoids the "everyone has a slightly different plan.md on their
| machine" problem. Still early days but it's been a nice fit for
| this kind of workflow.
| minikomi wrote:
| I agree, and this is why I tend to use gptel in emacs for
| planning - the document is the conversation context, and can be
| edited and annotated as you like.
| Frannky wrote:
| I tried Opus 4.6 recently and it's really good. I had ditched
| Claude a long time ago for Grok + Gemini + OpenCode with Chinese
| models. I used Grok/Gemini for planning and core files, and
| OpenCode for setup, running, deploying, and editing.
|
| However, Opus made me rethink my entire workflow. Now, I do it
| like this:
|
| * PRD (Product Requirements Document)
|
| * main.py + requirements.txt + readme.md (I ask for minimal,
| functional, modular code that fits the main.py)
|
| * Ask for a step-by-step ordered plan
|
| * Ask to focus on one step at a time
|
| The super powerful thing is that I don't get stuck on missing
| accounts, keys, etc. Everything is ordered and runs smoothly. I
| go rapidly from idea to working product, and it's incredibly easy
| to iterate if I figure out new features are required while
| testing. I also have GLM via OpenCode, but I mainly use it for
| "dumb" tasks.
|
| Interestingly, for reasoning capabilities regarding standard
| logic inside the code, I found Gemini 3 Flash to be very good and
| relatively cheap. I don't use Claude Code for the actual coding
| because forcing everything via chat into a main.py encourages
| minimal code that's easy to skim--it gives me a clearer
| representation of the feature space
| achenatx wrote:
| I use amazon kiro.
|
| The AI first works with you to write requirements, then it
| produces a design, then a task list.
|
| The helps the AI to make smaller chunks to work on, it will work
| on one task at a time.
|
| I can let it run for an hour or more in this mode. Then there is
| lots of stuff to fix, but it is mostly correct.
|
| Kiro also supports steering files, they are files that try to
| lock the AI in for common design decisions.
|
| the price is that a lot of the context is used up with these
| files and kiro constantly pauses to reset the context.
| amarant wrote:
| Interesting! I feel like I'm learning to code all over again!
| I've only been using Claude for a little more than a month and
| until now I've been figuring things out on my own. Building my
| methodology from scratch. This is much more advanced than what
| I'm doing. I've been going straight to implementation, but doing
| one very small and limited feature at a time, describing
| implementation details (data structures like this, use that API
| here, import this library etc) verifying it manually, and having
| Claude fix things I don't like. I had just started getting
| annoyed that it would make the same (or very similar) mistake
| over and over again and I would have to fix it every time. This
| seems like it'll solve that problem I had only just identified!
| Neat!
| w4yai wrote:
| You described how AntiGravity works natively.
| zmmmmm wrote:
| I actually don't really like a few of things about this approach.
|
| First, the "big bang" write it all at once. You are going to end
| up with thousands of lines of code that were monolithically
| produced. I think it is much better to have it write the plan and
| formulate it as sensible technical steps that can be completed
| one at a time. Then you can work through them. I get that this is
| not very "vibe"ish but that is kind of the point. I want the AI
| to help me get to the same point I would be at with produced code
| AND understanding of it, just accelerate that process. I'm not
| really interested in just generating thousands of lines of code
| that nobody understands.
|
| Second, the author keeps refering to adjusting the behaviour, but
| never incorporating that into long lived guidance. To me,
| integral with the planning process is building an overarching
| knowledge base. Every time you're telling it there's something
| wrong, you need to tell it to update the knowledge base about why
| so it doesn't do it again.
|
| Finally, no mention of tests? Just quick checks? To me, you have
| to end up with comprehensive tests. Maybe to the author it goes
| without saying, but I find it is integral to build this into the
| planning. Certain stages you will want certain types of tests.
| Some times in advance of the code (so TDD style) other times
| built alongside it or after.
|
| It's definitely going to be interesting to see how software
| methodology evolves to incorporate AI support and where it
| ultimately lands.
| girvo wrote:
| The articles approach matches mine, but I've learned from
| exactly the things you're pointing out.
|
| I get the PLAN.md (or equivalent) to be separated into "phases"
| or stages, then carefully prompt (because Claude and Codex both
| love to "keep going") it to only implement that stage, and
| update the PLAN.md
|
| Tests are crucial too, and form another part of the plan
| really. Though my current workflow begins to build them later
| in the process than I would prefer...
| armanj wrote:
| > "remove this section entirely, we don't need caching here" --
| rejecting a proposed approach
|
| I wonder why you don't remove it yourself. Aren't you already
| editing the plan?
| dnautics wrote:
| this is literally reinventing claude's planning mode, but with
| more steps. I think Boris doesn't realize that planning mode is
| actually stored in a file.
|
| https://x.com/boristane/status/2021628652136673282
| prodtorok wrote:
| Insights are nice for new users but I'm not seeing anything too
| different from how anyone experienced with Claude Code would use
| plan mode. You can reject plans with feedback directly in the
| CLI.
| tabs_or_spaces wrote:
| My workflow is a bit different.
|
| * I ask the LLM for it's understanding of a topic or an existing
| feature in code. It's not really planning, it's more like
| understanding the model first
|
| * Then based on its understanding, I can decide how great or
| small to scope something for the LLM
|
| * An LLM showing good understand can deal with a big task fairly
| well.
|
| * An LLM showing bad understanding still needs to be prompted to
| get it right
|
| * What helps a lot is reference implementations. Either I have
| existing code that serves as the reference or I ask for a
| reference and I review.
|
| A few folks do it at my work do it OPs way, but my arguments for
| not doing it this way
|
| * Nobody is measuring the amount of slop within the plan. We only
| judge the implementation at the end
|
| * it's still non deterministic - folks will have different
| experiences using OPs methods. If claude updates its model, it
| outdates OPs suggestions by either making it better or worse. We
| don't evaluate when things get better, we only focus on things
| not gone well.
|
| * it's very token heavy - LLM providers insist that you use many
| tokens to get the task done. It's in their best interest to get
| you to do this. For me, LLMs should be powerful enough to
| understand context with minimal tokens because of the investment
| into model training.
|
| Both ways gets the task done and it just comes down to my
| preference for now.
|
| For me, I treat the LLM as model training + post processing +
| input tokens = output tokens. I don't think this is the best way
| to do non deterministic based software development. For me, we're
| still trying to shoehorn "old" deterministic programming into a
| non deterministic LLM.
| umairnadeem123 wrote:
| The multi-pass approach works outside of code too. I run a fairly
| complex automation pipeline (prompt -> script -> images -> audio
| -> video assembly) and the single biggest quality improvement was
| splitting generation into discrete planning and execution phases.
| One-shotting a 10-step pipeline means errors compound. Having the
| LLM first produce a structured plan, then executing each step
| against that plan with validation gates between them, cut my
| failure rate from maybe 40% to under 10%. The planning doc also
| becomes a reusable artifact you can iterate on without re-running
| everything.
| wokwokwok wrote:
| This is the way.
|
| The practice is:
|
| - simple
|
| - effective
|
| - retains control and quality
|
| Certainly the "unsupervised agent" workflows are getting a lot of
| attention right now, but they require a specific set of
| circumstances to be effective:
|
| - clear validation loop (eg. Compile the kernel, here is gcc that
| does so correctly)
|
| - ai enabled tooling (mcp / cli tool that will lint, test and
| provide feedback immediately)
|
| - oversight to prevent sgents going off the rails (open area of
| research)
|
| - an unlimited token budget
|
| That means that _most people_ can 't use unsupervised agents.
|
| Not that they dont work; Most people have simply not got an
| environment and task that is appropriate.
|
| By comparison, anyone with cursor or claude can _immediately
| start using this approach_ , or their own variant on it.
|
| It does not require fancy tooling.
|
| It does not require an arcane agent framework.
|
| It works generally well across models.
|
| This is one of those few genunie pieces of good practical advice
| for people getting into AI coding.
|
| Simple. Obviously works once you start using it. No external
| dependencies. BYO tools to help with it, no "buy my AI startup
| xxx to help". No "star my github so I can a job at $AI corp too".
|
| Great stuff.
| epec254 wrote:
| Huge +1. This loop consistently delivers great results for my
| vibe coding.
|
| The "easy" path of "short prompt declaring what I want" works
| OK for simple tasks but consistently breaks down for medium to
| high complexity tasks.
| apsurd wrote:
| Can you help me understand the difference between "short
| prompt for what I want (next)" vs medium to high complexity
| tasks?
|
| What i mean is, in practice, how does one even get to a a
| high complexity task? What does that look like? Because isn't
| it more common that one sees only so far ahead?
| dnautics wrote:
| It's more or less what comes out of the box with plan mode,
| plus a few extra bits?
| wazHFsRy wrote:
| Absolutely. And you can also always let the agent look back at
| the plan to check if it is still on track and aligned.
|
| One step I added, that works great for me, is letting it write
| (api-level) tests after planning and before implementation.
| Then I'll do a deep review and annotation of these tests and
| tweak them until everything is just right.
| basch wrote:
| Honesty this is just language models in general at the moment,
| and not just coding.
|
| It's the same reason adding a thinking step works.
|
| You want to write a paper, you have it form a thesis and
| structure first. (In this one you might be better off asking
| for 20 and seeing if any of them are any good.) You want to
| research something, first you add gathering and filtering steps
| before synthesis.
|
| Adding smarter words or telling it to be deeper does work by
| slightly repositioning where your query ends up in space.
|
| Asking for the final product first right off the bat leads to
| repetitive verbose word salad. It just starts to loop back in
| on itself. Which is why temperature was a thing in the first
| place, and leads me to believe they've turned the temp down a
| bit to try and be more accurate. Add some randomness and
| variability to your prompts to compensate.
| turingsroot wrote:
| I've been teaching AI coding tool workshops for the past year and
| this planning-first approach is by far the most reliable pattern
| I've seen across skill levels.
|
| The key insight that most people miss: this isn't a new workflow
| invented for AI - it's how good senior engineers already work.
| You read the code deeply, write a design doc, get buy-in, then
| implement. The AI just makes the implementation phase
| dramatically faster.
|
| What I've found interesting is that the people who struggle most
| with AI coding tools are often junior devs who never developed
| the habit of planning before coding. They jump straight to "build
| me X" and get frustrated when the output is a mess. Meanwhile,
| engineers with 10+ years of experience who are used to writing
| design docs and reviewing code pick it up almost instantly -
| because the hard part was always the planning, not the typing.
|
| One addition I'd make to this workflow: version your research.md
| and plan.md files in git alongside your code. They become
| incredibly valuable documentation for future maintainers
| (including future-you) trying to understand why certain
| architectural decisions were made.
| hghbbjh wrote:
| > it's how good senior engineers already work
|
| The other trick all good ones I've worked with converged on:
| it's quicker to write code than review it (if we're being
| thorough). Agents have some areas where they can really shine
| (boilerplate you should maybe have automated already being
| one), but most of their speed comes from passing the quality
| checking to your users or coworkers.
|
| Juniors and other humans are valuable because eventually I
| trust them enough to not review their work. I don't know if
| LLMs can ever get here for serious industries.
| DevEx7 wrote:
| I'm a big fan of having the model create a GitHub issue directly
| (using the GH CLI) with the exact plan it generates, instead of
| creating a markdown file that will eventually get deleted. It
| gives me a permanent record and makes it easy to reference and
| close the issue once the PR is ready.
| RVuRnvbM2e wrote:
| This is just Waterfall for LLMs. What happens when you explore
| the problem space and need to change up the plan?
| mukundesh wrote:
| https://github.blog/ai-and-ml/generative-ai/spec-driven-deve...
| paradite wrote:
| Lol I wrote about this and been using plan+execute workflow for 8
| months.
|
| Sadly my post didn't much attention at the time.
|
| https://thegroundtruth.media/p/my-claude-code-workflow-and-p...
| wangzhongwang wrote:
| Interesting approach. The separation of planning and execution is
| crucial, but I think there's a missing layer most people
| overlook: permission boundaries between the two phases.
|
| Right now when Claude Code (or any agent) executes a plan, it
| typically has the same broad permissions for every step. But
| ideally, each execution step should only have access to the
| specific tools and files it needs -- least privilege, applied to
| AI workflows.
|
| I've been experimenting with declarative permission manifests for
| agent tasks. Instead of giving the agent blanket access, you
| define upfront what each skill can read, write, and execute.
| Makes the planning phase more constrained but the execution phase
| much safer.
|
| Anyone else thinking about this from a security-first angle?
| connectsnk wrote:
| Is it required to tell Claude to re-read the code folder again
| when you come back some day later or should we ask Claude to just
| pickup from research.md file thus saving some tokens?
| mkl wrote:
| How are the annotations put into the markdown? Claude needs to be
| able to identify them as annotations and not parts of the plan.
| vibeprofessor wrote:
| add another agent review, I ask Claude to send plan for review to
| Codex and fix critical and high issues, with complexity gating
| (no overcomplicated logic), run in a loop, then send to Gemini
| reviewer, then maybe final pass with Claude, once all C+H pass
| the sequence is done
| throwaway7783 wrote:
| I have to give this a try. My current model for backend is the
| same as how author does frontend iteration. My friend does the
| research-plan-edit-implement loop, and there is no real
| difference between the quality of what I do and what he does. But
| I do like this just for how it serves as documentation of the
| thought process across AI/human, and can be added to version
| control. Instead of humans reviewing PRs, perhaps humans can
| review the research/plan document.
|
| On the PR review front, I give Claude the ticket number and the
| branch (or PR) and ask it to review for correctness, bugs and
| design consistency. The prompt is always roughly the same for
| every PR. It does a very good job there too.
|
| Modelwise, Opus 4.6 is scary good!
| rotbart wrote:
| This is a similar workflow to speckit, kiro, gsd, etc.
| efnx wrote:
| I've been using Claude through opencode, and I figured this was
| just how it does it. I figured everyone else did it this way as
| well. I guess not!
| swe_dima wrote:
| Since everyone is showing their flow, here's mine:
|
| * create a feature-name.md file in a gitignored folder
|
| * start the file by giving the business context
|
| * describe a high-level implementation and user flows
|
| * describe database structure changes (I find it important not to
| leave it for interpretation)
|
| * ask Claude to inspect the feature and review if for coherence,
| while answering its questions I ask to augment feature-name.md
| file with the answers
|
| * enter Claude's plan mode and provide that feature-name.md file
|
| * at this point it's detailed enough that rarely any corrections
| from me are needed
| Merad wrote:
| I've been working off and on on a vibe coded FP language and
| transpiler - mostly just to get more experience with Claude Code
| and see how it handles complex real world projects. I've settled
| on a very similar flow, though I use three documents: plan,
| context, task list. Multiple rounds of iteration when planning a
| feature. After completion, have a clean session do an audit to
| confirm that everything was implemented per the design. Then I
| have both Claude and CodeRabbit do code review passes before I
| finally do manual review. VERY heavy emphasis on tests, the
| project currently has 2x more test code than application code. So
| far it works surprisingly well. Example planning docs below -
|
| https://github.com/mbcrawfo/vibefun/tree/main/.claude/archiv...
| kulikalov wrote:
| I came to the exact same pattern, with one extra heuristic at the
| end: spin up a new claude instance after the implementation is
| complete and ask it to find discrepancies between the plan and
| the implementation.
| zahlman wrote:
| > After Claude writes the plan, I open it in my editor and add
| inline notes directly into the document. These notes correct
| assumptions, reject approaches, add constraints, or provide
| domain knowledge that Claude doesn't have.
|
| This is the part that seems most novel compared to what I've
| heard suggested before. And I have to admit I'm a bit skeptical.
| Would it not be better to modify what Claude has written
| directly, to make it correct, rather than adding the corrections
| as separate notes (and expecting future Claude to parse out which
| parts were past Claude and which parts were the operator, and
| handle the feedback graciously)?
|
| At least, it seems like the intent is to do all of this in the
| same session, such that Claude has the context of the entire
| back-and-forth updating the plan. But that seems a bit
| unpleasant; I would think the file is there specifically to
| preserve context between sessions.
| fendy3002 wrote:
| One reason why I don't do this: even I won't be immune to
| mistakes. When I fix it with new values or paths, for example,
| and the one I provided is wrong, it can worsen the future work.
|
| Personally, I like to order claude one more time to update the
| plan file after I have given annotation, and review it again
| after. This will ensure (from my understanding) that claude
| won't treat my annotation as different instructions, thus
| risking the work being conflicted.
| ramoz wrote:
| The whole process feels Socratic which is why I and a lot of
| other folks use plan annotation tools already. In my workflow I
| had a great desire to tell the agent what I didn't like about
| the plan vs just fix it myself - because I wanted the agent to
| fix its own plan.
| strix_varius wrote:
| The baffling part of the article is all the assertions about how
| this is unique, novel, not the typical way people are doing this
| etc.
|
| There are whole products wrapped around this common workflow
| already (like Augment Intent).
| raptorraver wrote:
| I've been using this same pattern, except not the research phase.
| Definetly will try to add it to my process aswell.
|
| Sometimes when doing big task I ask claude to implement each
| phase seprately and review the code after each step.
| lxe wrote:
| Honestly, I found that the best way to use these CLIs is exactly
| how the CLI creators have intended.
| nerdright wrote:
| Haha this is surprisingly and exactly how I use claude as well.
| Quite fascinating that we independently discovered the same
| workflow.
|
| I maintain two directories: "docs/proposals" (for the research md
| files) and "docs/plans" (for the planning md files). For complex
| research files, I typically break them down into multiple
| planning md files so claude can implement one at a time.
|
| A small difference in my workflow is that I use subagents during
| implementation to avoid context from filling up quickly.
| brendanmc6 wrote:
| Same, I formalized a similar workflow for my team (oriented
| around feature requirement docs), I am thinking about fully
| productizing it and am looking to for feedback -
| https://acai.sh
|
| Even if the product doesn't resonate I think I've stumbled on
| some ideas you might find useful^
|
| I do think spec-driven development is where this all goes.
| Still making up my mind though.
| clouedoc wrote:
| This is basically long-lived specs that are used as tests to
| check that the product still adheres to the original idea
| that you wanted to implement, right?
|
| This inspired me to finally write good old playwright tests
| for my website :).
| puchatek wrote:
| Spec-driven looks very much like what the author describes.
| He may have some tweaks of his own but they could just as
| well be coded into the artifacts that something like OpenSpec
| produces.
| rossant wrote:
| Funny how I came up with something loosely similar. Asking Codex
| to write a detailed plan in a markdown document, reviewing it,
| and asking it to implement it step by step. It works exquisitely
| well when it can build and test itself.
| duttish wrote:
| This is quite close to what I've arrived at, but with two
| modifications
|
| 1) anything larger I work on in layers of docs. Architecture and
| requirements -> design -> implementation plan -> code. Partly it
| helps me think and nail the larger things first, and partly helps
| claude. Iterate on each level until I'm satisfied.
|
| 2) when doing reviews of each doc I sometimes restart the session
| and clear context, it often finds new issues and things to clear
| up before starting the next phase.
| mvkel wrote:
| > the workflow I've settled into is radically different from what
| most people do with AI coding tools
|
| This looks exactly like what anthropic recommends as the best
| practice for using Claude Code. Textbook.
|
| It also exposes a major downside of this approach: if you don't
| plan perfectly, you'll have to start over from scratch if
| anything goes wrong.
|
| I've found a much better approach in doing a design -> plan ->
| execute in batches, where the plan is no more than 1,500 lines,
| used as a proxy for complexity.
|
| My 30,000 LOC app has about 100,000 lines of plan behind it.
| Can't build something that big as a one-shot.
| Bishonen88 wrote:
| Dunno. My 80k+ LOC personal life planner, with a native android
| app, eink display view still one shots most features/bugs I
| encounter. I just open a new instance let it know what I want
| and 5min later it's done.
| makeramen wrote:
| Both can be true. I have personally experienced both.
|
| Some problems AI surprised me immensely with fast, elegant
| efficient solutions and problem solving. I've also
| experienced AI doing totally absurd things that ended up
| taking multiple times longer than if I did it manually.
| Sometimes in the same project.
| vasco wrote:
| What is a personal life planner?
| Bishonen88 wrote:
| Todos, habits, goals, calendar, meals, notes, bookmarks,
| shopping lists, finances. More or less that with Google cal
| integration, garmin Integration (Auto updates workout
| habits, weight goals) family sharing/gamification,
| daily/weekly reviews, ai summaries and more. All built by
| just prompting Claude for feature after feature, with me
| writing 0 lines.
| puchatek wrote:
| Is it on GH?
| Bishonen88 wrote:
| It was when I mvp'd it 3 weeks ago. Then I removed it as
| I was toying with the idea of somehow monetizing it. Then
| I added a few features which would make monetization
| impossible (e.g. How the app obtains etf/stock prices
| live and some other things). I reckon I could remove
| those and put in gh during the week if I don't forget.
| The quality of the Web app is SaaS grade IMO. Keyboard
| shortcuts, cmd+k, natural language parsing, great ui that
| doesn't look like made by ai in 5min. Might post here the
| link.
| mstkllah wrote:
| Would love to check it out too once you put it up.
| vasco wrote:
| Ah, I imagined actual life planning as in asking AI what
| to do, I was morbidly curious.
|
| Prompting basic notes apps is not as exciting but I can
| see how people who care about that also care about it
| being exactly a certain way, so I think get your
| excitement.
| therealdrag0 wrote:
| In 5 min you are one shotting smaller changes to the larger
| code base right? Not the entire 80k likes which was the other
| comments point afaict.
| Bishonen88 wrote:
| Yeah, then I guess I misunderstood the post. Its smaller
| features one by one ofc.
| PacificSpecific wrote:
| If you wouldn't mind sharing more about this in the future
| I'd love to read about it.
|
| I've been thinking about doing something like that myself
| because I'm one of those people who have tried countless apps
| but there's always a couple deal breakers that cause me to
| drop the app.
|
| I figured trying to agentically develop a planner app with
| the exact feature set I need would be an interesting and fun
| experiment.
| onion2k wrote:
| _if you don 't plan perfectly, you'll have to start over from
| scratch if anything goes wrong_
|
| This is my experience too, but it's pushed me to make much
| smaller plans and to commit things to a feature branch far more
| atomically so I can revert a step to the previous commit, or
| bin the entire feature by going back to main. I do this far
| more now than I ever did when I was writing the code by hand.
|
| This is how developers _should_ work regardless of how the code
| is being developed. I think this is a small but very real way
| AI has actually made me a better developer (unless I stop doing
| it when I don 't use AI... not tried that yet.)
| mattmanser wrote:
| Developers should work by wasting lots of time making the
| wrong thing?
|
| I bet if they did a work and motion study on this approach
| they'd find the classic:
|
| "Thinks they're more productive, AI has actually made them
| less productive"
|
| But lots of lovely dopamine from this false progress that
| gets thrown away!
| SpaceNoodled wrote:
| Classic
|
| https://metr.org/blog/2025-07-10-early-2025-ai-
| experienced-o...
| onion2k wrote:
| _Developers should work by wasting lots of time making the
| wrong thing?_
|
| Yes. In fact, that's not emphatic enough: _HELL YES!_
|
| More specifically, developers should experiment. They
| should test their hypothesis. They should try out ideas by
| designing a solution and creating a proof of concept, then
| throw that away and build a proper version based on what
| they learned.
|
| If your approach to building something is to implement the
| first idea you have and move on then you are going to waste
| _so much_ more time later refactoring things to fix
| architecture that paints you into corners, reimplementing
| things that didn 't work for future use cases, fixing edge
| cases than you hadn't considered, and just paying off a
| mountain of tech debt.
|
| I'd actually go so far as to say that if you aren't
| experimenting and throwing away solutions that don't quite
| work then you're _only_ amassing tech debt and you 're not
| really building anything that will last. If it does it's
| through luck rather than skill.
|
| Also, this has _nothing_ to do with AI. Developers should
| be working this way even if they handcraft their artisanal
| code carefully in vi.
| skydhash wrote:
| >> Developers should work by wasting lots of time making
| the wrong thing?
|
| > Yes. In fact, that's not emphatic enough: HELL YES!
|
| You do realize there are prior research and well tested
| solutions for a lot of things. Instead of wasting time
| making the wrong thing, it is faster to do some research
| if the problem has already been solved. Experimentation
| is fine only after checking that the problem space is
| truly novel or there's not enough information around.
|
| It is faster to iterate in your mental space and in front
| of a whiteboard than in code.
| abustamam wrote:
| > Developers should work by wasting lots of time making the
| wrong thing?
|
| Yes? I can't even count how many times I worked on
| something my company deemed was valuable only for it to be
| deprecated or thrown away soon after. Or, how many times I
| solved a problem but apparently misunderstood the specs
| slightly and had to redo it. Or how many times we've had to
| refactor our code because scope increased. In fact, the
| very existence of the concepts of refactoring and tech debt
| proves that devs often spend a lot of time making the
| "wrong" thing.
|
| Is it a waste? No, it solved the problem as understood at
| the time. And we learned stuff along the way.
| sixtyj wrote:
| LLMs are really eager to start coding (as interns are eager
| to start working), so the sentence "don't implement yet" has
| to be used very often at the beginning of any project.
| onion2k wrote:
| Most LLM apps have a 'plan' or 'ask' mode for that.
| jerryharri wrote:
| We're learning the lessons of Agile all over again.
| intrasight wrote:
| We're learning how to be an engineer all over again.
|
| The authors process is super-close what we were taught in
| engineering 101 40 years ago.
| skydhash wrote:
| I always feels like I'm in a fever dream when I hear
| about AI workflows. A lot of stuff is what I've read from
| software engineering books and articles.
| jerryharri wrote:
| It's after we come down from the Vibe coding high that we
| realize we still need to ship working, high-quality code.
| The lessons are the same, but our muscle memory has to be
| re-oriented. How do we create estimates when AI is
| involved? In what ways do we redefine the information
| flow between Product and Engineering?
| solarkraft wrote:
| I do this too. Relatively small changes, atomic commits with
| extensive reasoning in the message (keeps important context
| around). This is a best practice anyway, but used to be
| excruciatingly much effort. Now it's easy!
|
| Except that I'm still struggling with the LLM understanding
| its audience/context of its utterances. Very often, after a
| correction, it will focus a lot on the correction itself
| making for weird-sounding/confusing statements in commit
| messages and comments.
| dakolli wrote:
| wtf, why would you write 100k lines of plan to produce 30k
| loc.. JUST WRITE THE CODE!!!
| Bishonen88 wrote:
| They didn't write 100k plan lines. The llm did (99.9% of it
| at least or more). Writing 30k by hand would take weeks if
| not months. Llms do it in an afternoon.
| AstroBen wrote:
| Just reading that plan would take weeks or months
| chickensong wrote:
| You don't start with 100k lines, you work in batches that
| are digestible. You read it once, then move on. The lines
| add up pretty quickly considering how fast Claude works.
| If you think about the difference in how many characters
| it takes to describe what code is doing in English, it's
| pretty reasonable.
| dakolli wrote:
| And my weeks or months of work beats an LLMs 10/10 times.
| There are no shortcuts in life.
| tock wrote:
| Might be true for you. But there are plenty of top tier
| engineers who love LLMs. So it works for some. Not for
| others.
|
| And of course there are shortcuts in life. Any form of
| progress whether its cars, medicine, computers or the
| internet are all shortcuts in life. It makes life easier
| for a lot of people.
| Bishonen88 wrote:
| I have no doubts that it does for many people. But the
| time/cost tradeoff is still unquestionable. I know I
| could create what LLMs do for me in the frontend/backend
| in most cases as good or better - I know that, because
| I've done it at work for years. But to create a somewhat
| complex app with lots of pages/features/apis etc. would
| take me months if not a year++ since I'd be working on it
| only on the weekends for a few hours. Claude code helps
| me out by getting me to my goal in a fraction of the
| time. Its superpower lies not only in doign what I know
| but faster, but in doing what I don't know as well.
|
| I yield similar benefits at work. I can wow management
| with LLM assited/vibe coded apps. What previously
| would've taken a multi-man team weeks of planning and
| executing, stand ups, jour fixes, architecture diagrams,
| etc. can now be done within a single week by myself. For
| the type of work I do, managers do not care whether I
| could do it better if I'd code it myself. They are amazed
| however that what has taken months previously, can be
| done in hours nowadays. And I for sure will try to reap
| benefits of LLMs for as long as they don't replace me
| rather than being idealistic and fighting against them.
| abustamam wrote:
| > What previously would've taken a multi-man team weeks
| of planning and executing, stand ups, jour fixes,
| architecture diagrams, etc. can now be done within a
| single week by myself.
|
| This has been my experience. We use Miro at work for
| diagramming. Lots of visual people on the team, myself
| included. Using Miro's MCP I draft a solution to a
| problem and have Miro diagram it. Once we talk it through
| as a team, I have Claude or codex implement it from the
| diagram.
|
| It works surprisingly well.
|
| > They are amazed however that what has taken months
| previously, can be done in hours nowadays.
|
| Of course they're amazed. They don't have to pay you for
| time saved ;)
|
| > reap benefits of LLMs for as long as they don't replace
| me > What previously would've taken a multi-man team
|
| I think this is the part that people are worried about.
| Every engineer who uses LLMs says this. By definition it
| means that people are being replaced.
|
| I think I justify it in that no one on my team has been
| replaced. But management has explicitly said "we don't
| want to hire more because we can already 20x ourselves
| with our current team +LLM." But I do acknowledge that
| many people ARE being replaced; not necessarily by LLMs,
| but certainly by other engineers using LLMs.
| skydhash wrote:
| I'm still waiting for the multi-years success stories.
| Greenfield solutions are always easy (which is why we
| have frameworks that automate them). But maintaining
| solutions over years is always the true test of any
| technologies.
|
| It's already telling that nothing has staying power in
| the LLMs world (other than the chat box). Once the
| limitations can no longer be hidden by the hype and the
| true cost is revealed, there's always a next thing to
| pivot to.
| hghbbjh wrote:
| > but in doing what I don't know as well.
|
| Comments like these really help ground what I read online
| about LLMs. This matches how low performing devs at my
| work use AI, and their PRs are a net negative on the
| team. They take on tasks they aren't equipped to handle
| and use LLMs to fill the gaps quickly instead of taking
| time to learn (which LLMs speed up!).
| oblio wrote:
| That's not (or should not be what's happening).
|
| They write a short high level plan (let's say 200 words). The
| plan asks the agent to write a more detailed implementation
| plan (written by the LLM, let's say 2000-5000 words).
|
| They read this plan and adjust as needed, even sending it to
| the agent for re-dos.
|
| Once the implementation plan is done, they ask the agent to
| write the actual code changes.
|
| Then they review that and ask for fixes, adjustments, etc.
|
| This can be comparable to writing the code yourself but also
| leaves a detailed trail of what was done and why, which I
| basically NEVER see in human generated code.
|
| That alone is worth gold, by itself.
|
| And on top of that, if you're using an unknown platform or
| stack, it's basically a rocket ship. You bootstrap much
| faster. Of course, stay on top of the architecture, do
| controlled changes, learn about the platform as you go, etc.
| abustamam wrote:
| I take this concept and I meta-prompt it even more.
|
| I have a road map (AI generated, of course) for a side
| project I'm toying around with to experiment with LLM-
| driven development. I read the road map and I understand
| and approve it. Then, using some skills I found on
| skills.sh and slightly modified, my workflow is as such:
|
| 1. Brainstorm the next slice
|
| It suggests a few items from the road map that should be
| worked on, with some high level methodology to implement.
| It asks me what the scope ought to be and what invariants
| ought to be considered. I ask it what tradeoffs could be,
| why, and what it recommends, given the product constraints.
| I approve a given slice of work.
|
| NB: this is the part I learn the most from. I ask it why X
| process would be better than Y process given the
| constraints and it either corrects itself or it explains
| why. "Why use an outbox pattern? What other patterns could
| we use and why aren't they the right fit?"
|
| 2. Generate slice
|
| After I approve what to work on next, it generates a high
| level overview of the slice, including files touched, saved
| in a MD file that is persisted. I read through the slice,
| ensure that it is indeed working on what I expect it to be
| working on, and that it's not scope creeping or undermining
| scope, and I approve it. It then makes a plan based off of
| this.
|
| 3. Generate plan
|
| It writes a rather lengthy plan, with discrete task bullets
| at the top. Beneath, each step has to-dos for the llm to
| follow, such as generating tests, running migrations, etc,
| with commit messages for each step. I glance through this
| for any potential red flags.
|
| 4. Execute
|
| This part is self explanatory. It reads the plan and does
| its thing.
|
| I've been extremely happy with this workflow. I'll probably
| write a blog post about it at some point.
| jalopy wrote:
| This is a super helpful and productive comment. I look
| forward to a blog post describing your process in more
| detail.
| oblio wrote:
| This dead internet uncanny (sarcasm?) valley is killing
| me.
| AstroBen wrote:
| 100,000 lines is approx. one million words. The average person
| reads at 250wpm. The entire thing would take 66 hours just to
| read, assuming you were approaching it like a fiction book, not
| thinking anything over
| chickensong wrote:
| > design -> plan -> execute in batches
|
| This is the way for me as well. Have a high-level master design
| and plan, but break it apart into phases that are manageable.
| One-shotting anything beyond a todo list and expecting decent
| quality is still a pipe dream.
| zozbot234 wrote:
| > if you don't plan perfectly, you'll have to start over from
| scratch if anything goes wrong.
|
| You just revert what the AI agent changed and revise/iterate on
| the previous step - no need to start over. This can of course
| involve restricting the work to a smaller change so that the
| agent isn't overwhelmed by complexity.
| elAhmo wrote:
| How can you know that 100k lines plan is not just slop?
|
| Just because plan is elaborate doesn't mean it makes sense.
| d1sxeyes wrote:
| The "inline comments on a plan" is one of the best features of
| Antigravity, and I'm surprised others haven't started
| copycatting.
| _hugerobots_ wrote:
| Hub and spoke documentation in planning has been absolutely
| essential for the way my planning was before, and it's pretty
| cool seeing it work so well for planning mode to build scaffolds
| and routing.
| geoffbp wrote:
| It's worrying to me that nobody really knows how LLMs work. We
| create prompts with or without certain words and hope it works.
| That's my perspective anyway
| solumunus wrote:
| It's the same as dealing with a human. You convey a spec for a
| problem and the language you use matters. You can convey the
| problem in (from your perspective) a clear way and you will get
| mixed results nonetheless. You will have to continue to refine
| the solution with them.
|
| Genuinely: no one really knows how humans work either.
| mannyv wrote:
| It's actually no different from how real software is made.
| Requirements come from the business side, and through an odd
| game of telephone get down to developers.
|
| The team that has developers closest to the customer usually
| makes the better product...or has the better product/market
| fit.
|
| Then it's iteration.
| cawksuwcka wrote:
| falling asleep here. when will the babysitting end
| tayo42 wrote:
| We're just slowly reinventing agile for telling Ai agents what to
| do lol
|
| Just skip to the Ai stand-ups
| cheekyant wrote:
| It seems like the annotation of plan files is the key step.
|
| Claude Code now creates persistent markdown plan files in
| ~/.claude/plans/ and you can open them with Ctrl-G to annotate
| them in your default editor.
|
| So plan mode is not ephemeral any more.
| chaboud wrote:
| The author seems to think they've hit upon something
| revolutionary...
|
| They've actually hit upon something that several of us have
| evolved to naturally.
|
| LLM's are like unreliable interns with boundless energy. They
| make silly mistakes, wander into annoying structural traps, and
| have to be unwound if left to their own devices. It's like the
| genie that almost pathologically misinterprets your wishes.
|
| So, how do you solve that? Exactly how an experienced lead or
| software manager does: you have systems _write it down_ before
| executing, explain things back to you, and ground all of their
| thinking in the code and documentation, avoiding making
| assumptions about code after superficial review.
|
| When it was early ChatGPT, this meant function-level thinking and
| clearly described jobs. When it was Cline it meant cline rules
| files that forced writing architecture.md files and vibe-code.log
| histories, demanding grounding in research and code reading.
|
| Maybe nine months ago, another engineer said two things to me,
| less than a day apart:
|
| - "I don't understand why your clinerules file is so large. You
| have the LLM jumping through so many hoops and doing so much
| extra work. It's crazy."
|
| - The next morning: "It's basically like a lottery. I can't get
| the LLM to generate what I want reliably. I just have to settle
| for whatever it comes up with and then try again."
|
| These systems have to deal with minimal context, ambiguous
| guidance, and extreme isolation. Operate with a little empathy
| for the energetic interns, and they'll uncork levels of output
| worth fighting for. We're Software Managers now. For some of us,
| that's working out great.
| marc_g wrote:
| I've also found that a bigger focus on expanding my agents.md
| as the project rolls on has led to less headaches overall and
| more consistency (non-surprisingly). It's the same as asking
| juniors to reflect on the work they've completed and to
| document important things that can help them in the future.
| Software Manger is a good way to put this.
| zozbot234 wrote:
| AGENTS.md should mostly point to real documentation and
| design files that humans will also read and keep up to date.
| It's rare that something about a project is _only_ of
| interest to AI agents.
| jeffreygoesto wrote:
| Oh no, maybe the V-Model was right all the time? And right
| sizing increments with control stops after them. No wonder
| these matrix multiplications start to behave like humans, that
| is what we wanted them to do.
| baxtr wrote:
| So basically you're saying LLMs are helping us be better
| humans?
| shevy-java wrote:
| Better humans? How and where?
| vishnugupta wrote:
| Revolutionary or not it was very nice of the author to make
| time and effort to share their workflow.
|
| For those starting out using Claude Code it gives a structured
| way to get things done bypassing the time/energy needed to "hit
| upon something that several of us have evolved to naturally".
| ffsm8 wrote:
| Its ai written though, the tells are in pretty much every
| paragraph.
| ratsimihah wrote:
| I don't think it's that big a red flag anymore. Most people
| use ai to rewrite or clean up content, so I'd think we
| should actually evaluate content for what it is rather than
| stop at "nah it's ai written."
| elaus wrote:
| I think as humans it's very hard to abstract content from
| its form. So when the form is always the same boring,
| generic AI slop, it's really not helping the content.
| rmnclmnt wrote:
| And maybe writing an article or a keynote slides is one
| of the few places we can still exerce some human
| creativity, especially when the core skills (programming)
| is almost completely in the hands of LLMs already
| shevy-java wrote:
| Well, real humans may read it though. Personally I much
| prefer real humans write real articles than all this AI
| generated spam-slop. On youtube this is especially
| annoying - they mix in real videos with fake ones. I see
| this when I watch animal videos - some animal behaviour
| is taken from older videos, then AI fake is added. My own
| policy is that I do not watch anything ever again from
| people who lie to the audience that way so I had to begin
| to censor away such lying channels. I'd apply the same
| rationale to blog authors (but I am not 100% certain it
| is actually AI generated; I just mention this as a safety
| guard).
| ffsm8 wrote:
| > I don't think it's that big a red flag anymore.
|
| It is to me, because it indicates the author didn't care
| about the topic. The only thing they cared about is to
| write an "insightful" article about using llms. Hence
| this whole thing is basically linked-in resume
| improvement slop.
|
| Not worth interacting with, imo
|
| Also, it's not insightful whatsoever. It's basically a
| retelling of other articles around the time Claude code
| was released to the public (March-August 2025)
| pmg101 wrote:
| I don't judge content for being AI written, I judge it
| for the content itself (just like with code).
|
| However I do find the standard out-of-the-box style very
| grating. Call it faux-chummy linkedin corporate workslop
| style.
|
| Why don't people give the llm a steer on style? Either
| based on your personal style or at least on a writer
| whose style you admire. That should be easier.
| xoac wrote:
| Because they think this is good writing. You can't
| correct what you don't have taste for. Most software
| engineers think that reading books means reading NYT non-
| fiction bestsellers.
| ben_w wrote:
| While I agree with:
|
| > Because they think this is good writing. You can't
| correct what you don't have taste for.
|
| I have to disagree about:
|
| > Most software engineers think that reading books means
| reading NYT non-fiction bestsellers.
|
| There's a lot of scifi and fantasy in nerd circles, too.
| Douglas Adams, Terry Pratchett, Vernor Vinge, Charlie
| Stross, Iain M Banks, Arthur C Clarke, and so on.
|
| But simply enjoying good writing is not enough to fully
| get what makes writing good. Even writing is not itself
| enough to get such a taste: thinking of Arthur C Clarke,
| I've just finished 3001, and at the end Clarke gives
| thanks to his editors, noting his own experience as an
| editor meant he held a higher regard for editors than
| many writers seemed to. Stross has, likewise, blogged
| about how writing a manuscript is only the first half of
| writing a book, because then you need to edit the thing.
| pi-rat wrote:
| The main issue with evaluating content for what it is is
| how extremely asymmetric that process has become.
|
| Slop looks reasonable on the surface, and requires orders
| of magnitude more effort to evaluate than to produce.
| It's produced once, but the process has to be repeated
| for every single reader.
|
| Disregarding content that smells like AI becomes an
| extremely tempting early filtering mechanism to separate
| signal from noise - the reader's time is valuable.
| Thanemate wrote:
| >Most people use ai to rewrite or clean up content
|
| I think your sentence should have been "people who use ai
| do so to mostly rewrite or clean up content", but even
| then I'd question the statistical truth behind that
| claim.
|
| Personally, seeing something written by AI means that the
| person who wrote it did so just for looks and not for
| substance. Claiming to be a great author requires both
| penmanship and communication skills, and delegating one
| or either of them to a large language model inherently
| makes you less than that.
|
| However, when the point is just the contents of the
| paragraph(s) and nothing more then I don't care who or
| what wrote it. An example is the result of a research,
| because I'd certainly won't care about the prose or
| effort given to write the thesis but more on the results
| (is this about curing cancer now and forever? If yes, no
| one cares if it's written with AI).
|
| With that being said, there's still that I get anywhere
| close to understanding the author behind the thoughts and
| opinions. I believe the way someone writes hints to the
| way they think and act. In that sense, using LLM's to
| rewrite something to make it sound more professional than
| what you would actually talk in appropriate contexts
| makes it hard for me to judge someone's character,
| professionalism, and mannerisms. Almost feels like
| they're trying to mask part of themselves. Perhaps they
| lack confidence in their ability to sound professional
| and convincing?
| exe34 wrote:
| If you want to write something with AI, send me your
| prompt. I'd rather read what you intend for it to produce
| rather than what it produces. If I start to believe you
| regularly send me AI written text, I will stop reading
| it. Even at work. You'll have to call me to explain what
| you intended to write.
| DonHopkins wrote:
| And if my prompt is a 10 page wall of text that I would
| otherwise take the time to have the AI organize,
| deduplicate, summarize, and sharpen with an index,
| executive summary, descriptive headers, and logical
| sections, are you going to actually read all of that, or
| just whine "TL;DR"?
|
| It's much more efficient and intentional for the writer
| to put the time into doing the condensing and organizing
| once, and review and proofread it to make sure it's what
| they mean, than to just lazily spam every human they want
| to read it with the raw prompt, so every recipient has to
| pay for their own AI to perform that task like a slot
| machine, producing random results not reviewed and
| approved by the author as their intended message.
|
| Is that really how you want Hacker News discussions and
| your work email to be, walls of unorganized unfiltered
| text prompts nobody including yourself wants to take the
| time to read? Then step aside, hold my beer!
|
| Or do you prefer I should call you on the phone and
| ramble on for hours in an unedited meandering stream of
| thought about what I intended to write?
| fasbiner wrote:
| Yeah but it's not. This a complete contrivance and you're
| just making shit up. The prompt is much shorter than the
| output and you are concealing that fact. Why?
|
| Github repo or it didn't happen. Let's go.
| DonHopkins wrote:
| Are you actually accusing me of not writing walls of
| text??!
|
| Which prompt are you talking about, and exactly how many
| characters is it, and how do you know? And why do you
| think I know, and am concealing it?
|
| Github repo about what, or what didn't happen? You should
| run your posts through an LLM to sanity check them.
|
| I find AI Gloss to be much more insidious than AI Slop,
| which merely annoys with em-dashes, instead of trying to
| undermine reality. So I created these Anthropic Skills
| and Drescher Schemas in my MOOLLM github repo to
| recognize, analyze, fight, and prevent AI Slop, AI Gloss,
| and more.
|
| I'm actively applying Gary Drescher's schema mechanism to
| the problem, as he described in "Made-Up Minds: A
| Constructivist Approach to Artificial Intelligence", his
| thesis with his PhD advisor Seymour Papert and colleague
| Marvin Minsky, and his book from MIT Press.
|
| https://mitpress.mit.edu/9780262517089/made-up-minds/
|
| >Made-Up Minds addresses fundamental questions of
| learning and concept invention by means of an innovative
| computer program that is based on the cognitive-
| developmental theory of psychologist Jean Piaget.
| Drescher uses Piaget's theory as a source of inspiration
| for the design of an artificial cognitive system called
| the schema mechanism, and then uses the system to
| elaborate and test Piaget's theory. The approach is
| original enough that readers need not have extensive
| knowledge of artificial intelligence, and a chapter
| summarizing Piaget assists readers who lack a background
| in developmental psychology. The schema mechanism learns
| from its experiences, expressing discoveries in its
| existing representational vocabulary, and extending that
| vocabulary with new concepts. A novel empirical learning
| technique, marginal attribution, can find results of an
| action that are obscure because each occurs rarely in
| general, although reliably under certain conditions.
| Drescher shows that several early milestones in the
| Piagetian infant's invention of the concept of persistent
| object can be replicated by the schema mechanism.
|
| The goal is Training By Example, not just Instructions.
| Two kinds of training signal:
|
| - Training by instruction -- the skills themselves teach
| what to avoid, get into the training data by being
| published in moollm and included in other projects
|
| - Training by example -- the higher-quality conversations
| these skills produce become training data themselves
|
| Each logged example is a Drescher schema: what was the
| context, what did the AI do, what was the result, and
| what was the surprise (the failure). The schema includes
| the detection pattern (how to recognize it) and the
| correction (what should have happened). These schemas
| serve as both detection patterns and suggested
| mitigations -- they teach an AI (or a human) what to look
| for and what to do instead.
|
| No AI Gloss Drescher Schema Example: ChatGPT Deflection
| Playbook (please submit PRs with your own):
|
| https://github.com/SimHacker/moollm/blob/main/skills/no-
| ai-g...
|
| So what have you tried to do about the problem, other
| than just unoriginally whining in online discussions? You
| asked for a link to my repo, so now you owe me the
| courtesy of actually reading it and commenting on the
| substance instead of the form, instead of just
| complaining "tl;dr" or "ai;dr". You can lead a cow to
| MOOLLM, but you can't make her think.
|
| No AI Slop:
| https://github.com/SimHacker/moollm/tree/main/skills/no-
| ai-s...
|
| > The term "AI slop" was coined by Simon Willison.
|
| > AI slop is everything that makes AI output annoying.
| The filler, the puffery, the em-dashes, the 500 words
| when 50 would do, the "Great question!" before every
| answer. Annoying, but it doesn't lie to you. It just
| wastes your time.
|
| > SLOP = "You said too much, but what you said was true."
|
| > GLOSS = "You said it smoothly, but you lied about
| reality."
|
| > SLOP is the bread. GLOSS is the poison. Most bad AI
| output is a poison sandwich.
|
| No AI Gloss:
| https://github.com/SimHacker/moollm/tree/main/skills/no-
| ai-g...
|
| > The term "AI gloss" inspired by Simon Willison's "AI
| slop" -- because slop is just annoying, but gloss
| rewrites reality.
|
| > AI gloss is more insidious than AI slop. When an AI
| says "relationship management" instead of "tribute," it's
| not being verbose -- it's rewriting reality on behalf of
| whoever prefers the euphemism. Slop wastes your time.
| Gloss wastes your understanding of the world.
|
| > SLOP makes you scroll. GLOSS makes you believe false
| things.
|
| > NO-AI Web Ring: for real: | slop | gloss | sycophancy |
| hedging | moralizing | ideology | overlord | bias | for
| fun: | joking | customer-service | soul
|
| As a consolation prize, here's a wall of text I wrote
| without an LLM about my own personal experience and
| opinions that an LLM would know nothing about -- is it
| too long for you to read, or do you want more details? I
| would be glad to explain the ironic significance of the
| Rightward-Facing Cow if you like, and then launch into a
| rambling essay about how Cow Clicker perfectly
| demonstrates Ian Bogost's idea of procedural rhetoric,
| and how it relates to his criticisms of game design, and
| how Peter Molyneux not only totally missed the point, but
| unwittingly proved it, two years late to the party.
|
| https://news.ycombinator.com/item?id=47110605
|
| Procedural Rhetoric (MOOLLM Anthropic Skill): https://git
| hub.com/SimHacker/moollm/blob/main/skills/procedu...
|
| >Rules persuade. Structure IS argument. Design
| consciously.
|
| >What Is Procedural Rhetoric?
|
| >Ian Bogost coined it: "an unholy blend of Will Wright
| and Aristotle."
|
| >Games and simulations persuade through processes and
| rules, not just words or visuals. The structure of your
| world embodies an ideology. When The Sims allows same-sex
| relationships without fanfare, the rules themselves make
| a statement -- equality is the default, not a feature.
| layer8 wrote:
| It's certainly more interesting than whatever the AI
| would turn it into.
| stuaxo wrote:
| Even though I use LLMs for code, I just can't read LLM
| written text, I kind of hate the style, it reminds me too
| much of LinkedIn.
| ben_w wrote:
| > I don't think it's that big a red flag anymore. Most
| people use ai to rewrite or clean up content, so I'd
| think we should actually evaluate content for what it is
| rather than stop at "nah it's ai written."
|
| Unfortunately, there's a lot of people trying to content-
| farm with LLMs; this means that whatever style they
| default to, is automatically suspect of being a slice of
| "dead internet" rather than some new human discovery.
|
| I won't rule out the possibility that even LLMs, let
| alone other AI, can help with new discoveries, but they
| are definitely better at writing persuasively than they
| are at being inventive, which means I am forced to use
| "looks like LLM" as proxy for both "content farm" and
| "propaganda which may work on me", even though some
| percentage of this output won't even be LLM and some
| percentage of what is may even be both useful and novel.
| theshrike79 wrote:
| ai;dr
|
| If your "content" smells like AI, I'm going to use _my_
| AI to condense the content for me. I'm not wasting my
| time on overly verbose AI "cleaned" content.
|
| Write like a human, have a blog with an RSS feed and I'll
| most likely subscribe to it.
| dawnerd wrote:
| Very high chance someone that's using Claude to write
| code is also using Claude to write a post from some
| notes. That goes beyond rewriting and cleaning up.
| handfuloflight wrote:
| So is GP.
|
| This is clearly a standard AI exposition:
|
| LLM's are like unreliable interns with boundless energy.
| They make silly mistakes, wander into annoying structural
| traps, and have to be unwound if left to their own devices.
| It's like the genie that almost pathologically
| misinterprets your wishes.
| foldingmoney wrote:
| >the tells are in pretty much every paragraph.
|
| It's not just misleading -- it's lazy. And honestly? That
| doesn't vibe with me.
|
| [/s obviously]
| DonHopkins wrote:
| Then ask your own ai to rewrite it so it doesn't trigger
| you into posting uninteresting thought stopping comments
| proclaiming why you didn't read the article, that don't
| contribute to the discussion.
| petesergeant wrote:
| Here's mine! https://github.com/pjlsergeant/moarcode
| chaboud wrote:
| It's this line that I'm bristling at: "...the workflow I've
| settled into is radically different from what most people do
| with AI coding tools..."
|
| Anyone who spends some time with these tools (and doesn't
| black out from smashing their head against their desk) is
| going to find substantial benefit in planning with clarity.
|
| It was #6 in Boris's run-down:
| https://news.ycombinator.com/item?id=46470017
|
| So, yes, I'm glad that people write things out and share. But
| I'd prefer that they not lead with "hey folks, I have news:
| we should *slice* our bread!"
| copirate wrote:
| But the author's workflow is actually very different from
| Boris'.
|
| #6 is about using plan mode whereas the author says "The
| built-in plan mode sucks".
|
| The author's post is much more than just "planning with
| clarity".
| Forgeties79 wrote:
| I would say he's saying "hey folks, I have news. We should
| slice our bread with a knife rather than the spoon that
| came with the bread."
| fintechie wrote:
| This kind of flows have been documented in the wild for some
| time now. They started to pop up in the Cursor forums 2+
| years ago... eg:
| https://github.com/johnpeterman72/CursorRIPER
|
| Personally I have been using a similar flow for almost 3
| years now, tailored for my needs. Everybody who uses AI for
| coding eventually gravitates towards a similar pattern
| because it works quite well (for all IDEs, CLIs, TUIs)
| CodeBit26 wrote:
| I really like your analogy of LLMs as 'unreliable interns'. The
| shift from being a 'coder' to a 'software manager' who enforces
| documentation and grounding is the only way to scale these
| tools. Without an architecture.md or similar grounding, the
| context drift eventually makes the AI-generated code a
| liability rather than an asset. It's about moving the
| complexity from the syntax to the specification.
| BoredPositron wrote:
| It's alchemy all over again.
| shevy-java wrote:
| Alchemy involved a lot of do-it-yourself though. With AI it
| is like someone else does all the work (well, almost all the
| work).
| BoredPositron wrote:
| It was mainly a jab at the protoscientific nature of it.
| vntok wrote:
| Reproducing experimental results across models and
| vendors is trivial and cheap nowadays.
| BoredPositron wrote:
| Not if anthropic goes further in obfuscating the output
| of claude code.
| vntok wrote:
| Why would you test implementation details? Test _what 's_
| delivered, not _how_ it 's delivered. The thinking
| portion, synthetized or not, is merely implementation.
|
| The resulting artefact, that's what is worth testing.
| hghbbjh wrote:
| > Why would you test implementation details
|
| Because this has never been sufficient. From things like
| various hard to test cases to things like readability and
| long term maintenance. Reading and understanding the code
| is more efficient and necessary for any code worth
| keeping around.
| fy20 wrote:
| It's nice to have it written down in a concise form. I shared
| it with my team as some engineers have been struggling with AI,
| and I think this (just trying to one-shot without planning)
| could be why.
| bambax wrote:
| Agreed. The process described is much more elaborate than what
| I do but quite similar. I start to discuss in great details
| what I want to do, sometimes asking the same question to
| different LLMs. Then a todo list, then manual review of the
| code, esp. each function signature, checking if the
| instructions have been followed and if there are no obvious
| refactoring opportunities (there almost always are).
|
| The LLM does most of the coding, yet I wouldn't call it "vibe
| coding" at all.
|
| "Tele coding" would be more appropriate.
| mlaretallack wrote:
| I use AWS Kiro, and its spec driven developement is exactly
| this, I find it really works well as it makes me slow down
| and think about what I want it to do.
|
| Requirements, design, task list, coding.
| bonoboTP wrote:
| It feels like retracing the history of software project
| management. The post is quite waterfall-like. Writing a lot of
| docs and specs upfront then implementing. Another approach is
| to just YOLO (on a new branch) make it write up the lessons
| afterwards, then start a new more informed try and throw away
| the first. Or any other combo.
|
| For me what works well is to ask it to write _some_ code
| upfront to verify its assumptions against actual reality, not
| just be telling it to review the sources "in detail". It gains
| much more from real output from the code and clears up wrong
| assumptions. Do some smaller jobs, write up md files, then plan
| the big thing, then execute.
| 0x696C6961 wrote:
| This is exactly what I do. I assume most people avoid this
| approach due to cost.
| nurettin wrote:
| It makes an endless stream of assumptions. Some of them
| brilliant and even instructive to a degree, but most of them
| are unfounded and inappropriate in my experience.
| jerryharri wrote:
| 'The post is quite waterfall-like. Writing a lot of docs and
| specs upfront then implementing' - It's only waterfall if the
| specs cover the entire system or app. If it's broken up into
| sub-systems or vertical slices, then it's much more Agile or
| Lean.
| user3939382 wrote:
| If you have a big rules file you're in the right direction but
| still not there. Just as with humans, the key is that your
| architecture should make it very difficult to break the rules
| by accident and still be able to compile/run with correct exit
| status.
|
| My architecture is so beautifully strong that even LLMs and
| human juniors can't box their way out of it.
| kaycey2022 wrote:
| I've been doing the exact same thing for 2 months now. I wish I
| had gotten off my ass and written a blog post about it. I can't
| blame the author for gathering all the well deserved clout they
| are getting for it now.
| LeafItAlone wrote:
| Don't worry. This advice has been going around for much more
| than 2 months, including links posted here as well as
| official advice from the major companies (OpenAI and
| Anthropic) themselves. The tools literally have had plan mode
| as a first class feature.
|
| So you probably wouldn't have any clout anyways, like all of
| the other blog posts.
| noisy_boy wrote:
| I went through the blog. I started using Claude Code about 2
| weeks ago and my approach is practically the same. It just
| felt logical. I think there are a bunch of us who have landed
| on this approach and most are just quietly seeing the
| benefits.
| qudat wrote:
| > LLM's are like unreliable interns with boundless energy
|
| This isn't directed specifically at you but the general
| community of SWEs: we need to stop anthropomorphizing a tool.
| Code agents are not human capable and scaling pattern matching
| will never hit that goal. That's all hype and this is coming
| from someone who runs the range of daily CC usage. I'm using CC
| to its fullest capability while also being a good shepherd for
| my prod codebases.
|
| Pretending code agents are human capable is fueling this
| koolaide drinking hype craze.
| kobe_bryant wrote:
| if only there was another simpler way to use your knowledge to
| write code...
| growt wrote:
| That is just spec driven development without a spec, starting
| with the plan step instead.
| YetAnotherNick wrote:
| I don't know. I tried various methods. And this one kind of
| doesn't work quite a bit of times. The problem is plan naturally
| always skips some important details, or assumes some library
| function, but is taken as instruction in the next section. And
| claude can't handle ambiguity if the instruction is very
| detailed(e.g. if plan asks to use a certain library even if it is
| a bad fit claude won't know that decision is flexible). If the
| instruction is less detailed, I saw claude is willing to try
| multiple things and if it keeps failing doesn't fear in reverting
| almost everything.
|
| In my experience, the best scenario is that instruction and plan
| should be human written, and be detailed.
| pgt wrote:
| My process is similar, but I recently added a new "critique the
| plan" feedback loop that is yielding good results. Steps:
|
| 1. Spec
|
| 2. Plan
|
| 3. Read the plan & tell it to fix its bad ideas.
|
| 4. (NB) Critique the plan (loop) & write a detailed report
|
| 5. Update the plan
|
| 6. Review and check the plan
|
| 7. Implement plan
|
| Detailed here:
|
| https://x.com/PetrusTheron/status/2016887552163119225
| brumar wrote:
| Same. In my experience, the first plan always benefits from
| being challenged once or twice by claude itself.
| lastdong wrote:
| Google Anti-Gravity has this process built in. This is
| essentially a cycle a developer would follow: plan/analyse -
| document/discuss - break down tasks/implement. We've been using
| requirements and design documents as best practice since leaving
| our teenage bedroom lab for the professional world. I suppose
| this could be seen as our coding agents coming of age.
| w10-1 wrote:
| I try these staging-document patterns, but suspect they have 2
| fundamental flaws that stem mostly from our own biases.
|
| First, Claude evolves. The original post work pattern evolved
| over 9 months, before claude's recent step changes. It's likely
| claude's present plan mode is better than this workaround, but if
| you stick to the workaround, you'd never know.
|
| Second, the staging docs that represent some context - whether a
| library skills or current session design and implementation plans
| - are not the model Claude works with. At best they are shaping
| it, but I've found it does ignore and forget even what's written
| (even when I shout with emphasis), and the overall session
| influences the code. (Most often this happens when a peripheral
| adjustment ends up populating half the context.)
|
| Indeed the biggest benefit from the OP might be to squeeze within
| 1 session, omitting peripheral features and investigations at the
| plan stage. So the mechanism of action might be the combination
| of getting our own plan clear and avoiding confusing excursions.
| (A test for that would be to redo the session with the final plan
| and implementation, to see if the iteration process itself is
| shaping the model.)
|
| Our bias is to believe that we're getting better at managing this
| thing, and that we can control and direct it. It's uncomfortable
| to realize you can only really influence it - much like giving
| direction to a junior, but they can still go off track. And even
| if you found a pattern that works, it might work for reasons
| you're not understanding -- and thus fail you eventually. So,
| yes, try some patterns, but always hang on to the newbie senses
| of wonder and terror that make you curious, alert, and
| experimental.
| appsoftware wrote:
| This is the flow I've found myself working towards. Essentially
| maintaining more and more layered documentation for the LLM
| produces better and more consistent results. What is great here
| is the emphasis on the use of such documents in the planning
| phase. I'm feeling much more motivated to write solid
| documentation recently, because I know someone (the LLM) is
| actually going to read it! I've noticed my efforts and skill
| acquisition have moved sharply from app developer towards DevOps
| and architecture / management, but I think I'll always be
| grateful for the application engineering experience that I think
| the next wave of devs might miss out on.
|
| I've also noted such a huge gulf between some developers
| describing 'prompting things into existence' and the approach
| described in this article. Both types seem to report success,
| though my experience is that the latter seems more realistic, and
| much more likely to produce robust code that's likely to be
| maintainable for long term or project critical goals.
| dr_dshiv wrote:
| Another pattern is:
|
| 1. First vibecode software to figure out what you want
|
| 2. Then throw it out and engineer it
| chickensong wrote:
| I agree with most of this, though I'm not sure it's radically
| different. I think most people who've been using CC in earnest
| for a while probably have a similar workflow? Prior to Claude 4
| it was pretty much mandatory to define requirements and track
| implementation manually to manage context. It's still good, but
| since 4.5 release, it feels less important. CC basically works
| like this by default now, so unless you value the spec docs
| (still a good reference for Claude, but need to be maintained),
| you don't have to think too hard about it anymore.
|
| The important thing is to have a conversation with Claude during
| the planning phase and don't just say "add this feature" and take
| what you get. Have a back and forth, ask questions about common
| patterns, best practices, performance implications, security
| requirements, project alignment, etc. This is a learning
| opportunity for you and Claude. When you think you're done,
| request a final review to analyze for gaps or areas of
| improvement. Claude will _always_ find something, but starts to
| get into the weeds after a couple passes.
|
| If you're greenfield and you have preferences about structure and
| style, you need to be explicit about that. Once the scaffolding
| is there, modern Claude will typically follow whatever examples
| it finds in the existing code base.
|
| I'm not sure I agree with the "implement it all without stopping"
| approach and let auto-compact do its thing. I still see Claude
| get lazy when nearing compaction, though has gotten drastically
| better over the last year. Even so, I still think it's better to
| work in a tight loop on each stage of the implementation and
| preemptively compacting or restarting for the highest quality.
|
| Not sure that the language is that important anymore either.
| Claude will explore existing codebase on its own at unknown
| resolution, but if you say "read the file" it works pretty well
| these days.
|
| My suggestions to enhance this workflow:
|
| - If you use a numbered phase/stage/task approach with
| checkboxes, it makes it easy to stop/resume as-needed, and
| discuss particular sections. Each phase should be
| working/testable software.
|
| - Define a clear numbered list workflow in CLAUDE.md that loops
| on each task (run checks, fix issues, provide summary, etc).
|
| - Use hooks to ensure the loop is followed.
|
| - Update spec docs at the end of the cycle if you're keeping
| them. It's not uncommon for there to be some divergence during
| implementation and testing.
| koevet wrote:
| Has anyone found a efficient way to avoid repeating the initial
| codebase assessment when working with large projects?
|
| There are several projects on GitHub that attempt to tackle
| context and memory limitations, but I haven't found one that
| consistently works well in practice.
|
| My current workaround is to maintain a set of Markdown files,
| each covering a specific subsystem or area of the application.
| Depending on the task, I provide only the relevant documents to
| Claude Code to limit the context scope. It works reasonably well,
| but it still feels like a manual and fragile solution. I'm
| interested in more robust strategies for persistent project
| context or structured codebase understanding.
| jsmith99 wrote:
| Whenever I build a new feature with it I end up with several
| plan files leftover. I ask CC to combine them all, update with
| what we actually ended up building and name it something
| sensible, then whenever I want to work on that area again it's
| a useful reference (including the architecture, decisions and
| tradeoffs, relevant files etc).
| Sammi wrote:
| Yes this is what agent "skills" are. Just guides on any
| topic. The key is that you have the agent write and maintain
| them.
| KellyCriterion wrote:
| In Claude Web you can use projects to put files relevant for
| context there.
| mstkllah wrote:
| And then you have to remind it frequently to make use of the
| files. Happened to me so many times that I added it both to
| custom instructions as well as to the project memory.
| hathawsh wrote:
| That sounds like the recommended approach. However, there's one
| more thing I often do: whenever Claude Code and I complete a
| task that didn't go well at first, I ask CC what it learned,
| and then I tell it to write down what it learned for the
| future. It's hard to believe how much better CC has become
| since I started doing that. I ask it to write dozens of unit
| tests and it just does. Nearly perfectly. It's insane.
| energy123 wrote:
| For my longer spec files, I grep the subheaders/headers (with
| line numbers) and show this compact representation to the LLM's
| context window. I also have a file that describes what each
| spec files is and where it's located, and I force the LLM to
| read that and pull the subsections it needs. I also have one
| entrypoint requirements file (20k tokens) that I force it to
| read in full before it does anything else, every line I wrote
| myself. But none of this is a silver bullet.
| chickensong wrote:
| I'm interested in this as well.
|
| Skills almost seem like a solution, but they still need an out-
| of-band process to keep them updated as the codebase evolves.
| For now, a structured workflow that includes aggressive updates
| at the end of the loop is what I use.
| gregman1 wrote:
| It is really fun to watch how a baby makes its first steps and
| also how experienced professionals rediscover what standards were
| telling us for 80+ years.
| smcleod wrote:
| I don't really get what is different about this from how almost
| everyone else uses Claude Code? This is an incredibly common, if
| not the most common way of using it (and many other tools).
| nesk_ wrote:
| > I am not seeing the performance degradation everyone talks
| about after 50% context window.
|
| I pretty much agree with that. I use long sessions and stopped
| trying to optimize the context size, the compaction happens but
| the plan keeps the details and it works for me.
| charkubi wrote:
| Planning is important because you get the LLM to explain the
| problem and solution in _its_ language and structure, not yours.
|
| This shortcuts a range of problem cases where the LLM fights
| between the users strict and potentially conflicting
| requirements, and its own learning.
|
| In the early days we used to get LLM to write the prompts for us
| to get round this problem, now we have planning built in.
| shevy-java wrote:
| I don't deny that AI has use cases, but boy - the workflow
| described is boring:
|
| "Most developers type a prompt, sometimes use plan mode, fix the
| errors, repeat. "
|
| Does anyone think this is as epic as, say, watch the Unix
| archives https://www.youtube.com/watch?v=tc4ROCJYbm0 where Brian
| demos how pipes work; or Dennis working on C and UNIX? Or even
| before those, the older machines?
|
| I am not at all saying that AI tools are all useless, but there
| is no real epicness. It is just autogenerated AI slop and blob. I
| don't really call this engineering (although I also do agree,
| that it is engineering still; I just don't like using the same
| word here).
|
| > never let Claude write code until you've reviewed and approved
| a written plan.
|
| So the junior-dev analogy is quite apt here.
|
| I tried to read the rest of the article, but I just got angrier.
| I never had that feeling watching oldschool legends, though
| perhaps some of their work may be boring, but this AI-generated
| code ... that's just some mythical random-guessing work. And none
| of that is "intelligent", even if it may appear to work, may work
| to some extent too. This is a simulation of intelligence. If it
| works very well, why would any software engineer still be
| required? Supervising would only be necessary if AI produces
| slop.
| gehsty wrote:
| Doesn't Claude code do this by switching between edit mode and
| plan mode?
|
| FWIW I have had significant improvements by clearing context then
| implementing the plan. Seems like it stops Claude getting hung up
| on something.
| je42 wrote:
| There are frameworks like https://github.com/bmad-code-org/BMAD-
| METHOD and https://github.github.com/spec-kit/ that are working
| on encoding a similar kind of approach and process.
| mcv wrote:
| This is great. My workflow is also heading in that direction, so
| this is a great roadmap. I've already learned that just naively
| telling Claude what to do and letting it work, is a recipe for
| disaster and wasted time.
|
| I'm not this structured yet, but I often start with having it
| analyse and explain a piece of code, so I can correct it before
| we move on. I also often switch to an LLM that's separate from my
| IDE because it tends to get confused by sprawling context.
| gary17the wrote:
| > Read deeply, write a plan, annotate the plan until it's right,
| then let Claude execute the whole thing without stopping,
| checking types along the way.
|
| As others have already noted, this workflow is exactly what the
| Google Antigravity agent (based off Visual Studio Code) has been
| created for. Antigravity even includes specialized UI for a user
| to annotate selected portions of an LLM-generated plan before
| iterating it.
|
| One significant downside to Antigravity I have found so far is
| the fact that even though it will properly infer a certain
| technical requirement and clearly note it in the plan it
| generates (for example, "this business reporting column needs to
| use a weighted average"), it will sometimes quietly downgrade
| such a specialized requirement (for example, to a non-weighted
| average), without even creating an appropriate "WARNING:" comment
| in the generated code. Especially so when the relevant codebase
| already includes a similar, but not exactly appropriate API. My
| repetitive prompts to ALWAYS ask about ANY implementation
| ambiguities WHATSOEVER go unanswered.
|
| From what I gather Claude Code seems to be better than other
| agents at always remembering to query the user about
| implementation ambiguities, so maybe I will give Claude Code a
| shot over Antigravity.
| Fuzzwah wrote:
| All sounds like a bespoke way of remaking
| https://github.com/Fission-AI/OpenSpec
| __bjoernd wrote:
| Sounds a bit like what Claude Plan Mode or Amazon's Kiro were
| built for. I agree it's a useful flow, but you can also overdo
| it.
| grabshot_dev wrote:
| Why don't you make Claude give feedback and iterate by itself?
| alexrezvov wrote:
| Cool, the idea of leaving comments directly in the plan never
| even occurred to me, even though it really is the obvious thing
| to do.
|
| Do you markup and then save your comments in any way, and have
| you tried keeping them so you can review the rules and
| requirements later?
| zuInnp wrote:
| Since the rise of AI systems I really wonder how people wrote
| code before. This is exactly how I planned out implementation and
| executed the plan. Might have been some paper notes, a ticket or
| a white board, buuuuut ... I don't know.
| EastLondonCoder wrote:
| I don't use plan.md docs either, but I recognise the underlying
| idea: you need a way to keep agent output constrained by reality.
|
| My workflow is more like scaffold -> thin vertical slices ->
| machine-checkable semantics -> repeat.
|
| Concrete example: I built and shipped a live ticketing system for
| my club (Kolibri Tickets). It's not a toy: real payments
| (Stripe), email delivery, ticket verification at the door,
| frontend + backend, migrations, idempotency edges, etc. It's
| running and taking money.
|
| The reason this works with AI isn't that the model "codes fast".
| It's that the workflow moves the bottleneck from "typing" to
| "verification", and then engineers the verification loop:
| -keep the spine runnable early (end-to-end scaffold)
| -add one thin slice at a time (don't let it touch 15 files
| speculatively) -force checkable artifacts
| (tests/fixtures/types/state-machine semantics where it matters)
| -treat refactors as normal, because the harness makes them safe
|
| If you run it open-loop (prompt -> giant diff -> read/debug), you
| get the "illusion of velocity" people complain about. If you run
| it closed-loop (scaffold + constraints + verifiers), you can
| actually ship faster because you're not paying the integration
| cost repeatedly.
|
| Plan docs are one way to create shared state and prevent drift. A
| runnable scaffold + verification harness is another.
| aitchnyu wrote:
| Now that code is cheap, I ensured my side project has
| unit/integration tests (will enforce 100% coverage), Playwright
| tests, static typing (its in Python), scripts for all tasks.
| Will learn mutation testing too (yes, its overkill). Now my
| agent works upto 1 hour in loops and emits concise code I dont
| have to edit much.
| yunusabd wrote:
| That's exactly what Cursor's "plan" mode does? It even creates md
| files, which seems to be the main "thing" the author discovered.
| Along with some cargo cult science?
|
| How is this noteworthy other than to spark a discussion on hn? I
| mean I get it, but a little more substance would be nice.
| irthomasthomas wrote:
| In my own tests I have found opus to be very good at writing
| plans, terrible at executing them. It typically ignores half of
| the constraints.
| https://x.com/xundecidability/status/2019794391338987906?s=2...
| https://x.com/xundecidability/status/2024210197959627048?s=2...
| Sammi wrote:
| 1. Don't implement too much at at time
|
| 2. Have the agent review if it followed the plan and relevant
| skills accurately.
| irthomasthomas wrote:
| the first link was from a simple request with fewer than 1000
| tokens total in the context window, just a short shell
| script.
|
| here is another one which had about 200 tokens and opus
| decided to change the model name i requested.
|
| https://x.com/xundecidability/status/2005647216741105962?s=2.
| ..
|
| opus is bad at instruction following now.
| willsmith72 wrote:
| this sounds... really slow. for large changes for sure i'm
| investing time into planning. but such a rigid system can't
| possible be as good as a flexible approach with variable amounts
| of planning based on complexity
| richardjennings wrote:
| This is similar to what I do. I instruct an Architect mode with a
| set of rules related to phased implementation and detailed code
| artifacts output to a report.md file. After a couple of rounds of
| review and usually some responses that either tie together
| behaviors across context, critique poor choices or correct
| assumptions, there is a piece of work defined for a coder LLM to
| perform. With the new Opus 4.6 I then select specialist agents to
| review the report.md, prompted with detailed insight into
| particular areas of the software. The feedback from these
| specialist agent reviews is often very good and sometimes catches
| things I had missed. Once all of this is done, I let the agent
| make the changes and move onto doing something else. I typically
| rename and commit the report.md files which can be useful as an
| alternative to git diff / commit messages etc.
| vazma wrote:
| Sorry but I didn't get the hype with this post, isnt it what most
| of the people doing? I want to see more posts on how you use the
| claude "smart" without feeding the whole codebase polluting the
| context window and also more best practices on cost efficient
| ways to use it, this workflow is clearly burning million tokens
| per session, for me is a No
| pajamasam wrote:
| I feel like if I have to do all this, I might as well write the
| code myself.
| MarcLore wrote:
| The separation of planning and execution resonates strongly. I've
| been using a similar pattern when building with AI APIs -- write
| the spec/plan in natural language first, then let the model
| execute against it.
|
| One addition that's worked well for me: keeping a persistent
| context file that the model reads at the start of each session.
| Instead of re-explaining the project every time, you maintain a
| living document of decisions, constraints, and current state.
| Turns each session into a continuation rather than a cold start.
|
| The biggest productivity gain isn't in the code generation itself
| -- it's in reducing the re-orientation overhead between sessions.
| nikolay wrote:
| Well, that's already done by Amazon's Kiro [0], Google's
| Antigravity [1], GitHub's Spec Kit [2], and OpenSpec [3]!
|
| [0]: https://kiro.dev/
|
| [1]: https://antigravity.google/
|
| [2]: https://github.github.com/spec-kit/
|
| [3]: https://openspec.dev/
| baalimago wrote:
| Another approach is to spec functionality using comments and
| interfaces, then tell the LLM to first implement tests and
| finally make the tests pass. This way you also get regression
| safety and can inspect that it works as it should via the tests.
| folex wrote:
| this is exactly how I work with cursor
|
| except that I put notes to plan document in a single message
| like: > plan quote my note > plan
| quote my note
|
| otherwise, I'm not sure how to guarantee that ai won't confuse my
| notes with its own plan.
|
| one new thing for me is to review the todo list, I was always
| relying on auto generated todo list
| adithyassekhar wrote:
| What I've read is that even with all the meticulous planning, the
| author still needed to intervene. Not at the end but at the
| middle, unless it will continue building out something wrong and
| its even harder to fix once it's done. It'll cost even more
| tokens. It's a net negative.
|
| You might say a junior might do the same thing, but I'm not
| worried about it, at least the junior learned something while
| doing that. They could do it better next time. They know the code
| and change it from the middle where it broke. It's a net
| positive.
| ionwake wrote:
| this comment is the first truly humane one ive read regarding
| this whole AI fiasco
| anonymousDan wrote:
| Unfortunately, you could argue that the model provider has also
| learned something, i.e. the interaction can be used as
| additional training data to train subsequent models.
| jeleh wrote:
| Good article, but I would rephrase the core principle slightly:
|
| Never let Claude write code until you've reviewed, *fully
| understood* and approved a written plan.
|
| In my experience, the beginning of chaos is the point at which
| you trust that Claude has understood everything correctly and
| claims to present the very best solution. At that point, you
| leave the driver's seat.
| vemv wrote:
| Every "how I use Claude Code" post will get into the HN
| frontpage.
|
| Which maybe has to do with people wanting to show how _they_ use
| Claude Code in the comments!
| juanre wrote:
| Shameless plug: https://beadhub.ai allows you to do exactly that,
| but with several agents in parallel. One of them is in the role
| of planner, which takes care of the source-of-truth document and
| the long term view. They all stay in sync with real-time chat and
| mail.
|
| It's OSS.
|
| Real-time work is happening at
| https://app.beadhub.ai/juanre/beadhub (beadhub is a public
| project at https://beadhub.ai so it is visible).
|
| Particularly interesting (I think) is how the agents chat with
| each other, which you can see at
| https://app.beadhub.ai/juanre/beadhub/chat
| colinhb wrote:
| Quoting the article:
|
| > One trick I use constantly: for well-contained features where
| I've seen a good implementation in an open source repo, I'll
| share that code as a reference alongside the plan request. If I
| want to add sortable IDs, I paste the ID generation code from a
| project that does it well and say "this is how they do sortable
| IDs, write a plan.md explaining how we can adopt a similar
| approach." Claude works dramatically better when it has a
| concrete reference implementation to work from rather than
| designing from scratch.
|
| Licensing apparently means nothing.
|
| Ripped off in the training data, ripped off in the prompt.
| miohtama wrote:
| Concepts are not copyrightable.
| colinhb wrote:
| The article isn't describing someone who learned the concept
| of sortable IDs and then wrote their own implementation.
|
| It describes copying and pasting actual code from one project
| into a prompt so a language model can reproduce it in another
| project.
|
| It's a mechanical transformation of someone else's
| copyrighted expression (their code) laundered through a
| statistical model instead of a human copyist.
| layer8 wrote:
| "Mechanical" is doing some heavy lifting here. If a human
| does the same, reimplement the code in their own style for
| their particular context, it doesn't violate copyright.
| Having the LLM see the original code doesn't automatically
| make its output a plagiarism.
| parasti wrote:
| The biggest roadblock to using agents to maximum effectiveness
| like this is the chat interface. It's convenience as detriment
| and convenience as distraction. I've found myself repeatedly
| giving into that convenience only to realize that I have wasted
| an hour and need to start over because the agent is just
| obliviously circling the solution that I thought was fully
| obvious from the context I gave it. Clearly these tools are
| exceptional at transforming inputs into outputs and,
| counterintuitively, not as exceptional when the inputs are
| constantly interleaved with the outputs like they are in chat
| mode.
| submeta wrote:
| What works extremely well for me is this: Let Claude Code create
| the plan, then turn over the plan to Codex for review, and give
| the response back to Claude Code. Codex is exceptionally good at
| doing high level reviews and keeping an eye on the details. It
| will find very suble errors and omissins. And CC is very good at
| quickly converting the plan into code.
|
| This back and forth between the two agents with me steering the
| conversation elevates Claude Code into next level.
| drcongo wrote:
| This is exactly how I use it.
| oulipo2 wrote:
| Has Claude Code become slow, laggy, imprecise, giving wrong
| answers for other people here?
| stuaxo wrote:
| I had to stop reading about half way, it's written in that
| breathless linkedin/ai generated style.
| podgorniy wrote:
| I do the same. I also cross-ask gemini and claude about the plan
| during iterations, sometimes make several separate plans.
| clbrmbr wrote:
| I just use Jesse's "superpowers" plugin. It does all of this but
| also steps you through the design and gives you bite sized chunks
| and you make architecture decisions along the way. Far better
| than making big changes to an already established plan.
| tagawa wrote:
| Link for those interested:
| https://claude.com/plugins/superpowers
| clbrmbr wrote:
| I suggest reading the tests that Superpowers author has come
| up with for testing the skills. See the GitHub repo.
| flippyhead wrote:
| Have you tried https://github.com/pcvelz/superpowers ?
| clbrmbr wrote:
| https://github.com/obra/superpowers
| xbmcuser wrote:
| Gemini is better at research Claude at coding. I try to use
| Gemini to do all the research and write out instruction on what
| to do what process to follow then use it in Claude. Though I am
| mostly creating small python scripts
| sparin9 wrote:
| I think the real value here isn't "planning vs not planning,"
| it's forcing the model to surface its assumptions before they
| harden into code.
|
| LLMs don't usually fail at syntax. They fail at invisible
| assumptions about architecture, constraints, invariants, etc. A
| written plan becomes a debugging surface for those assumptions.
| hun3 wrote:
| Except that merely surfacing them changes their behavior, like
| how you add that one printf() call and now your heisenbug is
| suddenly nonexistent
| maccard wrote:
| > LLMs don't usually fail at syntax?
|
| Really? My experience has been that it's incredibly easy to get
| them stuck in a loop on a hallucinated API and burn through
| credits before I've even noticed what it's done. I have a small
| rust project that stores stuff on disk that I wanted to add an
| s3 backend too - Claude code burned through my $20 in a loop in
| about 30 minutes without any awareness of what it was doing on
| a very simple syntax issue.
| remify wrote:
| Sub agent also helps a lot in that regard. Have an agent do the
| planning, have an implementation agent do the code and have
| another one do the review. Clear responsabilities helps a lot.
|
| There also blue team / red team that works.
|
| The idea is always the same: help LLM to reason properly with
| less and more clear instructions.
| jalopy wrote:
| This sounds very promising. Any link to more details?
| MagicMoonlight wrote:
| Did you just write this with ChatGPT?
| asdxrfx wrote:
| It's also great to describe the full use case flow in the
| instructions, so you can clearly understand that LLM won't do
| some stupid thing on its own
| dr_kretyn wrote:
| The post and comments all read like: Here are my rituals to the
| software God. If you follow them then God gives plenty. Omit one
| step and the God mad. Sometimes you have to make a sacrifice but
| that's better for the long term.
|
| I've been in eng for decades but never participated in forums. Is
| the cargo cult new?
|
| I use Claude Code a lot. Still don't trust what's in the plan
| will get actually written, regardless of details. My ritual is
| around stronger guardrails outside of prompting. This is the new
| MongoDB webscale meme.
| getnormality wrote:
| This looks like an important post. What makes it special is that
| it operationalizes Polya's classic problem-solving recipe for the
| age of AI-assisted coding.
|
| 1. Understand the problem (research.md)
|
| 2. Make a plan (plan.md)
|
| 3. Execute the plan
|
| 4. Look back
| christophilus wrote:
| Yeah, OODA loop for programmers, basically. It's a good
| approach.
| kissgyorgy wrote:
| There is not a lot of explanation WHY is this better than doing
| the opposite: start coding and see how it goes and how this would
| apply to Codex models.
|
| I do exactly the same, I even developed my own workflows wit Pi
| agent, which works really well. Here is the reason:
|
| - Claude needs a lot more steering than other models, it's too
| eager to do stuff and does stupid things and write terrible code
| without feedback.
|
| - Claude is very good at following the plan, you can even use a
| much cheaper model if you have a good plan. For example I list
| every single file which needs edits with a short explanation.
|
| - At the end of the plan, I have a clear picture in my head how
| the feature will exactly look like and I can be pretty sure the
| end result will be good enough (given that the model is good at
| following the plan).
|
| A lot of things don't need planning at all. Simple fixes,
| refactoring, simple scripts, packaging, etc. Just keep it simple.
| etothet wrote:
| "The workflow I'm going to describe has one core principle: never
| let Claude write code until you've reviewed and approved a
| written plan."
|
| I'm not sure we need to be this black and white about things.
| Speaking from the perspective of leading a dev team, I regularly
| have Claude Code take a chance at code without reviewing a plan.
| For example, small issues that I've written clear details about,
| Claude can go to town on those. I've never been on a team that
| didn't have too many of these types of issues to address.
|
| And, a team should have othee guards in place that validates that
| code before it gets merged somewhere important.
|
| I don't have to review every single decision one of my teammates
| is going to make, even those less experienced teammates, but I do
| prepare teammates with the proper tools (specs, documentation,
| etc) so they can make a best effort first attempt. This is how I
| treat Claude Code in a lot of scenarios.
| josefrichter wrote:
| Radically different? Sounds to me like the standard spec driven
| approach that plenty of people use.
|
| I prefer iterative approach. LLMs give you incredible speed to
| try different approaches and inform your decisions. I don't think
| you can ever have a perfect spec upfront, at least that's my
| experience.
| MagicMoonlight wrote:
| So we're back to waterfall huh
| islandfox100 wrote:
| It strikes me that if this technology were as useful and all-
| encompassing as it's marketed to be, we wouldn't need four
| articles like this every week
| prplfsh wrote:
| People are figuring it out. Cars are broadly useful, but
| there's nuance to how to maintain then, use them will in
| different terrains and weather, etc.
| hombre_fatal wrote:
| How many millions of articles are there about people figuring
| out how to write better software?
|
| Does something have to be trivial-to-use to be useful?
| turingsroot wrote:
| I've been running AI coding workshops for engineers transitioning
| from traditional development, and the research phase is
| consistently the part people skip -- and the part that makes or
| breaks everything.
|
| The failure mode the author describes (implementations that work
| in isolation but break the surrounding system) is exactly what I
| see in workshop after workshop. Engineers prompt the LLM with
| "add pagination to the list endpoint" and get working code that
| ignores the existing query builder patterns, duplicates filtering
| logic, or misses the caching layer entirely.
|
| What I tell people: the research.md isn't busywork, it's your
| verification that the LLM actually understands the system it's
| about to modify. If you can't confirm the research is accurate,
| you have no business trusting the plan.
|
| One thing I'd add to the author's workflow: I've found it helpful
| to have the LLM explicitly list what it does NOT know or is
| uncertain about after the research phase. This surfaces blind
| spots before they become bugs buried three abstraction layers
| deep.
___________________________________________________________________
(page generated 2026-02-22 16:00 UTC)