codevoid.de/1/hn/comments_47149151.gph

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   LLM=True
       
       
        jakub_g wrote 22 hours 58 min ago:
        A potential problem I see with "LLM=true" being set by claude and
        friends is that some tools whose authors don't like LLMs might be
        tempted to e.g. don't output anything at all (and don't do what the CLI
        is supposed to do), out of principle, when they detect LLM is running
        things.
        
        Just the (small) probability of this being true might be enough for the
        big players to not consider creating that var. (Although, if it's easy
        enough to unset it, then maybe not an issue).
       
        dirk94018 wrote 1 day ago:
        This is exactly right. We hit the same wall. Our solution was to
        re-imagine Unix at [1] , and either pipe through jq etc or just start
        rewriting tools that do that. A good tool shouldn't be verbose out of
        laziness, it should be conscious of the information that might be
        needed by the next step in the pipeline. If deeper information is
        needed, the user should ask for that, with a command line flag.
        
  HTML  [1]: https://linuxtoaster.com
       
        nextzck wrote 1 day ago:
        This is why I built claude-warden: [1] I think itâs much simpler &
        easier to just build this into agents than trying to modify every tool
        ever created to be less verbose. Just guard agents from it user-side.
        Let users control what they want to see and pass into context.
        
  HTML  [1]: https://github.com/johnzfitch/claude-warden
       
        irawen wrote 1 day ago:
        the scroll breaks upon zoom in, a bit nauseating
       
        burkaman wrote 1 day ago:
        Why can't the agent harness dynamically decide whether outputs should
        be put into the context or not? It could check with an LLM to determine
        if the verbatim output seems important, and if not, store the full
        output locally but replace it in the prompt with a brief summary and
        unique ID. Then make a tool available so the full output can be
        retrieved later if necessary. That's roughly how humans do it, you
        scroll through your terminal and make quick decisions about what parts
        you can ignore, and then maybe come back later when you realize "oh I
        should probably read that whole stack trace".
        
        It wouldn't even need to send the full output to make a decision, it
        could just send "npm run build output 500 lines and succeeded, do we
        need to read the output?" and based on the rest of the conversation the
        LLM can respond yes or no.
       
          zwarag wrote 1 day ago:
          Isn't that what subagents do to a certain degree?
       
            burkaman wrote 1 day ago:
            Sort of, but you also want to keep the sub-agent context small for
            as long as possible, and if you're paying per token there's no
            reason to be sending thousands of tokens that are probably useless.
       
        rel_ic wrote 1 day ago:
        > The environment wins (less tokens burned = less energy consumed)
        
        This is understandable logic, but at a systemic level it's not how
        things always go. Increasing efficiency can lead to increased
        consumption overall. You might save 50% in energy for your workload,
        but maybe now you can run it 3 times as much, or maybe 3 times more
        people will use it, because it's cheaper. The result might be a 50%
        INCREASE in energy consumed.
        
  HTML  [1]: https://en.wikipedia.org/wiki/Jevons_paradox
       
          SubiculumCode wrote 1 day ago:
          This is the standing reason that is always given for why we must all
          sit in freeway traffic clogs, and I think it's B.S., because it
          assumes that there are viable alternatives available in near-medium
          term, but that isn't always the case. The alternative to freeways
          that are supposed to compensate is a joint combination of denser
          housing and mass transit, which in California, is not happening at
          all...zoning laws and the slow pace of building mass transit due to
          regulation slow-down and the need to service urban sprawl, prevent
          that solution from relieving traffic pressure. Don't speak of busses,
          because taking two hours to get to work is not better than one hour.
          So..the freeways stay the same number of lanes and my commute time
          continues to grow, and I am tired of hearing it is for the best.
          
          So yes, lower LLM costs would probably lead even more LLM usage and
          greater energy expenditures, but then again, so does having a moving
          economy, and all that comes with that.
       
          skybrian wrote 1 day ago:
          Yeah, probably. I wonder where speed-running fixing all the
          low-hanging fruit for AI-related efficiency improvements will leave
          us? It still seems worth doing. Maybe combined with a carbon tax.
       
        Gertig wrote 1 day ago:
        I've been using CODING_AGENT=true
       
        TobTobXX wrote 1 day ago:
        Many unix tools already print less logging when used im a script, ie.
        non-interactively. (I don't know how they detect that.) For example,
        `ls` has formatting/coloring and `ls | cat` does not. This solution
        seems like it would fit the problem from the article?
       
          zahlman wrote 1 day ago:
          > I don't know how they detect that.
          
          The OS knows (it has to because it set up the pipeline), and the
          process can find out through a system call, exposed in C as `isatty`:
          [1] > This solution seems like it would fit the problem from the
          article?
          
          Might not be a great idea. The world is probably already full of
          build tools pipelines that expect to process the normal terminal
          output (maybe with colours stripped). Environment variables like `CI`
          are a thing for a reason.
          
  HTML    [1]: https://www.man7.org/linux/man-pages/man3/isatty.3.html
       
          skydhash wrote 1 day ago:
          Thereâs a function isatty that detect if a file descriptor (stdout
          is one) is associated with a terminal [1] I believe most standard
          libraries has a version.
          
  HTML    [1]: https://man.openbsd.org/man3/ttyname.3
       
            sdsd wrote 1 day ago:
            I was about to comment the same thing. Usually I don't call the
            function directly, but via the tty command in my shell scripts:
            
              if tty -s; then
                echo "Standard input is a TTY (interactive mode)."
              else
                echo "Standard input is not a TTY (e.g., piped or redirected)."
              fi
            
            Now I wonder how _isatty_ itself detects whether a file descriptor
            is associated with a terminal!
       
              skydhash wrote 1 day ago:
              In OpenBSD, with the fcntl system call [1] [2] [3] [4]
              
  HTML        [1]: https://github.com/openbsd/src/blob/master/lib/libc/gen/...
  HTML        [2]: https://man.openbsd.org/fcntl
  HTML        [3]: https://github.com/openbsd/src/blob/master/sys/sys/fcntl...
  HTML        [4]: https://github.com/openbsd/src/blob/ba496e5267528b649ec8...
  HTML        [5]: https://github.com/openbsd/src/blob/ba496e5267528b649ec8...
       
        mark_l_watson wrote 1 day ago:
        This seems like a really solid idea: using an environment variable in
        command line tools and small apps to control output for AI vs. human
        digestion. Even given efficient attention mechanisms, slop tokens in
        the context window are bad.
        
        I also like a discussion in this thread: using custom tools to reduce
        the frequency of tool calls in general, that is, write tool wrappers
        specific for your applications or agents.
       
        philipwhiuk wrote 1 day ago:
        MCP as an env-var ;)
       
        googlielmo wrote 1 day ago:
        I like the gist of this, however LLM may not be the best name for this:
        what if a new tech (e.g., SLM) takes over? AGENT may be a more faithful
        name until something better is standardized.
       
        caerwy wrote 1 day ago:
        The UNIX philosophy of tools that handle text streams, staying "quiet"
        unless something goes wrong, doing one thing well, etc. are all still
        so well suited to the modern age of AI coding agents.
       
        exabrial wrote 1 day ago:
        Weâve reinvented exit codesâ¦
       
        bearjaws wrote 1 day ago:
        This is basically what RTK "Rust Token Killer" does.
        
        Removes all the fluff around commands that agents use frequently.
        
  HTML  [1]: https://github.com/rtk-ai/rtk
       
        user3939382 wrote 1 day ago:
        Actually what we have is an entire stack, starting with Von Neumann
        arch, the kernel, the browser, auth â- it is optimized for the
        intuition of neither humans nor agents. All the legacy cruft that we
        glibly told people to RTFM on is now choking your agent and burning
        your tokens.
        
        I have a solution to all this of course but why should I tell anyone.
       
        yoz-y wrote 1 day ago:
        All of this because we only have stdout and stderr and nothing in
        between. I wish there was a stdlog or stddebug or something
       
          we_have_options wrote 1 day ago:
          yes, if only we had a more fine-grained log level hierarchy that we
          could get every piece of software to agree to...
       
        titzer wrote 1 day ago:
        Seeing a JSON configuration file that stores environment variables
        makes me want to cry. Just to think that somewhere, somehow, it's going
        to launch an entire JavaScript VM (tens of megabytes) just to parse a
        file with 12 lines in it, then extract from a JavaScript the fields,
        munge it, eventually turn into more or less an array of VAR=val C
        strings which get passed to a forked shell....
       
          spankalee wrote 1 day ago:
          Why do you presume it needs a JavaScript VM to parse JSON?
       
          sgarland wrote 1 day ago:
          Granted, I have no idea how Claude Code operates internally, but if
          itâs already running in a JS VM, canât it read the file itself?
       
        tacone wrote 1 day ago:
        For Claude the most pollution usually comes from Claude itself.
        
        It's worth noting thet just by setting the right tone of voice,
        choosing the right words, and instructing it to be concise, surgical in
        what it says and writes, things change drastically - like night and
        day.
        
        It then starts obeying, CRITICALs are barely needed anymore and the
        docs it produces are tidy and pretty.
       
        Lerc wrote 1 day ago:
        I think the concept has value, but I think targeting today's LLMs like
        this is short sighted.
        
        It's making what is likely to be a permanent change to fix a temporary
        problem.
        
        I think the thing that would have value in the long term is an option
        to be concise, accurate, and unambiguous.
        
        This isn't something that should be considered to be only for LLMs. 
        Sometimes humans want readability to understand something quickly
        adding context helps a great deal here, but sometimes accuracy and
        unambiguity are paramount (like when doing an audit)  if dealing with a
        batch of similar things, the same repeated context adds nothing and
        limits how much you can see at once.
        
        So there can be a benefit when a human can request output like this for
        them to read directly.    On top of this is the broad range of of output
        processing tools that we have (some people still awk).
        
        So yes, this is needed, but LLMs will probably not need this in a few
        years.    The other uses will remain
       
        hrpnk wrote 1 day ago:
        Looks like the blog could use a HN=True. Hope the author won't get
        banned...
        
        > Error: API rate limit exceeded for app ID 7cc6c241b6e6762bf384. If
        you reach out to GitHub Support for help, please include the request ID
        E9FC:7BEBA:6CDB3B4:6485458:699EE247 and timestamp 2026-02-25 11:51:35
        UTC. For more on scraping GitHub and how it may affect your rights,
        please review our Terms of Service ( [1] ).
        
  HTML  [1]: https://docs.github.com/en/site-policy/github-terms/github-ter...
       
          avh3 wrote 1 day ago:
          Author here. Thanks for flagging. Let me look into it
       
        gormen wrote 1 day ago:
        Most of what helps LLMs here is exactly what helps humans: less noise,
        clearer signals, predictable output.
       
        sirk390 wrote 1 day ago:
        I would use this as a human. That npm output is crazy.    Maybe a better
        variable would be "CONCISE=1".    For LLMs, there are a few easier
        solutions, like outputing in a file (and then tail)., or running a
        subagent
       
        mohsen1 wrote 1 day ago:
        Funny! I built an entire cli and ecosystem around this:
        
  HTML  [1]: https://github.com/bodo-run/stop-nagging
       
        rustybolt wrote 1 day ago:
        Surprisingly often people refuse to document their architecture or
        workflow for new hires. However, when it's for an LLM some of these
        same people are suddenly willing to spend a lot of time and effort
        detailing architecture, process, workflows.
        
        I've seen projects with an empty README and a very extensive CLAUDE.md
        (or equivalent).
       
          bool3max wrote 1 day ago:
          That could be because Claude offers a dedicated /init command to
          generate a CLAUDE.md if it doesn't exist.
       
        moritonal wrote 1 day ago:
        It feels wild to have to keep reminding people, but AI changes very
        little. Tools have always had a variety of output, and ways to control
        this, and bad tools output a lot by default, whilst good tools hide it
        behind version of "-v" or easy greps. Don't add a --LLM or whatever, do
        add cleaner and consistent verbosity controls.
       
        eptcyka wrote 1 day ago:
        If the output of your build tool is too verbose for a mechanical brain
        to keep on top of, did the meat brain ever stand a chance?
        
        Why was the output so verbose in the first place then?
       
          xyzsparetimexyz wrote 1 day ago:
          So you can debug it without having to do a second build with extra
          flags, and in order to have a sense of what the build is doing at any
          particular time.
       
        cubefox wrote 2 days ago:
        I wonder whether attention-free architectures like Mamba or Gated
        DeltaNet are distracted less by irrelevant context, because they don't
        recall every detail inside their context window in the first place.
        Theoretically it should be fairly easy to test this via a dedicated
        "context rot benchmark" (standard benchmarks but with/without
        irrelevant context).
       
        fergie wrote 2 days ago:
        Given that most of the utility of Typescript is to make VSCode play
        nice for its human operator, _should_ we be using Typescript for
        systems that are written by machines?
       
        deafpolygon wrote 2 days ago:
        Or, stop outputting crap and use a logger. Make an LLM-only logger for
        output LLMs need and use stdout for HUMAN things.
       
        subhajeet2107 wrote 2 days ago:
        Can TOON format help in this, with "LLM=true" we can reduce the noise
        which pollutes context
       
        skerit wrote 2 days ago:
        > Then a brick hits you in the face when it dawns on you that all of
        our tools are dumping crazy amounts of non-relevant context into stdout
        thereby polluting your context windows.
        
        I've found that letting the agent write its own optimized script for
        dealing with some things can really help with this. Claude is now
        forbidden from using `gradlew` directly, and can only use a helper
        script we made. It clears, recompiles, publishes locally, tests, ...
        all with a few extra flags. And when a test fails, the stack trace is
        printed.
        
        Before this, Claude had to do A TON of different calls, all messing up
        the context. And when tests failed, it started to read gradle's
        generated HTML/XML files, which damaged the context immensely, since
        they contain a bunch of inline javascript.
        
        And I've also been implementing this "LLM=true"-like behaviour in most
        of my applications. When an LLM is using it, logging is less verbose,
        it's also deduplicated so it doesn't show the same line a hundred
        times, ...
        
        > He sees something goes wrong, but now he cut off the stacktraces by
        using tail, so he tries again using a bigger tail. Not satisfied with
        what he sees HE TRIES AGAIN with a bigger tail, and â¦ you see the
        problem. Itâs like a dog chasing its own tail.
        
        I've had the same issue. Claude was running the 5+ minute test suite
        MULTIPLE TIMES in succession, just with a different `| grep something`
        tacked at the end.
        Now, the scripts I made always logs the entire (simplified) output, and
        just prints the path to the temporary file. This works so much better.
       
          esafak wrote 1 day ago:
          How is it forbidden? I tell agents to use my wrappers in AGENTS but
          they ignore it half the time and use the naked tool.
       
            Squid_Tamer wrote 1 day ago:
            If you get desperate, I've given my agent a custom $PATH that
            replaces the forbidden tools with shims that either call the
            correct tool, or at least tell it what to do differently.
            
            ~/agent-shims/mvn:
            
                #!/bin/bash
                echo "Usage of 'mvn' is forbidden. Use build.sh or
            run-tests.sh"
            
            That way it is prevented from using the wrong tools, and can
            self-correct when it tries.
       
            simsla wrote 1 day ago:
            Permissions scoping
       
              esafak wrote 1 day ago:
              Then they attempt to download the missing tool or write a
              substitute from scratch. Am I the only one who runs into this??
       
          majewsky wrote 1 day ago:
          > Claude is now forbidden from using `gradlew` directly, and can only
          use a helper script we made. It clears, recompiles, publishes
          locally, tests, ... all with a few extra flags. And when a test
          fails, the stack trace is printed.
          
          I think my question at this point is what about this is specific to
          LLMs. Humans should not be forced to wade through reams of garbage
          output either.
       
            rstuart4133 wrote 1 day ago:
            > I think my question at this point is what about this is specific
            to LLMs. Humans should not be forced to wade through reams of
            garbage output either.
            
            Beware I'm a complete AI layman.  All this is from background
            reading of popular articles.  It may well be wrong.  It's
            definitely out of date.
            
            It has to do with how the attention heads work.  The attention
            heads (the idea originated from the "Attention is all you need"
            paper, arguably the single most important AI paper to date), direct
            the LLM to work on the most relevant parts of the conversation.  If
            you want a human analogue, it's your attention heads that are
            tacking the interesting points in a conversation.
            
            The original attention heads output a relevance score for every
            pair of words in the context window.  Thus in "Time flies like an
            arrow", it's the attention heads that spot the word "Time" is very
            relevant to "arrow", but not "flies".  The implication of this is
            an attention head does O(N*N) work.  It does not scale well to
            large context windows.
            
            Nonetheless, you see claims of "large" context windows the LLMs
            marketing.  (Large is in quotes, because even a 1M context window
            begins to feel very cramped in a write / test / fix loop.)  But a
            1M context-window would require a attention head requiring a 1
            trillion element matrix.  That isn't feasible.    The industry even
            has a name for the size of the window they give in their marketing:
            the Effective Context Window.  Internally they have another metric
            that measures the real amount of compute they throw at attention:
            the Physical Context Window.  The bridge between the two is some
            proprietary magic that discards tokens in the context window that
            are likely to be irrelevant.  In my experience, that bridge is
            pretty good at doing that, where "pretty good" is up to human
            standards.
            
            But eventually (actually quickly in my experience), you fill up
            even the marketed size of the context window because it is
            remembering every word said, in the order they were said.  If it
            reads code it's written to debug it, it appears twice in the
            context window.  All compiler and test output also ends up there. 
            Once the context window fills up they take drastic action, because
            it like letting malloc fail.  Even reporting a malloc failure is
            hard because it usually needs more malloc to do the reporting. 
            Anthropic calls it compacting.    It throws away 90% of your tokens. 
            It turns your helpful LLM into a goldfish with dementia.  It is
            nowhere near as good as human is at remembering what happened.    Not
            even close.
       
            keeda wrote 1 day ago:
            In my experience, it's the old time-invested vs time-saved trade
            off. If you're not looking at these reams of output often enough,
            the incentive to figure out all the flags and configs for verbosity
            to write these script is lower: [1] And because these issues are
            often sporadic, doing all this would be an unwanted sidequest, so
            humans grit their teeth and wade through the garbage manually each
            time.
            
            With LLMs, the cost is effectively 0 compared to a human, so it
            doesn't matter. Have them write the script. In fact, because it
            benefits the LLM by reducing context pollution, which increases
            their accuracy, such measures should be actively identified and put
            in place.
            
  HTML      [1]: https://xkcd.com/1205/
       
            kimixa wrote 1 day ago:
            Humans have the ability to ignore and generally not remember things
            after a short scan, prioritize what's actually important etc. But
            to an LLM a token is a token.
            
            There's attempts at effectively doing something similar with
            analysis passes of the context - kinda what things like
            auto-compaction is doing - but I'm sure anyone who has used the
            current generation of those tools will tell you they're very much
            imperfect.
       
              dcrazy wrote 1 day ago:
              Isnât the purpose of self attention exactly to recognize the
              relevance of some tokens over others?
       
                kimixa wrote 1 day ago:
                That may help with tokens being "ignored" while still being in
                the context window, but not context window size costs and
                limitations in the first place.
       
              pennomi wrote 1 day ago:
              The âa token is a tokenâ effect makes LLMs really bad at some
              things humans are great at, and really good at some things humans
              are terrible at.
              
              For example, I quickly get bored looking through long logfiles
              for anomalies but an LLM can highlight those super quickly.
       
            adammarples wrote 1 day ago:
            Lots of tools have a --quiet or --output json type option, which is
            usually helpful
       
          petedoyle wrote 1 day ago:
          Wow, I'd love to do this. Any tips on how to build this (or how to
          help an LLM build this), specifically for ./gradlew?
       
          ViktorEE wrote 2 days ago:
          The way I've solved this issue with a long running build script is to
          have a logging scripts which redirects all outputs into a file and
          can be included with
          ```
          # Redirect all output to a log file (re-execs script with
          redirection)
          source "$(dirname "$0")/common/logging.sh"
          ```
          at the start of a script.
          
          Then when the script runs the output is put into a file, and the LLM
          can search that. Works like a charm.
       
          quintu5 wrote 2 days ago:
          This has been my exact experience with agents using gradle and itâs
          beyond frustrating to watch. Iâve been meaning to set up my own
          low-noise wrapper script.
          
          This post just inspired me to tackle this once and for all today.
       
        haarlemist wrote 2 days ago:
        Can we just instruct the agents to redirect output streams to files,
        and then use grep to retrieve the necessary lines?
       
        exitb wrote 2 days ago:
        Also an acceptable solution - create a "runner" subagent on a cheap
        model, that's tasked with running a command and relaying the important
        parts to the main agent.
       
          mromanuk wrote 2 days ago:
          Yes, this is the solution. An agent that can clean up the output of
          irrelevant stuff
       
        robkop wrote 2 days ago:
        Weâve got a long way to go in optimising our environments for these
        models. Our perception of a terminal is much closer to feeding a video
        into Gemini than reading a textbook of logs. But we donât make that
        ax affordance at the moment.
        
        I wrote a small game for my dev team to experience what itâs like
        interacting through these painful interfaces over the summer
        www.youareanagent.app
        
        Jump to the agentic coding level or the mcp level to experience true
        frustration (call it empathy). I also wrote up a lot more thinking here
        www.robkopel.me/field-notes/ax-agent-experience/
       
          hirako2000 wrote 1 day ago:
          Beautiful simulation.
       
        m0rde wrote 2 days ago:
        I think about what I do in these verbose situations; I learn to ignore
        most of the output and only take forward the important piece. That may
        be a success message or error. I've removed most of the output from my
        context window / memory.
        
        I see some good research being done on how to allow LLMs to manage
        their own context. Most importantly, to remove things from their
        context but still allow subsequent search/retrieval.
       
        DoctorOetker wrote 2 days ago:
        So frequently beginners in linux command lines complain about the
        irregularity or redundance in command line tool conventions (sometimes
        actual command parameters -h --help or /h ? other times: man vs info;
        etc...)
        
        When the first transformers that did more than poetry or rough
        translation appeared everybody noticed their flaws, but I observed that
        a dumb enough (or smart enough to be dangerous?) LLM could be useful in
        regularizing parameter conventions. I would ask an LLM how to do this
        or that, and it would "helpfully" generate non-functional command
        invocations that otherwise appeared very 'conformant' to the point that
        sometimes my opinion was that -even though the invocation was wrong
        given the current calling convention for a specific tool- it would
        actually improve the tool if it accepted that human-machine ABI or
        calling convention.
        
        Now let us take the example of man vs info, I am not proposing to let
        AI decide we should all settle on man; nor do I propose to let AI
        decide we should all use info instead, but with AI we could have the
        documentation made whole in the missing half, and then it's up to the
        user if they prefer man or info to fetch the documentation of that
        tool.
        
        Similarily for calling conventions, we could ask LLM's to assemble
        parameter styles and analyze command calling conventions / parameters
        and then find one or more canonical ways to communicate this, perhaps
        consulting an environment variable to figure out what calling
        convention the user declares to use.
       
          mikkupikku wrote 1 day ago:
          It has long been a pet peeve of mine that the *nix world has no
          standard reliable convention for how to interrogate a program for
          it's available flags.  Instead there are at least a dozen ways it can
          be done and you can't rely on any one of them.
       
            speed_spread wrote 1 day ago:
            I've been thinking about using an OpenAPI schema to describe cli
            tools. It would probably need extensions to describe stdin/stout
            and other things that don't happen in REST.
       
              tomeraberbach wrote 1 day ago:
              Have you seen Stainless's CLI generator?
              
  HTML        [1]: https://www.stainless.com/blog/stainless-cli-generator-y...
       
            lelanthran wrote 1 day ago:
            That's not specific to unix though.
       
              mikkupikku wrote 1 day ago:
              I didn't say it was, but I simply don't care what problems any
              other kind of system has because they aren't my problems.  Last
              time I had windows on any of my computers it was windows 98.
       
                wizzwizz4 wrote 22 hours 24 min ago:
                With the right DOS flags, Windows 98 had pretty well
                standardised on /?. Of course, when you changed those flags,
                some programs would stick with their hardcoded /?, others would
                change to -?, and yet others would just fall over.
       
          ralfd wrote 1 day ago:
          Similarly law professor Rob Anderson joked on X that llm hallucinated
          cases are good law: [1] > Indeed hallucinated cases are "better law."
          Drawing on Ronald Dworkin's theory of law as integrity, which posits
          that ideal legal decisions must "fit" existing precedents while
          advancing principled justice, this article argues that these
          hallucinations represent emergent normative ideals. AI models,
          trained on vast corpora of real case law, synthesize patterns to
          produce rulings that optimally align with underlying legal
          principles, filling gaps in the doctrinal landscape. Rather than
          errors, they embody the "cases that should exist," reflecting a
          Hercules-like judge's holistic interpretation.
          
  HTML    [1]: https://x.com/ProfRobAnderson/status/2019078989348774129
       
            mikkupikku wrote 1 day ago:
            Seems naive.  You can get an LLM to agree with almost anything if
            you say the right things to it, and it will hallucinate citations
            to back you up without skipping a beat.  You can probably get it to
            hallucinate case law to legalize murder on Mondays.
       
              herewego wrote 1 day ago:
              Youâre talking about manipulated/malicious/intentfully steered
              hallucination but the parent is referring to trained emergent
              hallucination (even if sycophantic). These are two different
              things and both can occur, but the latter is whatâs being
              tongue-in-cheek referred to by the professor.
       
          fragmede wrote 2 days ago:
          Ah yes, the vaunted ffmpeg-llm --"take these jpegs and turn them into
          an mp4 and use music.mp3 as the soundtrack" command.
       
            iugtmkbdfil834 wrote 2 days ago:
            Ngl.. I can see the merit and simultaneously recoil in horror as I
            am starting to understand what linux greybeards hate about
            windofication of linux ( and now proposed llm-ification of it :D).
       
        isoprophlex wrote 2 days ago:
        Huh. I've noticed CC running build or test steps piped into greps, to
        cull useless chatter. It did this all by itself, without my explicit
        instructions.
        
        Also, I just restart when the context window starts filling up. Small
        focused changes work better anyway IMO than single god-prompts that try
        do do everything but eventually exceed context and capability...
       
          tovej wrote 1 day ago:
          cc is the C compiler.
          
          Please don't overload that term with trendy LLM products. You can use
          the full name.
       
            isoprophlex wrote 1 day ago:
            Surely any distinguished connoisseur of terminology gatekeeping
            such as yourself is able to distinguish between 'cc' and 'CC'. My
            terminal is able to spot the difference, you should be able to as
            well.
       
              tovej wrote 1 day ago:
              CC is an environment variable / internal variable used by most
              build tools to identify the current C compiler. cc is the
              standardized name of the executable, now usually an alias for
              gcc.
              
              Both CC and cc refer to the C compiler, in slightly different
              ways.
       
        lucumo wrote 2 days ago:
        > Then a brick hits you in the face when it dawns on you that all of
        our tools are dumping crazy amounts of non-relevant context into stdout
        thereby polluting your context windows.
        
        Not just context windows. Lots of that crap is completely useless for
        humans too. It's not a rare occurrence for warnings to be hidden in so
        much irrelevant output that they're there for years before someone
        notices.
       
          kubanczyk wrote 1 day ago:
          Yeah. Maybe we only need:
          
             BATCH=yes    (default is no)
          
             --batch   (default is --no-batch)
          
          for the unusual case when you do want the `route print` on a BGP
          router to actually dump 8 gigabytes of text throughout next 2
          minutes. Maybe it's fine if a default output for anything generously
          applies summarization, such as "X, Y, Z ...and 9 thousand+ similar
          entries".
          
          Having two separate command names (one for human/llm, one for batch)
          sucks.
          
          Having `-h` for human, like ls or df do, sucks slightly less, but it
          is still a backward-compatibility hack which leads to `alias`
          proliferation and makes human lifes worse.
       
          jaggederest wrote 1 day ago:
          The old unix philosophy of "print nothing on success" looks crazy
          until you start trying to build pipes and shell scripts that use
          multiple tools internally. Also very quickly makes it clear why
          stdout and stderr are separate
       
            titzer wrote 1 day ago:
            It never felt crazy to me, with the exception that are many
            situations where having progress and diagnostic information
            (usually opt-in) made sense.
            
            I guess it comes down to a choice of printing out only relevant
            information. I hate noisy crap, like LaTeX.
       
            sgarland wrote 1 day ago:
            Also becomes rapidly apparent that most modern tooling takes an
            extremely liberal view of logging levels. The fact that youâve
            successfully processed a file is not INFO, thatâs DEBUG.
       
              adwn wrote 1 day ago:
              "Finished conversion of xyz.mp3 to xyz.ogg" is valuable progress
              information to a regular user, not just to developers, so it
              belongs in INFO, not DEBUG.
       
                sgarland wrote 1 day ago:
                I suppose this is subjective, but I disagree. If I want to know
                the status of each item, Iâd pass -v to the command. A simple
                summary at the end is sufficient; if I pass -q, I expect it to
                print nothing, only issuing a return code.
       
                  adwn wrote 21 hours 7 min ago:
                  > If I want to know the status of each item, Iâd pass -v to
                  the command.
                  
                  I don't disagree. In my opinion, the default log level for
                  CLI applications should be WARN, showing errors and warnings.
                  -q should turn this OFF (alternatively, -q for ERROR, and -qq
                  for OFF), -v means INFO, -vv DEBUG, -vvv TRACE.
                  For servers and daemons, the default should probably be INFO,
                  but that's debatable.
       
        scotty79 wrote 2 days ago:
        I think I noticed LLMs doing >/dev/null on routine operations.
       
        vidarh wrote 2 days ago:
        Rather than an LLM=true, this is better handled with standardizing
        quiet/verbose settings, as this is a question of verbosity, where an
        LLM is one instance where you usually want it to be quieter, but not
        always.
        
        Secondly, a helper to capture output and cache it, and frankly a tool
        or just options to the regular shell/bash tools to cache output and
        allow filtered retrieval of the cached output, as more so than context
        and tokens the frustration I have with the patterns shown is that often
        the agent will re-execute time-consuming tasks to retrieve a different
        set of lines from the output.
        
        A lot of the time it might even be best to run the tool with verbose
        output, but it'd be nice if tools had a more uniform way of giving
        output that was easier to systematically filter to essentials on first
        run (while caching the rest).
       
          MITSardine wrote 1 day ago:
          Yes, what's preventing the LLM from running myCommand >
          /tmp/out_someHash.txt ; tail out_someHash.txt and then greping or
          tailing around /tmp/out_someHash.txt on failure?
       
            vidarh wrote 1 day ago:
            There isn't really anything other than training, but they generally
            don't. You probably can get them to do that with some extra
            instructions, but part of the problem - at least with Claude - is
            that it's really trigger-happy about re-running the commands if it
            doesn't get the results it likes, assuming the results reflects
            stale results. Even with very expensive (in time) scripts I often
            see it start a run, pipe it to a file, put it in the background,
            then loop on sleep statements, occasionally get "frustrated" and
            check, only to throw the results away 30 seconds after they are
            done because it's made an unrelated change.
            
            A lot of the time this behaviour is probably right. But it's
            annoyingly hard to steer it to handle this correctly. I've had it
            do this even with make targets where the makefile itself makes
            clear the dependencies means it could trust the cached (in a file)
            results if it just runs make . Instead I regularly find it reading
            the Makefile and running the commands manually to work around the
            dependency management.
       
          iainmerrick wrote 2 days ago:
          Yes! After seeing a lot of discussions like this, I came up with a
          rule of thumb:
          
          Any special accommodations you make for LLMs are either a) also good
          for humans, or b) more trouble than they're worth.
          
          It would be nice for both LLMs and humans to have a tool that hides
          verbose tool output, but still lets you go back and inspect it if
          there's a problem. Although in practice as a human I just minimise
          the terminal and ignore the spam until it finishes. Maybe LLMs just
          need their own equivalent of that, rather than always being hooked up
          directly to the stdout firehose.
       
        Peritract wrote 2 days ago:
        This all seems like a lot of effort so that an agent can run `npm run
        build` for you.
        
        I get the article's overall point, but if we're looking to optimise
        processing and reduce costs, then 'only using agents for things that
        benefit from using agents' seems like an immediate win.
        
        You don't need an agent for simple, well-understood commands. Use them
        for things where the complexity/cost is worth it.
       
          m0rde wrote 2 days ago:
          Feedback loops are important to agents. In the article, the agent
          runs this build command and notices an error. With that feedback
          loop, it can iterate a solution without requiring human intervention.
          But the fact that the build command pollutes the context in this case
          is a double-edge sword.
       
            tovej wrote 1 day ago:
            If you really need that, the easy solution here is to get a list of
            errors using an LSP (or any other way of getting a list of errors,
            even grep "Error:"), and only giving that list of errors to the LLM
            if the build fails. Otherwise just tell the LLM "build succeeded".
            
            That's an extremely simple solution. I don't see the point in this
            LLM=true bullshit.
       
          teekert wrote 2 days ago:
          But those simple and well understood commands can be part of a huge
          workflow the LLM embarks on.
       
            Peritract wrote 1 day ago:
            Again, if your priority is to minimise costs, then not forcing
            every part of that huge workflow through the agent is a good start.
       
        pelasaco wrote 2 days ago:
        `Humans=True`
        
        The best friend isn't a dog, but the family that you build.
        Wife/Husband/kids. Those are going to be your best friends for life.
       
        jascha_eng wrote 2 days ago:
        I noticed this on my spring boot side project. Successful test runs
        produce thousands of log lines in default mode because I like to e.g.
        log every executed SQL statement during development. It gives me
        visibility into what my orm is actually doing (yeh yeh I know I should
        just write SQL myself). For me it's just a bit of scrolling and cmd+f
        if I need to find something specific but Claude actually struggles a
        lot with this massive output.
        Especially when it then tries to debug things finding the actual error
        message in the haystack of logs is suddenly very hard for the LLM. So I
        spent some time cleaning up my logs locally to improve the "agentic
        ergonomics" so to say.
        
        In general I think good DevEx needs to be dialed to 11 for successful
        agentic coding. Clean software architecture and interfaces, good docs,
        etc. are all extremely valuable for LLMs because any bit of confusion,
        weird patterns or inconsistency can be learned by a human over time as
        a "quirk" of the code base. But for LLMs that don't have memory they
        are utterly confusing and will lead the agent down the wrong path
        eventually.
       
        troethe wrote 2 days ago:
        On a lot of linux distros there is the `moreutils` package, which
        contains a command called `chronic`. Originally intended to be used in
        crontabs, it executes a command and only outputs its output if it
        fails.
        I think this could find another use case here.
       
          markus1189 wrote 1 day ago:
          This is great, I like this.  Wrote a 'chronic-file' variant that just
          dumps everything into a tmpfile and outputs the filepath for the
          agent in case of error and otherwise nothing
       
          kubanczyk wrote 1 day ago:
          Useful enough to justify registering on HN. Thank you!
       
        block_dagger wrote 2 days ago:
        Interesting idea but bad tspec. A better approach would be a single env
        var (DEV_MODE perhaps) with âagentâ and âhumanâ as values (and
        maybe âciâ).
       
        thrdbndndn wrote 2 days ago:
        Something related to this article, but not related to AI:
        
        As someone who loves coding pet projects but is not a software engineer
        by profession, I find the paradigm of maintaining all these config
        files and environment variables exhausting, and there seem to be more
        and more of them for any non-trivial projects.
        
        Not only do I find it hard to remember which is which or to locate any
        specific setting, their mechanisms often feel mysterious too: I often
        have to manually test them to see if they actually work or how exactly.
        This is not the case for actual code, where I can understand the logic
        just by reading it, since it has a clearer flow.
        
        And I just canât make myself blindly copy other people's config/env
        files without knowing what each switch is doing. This makes building
        projects, and especially copying or imitating other people's projects,
        a frustrating experience.
        
        How do you deal with this better, my fellow professionals?
       
          latexr wrote 2 days ago:
          > As someone who loves coding pet projects but is not a software
          engineer by profession, I find the paradigm of maintaining all these
          config files and environment variables exhausting
          
          Then donât.
          
          > How do you deal with this better, my fellow professionals?
          
          By not doing it.
          
          Look, itâs your project. Why are you frustrating yourself? What you
          do is you set up your environment, your configuration, what you
          need/understand/prefer and thatâs it. Youâll find out what those
          are as you go along. If you need, document each line as you add it.
          Donât complicate it.
       
          nananana9 wrote 2 days ago:
          Don't fall for the "JS ecosystem" trap and use sane tools. If a
          floobergloob requires you to add a floobergloob.config.js to your
          project root that's a very good indicator floobergloob is not worth
          your time.
          
          The only boilerplate files you need in a JS repo root are gitignore,
          package.json, package-lock.json and optionally tsconfig if you're
          using TS.
          
          A node.js project shouldn't require a build step, and most websites
          can get away with a single build.js that calls your bundler (esbuild)
          and copies some static files dist/.
       
          dlt713705 wrote 2 days ago:
          First of all, I read the documentation for the tools I'm trying to
          configure.
          
          I know this is very 20th century, but it helps a lot to understand
          how everything fits together and to remember what each tool does in a
          complex stack.
          
          Documentation is not always perfect or complete, but it makes it much
          easier to find parameters in config files and know which ones to
          tweak.
          
          And when the documentation falls short, the old adage applies: "Use
          the source, Luke."
       
          blauditore wrote 2 days ago:
          Software folks love over-engineering things. If you look at the web
          coding craze of a few years ago, people started piling up tooling on
          top of tooling (frameworks, build pipelines, linting, generators
          etc.) for something that could also be zero-config, and just a
          handful of files for simple projects.
          
          I guess this happens when you're too deep in a topic and forget that
          eventually the overhead of maintaining the tooling outweights the
          benefits. It's a curse of our profession. We build and automate
          things, so we naturally want to build and automate tooling for doing
          the things we do.
       
            bonesss wrote 2 days ago:
            I donât think those web tooling piles are over-engineered per se,
            they address huge challenges at Google and Facebook, but the
            profession is way too driven by hype and fashion and the result is
            a lot of cargo culting of stuff from Big Dogs unquestioningly.
            Wrong tooling for the job creates that bubble of over complicated
            app development.
            
            Inventing GraphQL and React and making your own PHP compiler are
            absolutely insane and obviously wrong decisions â for everyone
            who isnât Facebook. With Facebook revenue and Facebooks army of
            resume obsessed PHP monkeys they strike me as elegant technological
            solutions to otherwise intractable organizational issues. Insane,
            but highly profitable and fast moving. Outside of that context
            using React should be addressing clear pain points, not a dogmatic
            default.
            
            Weâre seeing some active pushback on it now online, but so much
            damage has been done.  Embracing progressive complexity of web
            apps/sites should leave the majority as barebones with minimal if
            any JavaScript.
            
            Facebook solutions for Facebook problems.  Most of us can be deeply
            happy our 99 problems donât include theirs, and live a simpler
            easier life.
       
              CuriouslyC wrote 1 day ago:
              Not sure why you lumped React in there. Hack is loopy, and
              GraphQL was overhyped but conditionally useful, but React was
              legitimately useful and a real improvement over other ways of
              doing things at the time. Compare React to contemporary stuff
              like jQuery, Backbone, Knockout, Angular 1.x, etc.
       
                rokob wrote 1 day ago:
                I agree with you very much, if what you are building actually
                benefits from that much client side interactivity. I think the
                counterpoint is that most products could be server rendered
                html templates with a tiny amount of plain js rather than
                complex frontend applications.
       
          tekacs wrote 2 days ago:
          Honestly... ask an AI agent to update them for you.
          
          They do an excellent job of reading documentation and searching to
          pick and choose and filter config that you might care about.
          
          After decades of maintaining them myself, this was a huge breath of
          fresh air for me.
       
          ehnto wrote 2 days ago:
          Simplify your tools and build processes to as few as possible, and
          pick tools with fewer (or no) config files.
          
          It could depend on what you're doing, but if it's not for work the
          config hell is probably optional.
       
          syhol wrote 2 days ago:
          You start with the cleanest most minimal config you can get away
          with, but over the years you keep adding small additions and tweaks
          until it becomes a massive behemoth that only you will ever
          understand the reasoning behind.
       
            iainmerrick wrote 2 days ago:
            Right, and then when you don't work on it for 6 or 12 months, you
            come back and find that now you don't understand it either.
       
              latexr wrote 2 days ago:
              Part of doing it well is adding comments as you add options. When
              I used vim, every line or block in the config had an accompanying
              comment explaining what it did, except if the configâs name was
              so obvious that a comment would just repeat it.
       
                iainmerrick wrote 2 days ago:
                That's a good call. It's a big problem for JSON configs given
                pure JSON's strict no-comments policy. I like tools that let
                you use .js or better yet .ts files for config.
       
                  dgacmu wrote 1 day ago:
                  Or consider jsonc - json with comments - or jwcc - which is
                  json with comments and trailing commas
                  to make life a little easier. [1] [2] There are a lot of
                  implementations of all of these, such as
                  
  HTML            [1]: https://jsonc.org/
  HTML            [2]: https://nigeltao.github.io/blog/2021/json-with-comma...
  HTML            [3]: https://github.com/tailscale/hujson
       
                    iainmerrick wrote 1 day ago:
                    I like this idea a lot, and pushed for json5 at a previous
                    job, but I think there are a few snags:
                    
                    - it's weird and unfamiliar, most people prefer plain JSON
                    
                    - there are too many competing standards to choose from
                    
                    - most existing tools just use plain JSON (sometimes with
                    support for non-standard features, like tsconfig allowing
                    trailing commas, but usually poorly documented and
                    unreliable)
                    
                    Much easier just to make the leap to .ts files, which are
                    ergonomically better in almost every way anyway.
       
                  mikkupikku wrote 1 day ago:
                  A lot of json parsers will permit comments even though it
                  isn't meant to be valid.  Worth trying it, see if a comment
                  breaks the config, and if not then use comments and don't
                  worry about it.
       
                    maleldil wrote 1 day ago:
                    For reference, jq and python don't allow comments.
       
        canto wrote 2 days ago:
        This is merely scratching the surface.
        
        LLMs (Claude Code in particular) will explicitly create token intensive
        steps, plans and responses - "just to be sure" - "need to check" -
        "verify no leftovers", will do git diff even tho not asked for, create
        python scripts for simple tasks, etc.
        Absolutely no cache (except the memory which is meh) nor indexing
        whatsoever.
        
        Pro plan for 20 bucks per month is essentially worthless and, because
        of this and we are entering new era - the era of $100+ monthly single
        subscription being something normal and natural.
       
          zarzavat wrote 1 day ago:
          I assume you're using Opus.
          
          I'm on the Pro plan. If I run out of tokens, which has only happened
          2 or 3 times in months of use, I just work on something else that
          Claude can't do, or ...write the code myself.
          
          You do have to keep a close eye on it, but I would be doing that
          anyway given that if it goes haywire it's wasting my time as well as
          tokens. I'd rather spend an extra minute writing a clearer prompt
          telling it exactly what I want it to do, than waste time on a slot
          machine.
       
          jen729w wrote 2 days ago:
          And yet when my $100 CC Pro renewed last month my instinctive thought
          was wow is that all?
       
            mrweasel wrote 2 days ago:
            That can be read in two ways:
            
            1) It's only $100, well worth the money.
            
            2) Surprisingly little value was provide for all that money.
       
              jen729w wrote 1 day ago:
              I'm not sure how you'd read it the second way.
       
                mrweasel wrote 1 day ago:
                $100 per month is a lot of money, in that case "wow is that
                all?" must refer back to how little you got from it.
       
          outime wrote 2 days ago:
          >LLMs (Claude Code in particular) will explicitly create token
          intensive steps, plans and responses - "just to be sure" - "need to
          check" - "verify no leftovers", will do git diff even tho not asked
          for, create python scripts for simple tasks, etc. Absolutely no cache
          (except the memory which is meh) nor indexing whatsoever.
          
          Most of these things can be avoided with a customized CLAUDE.md.
       
            canto wrote 2 days ago:
            Not in my projects it seems. Perhaps you can share your best
            practices?
            Moreover, avoiding these should be the default behaviour. Currently
            the default is to drain your pockets.
            
            P.S
            CLAUDE.md is sometimes useful but, it's a yet another token drain.
            Especially that it can grow exponentially.
       
              outime wrote 2 days ago:
              I agree it should be the default, but it isn't. I also tend to
              think it's mainly to make money by default.
              
              Another thing that helps is using plan mode first, since you can
              more or less see how it's going to proceed and steer it
              beforehand.
       
          slopinthebag wrote 2 days ago:
          codex seems chill
       
        keybored wrote 2 days ago:
        Speaking of obvious questions. Why are you counting pennies instead of
        getting the LLM to do it? (Unless the idea was from an LLM and the
        executive decision was left to the operator, as well as posting the
        article)
        
        So much content about furnishing the Markdown and the whatnot for your
        bots. But content is content?
       
        vorticalbox wrote 2 days ago:
        could we not instruct the LLM to run build commands in a sub agents
        which could then just return a summary of what happened?
        
        this avoids having to update everything to support LLM=true and keep
        your current context window free of noise.
       
          vidarh wrote 2 days ago:
          Make (or whatever) targets that direct output to file and returns a
          subset have helped me quite a bit. Then wrap that in an agent that
          also knows how and when to return cached and filtered data from the
          output vs. rerunning. Fewer tokens spent reading output details that
          usually won't matter, coupled with less context pollution in the main
          agent from figuring out what to do.
       
          canto wrote 2 days ago:
          q() {
              local output
              output=$("$@" 2>&1)
              local ec=$?
              echo "$output" | tail -5
              return $ec
          }
          
          There :)
       
          dizzy3gg wrote 2 days ago:
          That would achieve 1 of the 3 wins.
       
            wongarsu wrote 2 days ago:
            If you use a smaller model for the sub agent you get all three
            
            Of course you can combine both approaches for even greater gains.
            But Claude Code and like five alternatives gaining an efficient
            tool-calling paradigm where console output is interpreted by Haiku
            instead of Opus seems like a much quicker win than adding an LLM
            env flag to every cli tool under the sun
       
            noname120 wrote 2 days ago:
            Probably the main one, people mostly complain about context window
            management rather than token usage
       
              Bishonen88 wrote 2 days ago:
              Dunno about that. Having used the $20 claude plan, I ran out of
              tokens within 30 minutes if running 3-4 agents at the same time.
              Often times, all 3-4 will run a build command at the end to
              confirm that the changes are successful. Thus the loss of tokens
              quickly gets out of hand.
              
              Edit: Just remembered that sometimes, I see claude running the
              build step in two terminals, side-by-side at nearly the same time
              :D
       
        Bishonen88 wrote 2 days ago:
        great idea. thought about the waste of tokens dozens of times when I
        saw claude code increase the token count in the CLI after a build. I
        was wondering if there's a way to stop that, but not enough to actually
        look into it. I'd love for popular build tools to implement something
        along those lines!
       
        bigblind wrote 2 days ago:
        I never considered the volume of output tokens from dev tools, but
        yeah, I like this idea a lot.
       
       
   DIR <- back to front page