codevoid.de/1/hn/comments_47139902.gph

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   Hugging Face Skills
       
       
        neya wrote 6 hours 55 min ago:
        I'm actually on the fence with skills. Vercel shared a study where they
        claimed skills performed actually worse [0] - than just injecting into
        the context directly via agents.md. Similarly, there was a paper
        recently that suggested the same [1] Of course, the classic response to
        these - even WITH the evidence is often "yOu'Re dOiNg iT wRonG". Does
        anyone actually have proof - where using skill.md is arguably better
        than not?
        
        Edit: Fixed company name, added link to Vercel's claim
        
        [0] [1]
        
  HTML  [1]: https://vercel.com/blog/agents-md-outperforms-skills-in-our-ag...
  HTML  [2]: https://arxiv.org/abs/2602.11988
       
          evalstate wrote 6 hours 27 min ago:
          I think the paper is saying specifically that it's redundant to
          include information about your coding repository when that
          information is otherwise available to the agent in higher fidelity
          forms (e.g. package.json). This makes sense - but not sure it's about
          Skills directly.
          
          For the former I'd be interested in learning more about that. From a
          harness perspective the difference would be the inclusion of the
          description in the system prompt, and an additional tool call to
          return the skill. While that's certainly less efficient than adding
          the context directly I'd be surprised if it degraded task performance
          significantly.
          
          I tend to be quite focussed with my Skill/Tool usage in general
          though, inviting them in to context when needed rather than
          increasing the potential for model confusion.
       
            neya wrote 5 hours 13 min ago:
            Here you go:
            
            Sorry, I miquoted the company, it was Vercel, not Cursor.
            
            "A compressed 8KB docs index embedded directly in AGENTS.md
            achieved a 100% pass rate, while skills maxed out at 79% even with
            explicit instructions telling the agent to use them. Without those
            instructions, skills performed no better than having no
            documentation at all."
            
  HTML      [1]: https://vercel.com/blog/agents-md-outperforms-skills-in-ou...
       
              evalstate wrote 4 hours 42 min ago:
              Gotcha - yeah, it removes the tool calling step so their content
              is always in context (noting they took action to try and reduce
              the size of that). The framing seems a little simplistic --
              thanks for the link.
       
        bandrami wrote 8 hours 3 min ago:
        At what point does it become computationally cheaper to just generate
        random elf binaries, test them against constraints, and iterate until
        they work as specified?
       
          KineticLensman wrote 6 hours 7 min ago:
          See 'genetic programming' for techniques that are sort of based on
          this idea. Typical approach is to have a problem representation (gene
          analogues) that can be used to create a population of different
          individual solutions. Test them all against a fitness function and
          retain those that are 'best' according to some metric. Then create
          (breed) some new individuals who have some of the characteristics of
          the winners, perhaps mutated somewhat, insert these into the
          population. Repeat until you have solved the problem or have a good
          enough solution.
          
          Challenges (apart from the time taken) are coming up with a good
          enough gene representation that captures the essence of the problem,
          building an efficient fitness function, and avoiding local maxima -
          i.e. a solution that is almost but not quite good enough, but from
          where you can't breed a better solution.,
       
        Ross00781 wrote 11 hours 14 min ago:
        The tension between discoverability and flexibility is real. I wonder
        if there's room for a hybrid approach - structured skill metadata
        (think OpenAPI-style specs for inputs/outputs) that can be compiled
        down to markdown context when needed. This would let agents validate
        tool calls before making them, while still keeping the LLM-friendly
        text format for reasoning about when to use them.
       
        rukuu001 wrote 18 hours 32 min ago:
        Say it fast out loud - "Hugging Face Skills" - probably not the message
        Hugging Face wants to send.
       
        firemelt wrote 20 hours 9 min ago:
        I really dont get skills at all is is just claude.md but for specific
        usecase?
       
          neurostimulant wrote 17 hours 39 min ago:
          Skills are only loaded when you need them, so youâll probably use
          fewer tokens overall compared to MCP servers or  including them
          manually in your main AGENTS.md/CLAUDE.md file, which are always
          loaded in the system prompt.
       
        sothatsit wrote 20 hours 33 min ago:
        Iâve had a great experience with CLI-related skills at work. We have
        written CLIs for systems like Jira, along with skills that document the
        CLIs and describe the organisation of Jira at our company. Claude Code
        loads these reliably whenever you mention Jira or an issue number.
        
        Alternatively, Iâve had less luck with purely documentation skills.
        They seem to be loaded less reliably when theyâre not linked to
        actions the agent wants to take, and it is frustrating to watch the
        agent try to figure something out when the docs are one skill load
        away.
       
          jedisct1 wrote 9 hours 5 min ago:
          Same experience here.
          
          Documentation-based skills donât really work in practice. They tend
          to waste tokens instead of adding value.
          
          CLI skills are also redundant when the CLI already provides clear
          built-in help messages. Those help messages are usually up to date,
          unlike separate skills that need to be maintained independently.
          
          If the CLI itself is confusing (and would likely be confusing for
          humans as well) then targeted skills can serve as a temporary
          workaround, a kind of band-aid.
          
          Where skills truly shine is when agents need to understand
          non-generic terms and concepts: unique product names, brand-specific
          terminology, custom function names, and other domain-specific
          language.
       
            sothatsit wrote 7 hours 52 min ago:
            I strongly disagree about CLI help being a good enough solution.
            Skills with CLIs backing them is the gold standard right now for a
            reason.
            
            1. Skills let the agent know the CLI is available because they get
            an entry in the context window.
            
            2. They let you provide a ton of organisational knowledge and
            processes that the agent would have a hard time figuring out from
            the CLI alone.
            
            3. It is just more efficient to provide quick information in a
            skill than it is to require an agent to figure out every detail
            from CLI help messages alone every single time.
       
        mccoyb wrote 21 hours 23 min ago:
        Skills feel analogous to behavioral programs. If you give an agent
        access to a programmable substrate (e.g. bash + CLI tools), you write
        these Markdown programs which are triggered and read when the agent
        thinks certain behaviors will be beneficial.
        
        It's a great idea: really neat take on programmability, and can be
        reloaded while the agent is running without tweaking the harness, etc
        -- lots of benefits.
        
        `pi` has a great skills implementation too.
        
        I think skills might really shine if you take a minimal approach to the
        system prompt (like `pi`) -- a lot of the times, if I want to
        orchestrate the agent in some complex behavior, I want to start fresh,
        and having it walk through a bunch of skills ... possibly the smaller
        the system prompt, the more likely the agent is to follow the skills
        without issue.
       
          evalstate wrote 21 hours 1 min ago:
          Yes -- skills live in a special gap between "should have been a
          deterministic program" and "model already had the ability to figure
          this out". My personal experience leaves me in agreement that minimal
          system prompts are definitely the way to go.
       
        RyanShook wrote 21 hours 41 min ago:
        So far my experience with skills is that they slow down or confuse
        agents unless you as the user understand what the skill actually
        contains and how it works. In general I would rather install a CLI tool
        and explain to the agent how I want it used vs. trying to get the agent
        to use a folder of instructions that I don't really understand what's
        inside.
       
          selridge wrote 20 hours 32 min ago:
          I mean, yes. You should do exactly that: instruct an agent on how to
          do something you understand in terms you can explain.
          
          Putting that in a `.md` file just means you donât need to do it
          twice.
       
          giancarlostoro wrote 21 hours 0 min ago:
          > So far my experience with skills is that they slow down or confuse
          agents unless you as the user understand what the skill actually
          contains and how it works. In general I would rather install a CLI
          tool and explain to the agent how I want it used vs. trying to get
          the agent to use a folder of instructions that I don't really
          understand what's inside.
          
          For Claude Code I add the tooling into either CLAUDE.md or
          .claude/INSTRUCTIONS.md which Claude reads when you start a new
          instance. If you update it, you MUST ask Claude to reread the file so
          it knows the full instructions.
       
          airstrike wrote 21 hours 35 min ago:
          Most LLM "harnessing" seems very lazy and bolted on. You can build
          much more robustly by leveraging a more complex application layer
          where you can manage state, but I guess people struggle building that
       
            TeMPOraL wrote 17 hours 10 min ago:
            Common failure mode I've observed is people building a stateful
            harness for the LLM and then forgetting to tell the LLM about it.
            Leads to funny/disturbing results whenever the two "desync" in some
            way.
            
            Example: a plan/act division, with the harness keeping state of
            which mode is active, and while in "plan mode", removing/disabling
            tools that can write data. Cue a mishandled timeout or an UI bug
            that prevents switching to "act mode", and suddenly the agent is
            spinning for 10 minutes questioning the nature of their reality, as
            the basic tools it needs to write code inexplicably ceased to
            exist, then opting for empirical experimentation and eventually
            figuring out a way to reimplement "search/replace" using shell
            calls or Python or whatever alternative wasn't properly sandboxed
            by the harness writers...
            
            Part of this is just bugs in code, but what irks me is watching the
            LLM getting gaslighted or plain confused by rules of reality
            changing underneath it, all because the harness state wasn't made
            observable to the agent, or someone couldn't be arsed to have their
            error messages and security policies provide feedback to the LLM
            and not just the user.
       
        daturkel wrote 22 hours 2 min ago:
        Skills in CC have been a bit frustrating for me. They don't trigger
        reliably and the emphasis on "it's just markdown" makes it harder to
        have them reliably call certain tools with the correct arguments.
        
        The idea that agent harnesses should primarily have their functionality
        dictated by plaintext commands feels like a copout around programming
        in some actually useful, semi-opinionated functionality (not to mention
        that it makes capability-discoverability basically impossible). For
        example, Claude Code has three modes: plan, ask about edits, and
        auto-accept edits. I always start with a plan and then I end up with
        multiple tasks. I'd like to auto-accept edits for a step at a time and
        the only way to do that reliably is to ask CC to do that, but it's not
        reliableâsometimes it just continues to go into the next step. If
        this were programmed explicitly into CC rather than relying on agent
        obedience, we could ditch the nondeterminism and just have a hook on
        task completion that toggles auto-complete back to "off."
       
          apwheele wrote 3 hours 43 min ago:
          I view them as more idiosyncratic docs, but focused on how to write
          code (there is so much huggingface code floating around the internet,
          the models do quite well with it already).
          
          I have not had much success with skills that have tree based logic
          (if a do x, else do y), they just tend to do everything in the skill
          (so will do both x and y).
          
          But just as "hey follow this outline of steps a,b,c" it works quite
          well in my experience.
       
          ctoth wrote 17 hours 21 min ago:
          Behavior trees. They are precisely what we need. Somebody just needs
          to go build the damn thing.
       
          conception wrote 20 hours 44 min ago:
           [1] works very well
          
  HTML    [1]: https://scottspence.com/posts/measuring-claude-code-skill-ac...
       
          btown wrote 20 hours 49 min ago:
          The saving grace of Claude Code skills is that when writing them
          yourself, you can give them frontmatter like "use when mentioning X"
          that makes them become relevant for very specific "shibboleths" -
          which you can then use when prompting.
          
          Are we at an ideal balance where Claude Code is pulling things in
          proactively enough... without bringing in irrelevant skills just
          because the "vibes" might match in frontmatter? Arguably not. But
          it's still a powerful system.
       
            winwang wrote 11 hours 45 min ago:
            For manual prompting, I use a "macro"-like system where I can just
            add `[@mymacro]` in the prompt itself and Claude will know to
            `./lookup.sh mymacro` to load its definition. Can easily chain
            multiple together. `[@code-review:3][@pycode]` -> 3x parallel code
            review, initialize subagents with python-code-guide.md or
            something. ...Also wrote a parser so it gets reminded by
            additionalContext in hooks.
            
            Interestingly, I've seen Claude do `./lookup.sh relevant-macro`
            without any prompting by me. Probably due it being mentioned in the
            compaction summary.
       
          giancarlostoro wrote 21 hours 3 min ago:
          Are you using either CLAUDE.md or .claude/INSTRUCTIONS.md to direct
          Claude about the different agents?
          
          Also, be aware that when you add new instructions if you don't tell
          claude to reread these files, it will NOT have it in its context
          window until you tell it to read them OR you make a new CC session.
          This was a bit frustrating for me because it was not immediately
          obvious.
       
          siquick wrote 21 hours 5 min ago:
          > Skills in CC have been a bit frustrating for me. They don't trigger
          reliably
          
          Referencing them in AGENTS/CLAUDE.md has increased their usage for
          me.
       
          btbuildem wrote 21 hours 7 min ago:
          > idea that agent harnesses should primarily have their functionality
          dictated by plaintext commands feels like a copout
          
          I think it's more along the lines of acknowledging the fast-paced
          changes in the field, and refusing to cast into code something that's
          likely to rapidly evolve in the near future.
          
          Once things settle down into tested practices, we'll see more
          "permanent" instrumentation arise.
       
            daturkel wrote 21 hours 3 min ago:
            Surely this logic doesn't apply if we're to believe that "code is
            cheap" now :p
       
              btbuildem wrote 3 hours 50 min ago:
              "Code is cheap" has two interpretations here: one, that's its no
              longer seen as the artisanally-crafted fine product, now it's
              "manufactured". Two, though, is that it's cheaper in ops -- once
              the criteria are fully discovered, once no more new paths for the
              agents to roam, things that have been cast into code consume
              minimal resources (in AI scale of things), they're doggedly
              deterministic, and are free of heavy dependencies.
              
              So yeah, I believe "it's a phase" but in a sense that it's a
              development phase, just like planning or prototyping.
       
          chickensong wrote 21 hours 7 min ago:
          > sometimes it just continues to go into the next step
          
          Use a structured workflow that loops on every task and includes a
          pause for user confirmation at the end. Enforce it with a hook. I'm
          not sure if you can toggle auto-accept this way, but I think the end
          result is what you're asking for.
          
          I use this with great success, sometimes toggling auto-accept on when
          confidence is high that Claude can complete a step without guidance,
          and toggling off when confidence is low and you want to slow down and
          steer, with Claude stopping between the steps. Now that prompt
          suggestions are a thing, you can just hit enter to continue on the
          suggested prompt to continue.
       
          DarmokJalad1701 wrote 21 hours 22 min ago:
          You can write skills that have an associated js/python/whatever
          script.
       
          Frannky wrote 21 hours 29 min ago:
          I think unless you're doing simple tasks, skills are unreliable. For
          better reliability, I have the agent trigger APIs that handles the
          complex logic (and its own LLM calls) internally. Has anyone found a
          solid strategy for making complex 'skills' more dependable?
       
            triage8004 wrote 13 hours 38 min ago:
            I found interrupting and insisting on the skill use the easiest
            way...got to be better ways like this
       
            Rebelgecko wrote 16 hours 16 min ago:
            Having the skill be "call this script with these args" seems to
            reduce the amount of stuff that goes wrong
       
            selridge wrote 20 hours 5 min ago:
            In my experience, all text âinstructionâ to the agent should be
            taken on a prayer. If you write compact agent guidance that is not
            contradictory and is local and useful to your project, the agent
            will follow it most of the time. There is nothing that you can
            write that will force the agent to follow it all of the time.
            
            If one can accept failure to follow instructions, then the world is
            open. That condition does not really comport with how we think
            about machines. Nevertheless, it is the case.
            
            Right now, a productive split is to place things that you need to
            happen into tooling and harnessing, and place things that would be
            nice for the agent to conceptualize into skills.
       
              Frannky wrote 18 hours 22 min ago:
              Yeah, that's my experience too
       
            chickensong wrote 21 hours 0 min ago:
            Is it that the skills aren't being triggered reliably, or that they
            get triggered but the skill itself is complex and doesn't work as
            expected?
       
              Frannky wrote 20 hours 48 min ago:
              both
       
                chickensong wrote 20 hours 12 min ago:
                I haven't done a lot with skills yet, but maybe try and
                leverage hooks to enforce skill usage, and move most of the
                skill's logic and complexity into a script so the agent only
                needs to reason about how to call the script.
       
                  Frannky wrote 16 hours 44 min ago:
                  I think I'll wait until they are more reliable. For now, I
                  use skills, but they just specify which endpoint to call. It
                  should be also safer, different vps, no access to credentials
                  but the bearer token.
       
            plufz wrote 21 hours 12 min ago:
            My only strategy is what used to be called slash-commands but are
            also skills now, I.e I call them explicitly. I think that actually
            works quite well and you can allow specific tools and tell it to
            use specific hooks for security of validation in the frontmatter
            properties.
       
          PantaloonFlames wrote 21 hours 48 min ago:
          You can publish scripts with skills you author, right?    With
          carefully constructed markdown that should allow the agent to call
          tools the right way.
       
       
   DIR <- back to front page