_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
HTML Hugging Face Skills
neya wrote 6 hours 55 min ago:
I'm actually on the fence with skills. Vercel shared a study where they
claimed skills performed actually worse [0] - than just injecting into
the context directly via agents.md. Similarly, there was a paper
recently that suggested the same [1] Of course, the classic response to
these - even WITH the evidence is often "yOu'Re dOiNg iT wRonG". Does
anyone actually have proof - where using skill.md is arguably better
than not?
Edit: Fixed company name, added link to Vercel's claim
[0] [1]
HTML [1]: https://vercel.com/blog/agents-md-outperforms-skills-in-our-ag...
HTML [2]: https://arxiv.org/abs/2602.11988
evalstate wrote 6 hours 27 min ago:
I think the paper is saying specifically that it's redundant to
include information about your coding repository when that
information is otherwise available to the agent in higher fidelity
forms (e.g. package.json). This makes sense - but not sure it's about
Skills directly.
For the former I'd be interested in learning more about that. From a
harness perspective the difference would be the inclusion of the
description in the system prompt, and an additional tool call to
return the skill. While that's certainly less efficient than adding
the context directly I'd be surprised if it degraded task performance
significantly.
I tend to be quite focussed with my Skill/Tool usage in general
though, inviting them in to context when needed rather than
increasing the potential for model confusion.
neya wrote 5 hours 13 min ago:
Here you go:
Sorry, I miquoted the company, it was Vercel, not Cursor.
"A compressed 8KB docs index embedded directly in AGENTS.md
achieved a 100% pass rate, while skills maxed out at 79% even with
explicit instructions telling the agent to use them. Without those
instructions, skills performed no better than having no
documentation at all."
HTML [1]: https://vercel.com/blog/agents-md-outperforms-skills-in-ou...
evalstate wrote 4 hours 42 min ago:
Gotcha - yeah, it removes the tool calling step so their content
is always in context (noting they took action to try and reduce
the size of that). The framing seems a little simplistic --
thanks for the link.
bandrami wrote 8 hours 3 min ago:
At what point does it become computationally cheaper to just generate
random elf binaries, test them against constraints, and iterate until
they work as specified?
KineticLensman wrote 6 hours 7 min ago:
See 'genetic programming' for techniques that are sort of based on
this idea. Typical approach is to have a problem representation (gene
analogues) that can be used to create a population of different
individual solutions. Test them all against a fitness function and
retain those that are 'best' according to some metric. Then create
(breed) some new individuals who have some of the characteristics of
the winners, perhaps mutated somewhat, insert these into the
population. Repeat until you have solved the problem or have a good
enough solution.
Challenges (apart from the time taken) are coming up with a good
enough gene representation that captures the essence of the problem,
building an efficient fitness function, and avoiding local maxima -
i.e. a solution that is almost but not quite good enough, but from
where you can't breed a better solution.,
Ross00781 wrote 11 hours 14 min ago:
The tension between discoverability and flexibility is real. I wonder
if there's room for a hybrid approach - structured skill metadata
(think OpenAPI-style specs for inputs/outputs) that can be compiled
down to markdown context when needed. This would let agents validate
tool calls before making them, while still keeping the LLM-friendly
text format for reasoning about when to use them.
rukuu001 wrote 18 hours 32 min ago:
Say it fast out loud - "Hugging Face Skills" - probably not the message
Hugging Face wants to send.
firemelt wrote 20 hours 9 min ago:
I really dont get skills at all is is just claude.md but for specific
usecase?
neurostimulant wrote 17 hours 39 min ago:
Skills are only loaded when you need them, so youâll probably use
fewer tokens overall compared to MCP servers or including them
manually in your main AGENTS.md/CLAUDE.md file, which are always
loaded in the system prompt.
sothatsit wrote 20 hours 33 min ago:
Iâve had a great experience with CLI-related skills at work. We have
written CLIs for systems like Jira, along with skills that document the
CLIs and describe the organisation of Jira at our company. Claude Code
loads these reliably whenever you mention Jira or an issue number.
Alternatively, Iâve had less luck with purely documentation skills.
They seem to be loaded less reliably when theyâre not linked to
actions the agent wants to take, and it is frustrating to watch the
agent try to figure something out when the docs are one skill load
away.
jedisct1 wrote 9 hours 5 min ago:
Same experience here.
Documentation-based skills donât really work in practice. They tend
to waste tokens instead of adding value.
CLI skills are also redundant when the CLI already provides clear
built-in help messages. Those help messages are usually up to date,
unlike separate skills that need to be maintained independently.
If the CLI itself is confusing (and would likely be confusing for
humans as well) then targeted skills can serve as a temporary
workaround, a kind of band-aid.
Where skills truly shine is when agents need to understand
non-generic terms and concepts: unique product names, brand-specific
terminology, custom function names, and other domain-specific
language.
sothatsit wrote 7 hours 52 min ago:
I strongly disagree about CLI help being a good enough solution.
Skills with CLIs backing them is the gold standard right now for a
reason.
1. Skills let the agent know the CLI is available because they get
an entry in the context window.
2. They let you provide a ton of organisational knowledge and
processes that the agent would have a hard time figuring out from
the CLI alone.
3. It is just more efficient to provide quick information in a
skill than it is to require an agent to figure out every detail
from CLI help messages alone every single time.
mccoyb wrote 21 hours 23 min ago:
Skills feel analogous to behavioral programs. If you give an agent
access to a programmable substrate (e.g. bash + CLI tools), you write
these Markdown programs which are triggered and read when the agent
thinks certain behaviors will be beneficial.
It's a great idea: really neat take on programmability, and can be
reloaded while the agent is running without tweaking the harness, etc
-- lots of benefits.
`pi` has a great skills implementation too.
I think skills might really shine if you take a minimal approach to the
system prompt (like `pi`) -- a lot of the times, if I want to
orchestrate the agent in some complex behavior, I want to start fresh,
and having it walk through a bunch of skills ... possibly the smaller
the system prompt, the more likely the agent is to follow the skills
without issue.
evalstate wrote 21 hours 1 min ago:
Yes -- skills live in a special gap between "should have been a
deterministic program" and "model already had the ability to figure
this out". My personal experience leaves me in agreement that minimal
system prompts are definitely the way to go.
RyanShook wrote 21 hours 41 min ago:
So far my experience with skills is that they slow down or confuse
agents unless you as the user understand what the skill actually
contains and how it works. In general I would rather install a CLI tool
and explain to the agent how I want it used vs. trying to get the agent
to use a folder of instructions that I don't really understand what's
inside.
selridge wrote 20 hours 32 min ago:
I mean, yes. You should do exactly that: instruct an agent on how to
do something you understand in terms you can explain.
Putting that in a `.md` file just means you donât need to do it
twice.
giancarlostoro wrote 21 hours 0 min ago:
> So far my experience with skills is that they slow down or confuse
agents unless you as the user understand what the skill actually
contains and how it works. In general I would rather install a CLI
tool and explain to the agent how I want it used vs. trying to get
the agent to use a folder of instructions that I don't really
understand what's inside.
For Claude Code I add the tooling into either CLAUDE.md or
.claude/INSTRUCTIONS.md which Claude reads when you start a new
instance. If you update it, you MUST ask Claude to reread the file so
it knows the full instructions.
airstrike wrote 21 hours 35 min ago:
Most LLM "harnessing" seems very lazy and bolted on. You can build
much more robustly by leveraging a more complex application layer
where you can manage state, but I guess people struggle building that
TeMPOraL wrote 17 hours 10 min ago:
Common failure mode I've observed is people building a stateful
harness for the LLM and then forgetting to tell the LLM about it.
Leads to funny/disturbing results whenever the two "desync" in some
way.
Example: a plan/act division, with the harness keeping state of
which mode is active, and while in "plan mode", removing/disabling
tools that can write data. Cue a mishandled timeout or an UI bug
that prevents switching to "act mode", and suddenly the agent is
spinning for 10 minutes questioning the nature of their reality, as
the basic tools it needs to write code inexplicably ceased to
exist, then opting for empirical experimentation and eventually
figuring out a way to reimplement "search/replace" using shell
calls or Python or whatever alternative wasn't properly sandboxed
by the harness writers...
Part of this is just bugs in code, but what irks me is watching the
LLM getting gaslighted or plain confused by rules of reality
changing underneath it, all because the harness state wasn't made
observable to the agent, or someone couldn't be arsed to have their
error messages and security policies provide feedback to the LLM
and not just the user.
daturkel wrote 22 hours 2 min ago:
Skills in CC have been a bit frustrating for me. They don't trigger
reliably and the emphasis on "it's just markdown" makes it harder to
have them reliably call certain tools with the correct arguments.
The idea that agent harnesses should primarily have their functionality
dictated by plaintext commands feels like a copout around programming
in some actually useful, semi-opinionated functionality (not to mention
that it makes capability-discoverability basically impossible). For
example, Claude Code has three modes: plan, ask about edits, and
auto-accept edits. I always start with a plan and then I end up with
multiple tasks. I'd like to auto-accept edits for a step at a time and
the only way to do that reliably is to ask CC to do that, but it's not
reliableâsometimes it just continues to go into the next step. If
this were programmed explicitly into CC rather than relying on agent
obedience, we could ditch the nondeterminism and just have a hook on
task completion that toggles auto-complete back to "off."
apwheele wrote 3 hours 43 min ago:
I view them as more idiosyncratic docs, but focused on how to write
code (there is so much huggingface code floating around the internet,
the models do quite well with it already).
I have not had much success with skills that have tree based logic
(if a do x, else do y), they just tend to do everything in the skill
(so will do both x and y).
But just as "hey follow this outline of steps a,b,c" it works quite
well in my experience.
ctoth wrote 17 hours 21 min ago:
Behavior trees. They are precisely what we need. Somebody just needs
to go build the damn thing.
conception wrote 20 hours 44 min ago:
[1] works very well
HTML [1]: https://scottspence.com/posts/measuring-claude-code-skill-ac...
btown wrote 20 hours 49 min ago:
The saving grace of Claude Code skills is that when writing them
yourself, you can give them frontmatter like "use when mentioning X"
that makes them become relevant for very specific "shibboleths" -
which you can then use when prompting.
Are we at an ideal balance where Claude Code is pulling things in
proactively enough... without bringing in irrelevant skills just
because the "vibes" might match in frontmatter? Arguably not. But
it's still a powerful system.
winwang wrote 11 hours 45 min ago:
For manual prompting, I use a "macro"-like system where I can just
add `[@mymacro]` in the prompt itself and Claude will know to
`./lookup.sh mymacro` to load its definition. Can easily chain
multiple together. `[@code-review:3][@pycode]` -> 3x parallel code
review, initialize subagents with python-code-guide.md or
something. ...Also wrote a parser so it gets reminded by
additionalContext in hooks.
Interestingly, I've seen Claude do `./lookup.sh relevant-macro`
without any prompting by me. Probably due it being mentioned in the
compaction summary.
giancarlostoro wrote 21 hours 3 min ago:
Are you using either CLAUDE.md or .claude/INSTRUCTIONS.md to direct
Claude about the different agents?
Also, be aware that when you add new instructions if you don't tell
claude to reread these files, it will NOT have it in its context
window until you tell it to read them OR you make a new CC session.
This was a bit frustrating for me because it was not immediately
obvious.
siquick wrote 21 hours 5 min ago:
> Skills in CC have been a bit frustrating for me. They don't trigger
reliably
Referencing them in AGENTS/CLAUDE.md has increased their usage for
me.
btbuildem wrote 21 hours 7 min ago:
> idea that agent harnesses should primarily have their functionality
dictated by plaintext commands feels like a copout
I think it's more along the lines of acknowledging the fast-paced
changes in the field, and refusing to cast into code something that's
likely to rapidly evolve in the near future.
Once things settle down into tested practices, we'll see more
"permanent" instrumentation arise.
daturkel wrote 21 hours 3 min ago:
Surely this logic doesn't apply if we're to believe that "code is
cheap" now :p
btbuildem wrote 3 hours 50 min ago:
"Code is cheap" has two interpretations here: one, that's its no
longer seen as the artisanally-crafted fine product, now it's
"manufactured". Two, though, is that it's cheaper in ops -- once
the criteria are fully discovered, once no more new paths for the
agents to roam, things that have been cast into code consume
minimal resources (in AI scale of things), they're doggedly
deterministic, and are free of heavy dependencies.
So yeah, I believe "it's a phase" but in a sense that it's a
development phase, just like planning or prototyping.
chickensong wrote 21 hours 7 min ago:
> sometimes it just continues to go into the next step
Use a structured workflow that loops on every task and includes a
pause for user confirmation at the end. Enforce it with a hook. I'm
not sure if you can toggle auto-accept this way, but I think the end
result is what you're asking for.
I use this with great success, sometimes toggling auto-accept on when
confidence is high that Claude can complete a step without guidance,
and toggling off when confidence is low and you want to slow down and
steer, with Claude stopping between the steps. Now that prompt
suggestions are a thing, you can just hit enter to continue on the
suggested prompt to continue.
DarmokJalad1701 wrote 21 hours 22 min ago:
You can write skills that have an associated js/python/whatever
script.
Frannky wrote 21 hours 29 min ago:
I think unless you're doing simple tasks, skills are unreliable. For
better reliability, I have the agent trigger APIs that handles the
complex logic (and its own LLM calls) internally. Has anyone found a
solid strategy for making complex 'skills' more dependable?
triage8004 wrote 13 hours 38 min ago:
I found interrupting and insisting on the skill use the easiest
way...got to be better ways like this
Rebelgecko wrote 16 hours 16 min ago:
Having the skill be "call this script with these args" seems to
reduce the amount of stuff that goes wrong
selridge wrote 20 hours 5 min ago:
In my experience, all text âinstructionâ to the agent should be
taken on a prayer. If you write compact agent guidance that is not
contradictory and is local and useful to your project, the agent
will follow it most of the time. There is nothing that you can
write that will force the agent to follow it all of the time.
If one can accept failure to follow instructions, then the world is
open. That condition does not really comport with how we think
about machines. Nevertheless, it is the case.
Right now, a productive split is to place things that you need to
happen into tooling and harnessing, and place things that would be
nice for the agent to conceptualize into skills.
Frannky wrote 18 hours 22 min ago:
Yeah, that's my experience too
chickensong wrote 21 hours 0 min ago:
Is it that the skills aren't being triggered reliably, or that they
get triggered but the skill itself is complex and doesn't work as
expected?
Frannky wrote 20 hours 48 min ago:
both
chickensong wrote 20 hours 12 min ago:
I haven't done a lot with skills yet, but maybe try and
leverage hooks to enforce skill usage, and move most of the
skill's logic and complexity into a script so the agent only
needs to reason about how to call the script.
Frannky wrote 16 hours 44 min ago:
I think I'll wait until they are more reliable. For now, I
use skills, but they just specify which endpoint to call. It
should be also safer, different vps, no access to credentials
but the bearer token.
plufz wrote 21 hours 12 min ago:
My only strategy is what used to be called slash-commands but are
also skills now, I.e I call them explicitly. I think that actually
works quite well and you can allow specific tools and tell it to
use specific hooks for security of validation in the frontmatter
properties.
PantaloonFlames wrote 21 hours 48 min ago:
You can publish scripts with skills you author, right? With
carefully constructed markdown that should allow the agent to call
tools the right way.
DIR <- back to front page