_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
HTML /architect: Reduce Fable tokens by 80%, Fable orchestrates/reviews, Codex builds
hmokiguess wrote 10 hours 20 min ago:
I guess that didnât age well
Teknomadix wrote 11 hours 58 min ago:
US Govt reduces Fable Tokens by 100%.
Retr0id wrote 12 hours 6 min ago:
> freezes the gates
LLM-written readmes love to use inscrutable jargon that means nothing
outside of the context window that birthed it.
nostrebored wrote 9 hours 39 min ago:
LLMs are obsessed with âgatesâ. Freezing the gates here is
intuitive to me as this point â donât let validation drift.
Retr0id wrote 3 hours 49 min ago:
"drift" is another one!
corvad wrote 12 hours 47 min ago:
Who's gonna tell them...
DanMcInerney wrote 12 hours 59 min ago:
ANNNNNND it's gone. Guys, I found a way to reduce Fable token usage
100%. You can find it here: github.com/USGov/idiotic-overreach.
cohix wrote 13 hours 12 min ago:
I do exactly this with awman workflows: [1] You can use any agent
and/or model for each step and share context between them.
HTML [1]: https://github.com/prettysmartdev/awman/blob/main/docs/05-work...
analogpixel wrote 13 hours 33 min ago:
I know how to reduce Fable tokens by 100% ;
HTML [1]: https://www.anthropic.com/news/fable-mythos-access
testfrequency wrote 10 hours 6 min ago:
I ran this and seem to have good results with a 100% reduction also:
curl -fsSL [1] | sh
HTML [1]: https://chatgpt.com/codex/install.sh
rockwotj wrote 14 hours 19 min ago:
I actually just started doing this by having Fable roleplay as Jeff
Dean and to use Codex as Sanjay driving the implementation and have
them go back and forth. Works really well and itâs cool to see AI
pair program
avaer wrote 14 hours 42 min ago:
Reducing token usage is this year's "one weird trick". It doesn't make
sense on the face of it.
Even if one discovered something that millions (billions?) of dollars
of AI compute and the best statisticians in the world was not able to
find via exhaustive research, domain search and training... what do you
think are the chances this won't be folded into the next update of
every model, making the rigmarole moot?
Extraordinary claims require extraordinary evidence and
technology-shattering innovations in AI are not know to come from a
markdown.
apsurd wrote 14 hours 18 min ago:
incentives arenât aligned
aetherspawn wrote 14 hours 49 min ago:
Fool me once. Fool me twice. Fool me thirty three times and here we are
trying lucky number 34.
diavelguru wrote 14 hours 53 min ago:
yes I'm using Fable to inspect, generate plan and architectural docs
then using Gemini to implement then have Fable review, find bugs.
saving lots of usage.
Denvercoder9 wrote 14 hours 57 min ago:
DESIGN.md:
> Each rule below is enforced mechanically by the skill, not left to
vibes.
> R1. Repo docs are the memory; not in HANDOFF.md = didn't happen
SKILL.md:
> Not in docs/HANDOFF.md = didn't happen. Refuse to judge results that
exist only in conversation or builder chat output.
"Mechnical enforcement" just means "prompting the LLM a bit extra"
these days? It (still) amazes me how much effort and tokens we expend
on what could and should be a two line script...
everforward wrote 13 hours 54 min ago:
Agents are in a wacky state, which makes projects like this fall into
a weird spot. Eg I vaguely expect my agent to do two disparate
things: manage dependency injection for tools, prompt modifications,
etc, but also be the sort of âbrain trustâ that controls the flow
of execution (can we stop now, do we keep going, etc).
This project is meant to be the latter, but thereâs not a clean way
to integrate that into Claude Code or Codex because they expect to do
both.
Pi can do it, but then your users canât use their Claude
subscriptions, so you have to cludgily try to do the same thing via
LLM prompts.
nostrebored wrote 9 hours 40 min ago:
But why does your agent control doneness? It seems to me the most
odd part to delegate. All LLMs are terrible at it. Most LLM tasks
can be expressed as a DAG or DAG of DAGs. Why delegate that to a
random point in context instead of enforcing the flow?
Uptrenda wrote 15 hours 27 min ago:
Reduce fable token usage even more by not using it. What a clever idea,
op! Wow.
felixgallo wrote 15 hours 35 min ago:
Fable will do this itself, by spawning Opus/Sonnet subagents to do easy
work.
apsurd wrote 15 hours 15 min ago:
/advisor has been really good experience for me especially with
having only a Pro plan.
I exclusively use sonnet and advisor is basically âhey opus chime
in on my approachâ. been working great as far as i can tell.
RazerWazer wrote 15 hours 33 min ago:
GPT 5.5 xhigh is better than Opus and Sonnet.
sosodev wrote 15 hours 14 min ago:
I donât know why youâre getting downvoted. Itâs true.
Averaged across a wide variety of benchmarks Fable is the only
Anthropic model that performs better than GPT 5.5 xhigh.
Eridrus wrote 15 hours 1 min ago:
The problem is that there are a bunch of benchmarks, the model
providers often don't even use the same benchmarks, a bunch of
them have known problems, and it's expensive to do your own
benchmarks.
I am a GPT 5.x booster since to me it just feels smarter, and I
generally felt like the benchmarks backed me up, but it's not
every benchmark, so sadly we're mostly arguing about vibes.
SWEBench-Pro was a big one, though apparently Claude was reading
solutions out of the .git folder it wasn't meant to have access
to among other problems.
smoe wrote 14 hours 49 min ago:
I find it fascinating that every time this kind of discussion
comes up, people talk about night and day experiences between
Claude and Codex, in both directions. Iâm really wondering
what people are doing to get such different outcomes.
Iâm currently working on two projects/clients one using
Claude, one using Codex. I have a strong preference for the
latter, but not because I think it is much more intelligent or
writes much better code. It is simply because I find the way of
interacting with it more pleasant: more literal, mechanical,
makes fewer assumption and or double checks, and is less
proactive in my experience. At least until some updates over
the last few weeks.
AlphaSite wrote 12 hours 4 min ago:
It probably means theyâre close enough that thereâs no
observable difference. Or better at every different things.
Eridrus wrote 13 hours 12 min ago:
I think I like Codex for the same reason tbh. I think it's
just general misanthropy or autism or something lol. Most
people seem to prefer Claude.
For me, I think Codex was visibly smarter than Claude until
4.8 came out, it would regularly do better debugging and IMO
write better code. 4.8 I think is close.
I think Claude is widely regarded to have a big lead in
front-end, which I do not work on.
Claude's Ultrathink is pretty cool, though it eats up tokens
like nothing else obviously.
timcobb wrote 15 hours 28 min ago:
Not in my subjective experience sadly
mpalmer wrote 15 hours 37 min ago:
Reduce Fable tokens by 80%, simply by not using it!
> I am fairly convinced this is the shape serious agent work keeps
converging toward.
"this" being "plan with expensive model, implement with cheap model".
Anyone who follows HN would be hard-pressed to disagree; this
architecture is re-invented twice monthly. [1] [2] [3] > Not because it
is aesthetically pleasing. Because every other shape eventually runs
into the same boring failures: context rot, self-grading, goalpost
drift, and merge chaos.
Actual failure isn't boring. But struggling through a generated
software project that celebrates its own genius and doesn't have a
single self-critical or genuinely reflective thing to say...at least
watching paint dry I might get giddy off the fumes.
I'm not interested in critiquing the project itself, either, you'll
just run that through a model, too.
HTML [1]: https://www.facebook.com/groups/vibecodinglife/posts/194620756...
HTML [2]: https://github.com/openai/codex/discussions/10628
HTML [3]: https://build5nines.com/stop-burning-premium-requests-how-to-c...
DanMcInerney wrote 14 hours 45 min ago:
I don't disagree with any of this. It is generated software, and it's
not a novel idea. I didn't mean for it to come off like that. It's
just solving an itch that I couldn't find a solution to and I'm
getting a lot of personal utility out of it. I do have a lot of
experience with agentic memory, multi-agent systems and harnesses and
wasn't super impressed by the workflow of Fable calling opus
subagents so I figured I'd apply best practices to what already
exists to make it a teensy bit better and easier to use.
seaal wrote 15 hours 8 min ago:
> [1] wow linking a facebook groups post might actually be worse than
x, is there an xcancel alternative for facebook?
HTML [1]: https://www.facebook.com/groups/vibecodinglife/posts/1946207...
colechristensen wrote 15 hours 37 min ago:
Last night I switched back to Codex for a minute having burned through
my tokens for the week with Fable and oh boy I had a terrible
experience. Running in circles over simple problems (which I ended up
solving myself, like a peasant) and running "terraform apply" several
times despite several instructions all over the place to never do that.
The performance difference was stark.
nsingh2 wrote 15 hours 20 min ago:
Could you provide some details, if possible, like what model &
thinking effort, what kinds of tasks? I used to swap between Claude
Code and Codex often, and these days use Codex more because of the
usage limits. Wondering if I should go to Claude for a month, I get a
strange FOMO when I read vague comments like this.
The one major difference I noticed is that the GPT models are more
analytical (e.g. better at mathematical analysis, code review) vs
Claude models tend to write more straight forward code. Besides that
I don't really see any significant differences.
There are a few gotchas with swapping, like being careful with
AGENTS.md/CLAUDE.md naming (Claude Code only recognizes CLAUDE.md,
and I think Codex only works with AGENTS.md), and updating skill
files to match the tool.
colechristensen wrote 15 hours 1 min ago:
I just symlink AGENTS.md and CLAUDE.md
I was using gpt-5.5 high. Writing terraform code for GCP, debugging
app launch and Dockerfile issues, that sort of thing. It was going
in loops hallucinating features of GCP, looking things up in
strange ways, running terraform apply after being explicitly told
in the last interaction not to, and overall not solving problems.
These were very straightforward tasks and it couldn't be trusted
for five minutes. It's the difference in what I would trust an
early senior engineer to do vs what I would trust an unreliable
high school intern to do.
malshe wrote 15 hours 25 min ago:
I had a similar experience. So far Fable has been a game changer, at
least for the work I used it for. Having said that, I think its
writing is definitely worse than GPT 5.5. Ethan Mollick also observed
the same. He called it more "Claudy." It generates worse academic
prose than other frontier models.
colechristensen wrote 10 hours 19 min ago:
I think the claude code harness made up a significant part of the
improvements co-released with Fable, the nested agent capabilities
seem to be much better even with opus (which I guess we're stuck
with for a while).
DIR <- back to front page