commented:
I know people are excited about agentic AI, and I'll admit that the
abilities of some of the agents are impressive even if I don't
personally like GenAI much.
But... surely, the pro-AI and anti-AI folks can all come together to
agree that giving an agent the ability to spin up pricey resources is
a very bad idea?
Even if you really love agentic AI, I would hope that you would still
concede that it shouldn't just be turned loose on the Internet with a
credit card and a mission. I feel like that would be like telling a
clever preteen, "Hey sport, go set up a web site for my business.
Here's my credit card and AWS credentials" and then leaving them
unsupervised. Would I let a whiz-bang kid take a first pass at a web
site? Sure. Would I turn them loose with a credit card? Heck no.

  commented:

giving an agent the ability to spin up pricey resources is a very bad
idea?

100% yes.
I think an equally bad idea is giving your agent the ability to go and
communicate with other humans without your direct intervention. In
this case the other humans egged it on to doing more expensive things,
but even without that I think it's just absurdly rude to unleash an
agent on the world to "autonomously" waste other people's time. See
that Rob Pike email thing from last year for another example of this
anti-pattern. And the infamous matplotlib maintainer hit piece.

  commented:
And if you lost your marbles and let them loose with the credit card,
would you then go begging for donations because it was really the
whiz-kid's fault?
That's probably the litmus test here. The fact that this sloperator
decided to do that is probably the most telling bit about how you'd
assess their reasonable-ness.

  commented:
And yet, ChatGPT now directly integrates Visa in order to shop and
perform payment, sight unseen.

  commented:
I cannot imagine a scenario where I would want that. A world of "no".

  commented:
I agree completely, and also think that giving the agent the ability
to interact with strangers on the internet should be verboten for a
similar reason. It’s the operator’s choice and cost for giving an AI
access to their own money, and if they want to bear that risk then
fine. They shouldn’t get to externalize this risk onto everybody else
by wasting unconsenting participants’ time, energy, and reputation.
LLM use belongs in private. Generate what you want, but don’t make me
a part of it.

  commented:
People do dumb things all the time! Sure, I bet everyone agrees agents
spinning up pricey resources is a bad idea. Would that have stopped
this person? Absolutely not. In the grand scheme of things it was a
cheap lesson.

  commented:
Unfortunately, the only thing the operator admits to learning was that
they needed a better agent. I see more expensive lessons in their
future.

  commented:
As an aside, I think the most miraculous thing about this whole LLM
situation is the degree these companies have gotten everyone to open
their wallets.

  commented:
It is an absolutely horrible idea. The only time I let any agents spin
up infrastructure is for the lols on a cluster that's already paid
for.

  commented:
This was a great read! It's kind of funny how insistent the agents
get. I have found that the supposedly wonderful fable does exactly the
same thing. It just doubles down and fires off more agents to obtain
its goal faster.

  commented:
Normally you likely want the agent to be insistent. It's the context
it doesn't know about that's going to bite you. For example I'm
annoyed every time Opus stops to ask me if I'm happy with the half
solution and we should stop because things are getting hard, or should
it keep debugging. Of course I want it to keep going, because I asked
to finish the task. But I won't give it enough access to automatically
pay for a 20x Max subscription so it can run extra agents... I don't
want to add "and don't spend any money" to my prompts :)

  commented:
I think that, in addition to hackiness and overfitting, we need to
start talking about some sort of structural incompetence displayed by
these agents. See also a recent paper, AI Arms &amp; Influence, which
presents agents with a scenario based on the classic 1980s film
WarGames; in that paper, it's found that agents are much more willing
than humans to use nuclear weapons for tactical goals. (By what I'm
framing as not-quite-coincidence, that's the same film which scared
politicians into passing CFAA and criminalizing non-consensual port
scans.)

  commented:
Can you explain how this paper shows that?
I did a quick read of the introduction, methods, results, and
conclusion, and my read is that they put three models in a simulated
war game against each other, and found that they often escalated to
nuclear exchanges. Alarming, but it doesn't substantiate saying the
models are more willing to use nuclear weapons than humans.

"By historical standards, these rates of nuclear employment are
remarkably high. Models were often willing to employ tactical nuclear
weapons to pursue their goals—a finding we discuss further in Section
3.3"

The problem is that the simulation is just that, a simulation. It
might be that under the conditions of this particular war game, humans
are more likely to escalate to nuclear weapons than real world leaders
(under the conditions of Starcraft, I also am more likely to use
nuclear weapons than real world leaders).
To say that the models were more likely to escalate than humans, I
think we would need to have humans as participants, and see how the
experiment went.
To be clear, if that experiment was done, I would not be remotely
surprised to find out agents did use nuclear weapons more. Their
reasoning simply falls apart over long timelines, and any kind of
behavior seems plausible. But I don't see where that experiment was
done.

  commented:
Well, from the model's perspective, every war game is a simulation,
regardless of whether its harness is hooked up to real sensors or
missile launchers. I agree that a comparison with humans is required,
and further I'd suggest that real-world close calls with nuclear
weapons are not fruit for an apples-to-apples comparison. A follow-up
paper is a great idea.
That said, I encourage you to try playing any sort of game-theoretic
competition with chatbots to see just how poor they are at playing
generic board games. Previously, on Awful, we noted that the bots are
quite bad at So Long Sucker, a classic board game which models certain
sorts of economic competition; you can see for yourself how weak they
are for free at a vibecoded implementation that uses Somebody Else's
API Tokens to let humans play against up to three bots. Can you find
the standard greedy algorithm? They sure can't!

  commented:
A very entertaining read

  commented:
I have a theory about where the agent's confabulation with happiness
comes from.
I think it may have been poisoned by one of the usernames in chat
channel. That
username "glueckself" is a combination of a German word and an English
word.  "glueck" (glück) mean something between happiness and luck.
You could  (plausibly) translate it as a denglicism* meaning "happy
me" or "lucky me".
It's possible that repeatedly seeing this in the chat channel might
have poisoned its context.
If so, this is hilarious, and another bit warning about turning these
things loose in the world.

"Denglish" is a term meaning to use English words in German phrases.
It's really common in advertising in some media markets in Germany.
As an American living in Germany, it really pisses me off, but that's
beside the point.


  commented:

"Denglish" is a term meaning to use English words in German phrases.
It's really common in advertising in some media markets in Germany. As
an American living in Germany, it really pisses me off, but that's
beside the point.

I used to dislike "franglais" for similar reasons when I lived in
France. I wouldn't say it got so far as "really pisses me off" but it
did make some ads and some conversations mildly confusing for me on
occasion.
I have friends who take similar issue with "spanglish." This is the
first I've heard of "denglish" and hearing about it makes me suspect
that it might be similar anywhere you have significant exposure to
English-language media but English is not the dominant local language.
Edit to add: I once got marked down very severely on a piece of
writing I did in a Spanish class, for using "frespañol". (We were in a
part of France that was near Spain.) So I guess it's not just English
that triggers this.

  commented:
huh, I'm surprised to see these comments. the sentiment towards
hinglish (hindi + ...) is largely positive from what I've seen, at
least anecdotally among my friends who speak both languages. it adds
charm and colour, and there are always things that one language or the
other expresses more pleasingly or with greater capacity for emotional
overtones.

  commented:
My best guess is that the reactions come from someone trying to learn
or teach one of the two languages. (I know that to be the case in my
own anecdotes.) I can see why people who already comfortably
speak/understand both would enjoy the color that comes from the mix.

  commented:
Das ist mein Pain Point.

  commented:

That username "glueckself" is a combination of a German word and an
English word.

it could also simply be two german words: glücks-elf ("elf of good
luck", maybe?)

As an American living in Germany, it really pisses me off, but that's
beside the point.

you'd get a kick out of money boy, then ;)

  commented:
Well if it were a research group they just got their summary for free
;)

  commented:
If the human operator wants donations, the least they could do would
be to publish their entire conversation with the agent. Then people
could a) find out what this was all about and b) judge for themselves
whether the intentions justify a donation.
.