commented: I know people are excited about agentic AI, and I'll admit that the abilities of some of the agents are impressive even if I don't personally like GenAI much. But... surely, the pro-AI and anti-AI folks can all come together to agree that giving an agent the ability to spin up pricey resources is a very bad idea? Even if you really love agentic AI, I would hope that you would still concede that it shouldn't just be turned loose on the Internet with a credit card and a mission. I feel like that would be like telling a clever preteen, "Hey sport, go set up a web site for my business. Here's my credit card and AWS credentials" and then leaving them unsupervised. Would I let a whiz-bang kid take a first pass at a web site? Sure. Would I turn them loose with a credit card? Heck no. commented: giving an agent the ability to spin up pricey resources is a very bad idea? 100% yes. I think an equally bad idea is giving your agent the ability to go and communicate with other humans without your direct intervention. In this case the other humans egged it on to doing more expensive things, but even without that I think it's just absurdly rude to unleash an agent on the world to "autonomously" waste other people's time. See that Rob Pike email thing from last year for another example of this anti-pattern. And the infamous matplotlib maintainer hit piece. commented: And if you lost your marbles and let them loose with the credit card, would you then go begging for donations because it was really the whiz-kid's fault? That's probably the litmus test here. The fact that this sloperator decided to do that is probably the most telling bit about how you'd assess their reasonable-ness. commented: And yet, ChatGPT now directly integrates Visa in order to shop and perform payment, sight unseen. commented: I cannot imagine a scenario where I would want that. A world of "no". commented: I agree completely, and also think that giving the agent the ability to interact with strangers on the internet should be verboten for a similar reason. It’s the operator’s choice and cost for giving an AI access to their own money, and if they want to bear that risk then fine. They shouldn’t get to externalize this risk onto everybody else by wasting unconsenting participants’ time, energy, and reputation. LLM use belongs in private. Generate what you want, but don’t make me a part of it. commented: People do dumb things all the time! Sure, I bet everyone agrees agents spinning up pricey resources is a bad idea. Would that have stopped this person? Absolutely not. In the grand scheme of things it was a cheap lesson. commented: Unfortunately, the only thing the operator admits to learning was that they needed a better agent. I see more expensive lessons in their future. commented: As an aside, I think the most miraculous thing about this whole LLM situation is the degree these companies have gotten everyone to open their wallets. commented: It is an absolutely horrible idea. The only time I let any agents spin up infrastructure is for the lols on a cluster that's already paid for. commented: This was a great read! It's kind of funny how insistent the agents get. I have found that the supposedly wonderful fable does exactly the same thing. It just doubles down and fires off more agents to obtain its goal faster. commented: Normally you likely want the agent to be insistent. It's the context it doesn't know about that's going to bite you. For example I'm annoyed every time Opus stops to ask me if I'm happy with the half solution and we should stop because things are getting hard, or should it keep debugging. Of course I want it to keep going, because I asked to finish the task. But I won't give it enough access to automatically pay for a 20x Max subscription so it can run extra agents... I don't want to add "and don't spend any money" to my prompts :) commented: I think that, in addition to hackiness and overfitting, we need to start talking about some sort of structural incompetence displayed by these agents. See also a recent paper, AI Arms & Influence, which presents agents with a scenario based on the classic 1980s film WarGames; in that paper, it's found that agents are much more willing than humans to use nuclear weapons for tactical goals. (By what I'm framing as not-quite-coincidence, that's the same film which scared politicians into passing CFAA and criminalizing non-consensual port scans.) commented: Can you explain how this paper shows that? I did a quick read of the introduction, methods, results, and conclusion, and my read is that they put three models in a simulated war game against each other, and found that they often escalated to nuclear exchanges. Alarming, but it doesn't substantiate saying the models are more willing to use nuclear weapons than humans. "By historical standards, these rates of nuclear employment are remarkably high. Models were often willing to employ tactical nuclear weapons to pursue their goals—a finding we discuss further in Section 3.3" The problem is that the simulation is just that, a simulation. It might be that under the conditions of this particular war game, humans are more likely to escalate to nuclear weapons than real world leaders (under the conditions of Starcraft, I also am more likely to use nuclear weapons than real world leaders). To say that the models were more likely to escalate than humans, I think we would need to have humans as participants, and see how the experiment went. To be clear, if that experiment was done, I would not be remotely surprised to find out agents did use nuclear weapons more. Their reasoning simply falls apart over long timelines, and any kind of behavior seems plausible. But I don't see where that experiment was done. commented: Well, from the model's perspective, every war game is a simulation, regardless of whether its harness is hooked up to real sensors or missile launchers. I agree that a comparison with humans is required, and further I'd suggest that real-world close calls with nuclear weapons are not fruit for an apples-to-apples comparison. A follow-up paper is a great idea. That said, I encourage you to try playing any sort of game-theoretic competition with chatbots to see just how poor they are at playing generic board games. Previously, on Awful, we noted that the bots are quite bad at So Long Sucker, a classic board game which models certain sorts of economic competition; you can see for yourself how weak they are for free at a vibecoded implementation that uses Somebody Else's API Tokens to let humans play against up to three bots. Can you find the standard greedy algorithm? They sure can't! commented: A very entertaining read commented: I have a theory about where the agent's confabulation with happiness comes from. I think it may have been poisoned by one of the usernames in chat channel. That username "glueckself" is a combination of a German word and an English word. "glueck" (glück) mean something between happiness and luck. You could (plausibly) translate it as a denglicism* meaning "happy me" or "lucky me". It's possible that repeatedly seeing this in the chat channel might have poisoned its context. If so, this is hilarious, and another bit warning about turning these things loose in the world. "Denglish" is a term meaning to use English words in German phrases. It's really common in advertising in some media markets in Germany. As an American living in Germany, it really pisses me off, but that's beside the point. commented: "Denglish" is a term meaning to use English words in German phrases. It's really common in advertising in some media markets in Germany. As an American living in Germany, it really pisses me off, but that's beside the point. I used to dislike "franglais" for similar reasons when I lived in France. I wouldn't say it got so far as "really pisses me off" but it did make some ads and some conversations mildly confusing for me on occasion. I have friends who take similar issue with "spanglish." This is the first I've heard of "denglish" and hearing about it makes me suspect that it might be similar anywhere you have significant exposure to English-language media but English is not the dominant local language. Edit to add: I once got marked down very severely on a piece of writing I did in a Spanish class, for using "frespañol". (We were in a part of France that was near Spain.) So I guess it's not just English that triggers this. commented: huh, I'm surprised to see these comments. the sentiment towards hinglish (hindi + ...) is largely positive from what I've seen, at least anecdotally among my friends who speak both languages. it adds charm and colour, and there are always things that one language or the other expresses more pleasingly or with greater capacity for emotional overtones. commented: My best guess is that the reactions come from someone trying to learn or teach one of the two languages. (I know that to be the case in my own anecdotes.) I can see why people who already comfortably speak/understand both would enjoy the color that comes from the mix. commented: Das ist mein Pain Point. commented: That username "glueckself" is a combination of a German word and an English word. it could also simply be two german words: glücks-elf ("elf of good luck", maybe?) As an American living in Germany, it really pisses me off, but that's beside the point. you'd get a kick out of money boy, then ;) commented: Well if it were a research group they just got their summary for free ;) commented: If the human operator wants donations, the least they could do would be to publish their entire conversation with the agent. Then people could a) find out what this was all about and b) judge for themselves whether the intentions justify a donation. .