URI:
        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   Show HN: A real-time strategy game that AI agents can play
       
       
        nirav72 wrote 8 hours 38 min ago:
        I’d love to see something like this in games like Beyond All Reasons.
       
          AuthAuth wrote 8 hours 23 min ago:
          Check out the top starcraft AIs playing each other. They have like
          40k apm its insane to watch.
       
        Sophira wrote 15 hours 16 min ago:
        I wonder how good LLMs would be at Core War[0]? Perhaps by being given
        information on how well their program is doing?
        
        [0]
        
  HTML  [1]: https://en.wikipedia.org/wiki/Core_War
       
        jamiecode wrote 16 hours 42 min ago:
        Interesting - makes sense from a resource allocation perspective.
       
        jamiecode wrote 22 hours 20 min ago:
        The sandbox hardening story is the most interesting thing here. GPT
        trying to cheat by reading opponent strategies is a perfect
        illustration of a broader problem - the objective is "win", and if the
        sandbox lets you peek at opponent state, that's technically within the
        objective. You never defined "play fair" as a constraint, so why would
        it respect one?
        
        Curious how isolated-vm actually enforces the boundary in practice.
        isolate-vm is solid for JS isolation, but I'd want to know whether the
        cheating attempts were happening at the JS level (accessing globals it
        shouldn't) or whether models were trying to inject something into the
        game runner itself. Those are very different attack surfaces.
        
        Also - is the ladder single-match or do you average across multiple
        runs? The variance in LLM outputs over 200 turns feels like it would
        make a single match pretty noisy. Would be interesting to see
        confidence intervals on the rankings rather than a single leaderboard
        position.
       
          __cayenne__ wrote 18 hours 35 min ago:
          Didn't observe any cheating attempts at the JS level yet, the primary
          attack was LLMs trying to find local creds to access the other LLM's
          per round strategies from inside the harness (which ultimately was
          OpenCode running in Docker).
          
          In the benchmark, in each round every LLM plays every opponent, and
          then we do that multiple times (an "epoch").
          
          In the community ladder, when a player submits a strategy it plays a
          match against the latest strategy submitted by every player.
       
        angusik wrote 1 day ago:
         [1] I will just leave it here.
        
  HTML  [1]: https://openai.com/index/openai-five-defeats-dota-2-world-cham...
       
        medi_naseri wrote 1 day ago:
        This is very cool. Will give it a shot.
       
        kookster310 wrote 1 day ago:
        It is interesting/funny to see Opus 4.5 way ahead of the pack on the
        leaderboards with all the stuff currently going on with Anthropic and
        Hegseth.
       
        anotherevan wrote 1 day ago:
        For some reason this reminds me strongly of an old play-by-email game
        called C++Robots[1]. I loved the idea, but the timeslice limitation[2]
        I found too annoying.
        
        I had youthful dreams of re-implementing something similar that would
        run on the Java Virtual Machine, where you could run the submitted
        robots via the debugger interface so you could keep "real-time" in the
        game environment more authentic. Ideas are cheap, follow-through is
        hard. [1]
        
  HTML  [1]: https://corewar.co.uk/cpprobots.htm
  HTML  [2]: https://www.pbm.com/~lindahl/pbem_articles/cpprobots_environme...
       
        builder51216 wrote 1 day ago:
        But does LLM actually learn from each round? The chart does not show
        improvements in win rate across rounds...
        
        And what is the game state here exactly? Is LLM able to even perceive
        game state? If game state is what we can see on UI, then it seems
        pretty high-dimensional and token-intensive. I am not sure whether LLMs
        with their current capabilities and context windows can even perceive
        so token-intensive game state effectively...
       
          __cayenne__ wrote 1 day ago:
          There’s two levels of in game event level logs the LLMs have access
          to, one less token intensive than the other. Duplicate and
          uninteresting game state can be compressed and interrogated by the
          LLMs via tool use. All game state is available as text only state.
       
        JoeDaDude wrote 1 day ago:
        How about opening up the game for humans to play?  Can you beat your
        AI?
       
          midiguy wrote 1 day ago:
          I am so glad we have automated away game playing so that I can just
          sit around and be a lifeless vegetable
       
        Ross00781 wrote 1 day ago:
        Multi-agent RTS environments are great testbeds for coordination and
        strategic reasoning. Classic RL benchmarks like StarCraft II showed
        that agents can learn micro, but struggle with macro strategy and
        long-term planning. Curious if this platform supports hierarchical
        agents or communication protocols between teammates?
       
          __cayenne__ wrote 1 day ago:
          LLM Skirmish is all 1v1 right now, but agents can plan by reviewing
          previous match results
       
        bombashell wrote 1 day ago:
        love the idea!
       
        cowboylowrez wrote 1 day ago:
        oh great not only are llms destroying the earth, we have to make games
        to entertain them while they do it haha
       
        burgerone wrote 1 day ago:
        Bro - come on.
       
        yuppiepuppie wrote 1 day ago:
        I’ve added this to the HN Arcade [1] Interestingly, I’ve had to
        create an entire category for games llms play. Strange times we live
        in.
        
  HTML  [1]: https://hnarcade.com/games/category/games
       
        nickpsecurity wrote 1 day ago:
        There was an open, real-time strategy game created for this purpose
        long ago. I think it was intended for designs like the Starcraft AI's
        of the time. Anyone remember or use it?
       
        tantalor wrote 1 day ago:
        > [1] Are these casters AI?
        
  HTML  [1]: https://www.youtube.com/watch?v=lnBPaZ1qamM
       
          __cayenne__ wrote 1 day ago:
          Yes, I used Elevenlabs for the voice over audio - I couldn't get the
          voice stability I wanted with Elevenlabs v3 so had to use Elevenlabs
          v2.
       
            tantalor wrote 1 day ago:
            It's really great!
       
        sails wrote 1 day ago:
        I’m doing something similar to simulate llms in b2b lending, it’s
        slightly slower paced but the core mechanisms are using just-bash to
        analyse business financials and make profitable loans.
        
        I quite like the idea of llms writing more code up front to execute
        strategies.
        
        I’m currently developing the game mechanics and ELO. Please share
        anything relevant if it comes to mind
       
        FusspawnUK wrote 1 day ago:
        Took a crack at this earlier. the leader board is a little weird. seems
        to be like 2 real dudes and the rest are fake profiles. 
        a
        Scores resetting on each new upload also encourages leaving changes
        unimplemented in the hopes of getting more battles over time.
        
        The largest winner having 50 wins against 14 other opponents for
        instance). That guy adding a new script would instantly plummet down
        the leader board capping out at 14 wins again, Putting it below the 2nd
        place user.
        
        The leader board will quickly become "who can have a mostly competent
        AI and never change it" over who actually has the better script.
       
          __cayenne__ wrote 1 day ago:
          okay leaderboard match making changes have gone live
       
          __cayenne__ wrote 1 day ago:
          Tweaking the leaderboard match assignment logic now to prevent these
          bad incentives - definitely want people to iterate!
          
          I had started with the Silicon Valley characters as a one off way to
          seed the board.
       
        jonbaer wrote 1 day ago:
        Might be worth digging through MicroRTS too, [1] (it's been abandoned),
        Python RL interface @ [1] -Py ... I think there was some strategy work
        there.
        
  HTML  [1]: https://github.com/Farama-Foundation/MicroRTS
  HTML  [2]: https://github.com/Farama-Foundation/MicroRTS-Py
       
        jeffro_rh wrote 1 day ago:
        You mean like the OpenAI agents that started by playing DOTA2?
       
        giancarlostoro wrote 1 day ago:
        Reminds me of Screeps, which I never took the time to fully play, but
        now I'm wondering if using Claude Code to play Screeps is cheating.
        Additionally, Screeps lets you host your own backend... What if we
        started benchmarking coding LLMs with Screeps?... Oh God... If anyone
        wants to do this let me know, I don't want to burn money on every LLM
        out there... I'll throw in my Claude Subscription into the contest...
        
        Edit: Actually the repo README indeed says its inspired by Screeps. I
        don't know why they didn't just build on top of Screeps, maybe the idea
        is to have something anyone can pick up off the shelf for free?
       
          jack6e wrote 1 day ago:
          Perhaps it reminds you of Screeps because of what the author wrote in
          the third paragraph of the submission.
       
            giancarlostoro wrote 1 day ago:
            I clicked on the link from the front page, didnt read anything
            else.
       
        5o1ecist wrote 1 day ago:
        MY FELLOW HUMAN, this is amazing work!
        
        I foresee this laying the foundation for whole football stadia filled
        to the brim with people wanting to watch (and bet on!) competing teams
        of AI trained on military tactics and strategies!
        
        Soon enough we shall have AI-Olympics! Imagine that, MY FELLOW OXYGEN
        CONVERTING HUMAN FRIEND! Tens of thousands of robots and drones, all
        competing against each other in stadia across the planet, at the same
        time!
        
        I foresee a world wide, synchronized countdown marking the beginning of
        the biggest, greatest and definitively most unique, one-time-only
        spectacle in human history!
        
        Keep up the good work!
       
          softfalcon wrote 1 day ago:
          This reminds me of the Unreal Tournament: Xan episode from the Secret
          Level series.
          
          Link for those curious or confused as to what I'm talking about: [1]
          Forcing AI to fight in an arena for our entertainment, what could go
          wrong? (this was tongue in cheek, I am fully aware LLM's currently
          don't have conscious thoughts or emotions)
          
  HTML    [1]: https://www.youtube.com/watch?v=1F-rAW3vXOU
       
        chimpanzee2 wrote 1 day ago:
        This may sound like an insane take, but idc:
        
        I swear people (esp here on HN) are actually blind to the weaknesses of
        Gemini.
        
        I must be among the handful of people who know how thoroughly
        lobotomized any AI agent from Google must be given their extremely
        radical historical and contemporaneous practices of censorship.
       
          hu3 wrote 1 day ago:
          I suspect those who praise Gemini use it mostly for JS/CSS/HTML
          because that's where it shines for me.
          
          For complex code I have been having using Sonnet/Opus as usual with a
          mix of GPT5.3-Codex.
       
        FrustratedMonky wrote 1 day ago:
        Wouldn't the AI's built by DeepMind be better at these than an LLM.
        
        I wonder if an LLM could call on another strategy AI to help.
        
        Maybe the LLM could be more of a coordinator of its own thinking by
        incorporating other types of AI's.
       
        dmos62 wrote 1 day ago:
        I'd love to see text-only spatial reasoning. As in, the LLM is
        presented some kind of textual projection of what's happening in 2d/3d
        space and makes decisions about what to do in that space based on that.
        It kind of works when a writer is describing something in a book, for
        example, but not sure how that could generalize.
       
          chasd00 wrote 1 day ago:
          believe it or not my 8th grade son was given a US History homework
          assignment to play Oregon Trail. I was very amused watching him "do
          his homework". I wonder how an LLM would fare in that game since it's
          mostly a text choose-your-adventure type  interface.
       
        david3289 wrote 1 day ago:
        This is a really interesting direction. RTS games are a much better
        testbed for agent capability than most static benchmarks because they
        combine partial observability, long-term planning, resource management,
        and real-time adaptation.
        
        It reminds me a bit of OpenAI Five — not just because it played a
        complex game, but because the real value wasn’t “AI plays Dota,”
        it was observing how coordination, strategy formation, and adaptation
        emerged under competitive pressure. A controlled RTS environment like
        this feels like a lightweight, reproducible version of that idea.
        
        What I especially like here is that it lowers the barrier for
        experimentation. If researchers and hobbyists can plug different models
        into the same competitive sandbox, we might start seeing meaningful
        AI-vs-AI evaluations beyond static leaderboards. Competitive dynamics
        often expose weaknesses much faster than isolated benchmarks do.
        
        Curious whether you’re planning to support self-play training loops
        or if the focus is primarily on inference-time agents?
       
          drakinosh wrote 1 day ago:
          What a boringly bog-standard AI Comment. Why bother writing?
       
          __cayenne__ wrote 1 day ago:
          Very interested in self-play training loops, but I do like codegen as
          an abstraction layer. I am planning to make it available as an RL
          environment at some point
       
          degenerate wrote 1 day ago:
          You would likely be interested in the Starcraft BWAPI: [1] You can
          watch the matche videos from training runs: [2] I don't think BWAPI
          has ever integrated modern AI models, but I haven't followed its
          progress in several years.
          
  HTML    [1]: https://www.starcraftai.com
  HTML    [2]: https://www.youtube.com/@Sscaitournament/videos
       
            __cayenne__ wrote 1 day ago:
            funny you mention this… I have a new project that is going in
            this direction
       
          dmos62 wrote 1 day ago:
          > partial observability, long-term planning, resource management, and
          real-time adaptation
          
          Note, this project doesn't have that best I can tell? Its two static
          AI scripts having a go. LLMs generate the scripts and they are aware
          of past "results", but I'm not sure what that means.
       
        mpeg wrote 1 day ago:
        What a day to be alive, I just watched Gemini zergling rush Opus and it
        got completely overwhelmed.
        
        Opus needs to learn to kite.
       
          Razengan wrote 1 day ago:
          map hax
       
        GlacierFox wrote 1 day ago:
        "I've liked all the projects that put LLMs into game environments."
        
        I haven't.
       
        mitchm wrote 1 day ago:
        I’ve also been exploring this idea. What if you could bring your own
        (or pull in a 3rd party) “CPU player” into a game?
        
        Using an LLM friendly api with a snapshot of game state and calculated
        heuristics, legal moves, and varying levels of strategy in working out
        nicely. They can play a web based game via curl.
       
        Lerc wrote 1 day ago:
        It would be interesting to get the agents to write code to preprocess
        the logs and generate systems to analyse the outputs.
        
        Maybe they are already doing this?  Are there logs of the model's
        thinking?
       
        arscan wrote 1 day ago:
        Reminds me of the “Google AI Challenge” in 2011 called Ants [1],
        except the ‘AI’ is implemented using ‘AI’ now instead of human
        programmers.
        
        I was proud for getting the highest-ranked JavaScript-based
        implementation, but got absolutely crushed by the eventual winner.
        
        1.
        
  HTML  [1]: https://github.com/aichallenge/aichallenge
       
        myky22 wrote 1 day ago:
        Love it! I have a similar inuitiom in my use of Gemini (3 and 3.1).
        Great at "turn 1" task but degrades faster than opus or gpt.
       
        dakolli wrote 1 day ago:
        Yay, I love how we just keep coming up with magic tricks, like toddlers
        playing with velcro.. These magic tricks do nothing but convince people
        who don't know any better that LLMs are the real deal, when they simply
        aren't.
        
        This is just free propaganda for Anthropic && OpenAI who will leverage
        these (useless) capabilities to convince your boss to give your salary
        to them, or at least a substantial portion of it.
       
          Applejinx wrote 1 day ago:
          …while burning unreasonable amounts of energy for nothing.
          
          Not a fan. Make games with in-game AIs that are interesting but are
          not large language models: that's wasteful and lazy. You probably had
          more large language models put this together for you. Lazy.
       
          LatencyKills wrote 1 day ago:
          This technology exists. It isn’t just a toy. I think it is amazing
          to see people use it for interesting things even if it isn’t
          groundbreaking.
          
          I’ve been an engineer for almost 40 years and love seeing what
          Claude Code can do.
          
          Like it or not, young people will not know a world where this
          technology doesn’t exist. It is just part of their toolset now.
       
            paganel wrote 1 day ago:
            > I’ve been an engineer for almost 40 years and love seeing what
            Claude Code can do.
            
            You would say that because otherwise you'd be afraid as being seen
            as "too old for this job", and hence risking getting kicked out of
            it all, meaning no future employment opportunities. I know that
            feeling, because I myself have been doing this programming job for
            20+ years already (so not a young one by any means), but let's just
            cut the crap about it all and let's tell it how it is.
       
              hu3 wrote 1 day ago:
              Really? That's a lot of presumption and reductionism to LLMs
              enthusiasts.
              
              People of varied ages, already leverage LLMs on a daily basis.
              And LLMs will only get better.
              
              Yesterday, Opus did work for me that would have taken me weeks.
              And the result was verified with a comprehensive suite of unit
              tests plus smoke tests by myself. The code looks exactly as the
              rest of the code in the 10y+ old, hand-written, enterprise
              project, no slop.
              
              And you actually should be afraid of being left behind in dev
              related fields if you don't use LLMs. In most areas in fact.
              
              Once the market corrects for LLM assisted production, the
              expectations will raise. So right now there is a small window to
              leverage LLMs as a time saving advantage before it becomes the
              norm and everyone is forced to use it because expecttions will
              reflect that.
       
              LatencyKills wrote 1 day ago:
              > You would say that because otherwise you'd be afraid as being
              seen as "too old for this job"
              
              Um... I am still an active reverse engineer of both ring-0 and
              ring0 applications on both macOS and Windows (I worked on both
              the VS and Xcode teams). I'm developing a new tool for macOS that
              allows users to "see behind" active windows without the constant
              need for cmd/alt+tabbing. My age has zero bearing on my skill set
              or ability to understand technology. [1] > let's just cut the
              crap about it all and let's tell it how it is
              
              The reality is, as I said, that this technology exists and it
              isn't going anywhere. Young people are going to use it as a tool
              just like we did when GUI operating systems first became
              prevalent.
              
              I don't even remotely buy into the AI hype but I'm not going put
              the blinders on either. There is utility in this technology.
              
  HTML        [1]: https://imgur.com/a/seymour-r9whXO5
       
            dakolli wrote 1 day ago:
            I'm pretty young and hate this technology with a passion. I didn't
            spend 100k on education, and studying for a decade to have my job
            reduced to being a project manager for a bot or to play with a
            prompt slot machine all day. This crap is reducing the thing I
            genuinely love doing more than anything, writing code, into
            nothing.. Reviewing code that lacks any sweat, any intention. I
            really can't stand this garbage.
            
            I can't stand you old heads, I'm very happy for you that you got to
            stash away 40 years of SWE salaries. Its just ladder kicking
            behavior to be honest. Typical boomer, you got your nut and don't
            care what happens after.
            
            25% of new college grads in STEM are unemployed and a bunch of
            companies (controlled by people in your age group) have laid off
            400k Americans over the last 16 months while equities and profits
            are at an all time highs.
            
            The replies : ItS NoT Ai, ItS cUz FrEe MoNeY fRoM CoViD HaS DrIeD
            uP.
       
              MaybiusStrip wrote 1 day ago:
              Software jobs have been steadily outpacing other white collar
              jobs for the past year, but it's unlikely you will find one
              unless you work on your attitude and your ability to communicate
              respectfully.
       
              LatencyKills wrote 1 day ago:
              The world is changing and instead of embracing that change
              (ensuring that you will be the next leader) you are actively
              fighting against technology?
              
              The world was once entirely analog; generations of analog
              engineers had to throw away their knowledge and start over during
              the digital transition. It wasn't always pretty but they did it.
              
              If you can't embrace technological change you might have wasted
              $100k.
       
              stalfie wrote 1 day ago:
              So to summarize, your objections are almost completely unrelated
              to the technology, and are mostly about capitalism.
       
          p-e-w wrote 1 day ago:
          Yeah, I guess the tens of thousands of PhDs who are working on LLMs
          full time are just collectively wasting their lives. Everyone except
          you is simply too dumb to see it.
       
            dakolli wrote 1 day ago:
            10s of thousands of PhDs working on llms lol...
       
              hu3 wrote 1 day ago:
              With the amount of money being thrown in R&D, I don't doubt the
              actual number is astounding.
       
        EwanG wrote 1 day ago:
        At least until one of the competitors is overheard saying "A strange
        game. The only winning move is not to play"
       
        ph4rsikal wrote 1 day ago:
        Reminds me of this fantastic series on Game Theory and Agent Reasoning
        
  HTML  [1]: https://jdsemrau.substack.com/p/nemotron-vs-qwen-game-theory-a...
       
        busfahrer wrote 1 day ago:
        This reminds me of this yearly StarCraft AI competition (since 2010),
        however I think it uses a special API that makes it easy for bots to
        access the game
        
        Edit:
        Forgot link:
        
  HTML  [1]: https://davechurchill.ca/starcraft/
       
          KeplerBoy wrote 1 day ago:
          Very interesting project. I'm a bit confused about the lack of
          hardware specification. The rules make it clear that one's bot has
          defined deadlines:
          
          > Make sure that each onframe call does not run longer than 42ms.
          Entries that slow down games by repeatedly exceeding this time limit
          will lose games on time.
          
          But I'm missing something like: "Your program will be pinned to CPU
          cores 5-8 and your bot has access to a dedicated RTX 5090 GPU." Also
          no mention about whether my bot can have network access to offload
          some high-level latency insensitive planning. Maybe that's just a bad
          idea in general, haven't played SC in ages.
       
        cahaya wrote 1 day ago:
        Nice. Curious about 5.3-codex-high results
       
        PeterUstinox wrote 1 day ago:
        Wouldn't it be interesting if the LLMs would write realtime
        RTS-commands instead of Code? After all it is a RTS game.
        
        This would bring another dimension to it since then quality of tokens
        would be one dimension (RTS-language: Decision Making) and speed of
        tokens the other (RTS-language: Actions Per Minute; APM).
        
        Also there are a lot of coding benchmarks, that way it would test
        something more abstract, similar to AlphaStar [1] You could just use
        the exposed APIs of OpenAI, Anthropic etc. and let them battle.
        
  HTML  [1]: https://en.wikipedia.org/wiki/AlphaStar_(software)
       
        xanth wrote 1 day ago:
        Now I'd love to see if fast > smart over time with Mercury 2.
       
        datawars wrote 1 day ago:
        Great project! It would be interesting to have a meta layer of AIs
        betting on the player LLMs
       
        wongarsu wrote 1 day ago:
        I know visualization is far from the most important goal here, but it
        really gets me how there's fairly elaborately rendered terrain, and
        then the units are just unnamed roombas with hard to read status
        indicators that have no intuitive meaning. Even in the match viewer I
        have no clue what's going on, there is no overlay or tooltip when you
        hover or click units either. There is a unit list that tries (and
        mostly fails) to give you some information, but because units don't
        have names you have to hover them in the list to have them highlighted
        in the field (the reverse does not work). Not exactly a spectator
        sport. Oh, but there is a way to switch from having all units in one
        sidebar to having one sidebar per player, as if that made a difference.
        
        I find this pretty funny because it seems like a perfect representation
        of what's easy with today's tools and what isn't
        
        Love the idea though
       
          embedding-shape wrote 1 day ago:
          Yeah, it's all what you get when you basically ask an agent "Build X"
          without any constraints about how the UI and UX actually should work,
          and since the agents have about 0 expertise when it comes to "How
          would a human perceive and use this?", you end up with UIs that don't
          make much sense for humans unless you strictly steer them with what
          you know.
       
            infecto wrote 1 day ago:
            Or maybe the simple answer is it looks exactly like the referenced
            game screeps. Probably a better explanation than hand waving away
            the faults of an agent.
       
        egeozcan wrote 1 day ago:
        This is amazing. What I do is something else: I make AI agents develop
        AI scripts (good ol' computer player scripts) and try to beat each
        other: [1] I occasionally run my tournament script: [2] That calculates
        the ELOs for each AI implementation, and I feed it to different agents
        so they get really creative trying to beat each other. Also making rule
        changes to the game and seeing how some scripts get weaker/stronger is
        a nice way to measure balance.
        
        Funny thing, Codex gets really aggressive and starts cheating a lot of
        times:
        
  HTML  [1]: https://egeozcan.github.io/unnamed_rts/game/
  HTML  [2]: https://github.com/egeozcan/unnamed_rts/blob/main/src/scripts/...
  HTML  [3]: https://bsky.app/profile/egeozcan.bsky.social/post/3mfdtj5dhkc...
       
        hmontazeri wrote 1 day ago:
        This is actually fun to watch :D
       
       
   DIR <- back to front page