codevoid.de/1/hn/comments_47111440.gph

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them
       
       
        7777332215 wrote 23 hours 12 min ago:
        I know they said they didn't obfuscate anything, but if you hide
        imports/symbols and obfuscate strings, which is the bare minimum for
        any competent attacker, the success rate will immediately drop to zero.
        
        This is detecting the pattern of an anomaly in language associated with
        malicious activity, which is not impressive for an LLM.
       
          stared wrote 21 hours 28 min ago:
          One of the authors here.
          
          The tasks here are entry level. So we are impressed that some AI
          models are able to detect some patterns, while looking just at binary
          code. We didn't take it for granted.
          
          For example, only a few models understand Ghidra and Radare2 tooling
          (Opus 4.5 and 4.6, Gemini 3 Pro, GLM 5) [1] We consider it a starting
          point for AI agents being able to work with binaries. Other people
          discovered the same - vide [2] and [3] .
          
          There is a long way ahead from "OMG, AI can do that!" to an
          end-to-end solution.
          
  HTML    [1]: https://quesma.com/benchmarks/binaryaudit/#models-tooling
  HTML    [2]: https://x.com/ccccjjjjeeee/status/2021160492039811300
  HTML    [3]: https://news.ycombinator.com/item?id=46846101
       
        folex wrote 1 day ago:
        > The executables in our benchmark often have hundreds or thousands of
        functions â while the backdoors are tiny, often just a dozen lines
        buried deep within. Finding them requires strategic thinking:
        identifying critical paths like network parsers or user input handlers
        and ignoring the noise.
        
        Perhaps it would make sense to provide LLMs with some strategy guides
        written in .md files.
       
        Bender wrote 1 day ago:
        Along this line can AI's find backdoors spread across multiple pieces
        of code and/or services?  i.e. by themselves they are not back-doors,
        advanced penetration testers would not suspect anything is afoot but
        when used together they provide access.
        
        e.g. an intentional weakness in systemd + udev + binfmt magic when used
        together == authentication and mandatory access control bypass.  Each
        weakness reviewed individually just looks like benign sub-optimal code.
       
          cluckindan wrote 1 day ago:
          Start with trying to find the xz vulnerability and other software
          possibly tying into that.
          
          Is there code that does something completely different than its
          comments claim?
       
        jakozaur wrote 1 day ago:
        See direct benchmark link: [1] Open-source GitHub:
        
  HTML  [1]: https://quesma.com/benchmarks/binaryaudit/
  HTML  [2]: https://github.com/QuesmaOrg/BinaryAudit
       
       
   DIR <- back to front page