codevoid.de/1/hn/comments_48528371.gph

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model
       
       
        nicman23 wrote 4 hours 57 min ago:
        is it any good?
       
        RandyOrion wrote 5 hours 2 min ago:
        Please do not claim you trained a new model, only to got caught
        red-handed by others. There are already several people or groups did
        that, got caught, and vanished in no time.
        
        Check how the "authors" of "this model" react to this problem [1]. See
        how they deal with this problem by first changing their affiliation
        from [1] to [2], then saying that they are sorry for being caught [3],
        then just remove all their affiliations once for all [4].
        
        I think the "authors" of "this model" [5] should be held accountable
        until they upload new checkpoints, and the performance of the new model
        is verified by third-parties.
        
        P.S. To people who downvoted me, show me why you're doing this. [1] [3]
        [2] [3] [4] [5]
        
  HTML  [1]: https://iplanrio.rio.rj.gov.br
  HTML  [2]: https://iplanrio.prefeitura.rio
  HTML  [3]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/commit...
  HTML  [4]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/commit...
  HTML  [5]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/commit...
  HTML  [6]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/commit...
  HTML  [7]: https://huggingface.co/prefeitura-rio
       
        blitzar wrote 6 hours 14 min ago:
        Its stupid and hilarious when someone in Rio does it; when a techbro in
        silicon valley does it they get VC funding, a maserati and an entry on
        the 30 under 30 list.
       
          rgbrth wrote 5 hours 22 min ago:
          I don't think people are saying it's stupid. It's just funny that
          potentially some random municipality worker is going well beyond
          their work scope and making contributions in the AI world.
          
          Could be from Rio, could be from any municipality anywhere in the
          world. The fact that the account is actually from the town hall
          rahter than a personal account also makes it funnier.
       
        jkwang wrote 9 hours 57 min ago:
        This is a concerning pattern. Rebranding merged models as "homegrown"
        without disclosure undermines trust in open-source AI development. The
        community needs better provenance tracking and transparency standards
        for model releases.
       
        FooBarWidget wrote 10 hours 36 min ago:
        Can anyone explain to me what a merge is and why that works? It seems
        utterly bizarre to me that you can just merge weights. You can't make a
        working program by just merging machine instruction pages. Aren't
        weights tightly coupled to a specific architecture?
       
          antonvs wrote 10 hours 13 min ago:
          In this case both sets of weights ultimately came from the same
          model. The Nex model they used is a fine-time of Qwen, which was the
          other model they used.
          
          I'm not an expert in this area, but it's not too hard to see how a
          merge like that could turn out ok.
       
        thelonelyborg wrote 15 hours 51 min ago:
        this is probably occurring all over the world including in startups.
       
        aaronbrethorst wrote 16 hours 51 min ago:
        They really missed out by not calling it Neuromancer.
       
        pelasaco wrote 20 hours 8 min ago:
        an eternal 7x1.. and I am not talking about CuraÃ§ao..
       
        rafaquintanilha wrote 20 hours 19 min ago:
        I have no affiliation with them but here's what I think happened:
        
        1. They claim the official model is based on Qwen 397B. It's likely
        they didn't disclose Nex Pro at all because Nex itself is based on the
        same base model (not saying they shouldn't).
        
        2. The improvement would come from merging the weights PLUS on-policy
        distillation. The confusion is that the uploaded model didn't have the
        distillation at all.
        
        3. It's important to notice they didn't advertise the model besides
        posting it on Reddit 2 days ago. It became viral organically, over the
        weekend, and during Brazil's World Cup debut (Brazilians will
        understand). Of course the mayor of Rio took the opportunity to
        capitalize over the free coverage, but that wasn't done in conjunction
        with the researchers.
        
        4. I don't see why they would disclose Qwen 397B as base and mention
        the SwiReasoning paper but not mention Nex if all they did was to merge
        both models.
        
        5. In any case, what they are claiming is easily verifiable once (if)
        they upload the right model.
       
          motbus3 wrote 53 min ago:
          It seems to me this is clearly a mistake. They would not even have
          the resources for it as far as I know and I think they are not even
          on a position to such bold claims.
       
          s1artibartfast wrote 14 hours 26 min ago:
          My understanding is that they didnt do any distalation. Tevery weight
          is a 60/40 element wise average of QWEN and NEX. Is this possible if
          the rio contracter did thei own post-training as claimed?
          
  HTML    [1]: https://x.com/tenobrus/status/2066243352211996728/photo/1
       
          smus wrote 14 hours 46 min ago:
          What do you mean World Cup debut? haven't they won 5?
       
            alxndresp wrote 14 hours 5 min ago:
            They meant their first, opening game of this current World Cup
            tournament
       
          Aurornis wrote 16 hours 16 min ago:
          > 2. The improvement would come from merging the weights PLUS
          on-policy distillation. The confusion is that the uploaded model
          didn't have the distillation at all.
          
          They merged the base model with another labâs fine tuned model. The
          improvements could have come from getting some of the fine tuned
          weights from the other model.
          
          If they really had a better performing model that they
          âaccidentallyâ forgot to upload, they could have uploaded the
          correct file by now.
       
            croes wrote 10 hours 42 min ago:
            Seems they did
            
  HTML      [1]: https://news.ycombinator.com/item?id=48529544
       
              ipieter wrote 10 hours 18 min ago:
              I only see an edit to the readme (13h ago) and removal of the
              weights, so the repo is now empty.
              
              I am willing to give them the benefit of the doubt, but we've
              seen this before: a model gets released that is supposedly
              state-of-the-art, yet seems to be a an other repackaged model
              without any training. Reflection 70B was the most similar
              example, all they now need is an api that rewrites "Claude" to
              "Rio".
       
          matheusmoreira wrote 19 hours 21 min ago:
          I'm honestly impressed that this even happened at all. "Rio de
          Janeiro's homegrown LLM" is probably the last headline I ever
          expected to read on HN.
       
            airstrike wrote 15 hours 56 min ago:
            Worth reminding everyone that Lua was also created in Rio, though
            admittedly at PUC rather than by the government.
            
            Rio has a strong engineering talent pool, along with many other
            major capitals in Brazil
       
              mathattack wrote 12 hours 52 min ago:
              Yes.  Though even more than the US, their engineering talent from
              top schools heads into consulting and finance.
       
              matheusmoreira wrote 15 hours 38 min ago:
              Brazil does have talent. Mauro Carvalho Chehab is a Linux kernel
              maintainer. Elixir was created by JosÃ© Valim, a brazilian. I
              have also created my own programming language.
              
              What Brazil doesn't have is a history of properly rewarding
              talent, which often causes it to migrate elsewhere. So it's
              definitely surprising when any sort of technological development
              happens in Brazil: it implies someone who stayed managed to get
              something done, most likely for much less than what that
              something is actually worth, while also being crushed by
              extremely high taxes that essentially doubles the cost of
              computer hardware.
       
                red-iron-pine wrote 1 hour 58 min ago:
                > extremely high taxes that essentially doubles the cost of
                computer hardware.
                
                I think people are missing the last few words -- cost of
                computing hardware
                
                when I used to do ISP work I did a lot for LATAM.  The joke was
                that you'd get better bandwidth for Brazil routing out of the
                country and through Miami than going across the country.  The
                reason?  crazy high tariffs on hardware.
                
                No reason to base anything locally, and if you're not basing it
                locally then there isn't really much reason to stick around,
                either.  Go to other hot markets like Zona America, Austin,
                CDMX, Miami, Los Angeles, etc.    and make the big $$$.
                
                I worked with 2 Brazilian engineers who were in country (and
                currently work with a 3rd now, based in Monteal) and they were
                very good but all said they had to get out of country to lock
                in the serious engineering roles.
       
                jdahlin wrote 6 hours 50 min ago:
                Brazil has the opposite of high taxes, especially for company
                owners. I remember paying 6% on income, compared to up to 70%
                in Sweden.
       
                rbanffy wrote 8 hours 42 min ago:
                > extremely high taxes
                
                I always find this funny. Brazilian taxes are nowhere near what
                I would say âhighâ. I pay about twice as much out of my
                compensation as I would pay in Brazil, and that would be as if
                I did zero tax optimisation back then.
       
                  persedes wrote 33 min ago:
                  Parent was referring to the cost of hardware. I've had
                  colleagues from brazil visit the US and go absolutely crazy
                  at best buy to grab as much hardware as they could (laptops,
                  nintendo switch, etc), because it's prohibitively expensive
                  for them to buy that at home.
       
                  rglullis wrote 7 hours 4 min ago:
                  As an employee: your taxes are not that high, but public
                  services are terrible so most of middle-class ends up paying
                  for the private alternative as well.
                  
                  As a business owner: not so bad if you are a freelancing or
                  just a few business partners providing some type of service,
                  but terrible the moment you start considering employing other
                  people.
       
                    rbanffy wrote 6 hours 48 min ago:
                    > but public services are terrible
                    
                    Have you seen the public services of countries with lower
                    taxes? Their public hospitals?
                    
                    > but terrible the moment you start considering employing
                    other people.
                    
                    Employing people isn't cheap anywhere (except, perhaps, in
                    the US, where labour rights are kind of nonexistent)
       
                      rglullis wrote 6 hours 24 min ago:
                      I live in Germany. No such thing as public hospitals. And
                      I pay close to 1200â¬/month in health insurance to the
                      public insurance company.
                      
                      I quick visit to the dermatologist to check for some tiny
                      bumps that showed up in my forehead: 60â¬, out of
                      pocket, because the insurer doesn't cover it.
       
                        rbanffy wrote 1 hour 36 min ago:
                        Sad to hear about that. Ireland is much better in that
                        regard - you can pay for private healthcare and it'll
                        provide you a broader network, but you might as well go
                        for public health, where you'll be prioritized based on
                        how life-threatening is your condition.
       
                          rglullis wrote 1 hour 11 min ago:
                          Yeah, I make it sound worse than it seems. The
                          problem of the public insurance is that you pay based
                          on your revenue instead of your actuarial risk, so in
                          the end it should be treated as an extra form of
                          revenue tax. I could go for the private insurance if
                          I wanted to pay less, but then I'd have to switch my
                          kids to the private insurer as well.
                          
                          All in all, my point was only that the amount of
                          taxes that people pay and quality of services are not
                          necessarily related. Germany has high taxes and
                          expensive-but-adequate healthcare. Greece has high
                          taxes and expensive-and-inadequate healthcare.
                          Switzerland has low taxes and universal/cheap
                          healthcare (max. $5000/year deductible, max charge
                          per hospitalization of $700).
       
                  fabioz wrote 7 hours 37 min ago:
                  I can second this.
                  
                  Compared to many countries Brazil doesn't have such high
                  taxes (I'd say that if you work remotely for a company
                  outside of Brazil, you'll probably have much lower taxes
                  compared to almost any other country -- working locally the
                  difference isn't as big, but you have higher taxes in many
                  other places).
                  
                  What it really lacks is access to capital (which is the real
                  "mojo" of the US compared to the rest of the world).
       
                    iterateoften wrote 2 hours 27 min ago:
                    Also the bureaucracy, employee rights, etc.
                    
                    Incorporating and getting a functional business entity in
                    Brazil is harder. In USA I literally do in 5min online
                    including bank account. In Brazil they are taking out
                    microscopes to verify your signature on the paperwork
                    matches.
                    
                    And in the USA if you have one bad employee, just fire them
                    any time. In Brazil for better or for worse nowhere near as
                    easy. Obviously better for employees but businesses donât
                    like it because you can get stuck with a employee dragging
                    down everyone unless you pay them a years salary etc.
       
            cscheid wrote 18 hours 22 min ago:
            Yes! That "prefeitura do Rio" huggingface URL is definitely
            shocking to read to this Brazilian as well (I'm assuming you and
            parent also are from your usernames).
       
          throwa356262 wrote 19 hours 55 min ago:
          Regarding #2
          
  HTML    [1]: https://news.ycombinator.com/item?id=48529544
       
            xiphias2 wrote 12 hours 9 min ago:
            This should be at the top: they uploaded the wrong model, they
            fixed it
       
              jwitthuhn wrote 9 hours 22 min ago:
              They did upload the wrong model but as of the time of writing
              they have not fixed it. Right now, 12 hours after they took the
              old one down, there is simply no model present in their
              huggingface repo.
       
                xiphias2 wrote 7 hours 44 min ago:
                I guess they will upload it later, it seems like an honest
                mistake to me.
                
                Anyways SwiTransformer paper looks interesting and doing a post
                training to optimize for it looks interesting as well.
       
        delusional wrote 20 hours 40 min ago:
        It's absolutely insane to me that we are now at a point where the top
        of the front page of hacker news is a random GitHub issue about
        attribution to some random LLM merge, written in just the most
        disgusting AI slop style.
        
        I would like to downvote this please.
       
          vor_ wrote 15 hours 24 min ago:
          There's been a noticeable drop in quality. It's often a blend of AI
          culture war posts and arbitrary Github links.
       
        Havoc wrote 21 hours 34 min ago:
        Nex in turn is also based on qwen so donât think theyâre too far
        off
       
        diego_moita wrote 21 hours 49 min ago:
        WHAT!? There are thieves in Rio de Janeiro?
        
        Oh, I am so SHOCKED, so SHOCKED! /s
        
        Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de
        bandido" (Gangster's Land).
        
        Kinda like Chicago in the 20's or Naples and Palermo in the 90s.
       
        jordz wrote 21 hours 57 min ago:
        Can someone please explain or link to some information about how models
        are merged? Is this genuinely merging weights mathematically or some
        kind of distillation (presumably not if theyâve done zero training as
        the post suggests).
       
          jxmorris12 wrote 10 hours 19 min ago:
          Thereâs nothing to read.
          
          Model A: A_1, â¦, A_n
          Model B: B_1, â¦, B_n
          
          C_i = A_i * p + B_i * (1 - p)
          
          In other words, itâs just a linear combination of the other
          modelsâ weights, per position.
       
            joe_the_user wrote 9 hours 56 min ago:
            It's been a while since I looked at neural networks in detail. Do
            all the large models have a close enough architecture that this
            makes sense? Do they have the same number of layers and width? I
            had thought that each model it's own "secret sauce" of normal and
            special layers (convolution, max-pooling, something-something)
            stacked together. Genuinely curious.
       
          calebkaiser wrote 21 hours 51 min ago:
          This is a good starting point: [1] But yes, in general, merging
          refers to techniques that directly blend the weights of different
          models mathematically. It had a big moment of popularity ~2 years
          ago, with many so-called "Frankenmodels" popping up on leaderboards.
          
          I tend to think of merging as belonging to the same general umbrella
          as things like "abliteration", or other techniques that surgically
          modify the weights of a model without a traditional training/tuning
          loop. Maxime Labonne is a great person to follow if you're interested
          in this general area.
          
  HTML    [1]: https://huggingface.co/docs/peft/developer_guides/model_merg...
       
        hintymad wrote 22 hours 7 min ago:
        > Every weight tensor in Rio is, to thousands of standard deviations,
        the same 0.6/0.4 blend of Nex and Qwen â across all 60 layers and
        every component of the network. Other finetunes cannot be explained as
        interpolations.
        
        I find it amazing how robust the current deep learning models are. A
        simple linear combination of every weight did not degrade the
        performance of the model, but enhanced it.
       
          Davidzheng wrote 10 hours 22 min ago:
          it's interesting that this was even guessed at
       
            Davidzheng wrote 10 hours 19 min ago:
            ok I guess they had other clues then if you do any sort of
            comparison vs Nex & Qwen probably a lot of weird coincidences will
            show up if somehow the three weights are not linearly independent
            lol
       
          itkovian_ wrote 11 hours 31 min ago:
          This is called linear mode connectivity and seems to work for almost
          every large model. So well that in most cases itâs an explicit part
          of the training process; do many training âbranchesâ then merge
          then continue.
          
          It is not understood why it works so well.
       
            teravor wrote 9 hours 36 min ago:
            is that actually how they train them in the datacenter? the
            trillion sized weight vector gets cloned and sent off to groups of
            GPUs and averaged after?
       
          tarruda wrote 17 hours 41 min ago:
          What I find fascinating is the idea that there might be a set of
          "secret" tweaks that when applied to those weights (or even smaller
          models) could result in an intelligence simulation that could vastly
          surpass even something like Fable.
       
          moritzwarhier wrote 19 hours 40 min ago:
          If this is true, it really would be impressive.
       
          themafia wrote 20 hours 30 min ago:
          > A simple linear combination of every weight did not degrade the
          performance of the model, but enhanced it.
          
          Which could be a signal that your "performance" was so abysmal in the
          first place that even randomly applied training methods can't make it
          _worse_.
       
          kristjansson wrote 20 hours 38 min ago:
          
          
  HTML    [1]: https://thickets.mit.edu
       
          meindnoch wrote 21 hours 0 min ago:
          It shows that LLMs are an extremely wasteful approach to
          intelligence.
       
            antonvs wrote 13 hours 6 min ago:
            Compared to what?
       
            kristjansson wrote 20 hours 36 min ago:
            or that intelligence is merely the composition of many redundant,
            lossy, ~random components
       
          Aurornis wrote 21 hours 9 min ago:
          > A simple linear combination of every weight did not degrade the
          performance of the model, but enhanced it.
          
          Enhanced it on a couple benchmarks, supposedly.
          
          The game is to turn knobs until you get a benchmark run that shows an
          improvement, then ship it. There are a lot of fine tunes and chimera
          models on HuggingFace that are supposedly better at some specific
          test, but when you use them for anything else they're usually worse.
          
          This happens with a lot of the models that are modified to remove
          censorship. They succeed in getting the model to emit previously
          censored outputs, but the overall output quality decreases.
       
            monster_truck wrote 17 hours 51 min ago:
            I don't think your last point is correct. Ablation, when done
            correctly, seems to increase the quality and typically also the
            performance too.
       
              antonvs wrote 10 hours 32 min ago:
              I'm curious about where you got that idea from. Neither the
              theory nor the available examples support it. If it did, everyone
              knowledgeable would be using abliterated models.
       
              tredre3 wrote 13 hours 13 min ago:
              That is something often claimed by heretics. My experience
              couldn't diverge more, however. All heretic (and abliterix)
              models I've tried are worse than the original. It's not
              immediately obvious if all you do is ask 2-3 questions and marvel
              at how it didn't refuse, but try using them for real over longer
              8k+ contexts and it falls apart real fast.
              
              They're more prone to getting stuck in loops, becoming
              unresponsive, and hallucinating more (presumably because of the
              reduced desire to not answer).
              
              I've tried all the popular heretic peddlers, but if you have one
              that you can vouch for maybe I've simply missed it.
       
              Aurornis wrote 16 hours 46 min ago:
              Abliterarion is a brute force technique that removes or silences
              parts of the model. It reduces performance because the
              abliterated elements arenât perfectly isolated to censorship so
              other aspects suffer.
              
              Many of the âuncensoredâ model providers also do some fine
              tuning on the models. Some of them target better benchmarks or
              other measures, but outside of the benchmarks and metrics
              theyâre fine tuned for they are generally noticeably worse than
              the original model.
       
                yowlingcat wrote 15 hours 12 min ago:
                The kind of abliteration you are mentioning is no longer state
                of the art or the most common form of removing the refusal
                layer in most models. Your your understanding was up to date
                about a year and a half ago, but has been out of date since
                after that.
       
                  avadodin wrote 7 hours 13 min ago:
                  What OP is describing wasn't called abliteration at all.
                  
                  Abliteration whilst a neologism implies a surgical ablation
                  of refusal.
                  
                  Earlier approaches postâtrained the model to refuse less
                  and, much like other kinds of fineâtuning, it degraded
                  performance. They were "uncensored".
                  
                  Abliteration has seen some improvement to this day but it
                  always was close to equivalent performance to the original
                  when compared to those earlier techniques.
       
                  weitendorf wrote 11 hours 24 min ago:
                  Unrelated but Iâve been putting off learning about
                  post-abliteration technique and want to use it for an
                  upcoming open source âretrainingâ project I have on my
                  backlog. Iâm not interested in the refusal layers though,
                  more like deep fine tuning but in a way that might let me
                  prune out or consolidate layers, if that makes sense? Do you
                  have any pointers or links to the current SOTA in this area?
                  
                  I guess Iâm looking for a kind of bulk/sticky dropout
                  (which was in fashion way back when I studied DNN in school).
       
                  ls612 wrote 14 hours 15 min ago:
                  Nowadays it is that Heretic tool is it not? Iâve seen Gemma
                  models uncensored with it.
       
            manquer wrote 18 hours 23 min ago:
            >  game is to turn knobs until you get a benchmark run that shows
            an improvement, then ship it
            
            i.e reinforcement learning against a weak reward function -
            benchmark is insufficiently complex and is not representative of
            the real world sufficiently.
            
            The "game", i.e. decision tree can be modeled as a multi-arm bandit
            problem, to deploy finite resources ( compute) toward
            exploitation/exploration .
            
            The main issue is each training / fine-tune is very expensive so
            number of chances at the slot so to speak is pretty limited today.
       
            andai wrote 20 hours 50 min ago:
            They seem to have deleted most of the README now, but the archived
            version has benchmarks. [1] And the Nex benchmarks for comparison
            [2] Rio seems to be about halfway between Qwen 3.5 and Nex, as
            you'd expect?
            
  HTML      [1]: https://web.archive.org/web/20260614082641/https://hugging...
  HTML      [2]: https://huggingface.co/nex-agi/Nex-N2-Pro
       
          x312 wrote 21 hours 10 min ago:
          This works because Nex itself is a finetune of Qwen3.5 ( [1] ). It's
          merging Qwen3.5 with a Qwen3.5 finetune.
          
          I don't believe this would work on two LLMs that have different
          pretraining. Even if it did you would need two LLMs that have exact
          same internal activation shapes, dimensions, expert counts, token
          vocabulary, realistically it would never happen outside of finetunes
          or academic experiments.
          
  HTML    [1]: https://huggingface.co/nex-agi/Nex-N2-Pro
       
            hashmap wrote 19 hours 32 min ago:
            not this exact thing, no, because the functional circuits dont
            appear in the same places across models. but if you find where they
            are you can do something like branch between some of the middle
            functional circuits between models and it kinda just works, or even
            do one after the other. you cant just like swap any two layers
            cause a bunch of em bend hyperbolic curvature to do hierarchical
            stuff deep in the poincare ball and the geometries get all bonkers,
            but before and after they do that things are relatively flat, and
            the geometries are more or less transferrable up to rigid rotation
            if they're each trained on large enough data.
       
            oofbey wrote 19 hours 53 min ago:
            Correct.  We used to think that because NN optimization is
            non-convex there are all these local minima.  Now we know that once
            you get past the very early parts of training from random init, the
            loss surface is fairly smooth, and not really convex, but close
            enough in a bunch of ways - linear combinations of trained models
            are pretty much always valid combinations.  You can think of fine
            tunings as deltas on the original model which can be summed
            together successfully.    I think this paper first showed that to me:
            [1] which was 8 years ago now.
            
  HTML      [1]: https://arxiv.org/pdf/1802.10026
       
          woadwarrior01 wrote 21 hours 46 min ago:
          It's is a well known idea[1], although it's still surprising that
          something as simple, even works.
          
          [1] 
          
  HTML    [1]: https://arxiv.org/abs/2203.05482
       
            kolanos wrote 21 hours 21 min ago:
            This team could have stopped here and still had something
            interesting (albeit not novel) to show. But the hype cycle was too
            tempting.
       
        jrm4 wrote 22 hours 36 min ago:
        âWell, Steve (Jobs), I think itâs more like we both had this rich
        neighbor named Xerox, and I broke into his house to steal the TV set,
        but I found out that you had already stolen it.â
        
        -- Bill Gates
       
          ckcheng wrote 22 hours 2 min ago:
          Whatâs more funny to me is the set up to that quote:
          
          > Bill Gates had somehow manifested, alone, surrounded by ten Apple
          employees. â¦ Steve started yelling at Bill, asking him why he
          violated their agreement.
          
          And whatâs more interesting is the conclusion:
          
          > Apple filed a monumental copyright lawsuit against Microsoft in
          1988, but they eventually lost on a technicality (the judge ruled
          that Apple inadvertently gave Microsoft a perpetual license to the
          Mac user interface in November 1985).
          
          Microsoft didnât steal Appleâs GUI â¦ Apple gave it to them.
       
            themafia wrote 20 hours 26 min ago:
            Two spoiled rich kids arguing over who's morality is the least
            worst.
            
            That this moment is held up as some great exchange in business is
            annoying.  That our regulatory agencies are perennially sleep at
            the switch and allow this nonsense to keep happening is extremely
            frustrating.
       
              ChrisClark wrote 19 hours 45 min ago:
              Held up as some great exchange?  No it's two assholes arguing
              with each other.  Just like most Jobs documentaries show him as a
              terrible person.
       
            alexgoodhart wrote 20 hours 36 min ago:
            That isnât fully true is it?
            
            Microsoft claimed that its softwareâs use of various
            visualizations related to window state was covered by the 1985
            agreement, and Apple claimed that this was not true; those window
            states were produced by Macintosh while Microsoftâs software was
            being rendered in the Mac environment.
            
            > In his March 20, 1989 Order, Judge Schwarzer declined to consider
            whether the visual displays in issue were generated by the
            Microsoft application programs or by the Macintosh system software.
            The point arose in connection with Microsoft's argument that the
            1985 Agreement licensed to Microsoft all visual displays that could
            possibly be called up by running the five Microsoft application
            programs on the Macintosh system software then or in the future.
            709 F. Supp. at 929. Judge Schwarzer concluded that Microsoft's
            contention would "defy common sense." Id.
       
          wunderlotus wrote 22 hours 19 min ago:
          lmao i really hope this is a real quote cuz itâs a banger
       
            ckcheng wrote 22 hours 9 min ago:
            Apparently:
            
  HTML      [1]: https://www.folklore.org/A_Rich_Neighbor_Named_Xerox.html
       
        yieldcrv wrote 23 hours 0 min ago:
        Didnât the last thread about this have someone from the lab or an
        enthusiast in Rio saying exactly that?
        
        Its a fine tune of Qwen
        
        Not a conspiracy
       
          daemonologist wrote 22 hours 41 min ago:
          The allegation here is that it's not actually a fine-tune of Qwen,
          but instead an undisclosed mashup (merge) of someone else's fine-tune
          of Qwen and the original model.  Rio subsequently said that the model
          was in fact a merge, that they did additional fine-tuning after the
          merge, and that they accidentally uploaded the base merge instead of
          the version with additional fine-tuning.  But this seems like quite
          an oversight...
       
            yieldcrv wrote 21 hours 4 min ago:
            > But this seems like quite an oversight...
            
            Not to me, what would people like to happen? Who are those people?
            And why do they care?
       
              antonvs wrote 10 hours 7 min ago:
              They made a public claim to having produced a useful model, which
              they published. Turns out they did nothing of the sort.
              
              > why do they care?
              
              Why does anyone ever care about having their time wasted by
              fraudulent claims?
       
                yieldcrv wrote 5 hours 19 min ago:
                Continue to explain like Iâm 5 instead of the rhetoricals
       
        fkozlowski wrote 23 hours 3 min ago:
        I'm honestly surprised that they even had the inclination to attempt
        creating a model. I guess it's bullish that a municipal IT department
        had the guts to try this?
       
          axus wrote 21 hours 18 min ago:
          I like the [dead] comment theory that they proposed a huge LLM
          training budget to the government, kept most of the money, and
          released a cheap merge to justify the grift.
       
            fkozlowski wrote 15 hours 52 min ago:
            Ah that makes sense
       
            dormento wrote 17 hours 43 min ago:
            This would be so very brazilian of them.
            
            Source: am Huelander.
       
            seba_dos1 wrote 19 hours 1 min ago:
            It's kinda weird to claim extraordinary results in such case
            though, as that brings a lot of eyes to it.
       
              mgambati wrote 18 hours 3 min ago:
              Nothing weird. The mayor wanted something brag about. That Rio,
              my friend.
       
            matheusmoreira wrote 19 hours 11 min ago:
            That's essentially Brazil's standard operating procedure. Wouldn't
            be surprising if that turned out to be the case.
            
            Still, I'm actually impressed that this even happened at all. "Rio
            de Janeiro's homegrown LLM" is the last headline I expected to read
            on HN.
       
          Havoc wrote 21 hours 33 min ago:
          Merges and fine tunes are within reach of individuals with some money
          to burn so Iâm sure a muni can do it
       
        MadrasTh0rn wrote 23 hours 7 min ago:
        Not surprised
       
          nom wrote 21 hours 54 min ago:
          why not?
       
            diego_moita wrote 21 hours 46 min ago:
            It is a recurrent Brazilian meme: Rio is known in Brazil as "terra
            de bandido" (gangster's land).
            
            The majority of their politicians have ties to organized crime.
            There is a virtual revolving door between police and crime, where
            people migrate from one to the other.
            
            It is like Chicago in the 20s, Naples and Medelin in the 80s or
            Moscow and Culiacan (Sinaloa, Mexico) today.
       
              dormento wrote 17 hours 40 min ago:
              Rio is kinda funny as a litmus test - federal government creates
              laws to try and curb some of the corruption, and Rio produces
              better and better corrupts - so far Rio is winning.
              
              BTW wasn't it a few months ago the current governor wanted to
              leave to be able to run as a candidate, so he asked a supreme
              justice to step in in as governor, since there wasn't anyone else
              that technically could?
       
                brunoarueira wrote 13 hours 17 min ago:
                No, he left to be a Senate candidate and their vice governor
                left in 2025 to another role, then the next in line is the
                Legislative Assembly of the State of Rio de Janeiro president,
                but him was jailed and away from the role. So the next is a
                judge from the Justice Tribunal.
       
              alexgoodhart wrote 20 hours 35 min ago:
              Somehow I doubt that political affiliations with crime syndicates
              are affecting heavily the dispositions of LLM developers. The
              industry itself though is one of incest.
       
                sebastianconcpt wrote 18 hours 3 min ago:
                Politicians don't come from outer space, they emerge locally
                and were raised swimming in an imaginary that has normalized
                the morals that eventually end up expressed at the top.
       
                afh1 wrote 18 hours 38 min ago:
                He is putting into question the character of the public workers
                involved in the project, not that it has anything to do with
                organized crime. Rio has relapsed into crime in the last
                decades and government workers in general have a reputation for
                corruption in Brazil. It's a low trust society specially north
                of Parana hence the lack of surprise.
       
        ekjhgkejhgk wrote 23 hours 7 min ago:
        One funny thing about incompetence is that they don't have the
        competence to know that their incompetence is straightforward to verify
        by a competent person.
       
          thimabi wrote 22 hours 37 min ago:
          I wouldnât describe what happened here as incompetence. As a
          âcariocaâ, I am pleasantly surprised to know that the
          governmentâs IT department is involved in AI work â even without
          the budget to create its own models from scratch.
       
            antonvs wrote 10 hours 11 min ago:
            They could do AI work without trying to lie to the entire rest of
            the world.
       
            reese_john wrote 20 hours 23 min ago:
            It is a testament to the bloat and overreach of the Brazilian state
            in the economy. Such endeavors should be left to the private sector
       
              thimabi wrote 17 hours 31 min ago:
              I disagree. Iâd prefer if my government invested more in AI
              solutions, so as not to depend so much on foreign technology.
              
              In an ideal world, Brazil would have a thriving private sector,
              capable of competing even in the AI sector. Unfortunately,
              thatâs not the case, and I believe that without government
              action such endeavors wonât really succeed.
       
            arcticfox wrote 22 hours 30 min ago:
            This seems kind of insane though, every time I go to Rio I think of
            the potential of AI/technology to solve some problems and leave it
            even more paradisiacal... But working on their own model? Wtf?
            There are a million applications of existing ones there that should
            be followed up on instead.
       
          carlosjobim wrote 22 hours 50 min ago:
          Why would they care? They get their salaries and pensions and
          bonuses, and the tax payer is footing the bill.
       
          root-parent wrote 23 hours 2 min ago:
          You just described every single vibe coder...
       
            vvpan wrote 19 hours 46 min ago:
            I think that's unfair to "vibe coding". If anybody explicitly
            claims to vibe coding something than they are admitting to low
            supervision of the code. And on the contrary you can also
            AI-produce code that you have supervised highly. I suppose there
            are people who both AI their code and push it as bespoke but I, for
            one, have not met such a person at our outside of work.
       
              root-parent wrote 19 hours 3 min ago:
              >> but I, for one, have not met such a person at our outside of
              work.
              
  HTML        [1]: https://news.ycombinator.com/item?id=48516679
       
        alfiedotwtf wrote 23 hours 14 min ago:
        Wasnât it already obvious given the awfully familiar parameter
        numbers?
       
          intoXbox wrote 21 hours 29 min ago:
          That only tells what base architecture they used, but fine tuning
          does not increase the number of weights, it just adapts the weights
          to improve better on a fine tuning dataset- something they claimed
          they had done
       
        zinodaur wrote 23 hours 20 min ago:
        Oh no, someone is profiting off of their work without proper
        attribution!?!?
       
          s1artibartfast wrote 14 hours 23 min ago:
          How do you feel about the government or government contractors saying
          they did a bunch of work when they did nothing instead?
       
          Aurornis wrote 22 hours 27 min ago:
          This is an open weights model based on other open weights models.
          
          The dispute is that they released it with claims about having done
          some post training that improved the outputs. It was discovered that
          the model was not post trained like they claimed.
          
          The HF page now says itâs a merge of models, which wasnât there
          before. Theyâre trying to claim they accidentally uploaded the
          wrong model to HF and that theyâll upload the real one soon.
          
          Basically, they thought they could splice two open weights models
          together and claim their team had accomplished some amazing post
          training, but they werenât smart enough to realize that other
          researchers would discover that there wasnât any post training.
       
            iknowstuff wrote 22 hours 10 min ago:
            How do they just splice two models together?
       
              ninja3925 wrote 21 hours 56 min ago:
              Out of curiosity, how was it discovered? You would have to look
              for it to find this linear combination.
       
                jdiff wrote 20 hours 52 min ago:
                Without the system prompt, asking its name results in it
                responding with the name of the model they're ripping from.
                That would certainly draw your eyes to the right places.
       
                  valleyer wrote 20 hours 43 min ago:
                  Why is this?  Do labs reinforce the model name during
                  training?  I was under the impression that this sort of
                  "self-knowledge" always came from the system prompt, but I
                  guess not...
       
                    jdiff wrote 19 hours 48 min ago:
                    Yes. In this case, during fine tuning. Other blurbs are
                    also baked in during fine tuning that are perfectly
                    reproducible from the Nex model. The details inside the
                    linked issue are quite accessible.
       
                Aurornis wrote 21 hours 38 min ago:
                Check the linked GitHub issue. They explain their process.
                
                Scroll past the first issue to find it. Itâs further down.
       
              Aurornis wrote 22 hours 4 min ago:
              The Nex N2 model they merged is based on Qwen 3.5, so you can
              swap pieces of one into the other. They found a combination of
              the two that did well on some benchmarks and shipped it.
              
              In the early days of Llama there were a lot of experiments like
              this. There were even some interesting combinations of models
              where they stacked layers of different models together or even
              added more layers with interesting results.
              
              But announcing that you spliced two models together isn't very
              impressive in 2026, so they announced that they had done their
              own post training and outdid the big labs. They thought nobody
              would look close enough to notice.
       
            moritzwarhier wrote 22 hours 22 min ago:
            Thanks for the factual clarification. This is so important when
            everyone already has their trigger finger on politics. Not meaning
            that politics are irrelevant here, see sister comment by jobim.
            
            But it's impossible to form a nuanced opinion when political
            association has a higher priority than the facts; which, again,
            don't look flattering for the implementers.
       
          carlosjobim wrote 22 hours 53 min ago:
          This is a pure scam on tax payer money. But what else would be
          expected?
       
            hootz wrote 20 hours 59 min ago:
            Apparently no public money was involved.
       
              jdiff wrote 20 hours 50 min ago:
              This is contrary to the mayor's words on Twitter.
              
              > An open AI model trained in Rio with public funding over the
              last year by @Prefeitura_Rio surpassing all other models.
              
  HTML        [1]: https://x.com/CavaliereRio/status/2065984620626129026
       
            jrm4 wrote 22 hours 38 min ago:
            Unlike the big companies who do this, which often are merely impure
            scams on tax payer money a little more downstream.
       
              philipallstar wrote 21 hours 57 min ago:
              Companies that generate loads of corporation tax, income tax, and
              VAT revenue are the exact opposite of wastes of public money.
       
                jrm4 wrote 20 hours 36 min ago:
                Yes, when they do so proportional to what they take, especially
                as compared to individuals and their tax liabilities.
                
                You'll have to let me know when that finally happens, because
                that ain't now.
       
                  philipallstar wrote 11 hours 26 min ago:
                  Sorry, I've no idea how to read your first sentence.
                  
                  Your second one - that's how everything public is paid for.
                  Private individuals pay tax, either through their
                  corporations paying corporation tax or the tax bill on top of
                  their wage bills, which a) drives up prices of the goods and
                  services they offer, or depresses wages, and b) funds all the
                  public sector employees and orgs that don't pay tax (orgs) or
                  don't pay net tax (employees).
       
              carlosjobim wrote 22 hours 27 min ago:
              Great, now we're defending embezzlement and fraud with public
              funds on HN, because we really really hate big business.
              
              A child caught doing something bad will cry "but my friends also
              did it!", is that the level of reasoning hackers want to be at?
       
                lostlogin wrote 21 hours 56 min ago:
                > Great, now we're defending embezzlement
                
                I might be missing something, but I donât see anyone
                defending the the scams.
       
                sdevonoes wrote 22 hours 2 min ago:
                There are no hackers around here anymore. HN is mainly about
                business  nowadays
       
                  dmix wrote 21 hours 28 min ago:
                  HN has always discussed business
       
                blanched wrote 22 hours 20 min ago:
                That seems like a bad faith read to me. Nobody is defending it,
                just pointing out the irony / hypocrisy. Two things can be bad,
                and they can be related.
       
                  carlosjobim wrote 16 hours 41 min ago:
                  You'd be surprised to hear then that I'm not the owner of any
                  big company which embezzles tax payer money, and have never
                  been involved in such.
       
                    blanched wrote 16 hours 20 min ago:
                    I donât follow how that makes sense as a response to what
                    I said?
       
                      carlosjobim wrote 16 hours 8 min ago:
                      Why would I be a hypocrite for pointing out public fund
                      embezzlement?
       
                        blanched wrote 16 hours 6 min ago:
                        Youâre not. The originally mentioned âbig
                        companiesâ are.
       
                jrm4 wrote 22 hours 23 min ago:
                What part of that said "defense?"
                
                They can both be bad.
       
          bachmeier wrote 22 hours 58 min ago:
          "Their work"? First you had the original content creators that did
          99.99% of the work. Then you had the US companies bundle it up into a
          frontier LLM. Then "they" did the "work" of using the US model as a
          foundation for their own. So in the sense of doing 0.00001% of the
          actual work that went into their product, sure.
          
          I'd say it's more like someone forking a Linux distro, adding a few
          themes and fonts, and then complaining when someone else forks their
          distro and adds another theme.
       
            idiotsecant wrote 22 hours 44 min ago:
            Oof this is delete your post level I think. Sorry bud, I been
            there.
       
            JoshStrobl wrote 22 hours 48 min ago:
            That joke really went over your head, huh...
       
            bwilliams18 wrote 22 hours 51 min ago:
            That was the joke of the parent comment.
       
            harikb wrote 22 hours 51 min ago:
            It is only a problem if you claim it to be an independently
            developed OS with no attribution to base
       
            dghlsakjg wrote 22 hours 52 min ago:
            Thatâs the joke.
       
              bachmeier wrote 21 hours 13 min ago:
              It isn't. The entirety of the comment I responded to is "Oh no,
              someone is profiting off of their work without proper
              attribution!?!?" It's a valid point, but references someone using
              content created by others for profit. I'm objecting to equating
              this project with the work done by the original content creators.
              They're not remotely the same thing.
              
              I understand how the internet works and how people respond to
              others in this type of setting, but the comment I replied to did
              not in any way make the point I was making about the
              disproportionate nature of relative contributions.
       
                vasco wrote 10 hours 29 min ago:
                > I understand how the internet works and how people respond to
                others in this type of setting,
                
                You should frame this as a reminder to be more charitable in
                your positions because sometimes you can be wrong. This
                subthread ended being one of the funniest I've read recently.
       
                dghlsakjg wrote 13 hours 18 min ago:
                > It isnât
                
                It is.
                
                > I understand how the internet works and how people respond to
                others in this type of setting, but the comment I replied to
                did not in any way make the point I was making about the
                disproportionate nature of relative contributions.
                
                Do you understand?
                
                Jokes arenât that funny when you have to dig into an
                explanation on the nuance of why the hidden meaning doesnât
                match the surface meaning in exact degree and proportions. That
                turns a joke into a pedantic comment. And paradoxically muddies
                the point by explaining it.
                
                We arenât morons. We understand that Picasso is doing
                something on a different level than someone feeding bulk
                scraped JPGs of paintings into a python script. You really
                donât have to explain.
       
                  bachmeier wrote 3 hours 32 min ago:
                  Have a nice day.
       
                idiotsecant wrote 18 hours 8 min ago:
                It's time to stop digging
       
          internet2000 wrote 23 hours 18 min ago:
          Attribution isn't the relevant part. Lying about your lab's
          capabilities is.
       
            themafia wrote 20 hours 29 min ago:
            It seems to me like the lies are both for the same reason.  To
            capture attention and profits that are not deserved.
       
            vips7L wrote 21 hours 0 min ago:
            Sounds like the whole AI movement.
       
            outside2344 wrote 22 hours 28 min ago:
            But the whole game is lying and stealing isn't it?
       
            adrian_b wrote 22 hours 58 min ago:
            I do not see anyone lying.
            
            The model card says:
            
            > Post-trained from Qwen 3.5 397B
            
            The model card also says that they use an inference framework based
            on "SwiReasoning: Switch-Thinking in Latent and Explicit for
            Pareto-Superior Reasoning LLMs" by Shi et al.: [1] So the sources
            seem properly attributed.
            
            They only claim that what they did to "Qwen 3.5 397B" has improved
            the LLM, including, as expected, with "strong performance in
            Portuguese".
            
  HTML      [1]: https://arxiv.org/abs/2510.05069
       
              petu wrote 22 hours 27 min ago:
              That's attribution to Qwen team.
              
              There (is/was) no attribution to Nex team (they've released a
              model based on Qwen 3.5 397B as well).
              
              As per OP link Nex claims that what Rio team released (so far) is
              just linear interpolation of weights between Nex and OG Qwen
              model. With no attribution to Nex and zero signs of Rio doing any
              training of their own.
       
              00index wrote 22 hours 31 min ago:
              Are you talking about the credit that was just updated an hour
              ago? lol
       
            functionmouse wrote 22 hours 58 min ago:
            leopards ate my face
       
            Planktonne wrote 23 hours 9 min ago:
            That's also something all the AI companies have been doing.
       
              low_tech_love wrote 20 hours 44 min ago:
              Theyâre using public money to âtrainâ this.
       
              dofm wrote 22 hours 52 min ago:
              Lying about model capability is right now the lingua franca of
              the cloud AI business model, almost; they yes-and each other's
              lies because they are in a position of needing to generate
              interest, including going as far as needing to trigger regulatory
              capture.
              
              (It's not news to anyone who has worked in sales-led businesses
              that salespeople are prone to believing the claims of other
              salespeople, I guess).
       
                selcuka wrote 16 hours 51 min ago:
                > Lying about model capability is right now the lingua franca
                of the cloud AI business model
                
                Lying about your lab's capabilities != Lying about model
                capability
                
                Exaggerating the capabilities of a new model that you've
                actually trained in press bulletins can be called marketing.
                Merging two models and claiming that you trained a new model is
                plain lazy.
       
        AlienRobot wrote 23 hours 20 min ago:
        The model's webpage at [1] says it's a merge now. It previously didn't
        contain this paragraph:
        
        >The model is built via a merge of [2] and [3] , proceeded by On-Policy
        Distillation from a stronger model. We detected an incorrect upload in
        the previous version, where the base merged version was upload instead
        of the final distilled model. We are sorry for the confusion and
        apologize profusely.
        
        Incidentally are people using Github issues as blogs now?
        
  HTML  [1]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B
  HTML  [2]: https://huggingface.co/nex-agi/Nex-N2-Pro
  HTML  [3]: https://huggingface.co/Qwen/Qwen3.5-397B-A17B
       
          jonchurch_ wrote 22 hours 13 min ago:
          Edit: I didnt even notice until someone pointed out this was on the
          Nex-n2 repo not the rio one, now I understand the OPâs confusion!
          
          It wasnt framed as an issue which is the norm breakage I think
          youâre reacting to, as in they didnt ask that the readme be updated
          etc, but it is common now for folks to use a projectâs issue
          tracker to name and shame them in a place they cant easily ignore.
          
          Whether thatâs right, prosocial, or professional is up for debate
          (as well as if any single definition of etiquette can be expected in
          2026 on an issue tracker).
          
          But surely you can see the optics reason why someone would take their
          complaint to the repo directly? It pressures the maintainers to
          respond, it allows for a pile on from the internet, and makes any
          decision to lock down a hostile thread into its own kind of
          statement.
          
          The maintainers should absolutely post an official response and lock
          the thread though, it will likely get ugly in there.
       
            ChoosesBarbecue wrote 21 hours 38 min ago:
            But this is posted on Nex's GitHub, not on "Rio de Janeiro's"
            GitHub.
            
            i.e. this is the maintainer posting on their own GitHub Issues.
       
        AnotherGoodName wrote 23 hours 27 min ago:
        This is fascinating that it worked though. Can we just merge all the
        open weight models and get something better?
       
          vor_ wrote 15 hours 27 min ago:
          Merging related models has been a very common practice for years. See
          the Stable Diffusion community.
       
          nylonstrung wrote 22 hours 1 min ago:
          If you go to Civitai this is pretty how it works in that corner of
          the image generation world
          
          Everything is using Stable Diffusion as underlying model, then most
          of the usage is merged of checkpoints
       
          avereveard wrote 22 hours 54 min ago:
          most merge improve a small subset of "feeling" benchmark (too small,
          too specific, or out of distribution) and tend to show degradation on
          actual benchmark, with especially punishing result on long chain
          benchmarks.
          
          also only work on matching architectures (i.e. finetunes/loras of the
          same model)
       
          dindunuf wrote 23 hours 0 min ago:
          that kinda worked in llama 1/2 era, not between different models but
          between finetunes of the same model. the briefly legendary Mythomax
          was IIRC a merge of 5+ tunes, some of which were merges themselves.
       
          wds wrote 23 hours 16 min ago:
          I imagine it'd work the same as merging all the good-tasting foods to
          get an even tastier one
       
          _3u10 wrote 23 hours 21 min ago:
          No, they need the same arch, but you can distill them into a single
          model. And yes, if you use the API directly Claude will often say
          itâs an open weight model (likely the ones it was distilled from)
       
        unrvl22 wrote 1 day ago:
        The municipality of Rio de Janeiro (via its IT company IplanRIO)
        released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune
        that beats comparable open models on benchmarks. The linked issue
        argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40%
        Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.
       
          vasco wrote 10 hours 36 min ago:
          Rio better have the best IT infrastructure and software in the world
          if they are spending time on LLMs. What a waste of tax payer money.
       
            vitorgrs wrote 9 hours 36 min ago:
            Piaui state it's also doing a LLM it seems. But indeed it would
            make more sense if it was a national thing rather than local...
       
          DonsDiscountGas wrote 22 hours 25 min ago:
          I didn't know model merging like that was possible. (Obviously
          possible from a pure software standpoint but I'm surprised it's
          effective)
       
            baobabKoodaa wrote 2 hours 52 min ago:
            A few years back these used to be called "Frankenstein models"
       
            hypercube33 wrote 12 hours 28 min ago:
            Even merging models with themselves as shown here in the post how
            they got to the top of hugging face with two gpus
       
            bwhitty wrote 21 hours 22 min ago:
            As another poster above linked, itâs been shown to be effective
            since 2022:
            
  HTML      [1]: https://arxiv.org/abs/2203.05482
       
              nightpool wrote 18 hours 32 min ago:
              it works because Nex N2 is also a derivative of the original base
              Qwen model. If it was two completely unrelated models it wouldn't
              work.
       
          Lucasoato wrote 22 hours 45 min ago:
          So the problem isnât in the missing attribution to Qwen, but with
          the fact that they didnât mention Nex-N2 Pro right?
       
            Aurornis wrote 22 hours 25 min ago:
            The problem is that they claimed to have made a big achievement
            with their home grown post training, and they expected to receive a
            lot of praise for it.
            
            Then researchers looked at the weights and there is no post
            training at all.
            
            They are now attributing both models they merged, but their excuse
            for the lack of post training is to claim they accidentally
            uploaded the wrong files.
       
              serial_dev wrote 21 hours 12 min ago:
              Iâd believe they accidentally uploaded the wrong files if they
              uploaded the correct ones. To state that they accidentally
              uploaded something else and then not upload the correct version
              means they probably do not have anything and either hope people
              forget about this or they are scrambling to have something that
              is at least close to their original claim.
       
                evilduck wrote 16 hours 10 min ago:
                "Oops, we uploaded the wrong files" is the standard deflection
                every time people like this get caught.
                
                Look up "Reflection 70B" drama.
       
       
   DIR <- back to front page