_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
HTML Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model
nicman23 wrote 4 hours 57 min ago:
is it any good?
RandyOrion wrote 5 hours 2 min ago:
Please do not claim you trained a new model, only to got caught
red-handed by others. There are already several people or groups did
that, got caught, and vanished in no time.
Check how the "authors" of "this model" react to this problem [1]. See
how they deal with this problem by first changing their affiliation
from [1] to [2], then saying that they are sorry for being caught [3],
then just remove all their affiliations once for all [4].
I think the "authors" of "this model" [5] should be held accountable
until they upload new checkpoints, and the performance of the new model
is verified by third-parties.
P.S. To people who downvoted me, show me why you're doing this. [1] [3]
[2] [3] [4] [5]
HTML [1]: https://iplanrio.rio.rj.gov.br
HTML [2]: https://iplanrio.prefeitura.rio
HTML [3]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/commit...
HTML [4]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/commit...
HTML [5]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/commit...
HTML [6]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/commit...
HTML [7]: https://huggingface.co/prefeitura-rio
blitzar wrote 6 hours 14 min ago:
Its stupid and hilarious when someone in Rio does it; when a techbro in
silicon valley does it they get VC funding, a maserati and an entry on
the 30 under 30 list.
rgbrth wrote 5 hours 22 min ago:
I don't think people are saying it's stupid. It's just funny that
potentially some random municipality worker is going well beyond
their work scope and making contributions in the AI world.
Could be from Rio, could be from any municipality anywhere in the
world. The fact that the account is actually from the town hall
rahter than a personal account also makes it funnier.
jkwang wrote 9 hours 57 min ago:
This is a concerning pattern. Rebranding merged models as "homegrown"
without disclosure undermines trust in open-source AI development. The
community needs better provenance tracking and transparency standards
for model releases.
FooBarWidget wrote 10 hours 36 min ago:
Can anyone explain to me what a merge is and why that works? It seems
utterly bizarre to me that you can just merge weights. You can't make a
working program by just merging machine instruction pages. Aren't
weights tightly coupled to a specific architecture?
antonvs wrote 10 hours 13 min ago:
In this case both sets of weights ultimately came from the same
model. The Nex model they used is a fine-time of Qwen, which was the
other model they used.
I'm not an expert in this area, but it's not too hard to see how a
merge like that could turn out ok.
thelonelyborg wrote 15 hours 51 min ago:
this is probably occurring all over the world including in startups.
aaronbrethorst wrote 16 hours 51 min ago:
They really missed out by not calling it Neuromancer.
pelasaco wrote 20 hours 8 min ago:
an eternal 7x1.. and I am not talking about Curaçao..
rafaquintanilha wrote 20 hours 19 min ago:
I have no affiliation with them but here's what I think happened:
1. They claim the official model is based on Qwen 397B. It's likely
they didn't disclose Nex Pro at all because Nex itself is based on the
same base model (not saying they shouldn't).
2. The improvement would come from merging the weights PLUS on-policy
distillation. The confusion is that the uploaded model didn't have the
distillation at all.
3. It's important to notice they didn't advertise the model besides
posting it on Reddit 2 days ago. It became viral organically, over the
weekend, and during Brazil's World Cup debut (Brazilians will
understand). Of course the mayor of Rio took the opportunity to
capitalize over the free coverage, but that wasn't done in conjunction
with the researchers.
4. I don't see why they would disclose Qwen 397B as base and mention
the SwiReasoning paper but not mention Nex if all they did was to merge
both models.
5. In any case, what they are claiming is easily verifiable once (if)
they upload the right model.
motbus3 wrote 53 min ago:
It seems to me this is clearly a mistake. They would not even have
the resources for it as far as I know and I think they are not even
on a position to such bold claims.
s1artibartfast wrote 14 hours 26 min ago:
My understanding is that they didnt do any distalation. Tevery weight
is a 60/40 element wise average of QWEN and NEX. Is this possible if
the rio contracter did thei own post-training as claimed?
HTML [1]: https://x.com/tenobrus/status/2066243352211996728/photo/1
smus wrote 14 hours 46 min ago:
What do you mean World Cup debut? haven't they won 5?
alxndresp wrote 14 hours 5 min ago:
They meant their first, opening game of this current World Cup
tournament
Aurornis wrote 16 hours 16 min ago:
> 2. The improvement would come from merging the weights PLUS
on-policy distillation. The confusion is that the uploaded model
didn't have the distillation at all.
They merged the base model with another labâs fine tuned model. The
improvements could have come from getting some of the fine tuned
weights from the other model.
If they really had a better performing model that they
âaccidentallyâ forgot to upload, they could have uploaded the
correct file by now.
croes wrote 10 hours 42 min ago:
Seems they did
HTML [1]: https://news.ycombinator.com/item?id=48529544
ipieter wrote 10 hours 18 min ago:
I only see an edit to the readme (13h ago) and removal of the
weights, so the repo is now empty.
I am willing to give them the benefit of the doubt, but we've
seen this before: a model gets released that is supposedly
state-of-the-art, yet seems to be a an other repackaged model
without any training. Reflection 70B was the most similar
example, all they now need is an api that rewrites "Claude" to
"Rio".
matheusmoreira wrote 19 hours 21 min ago:
I'm honestly impressed that this even happened at all. "Rio de
Janeiro's homegrown LLM" is probably the last headline I ever
expected to read on HN.
airstrike wrote 15 hours 56 min ago:
Worth reminding everyone that Lua was also created in Rio, though
admittedly at PUC rather than by the government.
Rio has a strong engineering talent pool, along with many other
major capitals in Brazil
mathattack wrote 12 hours 52 min ago:
Yes. Though even more than the US, their engineering talent from
top schools heads into consulting and finance.
matheusmoreira wrote 15 hours 38 min ago:
Brazil does have talent. Mauro Carvalho Chehab is a Linux kernel
maintainer. Elixir was created by José Valim, a brazilian. I
have also created my own programming language.
What Brazil doesn't have is a history of properly rewarding
talent, which often causes it to migrate elsewhere. So it's
definitely surprising when any sort of technological development
happens in Brazil: it implies someone who stayed managed to get
something done, most likely for much less than what that
something is actually worth, while also being crushed by
extremely high taxes that essentially doubles the cost of
computer hardware.
red-iron-pine wrote 1 hour 58 min ago:
> extremely high taxes that essentially doubles the cost of
computer hardware.
I think people are missing the last few words -- cost of
computing hardware
when I used to do ISP work I did a lot for LATAM. The joke was
that you'd get better bandwidth for Brazil routing out of the
country and through Miami than going across the country. The
reason? crazy high tariffs on hardware.
No reason to base anything locally, and if you're not basing it
locally then there isn't really much reason to stick around,
either. Go to other hot markets like Zona America, Austin,
CDMX, Miami, Los Angeles, etc. and make the big $$$.
I worked with 2 Brazilian engineers who were in country (and
currently work with a 3rd now, based in Monteal) and they were
very good but all said they had to get out of country to lock
in the serious engineering roles.
jdahlin wrote 6 hours 50 min ago:
Brazil has the opposite of high taxes, especially for company
owners. I remember paying 6% on income, compared to up to 70%
in Sweden.
rbanffy wrote 8 hours 42 min ago:
> extremely high taxes
I always find this funny. Brazilian taxes are nowhere near what
I would say âhighâ. I pay about twice as much out of my
compensation as I would pay in Brazil, and that would be as if
I did zero tax optimisation back then.
persedes wrote 33 min ago:
Parent was referring to the cost of hardware. I've had
colleagues from brazil visit the US and go absolutely crazy
at best buy to grab as much hardware as they could (laptops,
nintendo switch, etc), because it's prohibitively expensive
for them to buy that at home.
rglullis wrote 7 hours 4 min ago:
As an employee: your taxes are not that high, but public
services are terrible so most of middle-class ends up paying
for the private alternative as well.
As a business owner: not so bad if you are a freelancing or
just a few business partners providing some type of service,
but terrible the moment you start considering employing other
people.
rbanffy wrote 6 hours 48 min ago:
> but public services are terrible
Have you seen the public services of countries with lower
taxes? Their public hospitals?
> but terrible the moment you start considering employing
other people.
Employing people isn't cheap anywhere (except, perhaps, in
the US, where labour rights are kind of nonexistent)
rglullis wrote 6 hours 24 min ago:
I live in Germany. No such thing as public hospitals. And
I pay close to 1200â¬/month in health insurance to the
public insurance company.
I quick visit to the dermatologist to check for some tiny
bumps that showed up in my forehead: 60â¬, out of
pocket, because the insurer doesn't cover it.
rbanffy wrote 1 hour 36 min ago:
Sad to hear about that. Ireland is much better in that
regard - you can pay for private healthcare and it'll
provide you a broader network, but you might as well go
for public health, where you'll be prioritized based on
how life-threatening is your condition.
rglullis wrote 1 hour 11 min ago:
Yeah, I make it sound worse than it seems. The
problem of the public insurance is that you pay based
on your revenue instead of your actuarial risk, so in
the end it should be treated as an extra form of
revenue tax. I could go for the private insurance if
I wanted to pay less, but then I'd have to switch my
kids to the private insurer as well.
All in all, my point was only that the amount of
taxes that people pay and quality of services are not
necessarily related. Germany has high taxes and
expensive-but-adequate healthcare. Greece has high
taxes and expensive-and-inadequate healthcare.
Switzerland has low taxes and universal/cheap
healthcare (max. $5000/year deductible, max charge
per hospitalization of $700).
fabioz wrote 7 hours 37 min ago:
I can second this.
Compared to many countries Brazil doesn't have such high
taxes (I'd say that if you work remotely for a company
outside of Brazil, you'll probably have much lower taxes
compared to almost any other country -- working locally the
difference isn't as big, but you have higher taxes in many
other places).
What it really lacks is access to capital (which is the real
"mojo" of the US compared to the rest of the world).
iterateoften wrote 2 hours 27 min ago:
Also the bureaucracy, employee rights, etc.
Incorporating and getting a functional business entity in
Brazil is harder. In USA I literally do in 5min online
including bank account. In Brazil they are taking out
microscopes to verify your signature on the paperwork
matches.
And in the USA if you have one bad employee, just fire them
any time. In Brazil for better or for worse nowhere near as
easy. Obviously better for employees but businesses donât
like it because you can get stuck with a employee dragging
down everyone unless you pay them a years salary etc.
cscheid wrote 18 hours 22 min ago:
Yes! That "prefeitura do Rio" huggingface URL is definitely
shocking to read to this Brazilian as well (I'm assuming you and
parent also are from your usernames).
throwa356262 wrote 19 hours 55 min ago:
Regarding #2
HTML [1]: https://news.ycombinator.com/item?id=48529544
xiphias2 wrote 12 hours 9 min ago:
This should be at the top: they uploaded the wrong model, they
fixed it
jwitthuhn wrote 9 hours 22 min ago:
They did upload the wrong model but as of the time of writing
they have not fixed it. Right now, 12 hours after they took the
old one down, there is simply no model present in their
huggingface repo.
xiphias2 wrote 7 hours 44 min ago:
I guess they will upload it later, it seems like an honest
mistake to me.
Anyways SwiTransformer paper looks interesting and doing a post
training to optimize for it looks interesting as well.
delusional wrote 20 hours 40 min ago:
It's absolutely insane to me that we are now at a point where the top
of the front page of hacker news is a random GitHub issue about
attribution to some random LLM merge, written in just the most
disgusting AI slop style.
I would like to downvote this please.
vor_ wrote 15 hours 24 min ago:
There's been a noticeable drop in quality. It's often a blend of AI
culture war posts and arbitrary Github links.
Havoc wrote 21 hours 34 min ago:
Nex in turn is also based on qwen so donât think theyâre too far
off
diego_moita wrote 21 hours 49 min ago:
WHAT!? There are thieves in Rio de Janeiro?
Oh, I am so SHOCKED, so SHOCKED! /s
Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de
bandido" (Gangster's Land).
Kinda like Chicago in the 20's or Naples and Palermo in the 90s.
jordz wrote 21 hours 57 min ago:
Can someone please explain or link to some information about how models
are merged? Is this genuinely merging weights mathematically or some
kind of distillation (presumably not if theyâve done zero training as
the post suggests).
jxmorris12 wrote 10 hours 19 min ago:
Thereâs nothing to read.
Model A: A_1, â¦, A_n
Model B: B_1, â¦, B_n
C_i = A_i * p + B_i * (1 - p)
In other words, itâs just a linear combination of the other
modelsâ weights, per position.
joe_the_user wrote 9 hours 56 min ago:
It's been a while since I looked at neural networks in detail. Do
all the large models have a close enough architecture that this
makes sense? Do they have the same number of layers and width? I
had thought that each model it's own "secret sauce" of normal and
special layers (convolution, max-pooling, something-something)
stacked together. Genuinely curious.
calebkaiser wrote 21 hours 51 min ago:
This is a good starting point: [1] But yes, in general, merging
refers to techniques that directly blend the weights of different
models mathematically. It had a big moment of popularity ~2 years
ago, with many so-called "Frankenmodels" popping up on leaderboards.
I tend to think of merging as belonging to the same general umbrella
as things like "abliteration", or other techniques that surgically
modify the weights of a model without a traditional training/tuning
loop. Maxime Labonne is a great person to follow if you're interested
in this general area.
HTML [1]: https://huggingface.co/docs/peft/developer_guides/model_merg...
hintymad wrote 22 hours 7 min ago:
> Every weight tensor in Rio is, to thousands of standard deviations,
the same 0.6/0.4 blend of Nex and Qwen â across all 60 layers and
every component of the network. Other finetunes cannot be explained as
interpolations.
I find it amazing how robust the current deep learning models are. A
simple linear combination of every weight did not degrade the
performance of the model, but enhanced it.
Davidzheng wrote 10 hours 22 min ago:
it's interesting that this was even guessed at
Davidzheng wrote 10 hours 19 min ago:
ok I guess they had other clues then if you do any sort of
comparison vs Nex & Qwen probably a lot of weird coincidences will
show up if somehow the three weights are not linearly independent
lol
itkovian_ wrote 11 hours 31 min ago:
This is called linear mode connectivity and seems to work for almost
every large model. So well that in most cases itâs an explicit part
of the training process; do many training âbranchesâ then merge
then continue.
It is not understood why it works so well.
teravor wrote 9 hours 36 min ago:
is that actually how they train them in the datacenter? the
trillion sized weight vector gets cloned and sent off to groups of
GPUs and averaged after?
tarruda wrote 17 hours 41 min ago:
What I find fascinating is the idea that there might be a set of
"secret" tweaks that when applied to those weights (or even smaller
models) could result in an intelligence simulation that could vastly
surpass even something like Fable.
moritzwarhier wrote 19 hours 40 min ago:
If this is true, it really would be impressive.
themafia wrote 20 hours 30 min ago:
> A simple linear combination of every weight did not degrade the
performance of the model, but enhanced it.
Which could be a signal that your "performance" was so abysmal in the
first place that even randomly applied training methods can't make it
_worse_.
kristjansson wrote 20 hours 38 min ago:
HTML [1]: https://thickets.mit.edu
meindnoch wrote 21 hours 0 min ago:
It shows that LLMs are an extremely wasteful approach to
intelligence.
antonvs wrote 13 hours 6 min ago:
Compared to what?
kristjansson wrote 20 hours 36 min ago:
or that intelligence is merely the composition of many redundant,
lossy, ~random components
Aurornis wrote 21 hours 9 min ago:
> A simple linear combination of every weight did not degrade the
performance of the model, but enhanced it.
Enhanced it on a couple benchmarks, supposedly.
The game is to turn knobs until you get a benchmark run that shows an
improvement, then ship it. There are a lot of fine tunes and chimera
models on HuggingFace that are supposedly better at some specific
test, but when you use them for anything else they're usually worse.
This happens with a lot of the models that are modified to remove
censorship. They succeed in getting the model to emit previously
censored outputs, but the overall output quality decreases.
monster_truck wrote 17 hours 51 min ago:
I don't think your last point is correct. Ablation, when done
correctly, seems to increase the quality and typically also the
performance too.
antonvs wrote 10 hours 32 min ago:
I'm curious about where you got that idea from. Neither the
theory nor the available examples support it. If it did, everyone
knowledgeable would be using abliterated models.
tredre3 wrote 13 hours 13 min ago:
That is something often claimed by heretics. My experience
couldn't diverge more, however. All heretic (and abliterix)
models I've tried are worse than the original. It's not
immediately obvious if all you do is ask 2-3 questions and marvel
at how it didn't refuse, but try using them for real over longer
8k+ contexts and it falls apart real fast.
They're more prone to getting stuck in loops, becoming
unresponsive, and hallucinating more (presumably because of the
reduced desire to not answer).
I've tried all the popular heretic peddlers, but if you have one
that you can vouch for maybe I've simply missed it.
Aurornis wrote 16 hours 46 min ago:
Abliterarion is a brute force technique that removes or silences
parts of the model. It reduces performance because the
abliterated elements arenât perfectly isolated to censorship so
other aspects suffer.
Many of the âuncensoredâ model providers also do some fine
tuning on the models. Some of them target better benchmarks or
other measures, but outside of the benchmarks and metrics
theyâre fine tuned for they are generally noticeably worse than
the original model.
yowlingcat wrote 15 hours 12 min ago:
The kind of abliteration you are mentioning is no longer state
of the art or the most common form of removing the refusal
layer in most models. Your your understanding was up to date
about a year and a half ago, but has been out of date since
after that.
avadodin wrote 7 hours 13 min ago:
What OP is describing wasn't called abliteration at all.
Abliteration whilst a neologism implies a surgical ablation
of refusal.
Earlier approaches postâtrained the model to refuse less
and, much like other kinds of fineâtuning, it degraded
performance. They were "uncensored".
Abliteration has seen some improvement to this day but it
always was close to equivalent performance to the original
when compared to those earlier techniques.
weitendorf wrote 11 hours 24 min ago:
Unrelated but Iâve been putting off learning about
post-abliteration technique and want to use it for an
upcoming open source âretrainingâ project I have on my
backlog. Iâm not interested in the refusal layers though,
more like deep fine tuning but in a way that might let me
prune out or consolidate layers, if that makes sense? Do you
have any pointers or links to the current SOTA in this area?
I guess Iâm looking for a kind of bulk/sticky dropout
(which was in fashion way back when I studied DNN in school).
ls612 wrote 14 hours 15 min ago:
Nowadays it is that Heretic tool is it not? Iâve seen Gemma
models uncensored with it.
manquer wrote 18 hours 23 min ago:
> game is to turn knobs until you get a benchmark run that shows
an improvement, then ship it
i.e reinforcement learning against a weak reward function -
benchmark is insufficiently complex and is not representative of
the real world sufficiently.
The "game", i.e. decision tree can be modeled as a multi-arm bandit
problem, to deploy finite resources ( compute) toward
exploitation/exploration .
The main issue is each training / fine-tune is very expensive so
number of chances at the slot so to speak is pretty limited today.
andai wrote 20 hours 50 min ago:
They seem to have deleted most of the README now, but the archived
version has benchmarks. [1] And the Nex benchmarks for comparison
[2] Rio seems to be about halfway between Qwen 3.5 and Nex, as
you'd expect?
HTML [1]: https://web.archive.org/web/20260614082641/https://hugging...
HTML [2]: https://huggingface.co/nex-agi/Nex-N2-Pro
x312 wrote 21 hours 10 min ago:
This works because Nex itself is a finetune of Qwen3.5 ( [1] ). It's
merging Qwen3.5 with a Qwen3.5 finetune.
I don't believe this would work on two LLMs that have different
pretraining. Even if it did you would need two LLMs that have exact
same internal activation shapes, dimensions, expert counts, token
vocabulary, realistically it would never happen outside of finetunes
or academic experiments.
HTML [1]: https://huggingface.co/nex-agi/Nex-N2-Pro
hashmap wrote 19 hours 32 min ago:
not this exact thing, no, because the functional circuits dont
appear in the same places across models. but if you find where they
are you can do something like branch between some of the middle
functional circuits between models and it kinda just works, or even
do one after the other. you cant just like swap any two layers
cause a bunch of em bend hyperbolic curvature to do hierarchical
stuff deep in the poincare ball and the geometries get all bonkers,
but before and after they do that things are relatively flat, and
the geometries are more or less transferrable up to rigid rotation
if they're each trained on large enough data.
oofbey wrote 19 hours 53 min ago:
Correct. We used to think that because NN optimization is
non-convex there are all these local minima. Now we know that once
you get past the very early parts of training from random init, the
loss surface is fairly smooth, and not really convex, but close
enough in a bunch of ways - linear combinations of trained models
are pretty much always valid combinations. You can think of fine
tunings as deltas on the original model which can be summed
together successfully. I think this paper first showed that to me:
[1] which was 8 years ago now.
HTML [1]: https://arxiv.org/pdf/1802.10026
woadwarrior01 wrote 21 hours 46 min ago:
It's is a well known idea[1], although it's still surprising that
something as simple, even works.
[1]
HTML [1]: https://arxiv.org/abs/2203.05482
kolanos wrote 21 hours 21 min ago:
This team could have stopped here and still had something
interesting (albeit not novel) to show. But the hype cycle was too
tempting.
jrm4 wrote 22 hours 36 min ago:
âWell, Steve (Jobs), I think itâs more like we both had this rich
neighbor named Xerox, and I broke into his house to steal the TV set,
but I found out that you had already stolen it.â
-- Bill Gates
ckcheng wrote 22 hours 2 min ago:
Whatâs more funny to me is the set up to that quote:
> Bill Gates had somehow manifested, alone, surrounded by ten Apple
employees. ⦠Steve started yelling at Bill, asking him why he
violated their agreement.
And whatâs more interesting is the conclusion:
> Apple filed a monumental copyright lawsuit against Microsoft in
1988, but they eventually lost on a technicality (the judge ruled
that Apple inadvertently gave Microsoft a perpetual license to the
Mac user interface in November 1985).
Microsoft didnât steal Appleâs GUI ⦠Apple gave it to them.
themafia wrote 20 hours 26 min ago:
Two spoiled rich kids arguing over who's morality is the least
worst.
That this moment is held up as some great exchange in business is
annoying. That our regulatory agencies are perennially sleep at
the switch and allow this nonsense to keep happening is extremely
frustrating.
ChrisClark wrote 19 hours 45 min ago:
Held up as some great exchange? No it's two assholes arguing
with each other. Just like most Jobs documentaries show him as a
terrible person.
alexgoodhart wrote 20 hours 36 min ago:
That isnât fully true is it?
Microsoft claimed that its softwareâs use of various
visualizations related to window state was covered by the 1985
agreement, and Apple claimed that this was not true; those window
states were produced by Macintosh while Microsoftâs software was
being rendered in the Mac environment.
> In his March 20, 1989 Order, Judge Schwarzer declined to consider
whether the visual displays in issue were generated by the
Microsoft application programs or by the Macintosh system software.
The point arose in connection with Microsoft's argument that the
1985 Agreement licensed to Microsoft all visual displays that could
possibly be called up by running the five Microsoft application
programs on the Macintosh system software then or in the future.
709 F. Supp. at 929. Judge Schwarzer concluded that Microsoft's
contention would "defy common sense." Id.
wunderlotus wrote 22 hours 19 min ago:
lmao i really hope this is a real quote cuz itâs a banger
ckcheng wrote 22 hours 9 min ago:
Apparently:
HTML [1]: https://www.folklore.org/A_Rich_Neighbor_Named_Xerox.html
yieldcrv wrote 23 hours 0 min ago:
Didnât the last thread about this have someone from the lab or an
enthusiast in Rio saying exactly that?
Its a fine tune of Qwen
Not a conspiracy
daemonologist wrote 22 hours 41 min ago:
The allegation here is that it's not actually a fine-tune of Qwen,
but instead an undisclosed mashup (merge) of someone else's fine-tune
of Qwen and the original model. Rio subsequently said that the model
was in fact a merge, that they did additional fine-tuning after the
merge, and that they accidentally uploaded the base merge instead of
the version with additional fine-tuning. But this seems like quite
an oversight...
yieldcrv wrote 21 hours 4 min ago:
> But this seems like quite an oversight...
Not to me, what would people like to happen? Who are those people?
And why do they care?
antonvs wrote 10 hours 7 min ago:
They made a public claim to having produced a useful model, which
they published. Turns out they did nothing of the sort.
> why do they care?
Why does anyone ever care about having their time wasted by
fraudulent claims?
yieldcrv wrote 5 hours 19 min ago:
Continue to explain like Iâm 5 instead of the rhetoricals
fkozlowski wrote 23 hours 3 min ago:
I'm honestly surprised that they even had the inclination to attempt
creating a model. I guess it's bullish that a municipal IT department
had the guts to try this?
axus wrote 21 hours 18 min ago:
I like the [dead] comment theory that they proposed a huge LLM
training budget to the government, kept most of the money, and
released a cheap merge to justify the grift.
fkozlowski wrote 15 hours 52 min ago:
Ah that makes sense
dormento wrote 17 hours 43 min ago:
This would be so very brazilian of them.
Source: am Huelander.
seba_dos1 wrote 19 hours 1 min ago:
It's kinda weird to claim extraordinary results in such case
though, as that brings a lot of eyes to it.
mgambati wrote 18 hours 3 min ago:
Nothing weird. The mayor wanted something brag about. That Rio,
my friend.
matheusmoreira wrote 19 hours 11 min ago:
That's essentially Brazil's standard operating procedure. Wouldn't
be surprising if that turned out to be the case.
Still, I'm actually impressed that this even happened at all. "Rio
de Janeiro's homegrown LLM" is the last headline I expected to read
on HN.
Havoc wrote 21 hours 33 min ago:
Merges and fine tunes are within reach of individuals with some money
to burn so Iâm sure a muni can do it
MadrasTh0rn wrote 23 hours 7 min ago:
Not surprised
nom wrote 21 hours 54 min ago:
why not?
diego_moita wrote 21 hours 46 min ago:
It is a recurrent Brazilian meme: Rio is known in Brazil as "terra
de bandido" (gangster's land).
The majority of their politicians have ties to organized crime.
There is a virtual revolving door between police and crime, where
people migrate from one to the other.
It is like Chicago in the 20s, Naples and Medelin in the 80s or
Moscow and Culiacan (Sinaloa, Mexico) today.
dormento wrote 17 hours 40 min ago:
Rio is kinda funny as a litmus test - federal government creates
laws to try and curb some of the corruption, and Rio produces
better and better corrupts - so far Rio is winning.
BTW wasn't it a few months ago the current governor wanted to
leave to be able to run as a candidate, so he asked a supreme
justice to step in in as governor, since there wasn't anyone else
that technically could?
brunoarueira wrote 13 hours 17 min ago:
No, he left to be a Senate candidate and their vice governor
left in 2025 to another role, then the next in line is the
Legislative Assembly of the State of Rio de Janeiro president,
but him was jailed and away from the role. So the next is a
judge from the Justice Tribunal.
alexgoodhart wrote 20 hours 35 min ago:
Somehow I doubt that political affiliations with crime syndicates
are affecting heavily the dispositions of LLM developers. The
industry itself though is one of incest.
sebastianconcpt wrote 18 hours 3 min ago:
Politicians don't come from outer space, they emerge locally
and were raised swimming in an imaginary that has normalized
the morals that eventually end up expressed at the top.
afh1 wrote 18 hours 38 min ago:
He is putting into question the character of the public workers
involved in the project, not that it has anything to do with
organized crime. Rio has relapsed into crime in the last
decades and government workers in general have a reputation for
corruption in Brazil. It's a low trust society specially north
of Parana hence the lack of surprise.
ekjhgkejhgk wrote 23 hours 7 min ago:
One funny thing about incompetence is that they don't have the
competence to know that their incompetence is straightforward to verify
by a competent person.
thimabi wrote 22 hours 37 min ago:
I wouldnât describe what happened here as incompetence. As a
âcariocaâ, I am pleasantly surprised to know that the
governmentâs IT department is involved in AI work â even without
the budget to create its own models from scratch.
antonvs wrote 10 hours 11 min ago:
They could do AI work without trying to lie to the entire rest of
the world.
reese_john wrote 20 hours 23 min ago:
It is a testament to the bloat and overreach of the Brazilian state
in the economy. Such endeavors should be left to the private sector
thimabi wrote 17 hours 31 min ago:
I disagree. Iâd prefer if my government invested more in AI
solutions, so as not to depend so much on foreign technology.
In an ideal world, Brazil would have a thriving private sector,
capable of competing even in the AI sector. Unfortunately,
thatâs not the case, and I believe that without government
action such endeavors wonât really succeed.
arcticfox wrote 22 hours 30 min ago:
This seems kind of insane though, every time I go to Rio I think of
the potential of AI/technology to solve some problems and leave it
even more paradisiacal... But working on their own model? Wtf?
There are a million applications of existing ones there that should
be followed up on instead.
carlosjobim wrote 22 hours 50 min ago:
Why would they care? They get their salaries and pensions and
bonuses, and the tax payer is footing the bill.
root-parent wrote 23 hours 2 min ago:
You just described every single vibe coder...
vvpan wrote 19 hours 46 min ago:
I think that's unfair to "vibe coding". If anybody explicitly
claims to vibe coding something than they are admitting to low
supervision of the code. And on the contrary you can also
AI-produce code that you have supervised highly. I suppose there
are people who both AI their code and push it as bespoke but I, for
one, have not met such a person at our outside of work.
root-parent wrote 19 hours 3 min ago:
>> but I, for one, have not met such a person at our outside of
work.
HTML [1]: https://news.ycombinator.com/item?id=48516679
alfiedotwtf wrote 23 hours 14 min ago:
Wasnât it already obvious given the awfully familiar parameter
numbers?
intoXbox wrote 21 hours 29 min ago:
That only tells what base architecture they used, but fine tuning
does not increase the number of weights, it just adapts the weights
to improve better on a fine tuning dataset- something they claimed
they had done
zinodaur wrote 23 hours 20 min ago:
Oh no, someone is profiting off of their work without proper
attribution!?!?
s1artibartfast wrote 14 hours 23 min ago:
How do you feel about the government or government contractors saying
they did a bunch of work when they did nothing instead?
Aurornis wrote 22 hours 27 min ago:
This is an open weights model based on other open weights models.
The dispute is that they released it with claims about having done
some post training that improved the outputs. It was discovered that
the model was not post trained like they claimed.
The HF page now says itâs a merge of models, which wasnât there
before. Theyâre trying to claim they accidentally uploaded the
wrong model to HF and that theyâll upload the real one soon.
Basically, they thought they could splice two open weights models
together and claim their team had accomplished some amazing post
training, but they werenât smart enough to realize that other
researchers would discover that there wasnât any post training.
iknowstuff wrote 22 hours 10 min ago:
How do they just splice two models together?
ninja3925 wrote 21 hours 56 min ago:
Out of curiosity, how was it discovered? You would have to look
for it to find this linear combination.
jdiff wrote 20 hours 52 min ago:
Without the system prompt, asking its name results in it
responding with the name of the model they're ripping from.
That would certainly draw your eyes to the right places.
valleyer wrote 20 hours 43 min ago:
Why is this? Do labs reinforce the model name during
training? I was under the impression that this sort of
"self-knowledge" always came from the system prompt, but I
guess not...
jdiff wrote 19 hours 48 min ago:
Yes. In this case, during fine tuning. Other blurbs are
also baked in during fine tuning that are perfectly
reproducible from the Nex model. The details inside the
linked issue are quite accessible.
Aurornis wrote 21 hours 38 min ago:
Check the linked GitHub issue. They explain their process.
Scroll past the first issue to find it. Itâs further down.
Aurornis wrote 22 hours 4 min ago:
The Nex N2 model they merged is based on Qwen 3.5, so you can
swap pieces of one into the other. They found a combination of
the two that did well on some benchmarks and shipped it.
In the early days of Llama there were a lot of experiments like
this. There were even some interesting combinations of models
where they stacked layers of different models together or even
added more layers with interesting results.
But announcing that you spliced two models together isn't very
impressive in 2026, so they announced that they had done their
own post training and outdid the big labs. They thought nobody
would look close enough to notice.
moritzwarhier wrote 22 hours 22 min ago:
Thanks for the factual clarification. This is so important when
everyone already has their trigger finger on politics. Not meaning
that politics are irrelevant here, see sister comment by jobim.
But it's impossible to form a nuanced opinion when political
association has a higher priority than the facts; which, again,
don't look flattering for the implementers.
carlosjobim wrote 22 hours 53 min ago:
This is a pure scam on tax payer money. But what else would be
expected?
hootz wrote 20 hours 59 min ago:
Apparently no public money was involved.
jdiff wrote 20 hours 50 min ago:
This is contrary to the mayor's words on Twitter.
> An open AI model trained in Rio with public funding over the
last year by @Prefeitura_Rio surpassing all other models.
HTML [1]: https://x.com/CavaliereRio/status/2065984620626129026
jrm4 wrote 22 hours 38 min ago:
Unlike the big companies who do this, which often are merely impure
scams on tax payer money a little more downstream.
philipallstar wrote 21 hours 57 min ago:
Companies that generate loads of corporation tax, income tax, and
VAT revenue are the exact opposite of wastes of public money.
jrm4 wrote 20 hours 36 min ago:
Yes, when they do so proportional to what they take, especially
as compared to individuals and their tax liabilities.
You'll have to let me know when that finally happens, because
that ain't now.
philipallstar wrote 11 hours 26 min ago:
Sorry, I've no idea how to read your first sentence.
Your second one - that's how everything public is paid for.
Private individuals pay tax, either through their
corporations paying corporation tax or the tax bill on top of
their wage bills, which a) drives up prices of the goods and
services they offer, or depresses wages, and b) funds all the
public sector employees and orgs that don't pay tax (orgs) or
don't pay net tax (employees).
carlosjobim wrote 22 hours 27 min ago:
Great, now we're defending embezzlement and fraud with public
funds on HN, because we really really hate big business.
A child caught doing something bad will cry "but my friends also
did it!", is that the level of reasoning hackers want to be at?
lostlogin wrote 21 hours 56 min ago:
> Great, now we're defending embezzlement
I might be missing something, but I donât see anyone
defending the the scams.
sdevonoes wrote 22 hours 2 min ago:
There are no hackers around here anymore. HN is mainly about
business nowadays
dmix wrote 21 hours 28 min ago:
HN has always discussed business
blanched wrote 22 hours 20 min ago:
That seems like a bad faith read to me. Nobody is defending it,
just pointing out the irony / hypocrisy. Two things can be bad,
and they can be related.
carlosjobim wrote 16 hours 41 min ago:
You'd be surprised to hear then that I'm not the owner of any
big company which embezzles tax payer money, and have never
been involved in such.
blanched wrote 16 hours 20 min ago:
I donât follow how that makes sense as a response to what
I said?
carlosjobim wrote 16 hours 8 min ago:
Why would I be a hypocrite for pointing out public fund
embezzlement?
blanched wrote 16 hours 6 min ago:
Youâre not. The originally mentioned âbig
companiesâ are.
jrm4 wrote 22 hours 23 min ago:
What part of that said "defense?"
They can both be bad.
bachmeier wrote 22 hours 58 min ago:
"Their work"? First you had the original content creators that did
99.99% of the work. Then you had the US companies bundle it up into a
frontier LLM. Then "they" did the "work" of using the US model as a
foundation for their own. So in the sense of doing 0.00001% of the
actual work that went into their product, sure.
I'd say it's more like someone forking a Linux distro, adding a few
themes and fonts, and then complaining when someone else forks their
distro and adds another theme.
idiotsecant wrote 22 hours 44 min ago:
Oof this is delete your post level I think. Sorry bud, I been
there.
JoshStrobl wrote 22 hours 48 min ago:
That joke really went over your head, huh...
bwilliams18 wrote 22 hours 51 min ago:
That was the joke of the parent comment.
harikb wrote 22 hours 51 min ago:
It is only a problem if you claim it to be an independently
developed OS with no attribution to base
dghlsakjg wrote 22 hours 52 min ago:
Thatâs the joke.
bachmeier wrote 21 hours 13 min ago:
It isn't. The entirety of the comment I responded to is "Oh no,
someone is profiting off of their work without proper
attribution!?!?" It's a valid point, but references someone using
content created by others for profit. I'm objecting to equating
this project with the work done by the original content creators.
They're not remotely the same thing.
I understand how the internet works and how people respond to
others in this type of setting, but the comment I replied to did
not in any way make the point I was making about the
disproportionate nature of relative contributions.
vasco wrote 10 hours 29 min ago:
> I understand how the internet works and how people respond to
others in this type of setting,
You should frame this as a reminder to be more charitable in
your positions because sometimes you can be wrong. This
subthread ended being one of the funniest I've read recently.
dghlsakjg wrote 13 hours 18 min ago:
> It isnât
It is.
> I understand how the internet works and how people respond to
others in this type of setting, but the comment I replied to
did not in any way make the point I was making about the
disproportionate nature of relative contributions.
Do you understand?
Jokes arenât that funny when you have to dig into an
explanation on the nuance of why the hidden meaning doesnât
match the surface meaning in exact degree and proportions. That
turns a joke into a pedantic comment. And paradoxically muddies
the point by explaining it.
We arenât morons. We understand that Picasso is doing
something on a different level than someone feeding bulk
scraped JPGs of paintings into a python script. You really
donât have to explain.
bachmeier wrote 3 hours 32 min ago:
Have a nice day.
idiotsecant wrote 18 hours 8 min ago:
It's time to stop digging
internet2000 wrote 23 hours 18 min ago:
Attribution isn't the relevant part. Lying about your lab's
capabilities is.
themafia wrote 20 hours 29 min ago:
It seems to me like the lies are both for the same reason. To
capture attention and profits that are not deserved.
vips7L wrote 21 hours 0 min ago:
Sounds like the whole AI movement.
outside2344 wrote 22 hours 28 min ago:
But the whole game is lying and stealing isn't it?
adrian_b wrote 22 hours 58 min ago:
I do not see anyone lying.
The model card says:
> Post-trained from Qwen 3.5 397B
The model card also says that they use an inference framework based
on "SwiReasoning: Switch-Thinking in Latent and Explicit for
Pareto-Superior Reasoning LLMs" by Shi et al.: [1] So the sources
seem properly attributed.
They only claim that what they did to "Qwen 3.5 397B" has improved
the LLM, including, as expected, with "strong performance in
Portuguese".
HTML [1]: https://arxiv.org/abs/2510.05069
petu wrote 22 hours 27 min ago:
That's attribution to Qwen team.
There (is/was) no attribution to Nex team (they've released a
model based on Qwen 3.5 397B as well).
As per OP link Nex claims that what Rio team released (so far) is
just linear interpolation of weights between Nex and OG Qwen
model. With no attribution to Nex and zero signs of Rio doing any
training of their own.
00index wrote 22 hours 31 min ago:
Are you talking about the credit that was just updated an hour
ago? lol
functionmouse wrote 22 hours 58 min ago:
leopards ate my face
Planktonne wrote 23 hours 9 min ago:
That's also something all the AI companies have been doing.
low_tech_love wrote 20 hours 44 min ago:
Theyâre using public money to âtrainâ this.
dofm wrote 22 hours 52 min ago:
Lying about model capability is right now the lingua franca of
the cloud AI business model, almost; they yes-and each other's
lies because they are in a position of needing to generate
interest, including going as far as needing to trigger regulatory
capture.
(It's not news to anyone who has worked in sales-led businesses
that salespeople are prone to believing the claims of other
salespeople, I guess).
selcuka wrote 16 hours 51 min ago:
> Lying about model capability is right now the lingua franca
of the cloud AI business model
Lying about your lab's capabilities != Lying about model
capability
Exaggerating the capabilities of a new model that you've
actually trained in press bulletins can be called marketing.
Merging two models and claiming that you trained a new model is
plain lazy.
AlienRobot wrote 23 hours 20 min ago:
The model's webpage at [1] says it's a merge now. It previously didn't
contain this paragraph:
>The model is built via a merge of [2] and [3] , proceeded by On-Policy
Distillation from a stronger model. We detected an incorrect upload in
the previous version, where the base merged version was upload instead
of the final distilled model. We are sorry for the confusion and
apologize profusely.
Incidentally are people using Github issues as blogs now?
HTML [1]: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B
HTML [2]: https://huggingface.co/nex-agi/Nex-N2-Pro
HTML [3]: https://huggingface.co/Qwen/Qwen3.5-397B-A17B
jonchurch_ wrote 22 hours 13 min ago:
Edit: I didnt even notice until someone pointed out this was on the
Nex-n2 repo not the rio one, now I understand the OPâs confusion!
It wasnt framed as an issue which is the norm breakage I think
youâre reacting to, as in they didnt ask that the readme be updated
etc, but it is common now for folks to use a projectâs issue
tracker to name and shame them in a place they cant easily ignore.
Whether thatâs right, prosocial, or professional is up for debate
(as well as if any single definition of etiquette can be expected in
2026 on an issue tracker).
But surely you can see the optics reason why someone would take their
complaint to the repo directly? It pressures the maintainers to
respond, it allows for a pile on from the internet, and makes any
decision to lock down a hostile thread into its own kind of
statement.
The maintainers should absolutely post an official response and lock
the thread though, it will likely get ugly in there.
ChoosesBarbecue wrote 21 hours 38 min ago:
But this is posted on Nex's GitHub, not on "Rio de Janeiro's"
GitHub.
i.e. this is the maintainer posting on their own GitHub Issues.
AnotherGoodName wrote 23 hours 27 min ago:
This is fascinating that it worked though. Can we just merge all the
open weight models and get something better?
vor_ wrote 15 hours 27 min ago:
Merging related models has been a very common practice for years. See
the Stable Diffusion community.
nylonstrung wrote 22 hours 1 min ago:
If you go to Civitai this is pretty how it works in that corner of
the image generation world
Everything is using Stable Diffusion as underlying model, then most
of the usage is merged of checkpoints
avereveard wrote 22 hours 54 min ago:
most merge improve a small subset of "feeling" benchmark (too small,
too specific, or out of distribution) and tend to show degradation on
actual benchmark, with especially punishing result on long chain
benchmarks.
also only work on matching architectures (i.e. finetunes/loras of the
same model)
dindunuf wrote 23 hours 0 min ago:
that kinda worked in llama 1/2 era, not between different models but
between finetunes of the same model. the briefly legendary Mythomax
was IIRC a merge of 5+ tunes, some of which were merges themselves.
wds wrote 23 hours 16 min ago:
I imagine it'd work the same as merging all the good-tasting foods to
get an even tastier one
_3u10 wrote 23 hours 21 min ago:
No, they need the same arch, but you can distill them into a single
model. And yes, if you use the API directly Claude will often say
itâs an open weight model (likely the ones it was distilled from)
unrvl22 wrote 1 day ago:
The municipality of Rio de Janeiro (via its IT company IplanRIO)
released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune
that beats comparable open models on benchmarks. The linked issue
argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40%
Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.
vasco wrote 10 hours 36 min ago:
Rio better have the best IT infrastructure and software in the world
if they are spending time on LLMs. What a waste of tax payer money.
vitorgrs wrote 9 hours 36 min ago:
Piaui state it's also doing a LLM it seems. But indeed it would
make more sense if it was a national thing rather than local...
DonsDiscountGas wrote 22 hours 25 min ago:
I didn't know model merging like that was possible. (Obviously
possible from a pure software standpoint but I'm surprised it's
effective)
baobabKoodaa wrote 2 hours 52 min ago:
A few years back these used to be called "Frankenstein models"
hypercube33 wrote 12 hours 28 min ago:
Even merging models with themselves as shown here in the post how
they got to the top of hugging face with two gpus
bwhitty wrote 21 hours 22 min ago:
As another poster above linked, itâs been shown to be effective
since 2022:
HTML [1]: https://arxiv.org/abs/2203.05482
nightpool wrote 18 hours 32 min ago:
it works because Nex N2 is also a derivative of the original base
Qwen model. If it was two completely unrelated models it wouldn't
work.
Lucasoato wrote 22 hours 45 min ago:
So the problem isnât in the missing attribution to Qwen, but with
the fact that they didnât mention Nex-N2 Pro right?
Aurornis wrote 22 hours 25 min ago:
The problem is that they claimed to have made a big achievement
with their home grown post training, and they expected to receive a
lot of praise for it.
Then researchers looked at the weights and there is no post
training at all.
They are now attributing both models they merged, but their excuse
for the lack of post training is to claim they accidentally
uploaded the wrong files.
serial_dev wrote 21 hours 12 min ago:
Iâd believe they accidentally uploaded the wrong files if they
uploaded the correct ones. To state that they accidentally
uploaded something else and then not upload the correct version
means they probably do not have anything and either hope people
forget about this or they are scrambling to have something that
is at least close to their original claim.
evilduck wrote 16 hours 10 min ago:
"Oops, we uploaded the wrong files" is the standard deflection
every time people like this get caught.
Look up "Reflection 70B" drama.
DIR <- back to front page