_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
HTML Parse, Don't Validate and Type-Driven Design in Rust
ubixar wrote 10 hours 12 min ago:
C# gets close to this with records + pattern matching, F# discriminated
unions are even better for this with algebraic data types built right
in. A Result<'T,'Error> makes invalid states unrepresentable without
any ceremony. C# records/matching works for now, but native DUs will
make it even nicer.
barnacs wrote 11 hours 30 min ago:
Every time you introduce a type for a "value invariant" you lose
compatibility and force others to make cumbersome type conversions.
To me, invalid values are best expressed with optional error returns
along with the value that are part of the function signature. Types are
best used to only encode information about the hierarchy of structures
composed of primitive types. They help define and navigate the
representation of composite things as opposed to just having dynamic
nested maps of arbitrary strings.
mrkeen wrote 10 hours 13 min ago:
> They help define and navigate the representation of composite
things as opposed to just having dynamic nested maps of arbitrary
strings.
What would you say to someone who thinks that nested maps of
arbitrary strings have maximum compatibility, and using types forces
others to make cumbersome type conversions?
barnacs wrote 10 hours 8 min ago:
If the fields of a structure or the string keys of an untyped map
don't match then you don't have compatibility either way. The same
is not true for restricting the set of valid values.
edit: To put it differently: To possibly be compatible with the
nested "Circle" map, you need to know it is supposed to have a
"Radius" key that is supposed to be a float. Type definitions just
make this explicit. But just because your "Radius" can't be 0, you
shouldn't make it incompatible with everything else operating on
floats in general.
MarcLore wrote 11 hours 36 min ago:
This pattern maps beautifully to API design too. Instead of validating
a raw JSON request body and hoping you checked everything, you parse it
into a well-typed struct upfront. Every downstream function then gets
guaranteed-valid data without redundant checks. The Rust ecosystem
makes this almost painless with serde + custom deserializers. I've seen
codebases cut their error-handling surface area by 60% just by moving
from validate-then-use to parse-at-the-boundary.
qsera wrote 16 hours 37 min ago:
>"Parse, Don't Validate,"
Ah, that pretentious little blog post that was taken up by many as
gospel, for some mysterious reason...
This tells me that any idea, even if stupid (or obvious in this case)
ones, can go mainstream, if you rhyme it right.
That blog post is like a "modern art" painting by some famous author
that totally look a child has flingned some paint to a wall, but every
person who looks at it tries so hard to find some meaning that every
one end up finding "something"..
In fact the top comment right now express that their interpretation of
blog post is different from the author's...
Amusing!
mrkeen wrote 10 hours 9 min ago:
Was the original blog post wrong?
throw310822 wrote 11 hours 32 min ago:
I only half understand this stuff, but all this encapsulation of
values so that they are guaranteed to remain valid across
manipulations... isn't it called Object Oriented Programming?
AxiomLab wrote 17 hours 7 min ago:
This exact philosophy is why I started treating UI design systems like
compilers.
Instead of validating visual outputs after the fact (like linting CSS
or manual design reviews), you parse the constraints upfront. If a
layout component is strictly typed to only accept discrete grid
multiples, an arbitrary 13px margin becomes a compile error, not a
subjective design debate. It forces determinism.
Garlef wrote 13 hours 13 min ago:
curious: what kind of tooling would you use here?
unixpickle wrote 1 day ago:
The `try_roots` example here is actually a _counterexample_ to the
author's main argument. They explicitly ignore the "negative
discriminant" case. What happens if we consider it?
If we take their "parse" approach, then the types of the arguments a,
b, and c have to somehow encode the constraint `b^2 - 4ac >= 0`. This
would be a total mess--I can't think of any clean way to do this in
Rust. It makes _much_ more sense to simply return an Option and do the
validation within the function.
In general, I think validation is often the best way to solve the
problem. The only counterexample, which the author fixates on in the
post, is when one particular value is constrained in a clean,
statically verifiable way. Most of the time, validation is used to
check (possibly complex) interactions between multiple values, and
"parsing" isn't at all convenient.
sbszllr wrote 10 hours 42 min ago:
I was thinking a similar thing when reading the article. Often, the
validity of the input depends on the interaction between some of
them.
Sure, we can follow the advice of creating types that represent only
valid states but then we end up with `fn(a: A, b: B, c: C)
transformed into `fn(abc: ValidABC)`
the__alchemist wrote 1 day ago:
The examples in question propagate complexity throughout related code.
I think this is a case I see frequently in Rust of using too many
abstractions, and its associated complexities.
I would just (as a default; the situation varies)... validate prior to
the division and handle as appropriate.
The analogous situation I encounter frequently is indexing, e.g.
checking if the index is out of bounds. Similar idea; check; print or
display an error, then fail that computation without crashing the
program. Usually an indication of some bug, which can be tracked down.
Or, if it's an array frequently indexed, use a (Canonical for Rust's
core) `get` method on the whatever struct owns the array. It returns an
Option.
I do think either the article's approach, or validating is better than
runtime crashes! There are many patterns in programming. Using Types in
this way is something I see a lot of in OSS rust, but it is not my cup
of tea. Not heinous in this case, but I think not worth it.
This is the key to this article's philosophy, near the bottom:
> I love creating more types. Five million types for everyone please.
fph wrote 1 day ago:
The article quickly mentions implementing addition:
```
impl Add for NonZeroF32 { ... }
impl Add for NonZeroF32 { ... }
impl Add for f32 { ... }
```
What type would it return though?
WJW wrote 12 hours 15 min ago:
I imagine it would be something like Option, since -2.0 + 2.0 would
violate the constraints at runtime. This gets us the Option handling
problem back.
I think the article would have been better with NonZeroPositiveF32 as
the example type, since then addition would be safe.
alfons_foobar wrote 1 day ago:
Would have to be F32, no?
I cannot think of any way to enforce "non-zero-ness" of the result
without making it return an optional Result, and at that point we are
basically back to square one...
MaulingMonkey wrote 1 day ago:
> Would have to be F32, no?
Generally yes. `NonZeroU32::saturating_add(self, other: u32)` is
able to return `NonZeroU32` though! ( [1] )
> I cannot think of any way to enforce "non-zero-ness" of the
result without making it return an optional Result, and at that
point we are basically back to square one...
`NonZeroU32::checked_add(self, other: u32)` basically does this,
although I'll note it returns an `Option` instead of a `Result` (
[1] ), leaving you to `.map_err(...)` or otherwise handle the edge
case to your heart's content. Niche, but occasionally what you
want.
HTML [1]: https://doc.rust-lang.org/std/num/type.NonZeroU32.html#met...
HTML [2]: https://doc.rust-lang.org/std/num/type.NonZeroU32.html#met...
alfons_foobar wrote 1 day ago:
> `NonZeroU32::saturating_add(self, other: u32)` is able to
return `NonZeroU32` though!
I was confused at first how that could work, but then I realized
that of course, with _unsigned_ integers this works fine because
you cannot add a negative number...
hutao wrote 1 day ago:
Note that the division-by-zero example used in this article is not the
best example to demonstrate "Parse, Don't Validate," because it relies
on encapsulation. The principle of "Parse, Don't Validate" is best
embodied by functions that transform untrusted data into some data type
which is correct by construction.
Alexis King, the author of the original "Parse, Don't Validate"
article, also published a follow-up, "Names are not type safety" [0]
clarifying that the "newtype" pattern (such as hiding a nonzero integer
in a wrapper type) provide weaker guarantees than correctness by
construction. Her original "Parse, Don't Validate" article also
includes the following caveat:
> Use abstract datatypes to make validators âlook likeâ parsers.
Sometimes, making an illegal state truly unrepresentable is just plain
impractical given the tools Haskell provides, such as ensuring an
integer is in a particular range. In that case, use an abstract newtype
with a smart constructor to âfakeâ a parser from a validator.
So, an abstract data type that protects its inner data is really a
"validator" that tries to resemble a "parser" in cases where the type
system itself cannot encode the invariant.
The article's second example, the non-empty vec, is a better example,
because it encodes within the type system the invariant that one
element must exist. The crux of Alexis King's article is that programs
should be structured so that functions return data types designed to be
correct by construction, akin to a parser transforming less-structured
data into more-structured data.
[0]
HTML [1]: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-ty...
CodeBit26 wrote 16 hours 38 min ago:
I agree, 'correct by construction' is the ultimate goal here. Using
types like NonZeroU32 is a great simple example, but the real power
comes when you design your entire domain logic so that the compiler
acts as your gatekeeper. It shifts the mental load from run-time
debugging to design-time thinking.
rapnie wrote 1 day ago:
You can also search for "make invalid states
impossible/unrepresentable" [0] to find more info on related
practices. See "domain modeling made functional" [0] as a nice
example
[0] [1]
HTML [1]: https://geeklaunch.io/blog/make-invalid-states-unrepresentab...
HTML [2]: https://www.youtube.com/watch?v=2JB1_e5wZmU
hutao wrote 1 day ago:
The phrasing that I hear more often is "make illegal states
unrepresentable"; both the submitted article and Alexis King's
original article use this phrase. At least according to [1] , it
originates from Yaron Minsky (a programmer at Jane Street who is
prominent in the OCaml community).
EDIT: Parent comment was edited to amend the
"impossible/unrepresentable" wording
HTML [1]: https://fsharpforfunandprofit.com/posts/designing-with-typ...
rapnie wrote 1 day ago:
Yes, sorry. I thought to add some resources to it, or it would be
a too vague comment and found the better phrasing.
Sharlin wrote 1 day ago:
Even the newtype-based "parse, don't validate" is tremendously useful
in practice, though. The big thing is that if you have a bare string,
you don't know "where it's been". It doesn't carry with it
information whether it's already been validated. Even if a newtype
can't provide you full correctness by construction, it's vastly
easier to be convinced of the validity of an encapsulated value
compared to a naked one.
For full-on parse-don't-validate, you essentially need a dependent
type system. As a more light-weight partial solution, Rust has been
prototyping pattern types, which are types constrained by patterns.
For instance a range-restricted integer type could be simply spelled
`i8 is 0..100`, or a nonempty slice as `[T] is [_, ..]`. Such a
feature would certainly make correctness-by-construction easier in
many cases.
The non-empty list implemented as a (T, Vec) is, btw, a nice example
of the clash between practicality and theoretical purity. It can't
offer you a slice (consecutive view) of its elements without storing
the first element twice (which requires immutability and that T:
Clone, unlike normal Vec), which makes it fairly useless as a vector.
It's okay if you consider it just an abstract list with a more
restricted interface.
spockz wrote 14 hours 13 min ago:
Coming from Haskell, I loved Agda 2 as a dependent type language.
Is there any newer or more mainstream language that has added
dependent types?
throw_await wrote 11 hours 31 min ago:
Typescript has something that can be used as dependent types, but
it wasn't intended as a language feature, so the Syntax is not as
ergonomic as Agda:
HTML [1]: https://www.hacklewayne.com/dependent-types-in-typescrip...
doctor_phil wrote 13 hours 27 min ago:
Idris is slightly more mainstream I would say, but not wildy so.
If you like the Haskell interop then I'd recommend staying with
Agda.
Scala 3 is much more mainstream and has path dependent types.
I've only used Scala 2, and there the boilerplate for dependent
types was frustrating imo, but I've heard its better in 3.
rhdunn wrote 14 hours 14 min ago:
It's also useful to wrap/tag IDs in structured types. That makes it
easier to avoid errors when there are multiple type parameters such
as in the Microsoft graph API.
humkieufj wrote 13 hours 16 min ago:
Tragic how Rusties and HN apparently like to try to murder
critics through svvvatting. Rusties and HN cannot be said to have
neither souls nor conscience.
sam0x17 wrote 1 day ago:
btw the âquothâ crate makes it really really easy to implement
scannerless parsing in rust for arbitrary syntax, use it on many of my
projects
IshKebab wrote 1 day ago:
Interesting looking crate. You don't seem to have any examples at all
though so I wouldn't say it makes it easy!
strawhatguy wrote 1 day ago:
The alternative is one type, with many functions that can operate on
that type.
Like how clojure basically uses maps everywhere and the whole standard
library allows you to manipulate them in various ways.
The main problem with the many type approach is several same it worse
similar types, all incompatible.
slopinthebag wrote 1 day ago:
I find a balance is important. You can do nominal typing in a
structural type system with branding, and you can kinda do structural
typing in a nominal type system, but it's not as ergonomic. But you
should probably end up doing a mix of both.
Kinrany wrote 1 day ago:
It's not an alternative.
Start with a more dynamic type, do stuff that doesn't care about the
shape, parse into a more precise type, do stuff that relies on the
additional invariants, drop back into the more dynamic type again.
Rygian wrote 1 day ago:
This sounds like the "stringly typed language" mockery of some
languages. How is it actually different?
marcosdumay wrote 1 day ago:
There are more than two alternatives, since functions can operate in
more than one type.
fiddlerwoaroof wrote 1 day ago:
Yeah, there's something of a tension between the Perlis quote "It is
better to have 100 functions operate on one data structure than 10
functions on 10 data structures" and Parse, don't validate.
The way I've thought about it, though, is that it's possible to
design a program well either by encoding your important invariants in
your types or in your functions (especially simple functions). In
dynamically typed languages like Clojure, my experience is that
there's a set of design practices that have a lot of the same effects
as "Parse, Don't Validate" without statically enforced types. And,
ultimately, it's a question of mindset which style you prefer.
strawhatguy wrote 1 day ago:
There's probably a case for both. Core logic might benefit from
hard types deep in the bowels of unchanging engine.
The real world often changes though, and more often than not the
code has to adapt, regardless of how elegant are systems are
designed.
fiddlerwoaroof wrote 23 hours 53 min ago:
Coalton ( [1] ) is the sort of thing I like: a Haskell-style
language hosted inside a very dynamic one with good interop.
HTML [1]: https://coalton-lang.github.io
strawhatguy wrote 21 hours 16 min ago:
Yes it's quite the blend!
packetlost wrote 1 day ago:
I don't really get why this is getting flagged, I've found this to be
true but more of a trade off than a pure benefit. It also is sort of
besides the point: you always need to parse inputs from external,
usually untrusted, sources.
doublesocket wrote 1 day ago:
Agree with this. Mismatching types are generally an indicator of an
underlying issue with the code, not the language itself. These are
areas AI can be helpful flagging potential problems.
noitpmeder wrote 1 day ago:
This reminds me a bit of a recent publication by Stroustrup about using
concepts... in C++ to validate integer conversions automatically where
necessary. [1] {
Number ii = 0;
Number cc = '0';
ii = 2; // OK
ii = -2; // throws
cc = i; // OK if i is within ccâs range
cc = -17; // OK if char is signed; otherwise throws
cc = 1234; // throws if a char is 8 bits
}
HTML [1]: https://www.stroustrup.com/Concept-based-GP.pdf
cmovq wrote 1 day ago:
Dividing a float by zero is usually perfectly valid. It has predictable
outputs, and for some algorithms like collision detection this property
is used to remove branches.
woodruffw wrote 1 day ago:
I think âhas predictable outputsâ is less valuable than âhas
expected outputsâ for most workloads. Dividing by zero almost
always reflects an unintended state, so proceeding with the operation
means compounding the error state.
(This isnât to say itâs always wrong, but that having it be an
error state by default seems very reasonable to me.)
jaggederest wrote 1 day ago:
You can go even further with this in other languages, with things like
dependent typing - which can assert (among other interesting
properties) that, for example, something like
get_elem_at_index(array, index)
cannot ever have index outside the bounds of the array, but checked
statically at compilation time - and this is the key, without knowing a
priori what the length of array is.
"In Idris, a length-indexed vector is Vect n a (length n is in the
type), and a valid index into length n is Fin n ('a natural number
strictly less than n')."
Similar tricks work with division that might result in inf/-inf, to
prevent them from typechecking, and more subtle implications in e.g.
higher order types and functions
satvikpendem wrote 1 day ago:
Rust has some libraries that can do dependent typing too, based on
macros. For example: [1] Which refers to
HTML [1]: https://youtube.com/watch?v=JtYyhXs4t6w
HTML [2]: https://docs.rs/anodized/latest/anodized/
hmry wrote 15 hours 57 min ago:
Very cool and practical, but specs aren't dependent typing. (I
actually think specs are probably more useful than dependent types
for most people)
Dependent typing requires:
- Generic types that can take runtime values as parameters, e.g.
[u8; user_input]
- Functions where the type of one parameter depends on the runtime
value of another parameter, e.g. fn f(len: usize, data: [u8; len])
- Structs/tuples where the type of one field depends on the runtime
value of another field, e.g. struct Vec { len: usize, data: [u8;
self.len] }
esafak wrote 1 day ago:
I wish dependent types were more common :(
VorpalWay wrote 1 day ago:
How does that work? If the length of the array is read from stdin for
example, it would be impossible to know it at compile time.
Presumably this is limited somehow?
rq1 wrote 1 day ago:
Imagine you read a value from stdin and parse it as:
Maybe Int
So your program splits into two branches:
1. Nothing branch: you failed to obtain an Int.
There is no integer to use as an index, so you canât even attempt
a safe lookup into something like Vect n a.
2. Just i branch: you do have an Int called i.
But an Int is not automatically a valid index for Vect n a, because
vectors are indexed by Fin n (a proof carrying âbounded
naturalâ).
So inside the Just i branch, you refine further:
3. Try to turn the runtime integer i into a value of type Fin n.
There are two typical shapes of this step:
* Checked conversion returning Maybe (Fin n)
If the integer is in range, you get Just (fin : Fin n). Otherwise
Nothing.
Checked conversion returning evidence (proof) that itâs in range
For example: produce k : Nat plus a proof like k < n (or LTE (S k)
n), and then you can construct Fin n from that evidence.
(But itâs the same basically, you end up with a âMaybe
LTEâ¦â
Now if you also have a vector:
xs : Vect n a
⦠the n in Fin n and the n in Vect n a are the same n (thatâs
what âunifiesâ means here: the types line up), so you can do:
index fin xs : a
And crucially:
there is no branch in which you can call index without having
constructed the Fin n first,
so out-of-bounds access is unrepresentable (itâs not âchecked
laterâ, itâs âcannot be expressedâ).
And within _that_ branch of the program, you have a proof of Fin n.
Said differently: you donât get âcompile-time knowledge of
iâ; you get a compile-time guarantee that whatever value you
ended up with satisfies a predicate.
Concretely: you run a runtime check i < n. _ONCE_
If it fails, youâre in a branch where you do not have Fin n.
If it succeeds, you construct fin : Fin n at runtime (itâs a
value, you canât get around that), but its type encodes the
invariant âin boundsâ/check was done somewhere in this branch.
dernett wrote 1 day ago:
Not sure about Idris, but in Lean `Fin n` is a struct that contains
a value `i` and a proof that `i < n`. You can read in the value `n`
from stdin and then you can do `if h : i < n` to have a
compile-time proof `h` that you can use to construct a `Fin n`
instance.
smj-edison wrote 22 hours 42 min ago:
I've heard this can be a bit of a pain in practice, is that true?
I can imagine it could slow me down to construct a proof of an
invariant every time I call a function (if I understand
correctly).
jaggederest wrote 18 hours 45 min ago:
I haven't worked significantly with lean but I'm toying with my
own dependently typed language. generally you only have to
construct a proof once, much like a type or function, and then
you reuse it. Also, depending on the language there are
rewriting semantics ("elaboration") that let you do
mathematical transformations to make two statements equivalent
and then reuse the standardized proof.
marcosdumay wrote 1 day ago:
If you check that the value is inside the range, and execute some
different code if it's not, then congratulations, you now know at
compile time that the number you will read from stdin is in the
right range.
mdm12 wrote 1 day ago:
One option is dependent pairs, where one value of the pair (in this
example) would be the length of the array and the other value is a
type which depends on that same value (such as Vector n T instead
of List T).
Type-Driven Development with Idris[1] is a great introduction for
dependently typed languages and covers methods such as these if
you're interested (and Edwin Brady is a great teacher).
HTML [1]: https://www.manning.com/books/type-driven-development-with...
ratorx wrote 1 day ago:
It doesnât have to be a compile time constant. An alternative is
to prove that when you are calling the function the index is always
less than the size of the vector (a dynamic constraint). You may be
able to assert this by having a separate function on the vector
that returns a constrained value (eg. n < v.len()).
jaggederest wrote 1 day ago:
If the length is read from outside the program it's an IO
operation, not a static variable, but there are generally runtime
checks in addition to the type system. Usually you solve this as in
the article, with a constructor that checks it - so you'd have
something like "Invalid option: length = 5 must be within 0-4" when
you tried to create the Fin n from the passed in value
dang wrote 1 day ago:
Recent and related: Parse, Don't Validate (2019) - [1] - Feb 2026 (172
comments)
also:
Parse, Donât Validate â Some C Safety Tips - [2] - July 2025 (73
comments)
Parse, Don't Validate (2019) - [3] - July 2024 (102 comments)
Parse, don't validate (2019) - [4] - March 2023 (219 comments)
Parse, Don't Validate (2019) - [5] - June 2021 (270 comments)
Parsix: Parse Don't Validate - [6] - May 2021 (107 comments)
Parse, Donât Validate - [7] - Nov 2019 (230 comments)
Parse, Don't Validate - [8] - Nov 2019 (4 comments)
(p.s. these links are just to satisfy extra-curious readers - no
criticism is intended! I add this because people sometimes assume
otherwise)
HTML [1]: https://news.ycombinator.com/item?id=46960392
HTML [2]: https://news.ycombinator.com/item?id=44507405
HTML [3]: https://news.ycombinator.com/item?id=41031585
HTML [4]: https://news.ycombinator.com/item?id=35053118
HTML [5]: https://news.ycombinator.com/item?id=27639890
HTML [6]: https://news.ycombinator.com/item?id=27166162
HTML [7]: https://news.ycombinator.com/item?id=21476261
HTML [8]: https://news.ycombinator.com/item?id=21471753
DIR <- back to front page