codevoid.de/1/hn/comments_47103931.gph

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   Parse, Don't Validate and Type-Driven Design in Rust
       
       
        ubixar wrote 10 hours 12 min ago:
        C# gets close to this with records + pattern matching, F# discriminated
        unions are even better for this with algebraic data types built right
        in. A Result<'T,'Error> makes invalid states unrepresentable without
        any ceremony. C# records/matching works for now, but native DUs will
        make it even nicer.
       
        barnacs wrote 11 hours 30 min ago:
        Every time you introduce a type for a "value invariant" you lose
        compatibility and force others to make cumbersome type conversions.
        
        To me, invalid values are best expressed with optional error returns
        along with the value that are part of the function signature. Types are
        best used to only encode information about the hierarchy of structures
        composed of primitive types. They help define and navigate the
        representation of composite things as opposed to just having dynamic
        nested maps of arbitrary strings.
       
          mrkeen wrote 10 hours 13 min ago:
          > They help define and navigate the representation of composite
          things as opposed to just having dynamic nested maps of arbitrary
          strings.
          
          What would you say to someone who thinks that nested maps of
          arbitrary strings have maximum compatibility, and using types forces
          others to make cumbersome type conversions?
       
            barnacs wrote 10 hours 8 min ago:
            If the fields of a structure or the string keys of an untyped map
            don't match then you don't have compatibility either way. The same
            is not true for restricting the set of valid values.
            
            edit: To put it differently: To possibly be compatible with the
            nested "Circle" map, you need to know it is supposed to have a
            "Radius" key that is supposed to be a float. Type definitions just
            make this explicit. But just because your "Radius" can't be 0, you
            shouldn't make it incompatible with everything else operating on
            floats in general.
       
        MarcLore wrote 11 hours 36 min ago:
        This pattern maps beautifully to API design too. Instead of validating
        a raw JSON request body and hoping you checked everything, you parse it
        into a well-typed struct upfront. Every downstream function then gets
        guaranteed-valid data without redundant checks. The Rust ecosystem
        makes this almost painless with serde + custom deserializers. I've seen
        codebases cut their error-handling surface area by 60% just by moving
        from validate-then-use to parse-at-the-boundary.
       
        qsera wrote 16 hours 37 min ago:
        >"Parse, Don't Validate,"
        
        Ah, that pretentious little blog post that was taken up by many as
        gospel, for some mysterious reason...
        
        This tells me that any idea, even if stupid (or obvious in this case)
        ones, can go mainstream, if you rhyme it right.
        
        That blog post is like a "modern art" painting by some famous author
        that totally look a child has flingned some paint to a wall, but every
        person who looks at it tries so hard to find some meaning that every
        one end up finding "something"..
        
        In fact the top comment right now express that their interpretation of
        blog post is different from the author's...
        
        Amusing!
       
          mrkeen wrote 10 hours 9 min ago:
          Was the original blog post wrong?
       
          throw310822 wrote 11 hours 32 min ago:
          I only half understand this stuff, but all this encapsulation of
          values so that they are guaranteed to remain valid across
          manipulations... isn't it called Object Oriented Programming?
       
        AxiomLab wrote 17 hours 7 min ago:
        This exact philosophy is why I started treating UI design systems like
        compilers.
        
        Instead of validating visual outputs after the fact (like linting CSS
        or manual design reviews), you parse the constraints upfront. If a
        layout component is strictly typed to only accept discrete grid
        multiples, an arbitrary 13px margin becomes a compile error, not a
        subjective design debate. It forces determinism.
       
          Garlef wrote 13 hours 13 min ago:
          curious: what kind of tooling would you use here?
       
        unixpickle wrote 1 day ago:
        The `try_roots` example here is actually a _counterexample_ to the
        author's main argument. They explicitly ignore the "negative
        discriminant" case. What happens if we consider it?
        
        If we take their "parse" approach, then the types of the arguments a,
        b, and c have to somehow encode the constraint `b^2 - 4ac >= 0`. This
        would be a total mess--I can't think of any clean way to do this in
        Rust. It makes _much_ more sense to simply return an Option and do the
        validation within the function.
        
        In general, I think validation is often the best way to solve the
        problem. The only counterexample, which the author fixates on in the
        post, is when one particular value is constrained in a clean,
        statically verifiable way. Most of the time, validation is used to
        check (possibly complex) interactions between multiple values, and
        "parsing" isn't at all convenient.
       
          sbszllr wrote 10 hours 42 min ago:
          I was thinking a similar thing when reading the article. Often, the
          validity of the input depends on the interaction between some of
          them.
          
          Sure, we can follow the advice of creating types that represent only
          valid states but then we end up with `fn(a: A, b: B, c: C)
          transformed into `fn(abc: ValidABC)`
       
        the__alchemist wrote 1 day ago:
        The examples in question propagate complexity throughout related code. 
        I think this is a case I see frequently in Rust of using too many
        abstractions, and its associated complexities.
        
        I would just (as a default; the situation varies)... validate prior to
        the division and handle as appropriate.
        
        The analogous situation I encounter frequently is indexing, e.g.
        checking if the index is out of bounds. Similar idea; check; print or
        display an error, then fail that computation without crashing the
        program. Usually an indication of some bug, which can be tracked down.
        Or, if it's an array frequently indexed, use a (Canonical for Rust's
        core) `get` method on the whatever struct owns the array. It returns an
        Option.
        
        I do think either the article's approach, or validating is better than
        runtime crashes! There are many patterns in programming. Using Types in
        this way is something I see a lot of in OSS rust, but it is not my cup
        of tea. Not heinous in this case, but I think not worth it.
        
        This is the key to this article's philosophy, near the bottom:
        
        > I love creating more types. Five million types for everyone please.
       
        fph wrote 1 day ago:
        The article quickly mentions implementing addition:
        
        ```
        
        impl Add for NonZeroF32 { ... }
        
        impl Add for NonZeroF32 { ... }
        
        impl Add for f32 { ... }
        
        ```
        
        What type would it return though?
       
          WJW wrote 12 hours 15 min ago:
          I imagine it would be something like Option, since -2.0 + 2.0 would
          violate the constraints at runtime. This gets us the Option handling
          problem back.
          
          I think the article would have been better with NonZeroPositiveF32 as
          the example type, since then addition would be safe.
       
          alfons_foobar wrote 1 day ago:
          Would have to be F32, no?
          I cannot think of any way to enforce "non-zero-ness" of the result
          without making it return an optional Result, and at that point we are
          basically back to square one...
       
            MaulingMonkey wrote 1 day ago:
            > Would have to be F32, no?
            
            Generally yes.    `NonZeroU32::saturating_add(self, other: u32)` is
            able to return `NonZeroU32` though!  ( [1] )
            
            > I cannot think of any way to enforce "non-zero-ness" of the
            result without making it return an optional Result, and at that
            point we are basically back to square one...
            
            `NonZeroU32::checked_add(self, other: u32)` basically does this,
            although I'll note it returns an `Option` instead of a `Result` (
            [1] ), leaving you to `.map_err(...)` or otherwise handle the edge
            case to your heart's content.  Niche, but occasionally what you
            want.
            
  HTML      [1]: https://doc.rust-lang.org/std/num/type.NonZeroU32.html#met...
  HTML      [2]: https://doc.rust-lang.org/std/num/type.NonZeroU32.html#met...
       
              alfons_foobar wrote 1 day ago:
              > `NonZeroU32::saturating_add(self, other: u32)` is able to
              return `NonZeroU32` though!
              
              I was confused at first how that could work, but then I realized
              that of course, with _unsigned_ integers this works fine because
              you cannot add a negative number...
       
        hutao wrote 1 day ago:
        Note that the division-by-zero example used in this article is not the
        best example to demonstrate "Parse, Don't Validate," because it relies
        on encapsulation. The principle of "Parse, Don't Validate" is best
        embodied by functions that transform untrusted data into some data type
        which is correct by construction.
        
        Alexis King, the author of the original "Parse, Don't Validate"
        article, also published a follow-up, "Names are not type safety" [0]
        clarifying that the "newtype" pattern (such as hiding a nonzero integer
        in a wrapper type) provide weaker guarantees than correctness by
        construction. Her original "Parse, Don't Validate" article also
        includes the following caveat:
        
        > Use abstract datatypes to make validators âlook likeâ parsers.
        Sometimes, making an illegal state truly unrepresentable is just plain
        impractical given the tools Haskell provides, such as ensuring an
        integer is in a particular range. In that case, use an abstract newtype
        with a smart constructor to âfakeâ a parser from a validator.
        
        So, an abstract data type that protects its inner data is really a
        "validator" that tries to resemble a "parser" in cases where the type
        system itself cannot encode the invariant.
        
        The article's second example, the non-empty vec, is a better example,
        because it encodes within the type system the invariant that one
        element must exist. The crux of Alexis King's article is that programs
        should be structured so that functions return data types designed to be
        correct by construction, akin to a parser transforming less-structured
        data into more-structured data.
        
        [0]
        
  HTML  [1]: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-ty...
       
          CodeBit26 wrote 16 hours 38 min ago:
          I agree, 'correct by construction' is the ultimate goal here. Using
          types like NonZeroU32 is a great simple example, but the real power
          comes when you design your entire domain logic so that the compiler
          acts as your gatekeeper. It shifts the mental load from run-time
          debugging to design-time thinking.
       
          rapnie wrote 1 day ago:
          You can also search for "make invalid states
          impossible/unrepresentable" [0] to find more info on related
          practices. See "domain modeling made functional" [0] as a nice
          example
          
          [0] [1]
          
  HTML    [1]: https://geeklaunch.io/blog/make-invalid-states-unrepresentab...
  HTML    [2]: https://www.youtube.com/watch?v=2JB1_e5wZmU
       
            hutao wrote 1 day ago:
            The phrasing that I hear more often is "make illegal states
            unrepresentable"; both the submitted article and Alexis King's
            original article use this phrase. At least according to [1] , it
            originates from Yaron Minsky (a programmer at Jane Street who is
            prominent in the OCaml community).
            
            EDIT: Parent comment was edited to amend the
            "impossible/unrepresentable" wording
            
  HTML      [1]: https://fsharpforfunandprofit.com/posts/designing-with-typ...
       
              rapnie wrote 1 day ago:
              Yes, sorry. I thought to add some resources to it, or it would be
              a too vague comment and found the better phrasing.
       
          Sharlin wrote 1 day ago:
          Even the newtype-based "parse, don't validate" is tremendously useful
          in practice, though. The big thing is that if you have a bare string,
          you don't know "where it's been". It doesn't carry with it
          information whether it's already been validated. Even if a newtype
          can't provide you full correctness by construction, it's vastly
          easier to be convinced of the validity of an encapsulated value
          compared to a naked one.
          
          For full-on parse-don't-validate, you essentially need a dependent
          type system. As a more light-weight partial solution, Rust has been
          prototyping pattern types, which are types constrained by patterns.
          For instance a range-restricted integer type could be simply spelled
          `i8 is 0..100`, or a nonempty slice as `[T] is [_, ..]`. Such a
          feature would certainly make correctness-by-construction easier in
          many cases.
          
          The non-empty list implemented as a (T, Vec) is, btw, a nice example
          of the clash between practicality and theoretical purity. It can't
          offer you a slice (consecutive view) of its elements without storing
          the first element twice (which requires immutability and that T:
          Clone, unlike normal Vec), which makes it fairly useless as a vector.
          It's okay if you consider it just an abstract list with a more
          restricted interface.
       
            spockz wrote 14 hours 13 min ago:
            Coming from Haskell, I loved Agda 2 as a dependent type language.
            Is there any newer or more mainstream language that has added
            dependent types?
       
              throw_await wrote 11 hours 31 min ago:
              Typescript has something that can be used as dependent types, but
              it wasn't intended as a language feature, so the Syntax is not as
              ergonomic as Agda:
              
  HTML        [1]: https://www.hacklewayne.com/dependent-types-in-typescrip...
       
              doctor_phil wrote 13 hours 27 min ago:
              Idris is slightly more mainstream I would say, but not wildy so.
              If you like the Haskell interop then I'd recommend staying with
              Agda.
              
              Scala 3 is much more mainstream and has path dependent types.
              I've only used Scala 2, and there the boilerplate for dependent
              types was frustrating imo, but I've heard its better in 3.
       
            rhdunn wrote 14 hours 14 min ago:
            It's also useful to wrap/tag IDs in structured types. That makes it
            easier to avoid errors when there are multiple type parameters such
            as in the Microsoft graph API.
       
              humkieufj wrote 13 hours 16 min ago:
              Tragic how Rusties and HN apparently like to try to murder
              critics through svvvatting. Rusties and HN cannot be said to have
              neither souls nor conscience.
       
        sam0x17 wrote 1 day ago:
        btw the âquothâ crate makes it really really easy to implement
        scannerless parsing in rust for arbitrary syntax, use it on many of my
        projects
       
          IshKebab wrote 1 day ago:
          Interesting looking crate. You don't seem to have any examples at all
          though so I wouldn't say it makes it easy!
       
        strawhatguy wrote 1 day ago:
        The alternative is one type, with many functions that can operate on
        that type.
        
        Like how clojure basically uses maps everywhere and the whole standard
        library allows you to manipulate them in various ways.
        
        The main problem with the many type approach is several same it worse
        similar types, all incompatible.
       
          slopinthebag wrote 1 day ago:
          I find a balance is important. You can do nominal typing in a
          structural type system with branding, and you can kinda do structural
          typing in a nominal type system, but it's not as ergonomic. But you
          should probably end up doing a mix of both.
       
          Kinrany wrote 1 day ago:
          It's not an alternative.
          
          Start with a more dynamic type, do stuff that doesn't care about the
          shape, parse into a more precise type, do stuff that relies on the
          additional invariants, drop back into the more dynamic type again.
       
          Rygian wrote 1 day ago:
          This sounds like the "stringly typed language" mockery of some
          languages. How is it actually different?
       
          marcosdumay wrote 1 day ago:
          There are more than two alternatives, since functions can operate in
          more than one type.
       
          fiddlerwoaroof wrote 1 day ago:
          Yeah, there's something of a tension between the Perlis quote "It is
          better to have 100 functions operate on one data structure than 10
          functions on 10 data structures" and Parse, don't validate.
          
          The way I've thought about it, though, is that it's possible to
          design a program well either by encoding your important invariants in
          your types or in your functions (especially simple functions).    In
          dynamically typed languages like Clojure, my experience is that
          there's a set of design practices that have a lot of the same effects
          as "Parse, Don't Validate" without statically enforced types. And,
          ultimately, it's a question of mindset which style you prefer.
       
            strawhatguy wrote 1 day ago:
            There's probably a case for both. Core logic might benefit from
            hard types deep in the bowels of unchanging engine.
            
            The real world often changes though, and more often than not the
            code has to adapt, regardless of how elegant are systems are
            designed.
       
              fiddlerwoaroof wrote 23 hours 53 min ago:
              Coalton ( [1] ) is the sort of thing I like: a Haskell-style
              language hosted inside a very dynamic one with good interop.
              
  HTML        [1]: https://coalton-lang.github.io
       
                strawhatguy wrote 21 hours 16 min ago:
                Yes it's quite the blend!
       
          packetlost wrote 1 day ago:
          I don't really get why this is getting flagged, I've found this to be
          true but more of a trade off than a pure benefit. It also is sort of
          besides the point: you always need to parse inputs from external,
          usually untrusted, sources.
       
            doublesocket wrote 1 day ago:
            Agree with this. Mismatching types are generally an indicator of an
            underlying issue with the code, not the language itself. These are
            areas AI can be helpful flagging potential problems.
       
        noitpmeder wrote 1 day ago:
        This reminds me a bit of a recent publication by Stroustrup about using
        concepts... in C++ to validate integer conversions automatically where
        necessary. [1] {
             Number ii = 0;
             Number cc = '0';
             ii = 2; // OK
             ii = -2; // throws
             cc = i; // OK if i is within ccâs range
             cc = -17; // OK if char is signed; otherwise throws
             cc = 1234; // throws if a char is 8 bits
          }
        
  HTML  [1]: https://www.stroustrup.com/Concept-based-GP.pdf
       
        cmovq wrote 1 day ago:
        Dividing a float by zero is usually perfectly valid. It has predictable
        outputs, and for some algorithms like collision detection this property
        is used to remove branches.
       
          woodruffw wrote 1 day ago:
          I think âhas predictable outputsâ is less valuable than âhas
          expected outputsâ for most workloads. Dividing by zero almost
          always reflects an unintended state, so proceeding with the operation
          means compounding the error state.
          
          (This isnât to say itâs always wrong, but that having it be an
          error state by default seems very reasonable to me.)
       
        jaggederest wrote 1 day ago:
        You can go even further with this in other languages, with things like
        dependent typing - which can assert (among other interesting
        properties) that, for example, something like
        
            get_elem_at_index(array, index)
        
        cannot ever have index outside the bounds of the array, but checked
        statically at compilation time - and this is the key, without knowing a
        priori what the length of array is.
        
        "In Idris, a length-indexed vector is Vect n a (length n is in the
        type), and a valid index into length n is Fin n ('a natural number
        strictly less than n')."
        
        Similar tricks work with division that might result in inf/-inf, to
        prevent them from typechecking, and more subtle implications in e.g.
        higher order types and functions
       
          satvikpendem wrote 1 day ago:
          Rust has some libraries that can do dependent typing too, based on
          macros. For example: [1] Which refers to
          
  HTML    [1]: https://youtube.com/watch?v=JtYyhXs4t6w
  HTML    [2]: https://docs.rs/anodized/latest/anodized/
       
            hmry wrote 15 hours 57 min ago:
            Very cool and practical, but specs aren't dependent typing. (I
            actually think specs are probably more useful than dependent types
            for most people)
            
            Dependent typing requires:
            
            - Generic types that can take runtime values as parameters, e.g.
            [u8; user_input]
            
            - Functions where the type of one parameter depends on the runtime
            value of another parameter, e.g. fn f(len: usize, data: [u8; len])
            
            - Structs/tuples where the type of one field depends on the runtime
            value of another field, e.g. struct Vec { len: usize, data: [u8;
            self.len] }
       
          esafak wrote 1 day ago:
          I wish dependent types were more common :(
       
          VorpalWay wrote 1 day ago:
          How does that work? If the length of the array is read from stdin for
          example, it would be impossible to know it at compile time.
          Presumably this is limited somehow?
       
            rq1 wrote 1 day ago:
            Imagine you read a value from stdin and parse it as:
            
            Maybe Int
            
            So your program splits into two branches:
            
            1. Nothing branch: you failed to obtain an Int.
            
            There is no integer to use as an index, so you canât even attempt
            a safe lookup into something like Vect n a.
            
            2. Just i branch: you do have an Int called i.
            
            But an Int is not automatically a valid index for Vect n a, because
            vectors are indexed by Fin n (a proof carrying âbounded
            naturalâ).
            
            So inside the Just i branch, you refine further:
            
            3. Try to turn the runtime integer i into a value of type Fin n.
            
            There are two typical shapes of this step:
            
            * Checked conversion returning Maybe (Fin n)
            
            If the integer is in range, you get Just (fin : Fin n). Otherwise
            Nothing.
            
            Checked conversion returning evidence (proof) that itâs in range
            
            For example: produce k : Nat plus a proof like k < n (or LTE (S k)
            n), and then you can construct Fin n from that evidence.
            
            (But itâs the same basically, you end up with a âMaybe
            LTEâ¦â
            
            Now if you also have a vector:
            xs : Vect n a
            
            â¦ the n in Fin n and the n in Vect n a are the same n (thatâs
            what âunifiesâ means here: the types line up), so you can do:
            index fin xs : a
            
            And crucially:
            
            there is no branch in which you can call index without having
            constructed the Fin n first,
            so out-of-bounds access is unrepresentable (itâs not âchecked
            laterâ, itâs âcannot be expressedâ).
            
            And within _that_ branch of the program, you have a proof of Fin n.
            
            Said differently: you donât get âcompile-time knowledge of
            iâ; you get a compile-time guarantee that whatever value you
            ended up with satisfies a predicate.
            
            Concretely: you run a runtime check i < n. _ONCE_
            
            If it fails, youâre in a branch where you do not have Fin n.
            
            If it succeeds, you construct fin : Fin n at runtime (itâs a
            value, you canât get around that), but its type encodes the
            invariant âin boundsâ/check was done somewhere in this branch.
       
            dernett wrote 1 day ago:
            Not sure about Idris, but in Lean `Fin n` is a struct that contains
            a value `i` and a proof that `i < n`. You can read in the value `n`
            from stdin and then you can do `if h : i < n` to have a
            compile-time proof `h` that you can use to construct a `Fin n`
            instance.
       
              smj-edison wrote 22 hours 42 min ago:
              I've heard this can be a bit of a pain in practice, is that true?
              I can imagine it could slow me down to construct a proof of an
              invariant every time I call a function (if I understand
              correctly).
       
                jaggederest wrote 18 hours 45 min ago:
                I haven't worked significantly with lean but I'm toying with my
                own dependently typed language. generally you only have to
                construct a proof once, much like a type or function, and then
                you reuse it. Also, depending on the language there are
                rewriting semantics ("elaboration") that let you do
                mathematical transformations to make two statements equivalent
                and then reuse the standardized proof.
       
            marcosdumay wrote 1 day ago:
            If you check that the value is inside the range, and execute some
            different code if it's not, then congratulations, you now know at
            compile time that the number you will read from stdin is in the
            right range.
       
            mdm12 wrote 1 day ago:
            One option is dependent pairs, where one value of the pair (in this
            example) would be the length of the array and the other value is a
            type which depends on that same value (such as Vector n T instead
            of List T).
            
            Type-Driven Development with Idris[1] is a great introduction for
            dependently typed languages and covers methods such as these if
            you're interested (and Edwin Brady is a great teacher).
            
  HTML      [1]: https://www.manning.com/books/type-driven-development-with...
       
            ratorx wrote 1 day ago:
            It doesnât have to be a compile time constant. An alternative is
            to prove that when you are calling the function the index is always
            less than the size of the vector (a dynamic constraint). You may be
            able to assert this by having a separate function on the vector
            that returns a constrained value (eg. n < v.len()).
       
            jaggederest wrote 1 day ago:
            If the length is read from outside the program it's an IO
            operation, not a static variable, but there are generally runtime
            checks in addition to the type system. Usually you solve this as in
            the article, with a constructor that checks it - so you'd have
            something like "Invalid option: length = 5 must be within 0-4" when
            you tried to create the Fin n from the passed in value
       
        dang wrote 1 day ago:
        Recent and related: Parse, Don't Validate (2019) - [1] - Feb 2026 (172
        comments)
        
        also:
        
        Parse, Donât Validate â Some C Safety Tips - [2] - July 2025 (73
        comments)
        
        Parse, Don't Validate (2019) - [3] - July 2024 (102 comments)
        
        Parse, don't validate (2019) - [4] - March 2023 (219 comments)
        
        Parse, Don't Validate (2019) - [5] - June 2021 (270 comments)
        
        Parsix: Parse Don't Validate - [6] - May 2021 (107 comments)
        
        Parse, Donât Validate - [7] - Nov 2019 (230 comments)
        
        Parse, Don't Validate - [8] - Nov 2019 (4 comments)
        
        (p.s. these links are just to satisfy extra-curious readers - no
        criticism is intended! I add this because people sometimes assume
        otherwise)
        
  HTML  [1]: https://news.ycombinator.com/item?id=46960392
  HTML  [2]: https://news.ycombinator.com/item?id=44507405
  HTML  [3]: https://news.ycombinator.com/item?id=41031585
  HTML  [4]: https://news.ycombinator.com/item?id=35053118
  HTML  [5]: https://news.ycombinator.com/item?id=27639890
  HTML  [6]: https://news.ycombinator.com/item?id=27166162
  HTML  [7]: https://news.ycombinator.com/item?id=21476261
  HTML  [8]: https://news.ycombinator.com/item?id=21471753
       
       
   DIR <- back to front page