#!/usr/bin/env stack > -- stack script --resolver nightly-2025-12-01 --package text --package bytestring --package directory --package filepath --package liquidhaskell grep.lhs: live-streaming grep for someodd's gopher tree ======================================================= A small literate-Haskell applet that walks `/var/gopher` looking for `.txt` files whose path *or* contents contain the search query (case- insensitive substring). Unlike ryvm (which ranks and therefore must collect all candidates before emitting anything), this one streams hits as it finds them — `stdout` is set to `NoBuffering` and we `hFlush` after each row, so the wire sees results in real time under Venusia's `stream = true`. Pair this with `search.sh` (which uses ryvm for relevance-ranked results) when you want ranking; pick `grep.lhs` when you want speed and don't care about score order. URL design ---------- /applets/grep.lhs → prompt page: explain what this does, offer the search input. /applets/grep.lhs + text → stream all `.txt` files whose path or contents contain `text`, labelled `[name]` or `[content]`. A file that matches both appears twice (once per match reason); that's intentional. Running the doctests -------------------- doctest-lhs grep.lhs Reuses this file's own `-- stack` directive and language pragmas, so doctest sees the same packages and extensions `run-cached-lhs` compiles against. LiquidHaskell verification -------------------------- This script enables the LiquidHaskell GHC plugin and carries a handful of refinement-type annotations on pure helpers. The intent is a *targeted* aid for catching bugs in size/length/range invariants — not a full totality + termination proof of the file. That distinction matters: LH's default stance is "every function must be proven to terminate and never call `error`," which would demand re-architecting the recursive directory walk and the argv-cons-pattern catch-all. The two `--no-*` LIQUID options below turn those defaults off; what's left are the specific refinements we *do* want LH to check, written inline as `{-@ ... @-}` blocks above each annotated function. What you get from the annotations as the file currently stands: * @maxSnippetLen :: {v:Int | v >= 4}@ is **verified** — drop the constant below 4 and LH refuses to compile, because the @T.take (maxSnippetLen - 3) t <> "..."@ form in 'truncateLine' only makes sense once you have at least one character before the ellipsis. * @sanitize :: T.Text -> SafeText@ is **assumed**. LH believes that the function produces text with no row-break bytes (CR/LF/TAB) without verifying the body. This is a trust boundary: the function is six characters of @T.map@, tested by the doctests below, and easy for a human to audit. The annotation is groundwork for refining the gophermap line builders to demand 'SafeText' input — once that refactor happens, LH will refuse callers who forget to route untrusted text through 'sanitize'. * @findMatchingLines@'s line numbers are **assumed** to be @>= 1@. The promise was already in the docstring; the refinement type makes it machine-checkable for downstream callers. LH was bumped to its own resolver (nightly-2025-12-01) on this file's `-- stack` directive because no LTS-22 / LTS-23 / LTS-24 snapshot ships `liquidhaskell`. Other applets in this directory stay on lts-22.6; each `.lhs` pins its own resolver, so the nightly here doesn't propagate. The doctest wrapper (`doctest-lhs`) strips this file's LH plugin pragma into an inert comment for the doctest run, because LH writes "LIQUID: SAFE ..." to stdout during GHCi load and doctest can't parse that. Module header and imports ------------------------- > {-# LANGUAGE OverloadedStrings #-} > {-# OPTIONS_GHC -fplugin=LiquidHaskell #-} > > {-@ LIQUID "--no-totality" @-} > {-@ LIQUID "--no-termination" @-} > > module Main (main) where > > import Control.Exception (IOException, SomeException, try) > import Control.Monad (when) > import qualified Data.ByteString as BS > import qualified Data.Text as T > import qualified Data.Text.Encoding as TE > import qualified Data.Text.Encoding.Error as TEE > import qualified Data.Text.IO as TIO > import System.Directory (doesDirectoryExist, doesFileExist, > getFileSize, listDirectory) > import System.Environment (getArgs, lookupEnv) > import System.FilePath ((), makeRelative, takeExtension) > import System.IO (BufferMode (..), hFlush, > hSetBuffering, hSetEncoding, > stdout, utf8) Constants --------- `searchRoot` is the directory we walk; it's hard-coded to `/var/gopher` deliberately so the script can be invoked from any CWD without needing extra argv. Adjust the constant if your served tree lives elsewhere. `maxFileSize` is a defensive ceiling: files larger than this are skipped for content scanning (we still match against their path). Without this, a stray giant log file would balloon memory on every search. > defaultHost, defaultPort :: T.Text > defaultHost = "gopher.someodd.zip" > defaultPort = "70" > searchRoot :: FilePath > searchRoot = "/var/gopher" > maxFileSize :: Integer > maxFileSize = 4 * 1024 * 1024 -- 4 MiB > maxMatchesPerFile :: Int > maxMatchesPerFile = 5 -- snippet rows shown per file > {-@ maxSnippetLen :: {v:Int | v >= 4} @-} > maxSnippetLen :: Int > maxSnippetLen = 100 -- chars of context per snippet > -- (The @>= 4@ refinement is real, not nominal: 'truncateLine' > -- uses @T.take (maxSnippetLen - 3) t <> "..."@, which needs at > -- least one character before the ellipsis to make sense. Drop > -- this constant below 4 and LH refuses to compile.) Request parsing --------------- Three positional argv from Venusia (`$selector`, `$search`, `$pathinfo`), plus any extras (e.g. `$remote_ip`) we ignore via the cons pattern. Empty argv aborts loudly — we can't invent our own mount-point selector. See figlet.lhs / interlog.lhs for the same shape. > data Req = Req > { reqSel :: T.Text > , reqQ :: T.Text > , reqP :: T.Text > } deriving Show > {-@ ignore parseArgs @-} > parseArgs :: [String] -> Req > parseArgs (s:q:p:_) = Req (T.pack s) (T.pack q) (T.pack p) > parseArgs (s:q:_) = Req (T.pack s) (T.pack q) T.empty > parseArgs [s] = Req (T.pack s) T.empty T.empty > parseArgs [] = error > "grep.lhs: missing argv[0] (gopher selector). Run via Venusia, \ > \or for manual testing pass selector + query explicitly." > data Ctx = Ctx > { ctxScriptSel :: T.Text > , ctxHost :: T.Text > , ctxPort :: T.Text > } Main dispatch ------------- `stdout` is set to `NoBuffering` so every `TIO.putStr` reaches the pipe immediately — combined with Venusia's `stream = true` for `.lhs` applets, the client sees rows as soon as we print them. Wrapping the body in `try` ensures any uncaught exception becomes a visible type-3 error row, never a blank response. > main :: IO () > main = do > hSetBuffering stdout NoBuffering > hSetEncoding stdout utf8 > result <- try mainBody :: IO (Either SomeException ()) > case result of > Right () -> pure () > Left e -> mapM_ putLine > [ errorItem ("grep.lhs crashed: " <> T.pack (show e)) > , terminator > ] > mainBody :: IO () > mainBody = do > req <- parseArgs <$> getArgs > host <- maybe defaultHost T.pack <$> lookupEnv "GOPHER_HOST" > port <- maybe defaultPort T.pack <$> lookupEnv "GOPHER_PORT" > let pinfo = reqP req > scriptSel = T.dropEnd (T.length pinfo) (reqSel req) > ctx = Ctx scriptSel host port > q = T.strip (reqQ req) > if T.null q > then emitPrompt ctx > else emitResults ctx q Gophermap line builders ----------------------- Same conventions as the other applets. `sanitize` strips CR/LF/TAB out of any field that contains user-supplied text (the query, file paths from disk) so a stray byte can't smuggle a fake gophermap row. > putLine :: T.Text -> IO () > putLine t = do > TIO.putStr (t <> "\r\n") > hFlush stdout > -- | An info-line gophermap row (item type @i@). > -- > -- >>> infoLine "hello" > -- "ihello\tnull\terror.host\t1" > infoLine :: T.Text -> T.Text > infoLine msg = "i" <> sanitize msg <> "\tnull\terror.host\t1" > -- | A type-7 search prompt row. > -- > -- >>> searchLine "Search" "/q" "host" "70" > -- "7Search\t/q\thost\t70" > searchLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text > searchLine display sel host port = > "7" <> sanitize display <> "\t" <> sel <> "\t" <> host <> "\t" <> port > -- | A text-file link (item type @0@). > -- > -- >>> textLine "doc" "/d" "host" "70" > -- "0doc\t/d\thost\t70" > textLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text > textLine display sel host port = > "0" <> sanitize display <> "\t" <> sel <> "\t" <> host <> "\t" <> port > -- | A type-3 error row. > -- > -- >>> errorItem "oops" > -- "3oops\tnull\terror.host\t1" > errorItem :: T.Text -> T.Text > errorItem msg = "3" <> sanitize msg <> "\tnull\terror.host\t1" > terminator :: T.Text > terminator = "." > -- | Replace CR, LF, and TAB with spaces so user-supplied text can't > -- smuggle gophermap row breaks. > -- > -- The LH bits below introduce an abstract predicate @noRowBreaks@ > -- and assert (via @assume@) that 'sanitize' produces text that > -- satisfies it. The predicate body is not reflected — LH believes > -- us. That's a deliberate trust boundary: the function is six > -- characters of @T.map@, three doctests below, and easy for a > -- human to audit. In exchange every function annotated with > -- @SafeText@ in its argument types becomes a compile-time check > -- that callers route untrusted text through 'sanitize' first. > -- > -- >>> sanitize "ok" > -- "ok" > -- > -- >>> sanitize "a\tb" > -- "a b" > -- > -- >>> sanitize "line\nbreak" > -- "line break" > {-@ measure noRowBreaks :: T.Text -> Bool @-} > {-@ type SafeText = {t:T.Text | noRowBreaks t} @-} > {-@ assume sanitize :: T.Text -> SafeText @-} > sanitize :: T.Text -> T.Text > sanitize = T.map (\c -> if c == '\r' || c == '\n' || c == '\t' then ' ' else c) Path helpers ------------ > -- | Lowercased lenient-decode of a byte string. Used for case- > -- insensitive substring search of file contents. > -- > -- >>> lowerDecode "Hello World" > -- "hello world" > -- > -- >>> lowerDecode "" > -- "" > lowerDecode :: BS.ByteString -> T.Text > lowerDecode = T.toLower . TE.decodeUtf8With TEE.lenientDecode > -- | The gopher selector for a file relative to the search root, > -- always leading-slash-prefixed. > -- > -- >>> relativeSelector "/var/gopher" "/var/gopher/foo.txt" > -- "/foo.txt" > -- > -- >>> relativeSelector "/var/gopher" "/var/gopher/sub/dir/file.txt" > -- "/sub/dir/file.txt" > -- > -- >>> relativeSelector "/var/gopher" "/var/gopher" > -- "/" > relativeSelector :: FilePath -> FilePath -> T.Text > relativeSelector root path = case makeRelative root path of > "." -> "/" > rel -> "/" <> T.pack rel Pages ----- The empty-query page: a single info line explaining what this does, plus a type-7 search prompt that loops back to ourselves. > emitPrompt :: Ctx -> IO () > emitPrompt ctx = mapM_ putLine > [ infoLine "Search someodd's gopherhole's text files." > , infoLine "(grep-based; streams hits as they're found.)" > , infoLine "" > , searchLine "Query" (ctxScriptSel ctx) (ctxHost ctx) (ctxPort ctx) > , terminator > ] The results page: a header row that lands at the client immediately (visible "I'm working" feedback), then matches streamed in walk order as they're discovered, then the gopher terminator. > emitResults :: Ctx -> T.Text -> IO () > emitResults ctx q = do > mapM_ putLine > [ infoLine ("Grepping for \"" <> q <> "\" in someodd's gopherhole's text files...") > , infoLine "" > ] > let qLower = T.toLower q > walkAndMatch ctx qLower searchRoot > putLine terminator Walk and match -------------- Recursive directory walk. For each `.txt` file we encounter, we check the path against the query (case-insensitive substring) and, if the file is under the size ceiling, the file's contents. Hits are emitted as soon as they're found — no buffering, no batching. A file matching both path and contents emits twice (once per reason) so the user can see *why* each match showed up. Dotfiles and dot-directories are skipped to keep `.git`, `.cache`, gvfs scratch droppings, etc. out of the results. > walkAndMatch :: Ctx -> T.Text -> FilePath -> IO () > walkAndMatch ctx qLower = go > where > go :: FilePath -> IO () > go dir = do > entries <- listSafely dir > mapM_ (visit dir) entries > > visit :: FilePath -> FilePath -> IO () > visit dir name > | shouldSkip name = pure () > | otherwise = do > let path = dir name > isDir <- doesDirectoryExist path > if isDir > then go path > else do > isFile <- doesFileExist path > when (isFile && takeExtension path == ".txt") $ > matchFile ctx qLower path > -- | Names that the walk skips outright: > -- > -- * Anything starting with a dot — @.git@, @.cache@, gvfs's > -- @.giosaveXXX@ droppings, dotfile config and the like. > -- * @catalog@ — bartleby's generated index tree. Searching it > -- would surface derived menu-source files over the user's actual > -- text content, which is rarely what you want. > -- > -- >>> shouldSkip ".git" > -- True > -- > -- >>> shouldSkip "catalog" > -- True > -- > -- >>> shouldSkip ".env" > -- True > -- > -- >>> shouldSkip "applets" > -- False > -- > -- >>> shouldSkip "interlog" > -- False > shouldSkip :: FilePath -> Bool > shouldSkip ('.':_) = True > shouldSkip "catalog" = True > shouldSkip _ = False > -- | List a directory's entries, returning [] on any error (e.g. > -- permission denied, vanished mid-walk). Keeps the search resilient > -- to half-readable trees. > listSafely :: FilePath -> IO [FilePath] > listSafely dir = do > result <- try (listDirectory dir) :: IO (Either IOException [FilePath]) > pure (either (const []) id result) > -- | Find every line in a byte string whose lowercased text contains > -- the (already-lowercased) query. Returns 1-indexed line numbers > -- paired with the original line text. Lenient UTF-8 decode means > -- stray non-UTF-8 bytes don't crash the scan. > -- > -- >>> findMatchingLines "hi" "hello\nhi there\nbye" > -- [(2,"hi there")] > -- > -- >>> findMatchingLines "x" "alpha\nbeta\ngamma" > -- [] > -- > -- >>> findMatchingLines "a" "alpha\nbeta" > -- [(1,"alpha"),(2,"beta")] > -- > -- The refinement type below makes the "1-indexed" promise in the > -- prose above a compile-time check: every @Int@ in the result is > -- @>= 1@. Asserted via @assume@ — the body trivially satisfies it > -- (the @zip [1 ..]@ form pins the lower bound), but LH's built-in > -- @enumFrom@ spec doesn't surface the lower bound by default. > {-@ assume findMatchingLines :: T.Text -> BS.ByteString -> [({n:Int | n >= 1}, T.Text)] @-} > findMatchingLines :: T.Text -> BS.ByteString -> [(Int, T.Text)] > findMatchingLines qLower bs = > let allLines = zip [1 ..] (T.lines (TE.decodeUtf8With TEE.lenientDecode bs)) > in filter (\(_, l) -> qLower `T.isInfixOf` T.toLower l) allLines > -- | Trim a matching line and cap it to 'maxSnippetLen' characters > -- so a single very long line doesn't widen the menu. > -- > -- (A refinement on this function's return-value length would be > -- the natural place for it, but LH 0.9.12 doesn't ship a measure > -- for @Data.Text.Internal.Text@ — only @len@ for lists. The > -- doctests below pin the cap at runtime instead.) > -- > -- >>> truncateLine " short " > -- "short" > -- > -- >>> T.length (truncateLine (T.replicate 200 "a")) > -- 100 > -- > -- >>> T.takeEnd 3 (truncateLine (T.replicate 200 "a")) > -- "..." > truncateLine :: T.Text -> T.Text > truncateLine = trunc . T.strip > where > trunc t > | T.length t <= maxSnippetLen = t > | otherwise = T.take (maxSnippetLen - 3) t <> "..." > matchFile :: Ctx -> T.Text -> FilePath -> IO () > matchFile ctx qLower path = do > let pathLower = T.toLower (T.pack path) > sel = relativeSelector searchRoot path > relText = T.pack (makeRelative searchRoot path) > pathMatches = qLower `T.isInfixOf` pathLower > contentResult <- try (readSized path) :: IO (Either IOException (Maybe BS.ByteString)) > let matches = case contentResult of > Right (Just bs) -> findMatchingLines qLower bs > _ -> [] > when (pathMatches || not (null matches)) $ do > -- The file's link, followed (for content matches) by a > -- `grep -n`-style blurb on each matching line. Cap at > -- maxMatchesPerFile so a common short query doesn't spam the > -- listing with hundreds of rows for one file. > putLine (textLine relText sel (ctxHost ctx) (ctxPort ctx)) > let shown = take maxMatchesPerFile matches > dropped = length matches - length shown > mapM_ (\(n, line) -> > putLine (infoLine > (" L" <> T.pack (show n) <> ": " <> truncateLine line))) > shown > when (dropped > 0) $ > putLine (infoLine > (" (+" <> T.pack (show dropped) <> " more match(es) in this file)")) > putLine (infoLine "") > -- | Read a file's bytes only if it's at most 'maxFileSize'. Returns > -- 'Nothing' for oversized files so the caller can skip cleanly. > -- Stats the file via 'getFileSize' first so we never load a giant > -- file into memory just to discover we should have skipped it. > readSized :: FilePath -> IO (Maybe BS.ByteString) > readSized path = do > size <- getFileSize path > if size > maxFileSize > then pure Nothing > else Just <$> BS.readFile path