#!/usr/bin/env stack > -- stack script --resolver lts-22.6 --package text --package process --package directory --package time --package aeson --package bytestring --package network-uri ask.lhs: ask a local LLM a question over gopher =============================================== A small literate-Haskell applet that gateways a local Ollama model into gopherspace. You ask a short question; it answers in a couple of plain-text lines. No account, no app, no API key --- the entire client is `curl`: curl gopher://gopher.someodd.zip/1/applets/ask.lhs%09what+is+gopher The model is hard-coded (`ask`, a brevity-tuned model built from a Modelfile) and both the question and the answer are kept short on purpose --- this runs on a small home server with a modest GPU, so a single-flight lock serialises requests and a second caller is told to come back in a moment rather than piling onto the card. Output is **streamed token by token**. We POST to Ollama's HTTP API (`/api/generate` with `"stream": true`) which returns NDJSON --- one JSON object per token, each terminated by `\n`. That gives a clean per-line boundary on the read side: `hGetLine` returns once per token instead of blocking until generation finishes. Tokens accumulate into a line buffer that is flushed as a gophermap info-line each time it reaches `wrapWidth` characters (split at the last space) or when a token contains a newline. Combined with `hSetBuffering stdout NoBuffering` and Venusia's `stream = true`, the client sees rows appear at the model's token rate. The earlier version of this script shelled out to `ollama run`. That emits all tokens on a single line in TTY-formatted output (ANSI cursor escapes around each token, `\n` only at end of generation), so a line-buffered reader blocks until the model is done and the user sees the whole answer at once. The HTTP API sidesteps that. (This file is markdown-flavoured literate Haskell. Headings use setext underlines rather than ATX-style `#` because GHC's literate parser interprets a `#` at column 1 of a non-code line as the start of a pragma --- setext sidesteps that.) URL design ---------- `$SCRIPT` stands for whatever `routes.toml` mounts this file at (`/applets/curl/ask.lhs` on this server); the script computes its own mount-point from `$selector - $pathinfo`, so substitute freely. $SCRIPT → prompt page: a type-7 search item. $SCRIPT + type-7 query → the query (clamped to 200 chars) is answered by the `ask` model; the reply is streamed as info-lines followed by a footer link back to the gopherhole root. $SCRIPT/ → same answer, but the question rides in path-info (percent-decoded; its own slashes are preserved). Lets a plain menu link or a bookmarked URL ask without a type-7 prompt. Path-info wins over the search query if both are somehow present. $SCRIPT/txt/ → same, but answered as a plain-text document (gopher item type 0) rather than a gophermap -- link it as a `0` item. The applet can't see the URL's item type, so the `txt/` prefix selects the text rendering explicitly. Same `$pathinfo` mechanism atomize.lhs (`/applets/curl/atomize.lhs`) uses to take a URL in its path; here it carries the question. From the command line --------------------- The gopher item-type segment (`/0/`, `/1/`) is consumed by curl's URL parser and never reaches the server. `%09` is the tab that separates selector from search query; `%20` is a space. Plain text out -- the `txt/` path, best for a shell or a pipe: curl 'gopher://gopher.someodd.zip/0/applets/curl/ask.lhs/txt/when%20was%20gopher%20invented' Gophermap (menu) out, via a type-7 tab query: curl 'gopher://gopher.someodd.zip/1/applets/curl/ask.lhs%09when+was+gopher+invented' Or bake the question into the path (gophermap out, no tab needed): curl 'gopher://gopher.someodd.zip/1/applets/curl/ask.lhs/when%20was%20gopher%20invented' Running the doctests -------------------- doctest-lhs ask.lhs Reuses this file's own `-- stack` directive and language pragmas, so doctest sees the same packages and extensions `run-cached-lhs` compiles against. Module header and imports ------------------------- > {-# LANGUAGE OverloadedStrings #-} > module Main (main) where > > import Control.Exception (IOException, SomeException, finally, try) > import Control.Monad (when) > import qualified Data.Aeson as A > import Data.Aeson ((.!=), (.:?)) > import qualified Data.ByteString.Lazy as BL > import qualified Data.Text as T > import qualified Data.Text.Encoding as TE > import qualified Data.Text.IO as TIO > import Data.Time.Clock (NominalDiffTime, diffUTCTime, > getCurrentTime) > import Network.URI (unEscapeString) > import System.Directory (createDirectory, getModificationTime, > removeDirectoryRecursive) > import System.Environment (getArgs, lookupEnv) > import System.IO (BufferMode (..), Handle, hClose, hIsEOF, > hSetBuffering, hSetEncoding, stdout, utf8) > import System.Process (CreateProcess (..), ProcessHandle, > StdStream (..), createProcess, proc, > terminateProcess, waitForProcess) Constants --------- `askModel` names the Ollama model. We send our own `systemPrompt` and `num_predict` (`answerTokens`) with every request, which override the model's Modelfile SYSTEM / parameters --- so answer length and the plain-text style are controlled here in the script, and `askModel` can even point straight at a base model like `llama3:latest`. `ollamaUrl` is the local Ollama HTTP endpoint; the default daemon listens on port 11434. Swap to a remote host:port if your Ollama lives elsewhere, but note that this changes the trust boundary --- the script POSTs the user's query straight to whatever URL this is. `maxQueryLen` clamps user input; `wrapWidth` is the column the answer is word-wrapped to; `answerTokens` is the `num_predict` we request (sent in `options`, overriding the model's terse Modelfile default, so answers run a few paragraphs and visibly stream); `maxAnswerLines` is a hard backstop on emitted rows; `lockDir` is the single-flight lock (an atomically-created directory --- `mkdir` is atomic, so the create either wins or loses cleanly). > defaultHost, defaultPort :: T.Text > defaultHost = "gopher.someodd.zip" > defaultPort = "70" > askModel :: T.Text > askModel = "ask" > -- | System prompt sent with each request, overriding the model's terse > -- Modelfile default so answers run long enough to watch stream -- but > -- kept plain-text, since a gopher terminal has no markdown. > systemPrompt :: T.Text > systemPrompt = > "You are answering over a plain-text gopher terminal. Reply in plain \ > \prose only -- no markdown, asterisks, headings, or bullet characters. \ > \Give a thorough, complete answer of several paragraphs." > ollamaUrl :: String > ollamaUrl = "http://localhost:11434/api/generate" > maxQueryLen :: Int > maxQueryLen = 200 > wrapWidth :: Int > wrapWidth = 70 > -- | num_predict we request, overriding the model's terse default so > -- answers run longer and stream visibly (~10x the old brevity). > answerTokens :: Int > answerTokens = 512 > maxAnswerLines :: Int > maxAnswerLines = 140 -- hard backstop; 10x the old cap > lockDir :: FilePath > lockDir = "/tmp/ask-ollama.lock.d" > -- | A lock older than this is treated as abandoned and stolen. A > -- request killed mid-flight (client/server timeout) never runs its > -- `finally`, so without this the applet would brick permanently > -- the moment one request dies. > lockStaleSecs :: NominalDiffTime > lockStaleSecs = 180 Request parsing --------------- Three positional argv from Venusia (`$selector`, `$search`, `$pathinfo`), plus any extras (e.g. `$remote_ip`) we ignore via the cons pattern. Same shape as grep.lhs / figlet.lhs. > data Req = Req > { reqSel :: T.Text > , reqQ :: T.Text > , reqP :: T.Text > } deriving Show > parseArgs :: [String] -> Req > parseArgs (s:q:p:_) = Req (T.pack s) (T.pack q) (T.pack p) > parseArgs (s:q:_) = Req (T.pack s) (T.pack q) T.empty > parseArgs [s] = Req (T.pack s) T.empty T.empty > parseArgs [] = error > "ask.lhs: missing argv[0] (gopher selector). Run via Venusia, \ > \or for manual testing pass selector + query explicitly." > data Ctx = Ctx > { ctxScriptSel :: T.Text > , ctxHost :: T.Text > , ctxPort :: T.Text > } Main dispatch ------------- `stdout` is `NoBuffering` so rows reach the pipe immediately under Venusia's `stream = true`. Wrapping the body in `try` ensures any uncaught exception becomes a visible type-3 error row, never a blank response. > main :: IO () > main = do > hSetBuffering stdout NoBuffering > hSetEncoding stdout utf8 > result <- try mainBody :: IO (Either SomeException ()) > case result of > Right () -> pure () > Left e -> mapM_ putLine > [ errorItem ("ask.lhs crashed: " <> T.pack (show e)) > , terminator > ] > mainBody :: IO () > mainBody = do > req <- parseArgs <$> getArgs > host <- maybe defaultHost T.pack <$> lookupEnv "GOPHER_HOST" > port <- maybe defaultPort T.pack <$> lookupEnv "GOPHER_PORT" > let pinfo = reqP req > scriptSel = T.dropEnd (T.length pinfo) (reqSel req) > ctx = Ctx scriptSel host port > -- The question arrives two ways: as a type-7 search query, or > -- baked into path-info (`$SCRIPT/`) so a plain menu > -- link or a bookmarked URL can ask directly. Path-info is one > -- opaque, percent-decoded string (we never split on '/', so a > -- question's own slashes survive) and takes precedence. > -- A leading `txt/` segment asks for a plain-text (item type 0) > -- answer instead of a gophermap; the rest is the question. > stripped = T.dropWhile (== '/') pinfo > (plain, raw) = case T.stripPrefix "txt/" stripped of > Just r -> (True, r) > Nothing -> (stripped == "txt", if stripped == "txt" then "" else stripped) > pQ = T.pack (unEscapeString (T.unpack raw)) > q = clampQuery (if T.null pQ then reqQ req else pQ) > if T.null q > then if plain then emitUsageText ctx else emitPrompt ctx > else if plain then emitAnswerText ctx q else emitAnswer ctx q Gophermap line builders ----------------------- Same conventions as the other applets. `sanitize` strips CR/LF/TAB out of any field carrying user-supplied or model-generated text so a stray byte can't smuggle a fake gophermap row. > putLine :: T.Text -> IO () > putLine t = TIO.putStr (t <> "\r\n") > -- | An info-line gophermap row (item type @i@). > -- > -- >>> infoLine "hello" > -- "ihello\tnull\terror.host\t1" > infoLine :: T.Text -> T.Text > infoLine msg = "i" <> sanitize msg <> "\tnull\terror.host\t1" > -- | A type-7 search prompt row. > -- > -- >>> searchLine "Ask" "/q" "host" "70" > -- "7Ask\t/q\thost\t70" > searchLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text > searchLine display sel host port = > "7" <> sanitize display <> "\t" <> sel <> "\t" <> host <> "\t" <> port > -- | A menu link row (item type @1@). > -- > -- >>> menuLine "home" "" "host" "70" > -- "1home\t\thost\t70" > menuLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text > menuLine display sel host port = > "1" <> sanitize display <> "\t" <> sel <> "\t" <> host <> "\t" <> port > -- | A type-3 error row. > -- > -- >>> errorItem "oops" > -- "3oops\tnull\terror.host\t1" > errorItem :: T.Text -> T.Text > errorItem msg = "3" <> sanitize msg <> "\tnull\terror.host\t1" > terminator :: T.Text > terminator = "." > -- | >>> sanitize "ok" > -- "ok" > -- > -- >>> sanitize "a\tb" > -- "a b" > -- > -- >>> sanitize "line\nbreak" > -- "line break" > sanitize :: T.Text -> T.Text > sanitize = T.map (\c -> if c == '\r' || c == '\n' || c == '\t' then ' ' else c) Query handling -------------- `clampQuery` forces short input: strip surrounding whitespace, then cap the length. Short in keeps short out (alongside the model's `num_predict`), which is the whole point on a small server. > -- | >>> clampQuery " hello " > -- "hello" > -- > -- >>> T.length (clampQuery (T.replicate 500 "x")) > -- 200 > -- > -- >>> clampQuery " " > -- "" > clampQuery :: T.Text -> T.Text > clampQuery = T.take maxQueryLen . T.strip `splitAtWrap` is the per-token line-break primitive: take the first `wrapWidth` chars, then back up to the last space so we never break mid-word. If there's no space in that window (one very long token, or a stretch of non-space content), we fall through to a hard split at `wrapWidth` rather than letting the buffer grow unbounded. > -- | >>> splitAtWrap "the quick brown fox jumps" > -- ("the quick brown fox jumps","") > -- > -- >>> splitAtWrap (T.replicate 80 "x") > -- ("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx","xxxxxxxxxx") > -- > -- >>> splitAtWrap "the quick brown fox jumps over the lazy dog and then some more" > -- ("the quick brown fox jumps over the lazy dog and then some more","") > -- > -- >>> fst (splitAtWrap (T.replicate 40 "ab " <> "end")) > -- "ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab" > splitAtWrap :: T.Text -> (T.Text, T.Text) > splitAtWrap buf > | T.length buf <= wrapWidth = (buf, T.empty) > | otherwise = > let prefix = T.take wrapWidth buf > in case T.findIndex (== ' ') (T.reverse prefix) of > Just i -> > let k = wrapWidth - 1 - i > in (T.take k buf, T.drop (k + 1) buf) > Nothing -> (prefix, T.drop wrapWidth buf) Pages ----- The curl cribs are built from this request's own mount point and host/port, so they stay correct wherever the file is mounted. > curlTextExample :: Ctx -> T.Text > curlTextExample ctx = > " curl 'gopher://" <> ctxHost ctx <> ":" <> ctxPort ctx > <> "/0" <> ctxScriptSel ctx <> "/txt/your%20question'" > curlMenuExample :: Ctx -> T.Text > curlMenuExample ctx = > " curl 'gopher://" <> ctxHost ctx <> ":" <> ctxPort ctx > <> "/1" <> ctxScriptSel ctx <> "%09your%20question'" The empty-query "front door": the type-7 search item (so a menu client can just type a question) plus curl cribs for the plain-text `txt/` endpoint and the gophermap form. > emitPrompt :: Ctx -> IO () > emitPrompt ctx = mapM_ putLine > [ infoLine "Ask a local LLM a question. Plain text in, plain text out." > , infoLine "No account, no app. Small home server -- keep it short." > , infoLine "" > , searchLine "Your question" (ctxScriptSel ctx) (ctxHost ctx) (ctxPort ctx) > , infoLine "" > , infoLine "Prefer the shell? curl speaks gopher. Plain text back:" > , infoLine (curlTextExample ctx) > , infoLine "Or a gophermap (menu) answer:" > , infoLine (curlMenuExample ctx) > , terminator > ] Shown when the `txt/` endpoint is hit with no question -- a curl user gets a plain-text crib rather than a gophermap. > emitUsageText :: Ctx -> IO () > emitUsageText ctx = mapM_ putLine > [ "ask -- put a question to a local LLM; plain text back." > , "Small home server, so keep it short." > , "" > , "Plain text (this endpoint):" > , curlTextExample ctx > , "" > , "Gophermap / menu answer:" > , curlMenuExample ctx > , terminator > ] The answer page: acquire the single-flight lock, spawn curl pointed at Ollama's streaming HTTP endpoint, stream its NDJSON output. There are two renderings sharing one streamer: 'emitAnswer' wraps each line as a gophermap info-row with a footer menu link ('infoLine'), while 'emitAnswerText' writes lines to the wire raw for a plain-text (item type 0) document. If the lock is already held, say so politely and bail --- exit 0, not an error. > emitAnswer :: Ctx -> T.Text -> IO () > emitAnswer ctx q = runLocked busyGophermap $ do > putLine (infoLine ("Asking the local LLM: \"" <> q <> "\" ...")) > putLine (infoLine "") > emitted <- withAnswerStream infoLine q > when (emitted == 0) $ putLine (infoLine "(no answer)") > mapM_ putLine (footerRows ctx) > putLine terminator > -- | Plain-text answer (gopher item type 0): the same stream, but > -- content lines reach the wire raw -- no @i@ prefix, no tab fields -- > -- so a `txt/` link renders as a clean text document. Ends with the > -- lone-period gopher text terminator. > emitAnswerText :: Ctx -> T.Text -> IO () > emitAnswerText _ q = runLocked busyText $ do > putLine ("Q: " <> sanitize q) > putLine "" > emitted <- withAnswerStream id q > when (emitted == 0) $ putLine "(no answer)" > putLine terminator > busyGophermap, busyText :: [T.Text] > busyGophermap = > [ infoLine "Busy -- someone else is asking right now." > , infoLine "This runs on a small home server. Try again in a moment." > , terminator > ] > busyText = > [ "Busy -- someone else is asking right now." > , "This runs on a small home server. Try again in a moment." > , terminator > ] > -- | Acquire the single-flight lock; on contention print @busy@ and > -- bail (exit 0, not an error), otherwise run @action@ and always > -- release the lock. > runLocked :: [T.Text] -> IO () -> IO () > runLocked busy action = do > acquired <- acquireLock > if not acquired > then mapM_ putLine busy > else action `finally` releaseLock > -- | Spawn curl -> Ollama (JSON body on stdin, so the prompt never > -- lands on an argv or hits length limits), then stream the NDJSON > -- answer through @render@ -- one wire line per content line, so the > -- caller picks gophermap ('infoLine') or raw text ('id'). Returns the > -- number of lines emitted (0 lets the caller print a placeholder). > withAnswerStream :: (T.Text -> T.Text) -> T.Text -> IO Int > withAnswerStream render q = do > let payload = A.encode $ A.object > [ "model" A..= askModel > , "system" A..= systemPrompt > , "prompt" A..= q > , "stream" A..= True > , "options" A..= A.object [ "num_predict" A..= answerTokens ] > ] > (Just hIn, Just hOut, _, ph) <- createProcess > (proc "curl" > [ "-sN", "-X", "POST", ollamaUrl > , "-H", "Content-Type: application/json" > , "--data-binary", "@-" > ]) > { std_in = CreatePipe, std_out = CreatePipe } > BL.hPut hIn payload > hClose hIn > hSetEncoding hOut utf8 > emitted <- streamAnswer render ph hOut > _ <- waitForProcess ph > pure emitted Streaming the answer -------------------- Each NDJSON line is decoded permissively into a `StreamChunk`. The mid-stream shape is @{"response":"","done":false}@; the final summary chunk is @{"response":"","done":true,..}@; an error (e.g. model not loaded) carries an @"error"@ field instead. Missing fields default to safe values so a malformed line is silently skipped rather than crashing the stream. > data StreamChunk = StreamChunk > { scResp :: T.Text > , scDone :: Bool > , scError :: Maybe T.Text > } deriving Show > instance A.FromJSON StreamChunk where > parseJSON = A.withObject "StreamChunk" $ \o -> > StreamChunk > <$> o .:? "response" .!= T.empty > <*> o .:? "done" .!= False > <*> o .:? "error" `streamAnswer` reads NDJSON lines from curl's stdout, accumulates tokens into a buffer, and emits info-lines as the buffer fills. Emission rules: (a) a token containing `\n` forces a flush of the buffer up to the newline, then continues with what follows; (b) a buffer that exceeds `wrapWidth` is split at the last space and the prefix is emitted. Stops on @done=true@, on an @error@ chunk, or after `maxAnswerLines` rows --- at the cap we `terminateProcess` curl so Ollama gets a disconnect on the HTTP side and stops generating tokens nobody will see. Returns the number of info-lines emitted (the caller uses 0 to print a "(no answer)" placeholder). > streamAnswer :: (T.Text -> T.Text) -> ProcessHandle -> Handle -> IO Int > streamAnswer render ph h = go 0 T.empty > where > go n buf = do > eof <- hIsEOF h > if eof > then flushTail n buf > else do > rawLine <- TIO.hGetLine h > case A.decodeStrict (TE.encodeUtf8 rawLine) of > Nothing -> go n buf > Just chunk -> handleChunk n buf chunk > > handleChunk n buf chunk = case scError chunk of > Just err -> do > n1 <- if T.null (T.strip buf) then pure n > else emitLine n (T.stripStart buf) > emitLine n1 ("error: " <> err) > Nothing -> do > (n', buf') <- drainBuf n (buf <> scResp chunk) > if n' >= maxAnswerLines > then terminateProcess ph >> pure n' > else if scDone chunk > then flushTail n' buf' > else go n' buf' > > -- Walk the buffer emitting as many complete lines as we can. > -- One pass per line; stops when the residual is short enough to > -- wait for the next token and contains no newline. > drainBuf n buf > | n >= maxAnswerLines = pure (n, T.empty) > | otherwise = case T.uncons (T.dropWhile (/= '\n') buf) of > Just (_, after) -> do > let before = T.takeWhile (/= '\n') buf > n' <- emitLine n (T.stripStart before) > drainBuf n' after > Nothing > | T.length buf > wrapWidth -> do > let (ln, rest) = splitAtWrap buf > n' <- emitLine n (T.stripStart ln) > drainBuf n' rest > | otherwise -> pure (n, buf) > > emitLine n ln > | n >= maxAnswerLines = pure n > | n == maxAnswerLines - 1 = do > putLine (render "... (truncated)") > pure (n + 1) > | otherwise = do > putLine (render ln) > pure (n + 1) > > flushTail n buf > | T.null (T.strip buf) = pure n > | otherwise = emitLine n (T.stripStart buf) > footerRows :: Ctx -> [T.Text] > footerRows ctx = > [ infoLine "" > , menuLine "-- served over gopher * no js * no ads * no tracking --" > "" (ctxHost ctx) (ctxPort ctx) > ] Single-flight lock ------------------ `createDirectory` is atomic: exactly one concurrent caller succeeds, the rest get an `IOException`. But a request killed mid-flight never runs its `finally`, leaving the lock dir behind forever --- and only the daemon user can remove it. So a failed `createDirectory` isn't the final word: if the existing lock is older than `lockStaleSecs` its holder is presumed dead and the lock is stolen (removed and recreated). The mtime of the dir is its creation time --- we never write into it --- so "age" is "how long this request has been running". `releaseLock` swallows its own errors so a half-cleaned lock can never crash the response. > acquireLock :: IO Bool > acquireLock = do > r <- try (createDirectory lockDir) :: IO (Either IOException ()) > case r of > Right () -> pure True > Left _ -> do > stale <- lockIsStale > if not stale > then pure False > else do > releaseLock > s <- try (createDirectory lockDir) :: IO (Either IOException ()) > pure (either (const False) (const True) s) > -- | True if the lock dir is missing, unreadable, or older than > -- 'lockStaleSecs'. Errors count as stale so a vanished/odd lock is > -- retried rather than treated as a live holder. > lockIsStale :: IO Bool > lockIsStale = do > res <- try $ do > mtime <- getModificationTime lockDir > now <- getCurrentTime > pure (diffUTCTime now mtime > lockStaleSecs) > pure (either (const True) id (res :: Either IOException Bool)) > releaseLock :: IO () > releaseLock = do > _ <- try (removeDirectoryRecursive lockDir) :: IO (Either IOException ()) > pure ()