#!/usr/bin/env stack > -- stack script --resolver lts-22.6 --package text --package bytestring --package process --package directory --package filepath --package random --package time 4_ch.lhs: gopherspace ↔ 4-ch.net (read + reply + new thread) ============================================================ A single literate-Haskell applet that gateways the Kareha text-board at into gopherspace. Pick a board, read threads, reply to one, or start a new one --- all over gopher, no JavaScript, no HTML rendering, no web client. This replaces the older `/gateway/4_ch` script (now living in `.old/`), which was read-only, hardcoded three `[[gateway]]` blocks in `routes.toml`, and parsed HTML with a `curl | sed` pipeline that nobody wanted to touch. This applet self-mounts under any selector (it derives its mount point from `$selector` minus `$pathinfo`, Venusia 0.8.0.0+), parses HTML in native `Text`, and adds the bit the original author waved at but never built: posting. URL design ---------- `$SCRIPT` is whatever `routes.toml` mounts this file at --- `/applets/4_ch.lhs` on this server. Substitute freely; the script re-derives its base from `$selector` at request time. $SCRIPT board picker $SCRIPT/ thread index for board $SCRIPT// view thread on board $SCRIPT///reply open a reply session: fetch a captcha, cache the cookie, land on the reply menu $SCRIPT///reply/ reply menu (captcha + submit) $SCRIPT///reply//captcha.gif raw GIF bytes $SCRIPT///reply//submit type-7 target; query is "CAPTCHA||BODY" $SCRIPT//new open a new-thread session $SCRIPT//new/ new-thread menu $SCRIPT//new//captcha.gif raw GIF bytes $SCRIPT//new//submit type-7 target; query is "CAPTCHA||TITLE||BODY" `` is the board slug (lowercase alphanumeric, eg `tech`, `dqn`); `` is the thread id (all digits); `` is a 16-hex-char session id created when the reply/new flow starts. Anything off this grid is a type-3 error row. The captcha session ------------------- Kareha gates posting on a small image captcha. The image is fetched from `//captcha.pl?selector=.postcaptcha` (or `.threadcaptcha`); the server sets a `captchakey` cookie whose value is the answer it'll accept. The applet runs that handshake when the user clicks "reply" or "new thread", stashes the GIF + cookie jar under `/tmp/4ch-sessions//`, and surfaces both: - a type-`g` link to the GIF (gopher clients that render images show it inline), - the underlying HTTPS URL as an info-line (for the GIF-blind), - a type-7 search prompt to actually submit the post. The cookie expires server-side in two weeks; the local session is GC'd after `sessionTTL` seconds. The directory layout is the entire state machine --- there is no database, no in-memory state, and no locking. The session id is the cookie's only authority over the applet, so it is generated from `random` (not predictable input) and the directory permissions are 0700. Submit encoding --------------- Gopher type-7 takes one search line. We split that line on `||` (double pipe) to pack multiple form fields into a single submission: - reply: `CAPTCHA||BODY` - new thread: `CAPTCHA||TITLE||BODY` Inside `BODY`, the literal two-character sequence `\n` is converted to a real newline before submission, so multi-paragraph posts are possible from a single search field. From the command line --------------------- curl gopher://gopher.someodd.zip/1/applets/4_ch.lhs curl gopher://gopher.someodd.zip/1/applets/4_ch.lhs/tech curl gopher://gopher.someodd.zip/1/applets/4_ch.lhs/tech/1721174972 curl 'gopher://gopher.someodd.zip/1/applets/4_ch.lhs/tech/1721174972/reply' (This file is markdown-flavoured literate Haskell. Headings use setext underlines rather than ATX `#` because GHC's literate parser interprets a `#` at column 1 of a non-code line as the start of a pragma --- setext sidesteps that.) Running the doctests -------------------- doctest-lhs 4_ch.lhs Reuses this file's own `-- stack` directive and language pragmas, so doctest sees the same packages and extensions `run-cached-lhs` compiles against. Module header and imports ------------------------- > {-# LANGUAGE OverloadedStrings #-} > module Main (main) where > > import Control.Exception (IOException, SomeException, try) > import Control.Monad (forM_, when) > import qualified Data.ByteString as BS > import Data.Char (isDigit, isAlphaNum, isHexDigit) > import qualified Data.Text as T > import qualified Data.Text.IO as TIO > import Data.Time.Clock (UTCTime, diffUTCTime, getCurrentTime) > import System.Directory (createDirectoryIfMissing, doesFileExist, > getModificationTime, listDirectory, > removeDirectoryRecursive, setPermissions, > emptyPermissions, readable, writable, searchable) > import System.Environment (getArgs, lookupEnv) > import System.Exit (ExitCode (..)) > import System.FilePath (()) > import System.IO (BufferMode (..), hSetBuffering, hSetEncoding, > stdout, utf8) > import System.Process (readProcessWithExitCode) > import System.Random (randomRIO) Defaults -------- Host/port follow figlet.lhs --- overridable via env so the same file serves staging and production. > defaultHost, defaultPort :: T.Text > defaultHost = "gopher.someodd.zip" > defaultPort = "70" > upstream :: T.Text > upstream = "https://4-ch.net" > sessionRoot :: FilePath > sessionRoot = "/tmp/4ch-sessions" `sessionTTL` is the wall-clock age past which a session directory is considered abandoned and removed. Two captcha cookies live two weeks upstream, but a user who walks away mid-flow has effectively given up; thirty minutes is plenty for a human to view a GIF and type a post, and short enough to keep `/tmp` tidy. > sessionTTL :: Int -- seconds > sessionTTL = 30 * 60 > -- | Cap the per-page row count so a runaway board (a /dqn/ flame > -- war thread with hundreds of replies) does not blow past whatever > -- a gopher client will scroll. > maxRowsPerPage :: Int > maxRowsPerPage = 200 > -- | Hard cap on accepted submit body length. Kareha itself will > -- reject anything truly enormous, but the applet has no reason to > -- shovel half a megabyte through `curl -F`. > maxBodyChars :: Int > maxBodyChars = 8000 Request parsing --------------- Same shape as figlet.lhs / ask.lhs / qdbviewer.lhs --- three positional args from `routes.toml` (`$selector`, `$search`, `$pathinfo`), any extras (`$remote_ip`) ignored via the cons pattern. > data Req = Req > { reqSel :: T.Text -- full selector that resolved here > , reqQ :: T.Text -- search text after the tab; empty when none > , reqP :: T.Text -- selector portion after this script's filename > } deriving (Eq, Show) > -- | Parse the framework's argv into a 'Req'. Extras past the third > -- position are ignored; empty argv is a usage error. > -- > -- >>> parseArgs ["/applets/4_ch.lhs/tech/1721174972", "", "/tech/1721174972"] > -- Req {reqSel = "/applets/4_ch.lhs/tech/1721174972", reqQ = "", reqP = "/tech/1721174972"} > -- > -- >>> parseArgs ["/applets/4_ch.lhs", "", ""] > -- Req {reqSel = "/applets/4_ch.lhs", reqQ = "", reqP = ""} > parseArgs :: [String] -> Req > parseArgs (s:q:p:_) = Req (T.pack s) (T.pack q) (T.pack p) > parseArgs (s:q:_) = Req (T.pack s) (T.pack q) T.empty > parseArgs [s] = Req (T.pack s) T.empty T.empty > parseArgs [] = error > "4_ch.lhs: missing argv[0] (gopher selector). When run by Venusia \ > \this is automatic; for manual testing pass it explicitly, e.g. \ > \`runghc 4_ch.lhs /applets/4_ch.lhs '' '/tech'`." > data Ctx = Ctx > { ctxScriptSel :: T.Text -- selector path to this script, no path-info > , ctxHost :: T.Text > , ctxPort :: T.Text > } Main dispatch ------------- `stdout` is `NoBuffering` so partial output reaches the client under Venusia's `stream = true`. The top-level `try` turns any uncaught exception into a visible type-3 error row instead of a blank response. > main :: IO () > main = do > hSetBuffering stdout NoBuffering > hSetEncoding stdout utf8 > result <- try mainBody :: IO (Either SomeException ()) > case result of > Right () -> pure () > Left e -> mapM_ putLine > [ errorItem ("4_ch.lhs crashed: " <> T.pack (show e)) > , terminator > ] > mainBody :: IO () > mainBody = do > req <- parseArgs <$> getArgs > host <- maybe defaultHost T.pack <$> lookupEnv "GOPHER_HOST" > port <- maybe defaultPort T.pack <$> lookupEnv "GOPHER_PORT" > let pinfo = reqP req > scriptSel = T.dropEnd (T.length pinfo) (reqSel req) > ctx = Ctx scriptSel host port > segs = filter (not . T.null) . T.splitOn "/" $ pinfo > dispatch ctx segs (reqQ req) Path-info segments select the handler. Pattern guards check shape (board slug, thread digits, sid hex) and bounce to the error page otherwise --- this is the only validation between path-info and disk operations under `/tmp/4ch-sessions`, so it has to be exact. > dispatch :: Ctx -> [T.Text] -> T.Text -> IO () > dispatch ctx segs q = case segs of > [] -> emitBoards ctx > [b] | isBoardSlug b -> emitThreads ctx b > [b, t] | isBoardSlug b, isThreadId t -> emitThread ctx b t > [b, t, "reply"] | isBoardSlug b, isThreadId t -> emitReplyInit ctx b t > [b, t, "reply", sid] | isBoardSlug b, isThreadId t, isSid sid > -> emitReplyMenu ctx b t sid > [b, t, "reply", sid, "captcha.gif"] > | isBoardSlug b, isThreadId t, isSid sid -> emitCaptchaGif b t (Just sid) "reply" > [b, t, "reply", sid, "submit"] > | isBoardSlug b, isThreadId t, isSid sid -> doReply ctx b t sid q > [b, "new"] | isBoardSlug b -> emitNewThreadInit ctx b > [b, "new", sid] | isBoardSlug b, isSid sid -> emitNewThreadMenu ctx b sid > [b, "new", sid, "captcha.gif"] > | isBoardSlug b, isSid sid -> emitCaptchaGif b "" (Just sid) "new" > [b, "new", sid, "submit"] > | isBoardSlug b, isSid sid -> doNewThread ctx b sid q > _ -> emitPathError ctx (T.intercalate "/" segs) Validators. Cheap; doctest-checked. > -- | >>> map isBoardSlug ["tech", "dqn", "../etc", "TECH", ""] > -- [True,True,False,False,False] > isBoardSlug :: T.Text -> Bool > isBoardSlug t = not (T.null t) > && T.length t <= 32 > && T.all (\c -> isAlphaNum c && c == toLowerAscii c) t > where toLowerAscii c = if c >= 'A' && c <= 'Z' then toEnum (fromEnum c + 32) else c > -- | >>> map isThreadId ["1721174972", "12", "abc", "", "1234567890123456"] > -- [True,True,False,False,True] > isThreadId :: T.Text -> Bool > isThreadId t = not (T.null t) && T.length t <= 16 && T.all isDigit t > -- | >>> map isSid ["0123456789abcdef", "DEADBEEF12345678", "short", ""] > -- [True,False,False,False] > isSid :: T.Text -> Bool > isSid t = T.length t == 16 && T.all (\c -> isHexDigit c && not (c >= 'A' && c <= 'F')) t Gophermap line builders ----------------------- Same row shape as figlet.lhs; tabs are the only structure on the wire. Anything user-visible passes through `sanitize` first so embedded CR/LF/TAB can never smuggle a fake row. > putLine :: T.Text -> IO () > putLine t = TIO.putStr (t <> "\r\n") > -- | >>> infoLine "hello" > -- "ihello\t\t\t0" > infoLine :: T.Text -> T.Text > infoLine msg = "i" <> sanitize msg <> "\t\t\t0" > -- | >>> menuLine "home" "/" "host" "70" > -- "1home\t/\thost\t70" > menuLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text > menuLine d s h p = "1" <> sanitize d <> "\t" <> s <> "\t" <> h <> "\t" <> p > -- | >>> searchLine "ask" "/q" "host" "70" > -- "7ask\t/q\thost\t70" > searchLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text > searchLine d s h p = "7" <> sanitize d <> "\t" <> s <> "\t" <> h <> "\t" <> p > -- | A type-`g` GIF item. > -- > -- >>> gifLine "captcha" "/c.gif" "host" "70" > -- "gcaptcha\t/c.gif\thost\t70" > gifLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text > gifLine d s h p = "g" <> sanitize d <> "\t" <> s <> "\t" <> h <> "\t" <> p > -- | A type-`h` HTTP link. > -- > -- >>> httpLine "URL" "https://example.com/" "host" "70" > -- "hURL\tURL:https://example.com/\thost\t70" > httpLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text > httpLine d url h p = > "h" <> sanitize d <> "\tURL:" <> url <> "\t" <> h <> "\t" <> p > -- | >>> errorItem "nope" > -- "3nope\t\t\t0" > errorItem :: T.Text -> T.Text > errorItem msg = "3" <> sanitize msg <> "\t\t\t0" > terminator :: T.Text > terminator = "." > -- | Strip CR/LF/TAB out of user/upstream-supplied row text so it > -- cannot smuggle gophermap row breaks. > -- > -- >>> sanitize "ok" > -- "ok" > -- > -- >>> sanitize "with\ttab" > -- "with tab" > -- > -- >>> sanitize "newline\nhere" > -- "newline here" > sanitize :: T.Text -> T.Text > sanitize = T.map (\c -> if c == '\r' || c == '\n' || c == '\t' then ' ' else c) Self-link helpers ----------------- `link` joins path-info onto this script's mount point. `homeMenu` is the back-to-board-picker row reused on every page that isn't the picker. > link :: Ctx -> T.Text -> T.Text > link c sub = ctxScriptSel c <> "/" <> sub > homeMenu :: Ctx -> T.Text > homeMenu c = menuLine "← back to board list" (ctxScriptSel c) (ctxHost c) (ctxPort c) HTML decoding ------------- Kareha pages are XHTML; the bits the applet pulls out (post bodies, thread titles) carry inline tags --- `

`, `
`, `

` --- and HTML entities. `htmlToText` peels the structural tags into whitespace, strips anything else, and runs the entity decoder. > -- | Decode the HTML entities Kareha emits. Single-pass over the > -- text so `&lt;` stays as the literal `<` (the user typed > -- `<` and the server escaped the ampersand). > -- > -- >>> decodeEntities "&lt; > " '" > -- "< > \" '" > decodeEntities :: T.Text -> T.Text > decodeEntities = go > where > go t = let (pre, rest) = T.breakOn "&" t > in if T.null rest > then t > else case decodeOne rest of > Just (d, more) -> pre <> d <> go more > Nothing -> pre <> T.take 1 rest <> go (T.drop 1 rest) > > decodeOne r > | "&" `T.isPrefixOf` r = Just ("&", T.drop 5 r) > | "<" `T.isPrefixOf` r = Just ("<", T.drop 4 r) > | ">" `T.isPrefixOf` r = Just (">", T.drop 4 r) > | """ `T.isPrefixOf` r = Just ("\"", T.drop 6 r) > | "'" `T.isPrefixOf` r = Just ("'", T.drop 5 r) > | "," `T.isPrefixOf` r = Just (",", T.drop 5 r) > | " " `T.isPrefixOf` r = Just (" ", T.drop 6 r) > | "'" `T.isPrefixOf` r = Just ("'", T.drop 6 r) > | otherwise = Nothing > -- | Replace every `<...>` tag with the empty string. > -- > -- >>> stripTags "a bold word" > -- "a bold word" > stripTags :: T.Text -> T.Text > stripTags t = case T.breakOn "<" t of > (pre, rest) > | T.null rest -> pre > | otherwise -> pre <> stripTags (T.drop 1 (T.dropWhile (/= '>') rest)) > -- | Convert a fragment of Kareha post-body XHTML to plain text. > -- Inline tags become whitespace; everything else is stripped; > -- entities are decoded last. > -- > -- >>> htmlToText "

hello
world

" > -- "hello\nworld\n\n" > -- > -- >>> htmlToText ">>1
quoted
" > -- ">>1 > quoted" > htmlToText :: T.Text -> T.Text > htmlToText = > decodeEntities > . stripTags > . T.replace "
" "" > . T.replace "
" "> " > . T.replace "

" "\n\n" > . T.replace "

" "" > . T.replace "
" "\n" > . T.replace "
" "\n" > . T.replace "
" "\n" > -- | First `Text` lying strictly between `open` and the next `close` > -- in the haystack. Returns empty if either marker is missing. > -- > -- >>> textBetween "" "" "x hi y" > -- "hi" > -- > -- >>> textBetween "" "" "no markers here" > -- "" > textBetween :: T.Text -> T.Text -> T.Text -> T.Text > textBetween open close hay = > let (_, a) = T.breakOn open hay > afterOpen = T.drop (T.length open) a > (inside, _) = T.breakOn close afterOpen > in if T.null a then T.empty else inside HTTP helpers (via curl subprocess) ---------------------------------- Curl is invoked as a subprocess to keep the dependency surface flat --- same posture as the rest of the applets. `curlGet` / `curlGetWithCookies` return the body as `Text`; `curlGetBytes` returns raw `ByteString` (for the GIF). `curlPostMultipart` POSTs a list of name/value pairs as multipart/form-data with `--form-string`, which prevents `@file` and ` curlGet :: T.Text -> IO (Either T.Text T.Text) > curlGet url = do > r <- try $ readProcessWithExitCode "curl" > ["-s", "--compressed", "--max-time", "20", T.unpack url] "" > case r of > Left e -> pure (Left ("curl failed: " <> T.pack (show (e :: IOException)))) > Right (ExitSuccess, out, _) -> pure (Right (T.pack out)) > Right (_, _, err) -> pure (Left ("curl error: " <> T.strip (T.pack err))) > -- | GET binary into `out`, saving cookies to `jar`. The body is > -- written by curl, so we never funnel bytes through `String` (which > -- the system locale would re-decode and corrupt for non-text > -- responses). > curlGetToFileWithJar :: FilePath -> FilePath -> T.Text -> IO (Either T.Text ()) > curlGetToFileWithJar jar out url = do > r <- try $ readProcessWithExitCode "curl" > ["-s", "--compressed", "--max-time", "20", "-c", jar, "-o", out, T.unpack url] "" > case r of > Left e -> pure (Left ("curl failed: " <> T.pack (show (e :: IOException)))) > Right (ExitSuccess, _, _) -> pure (Right ()) > Right (_, _, err) -> pure (Left ("curl error: " <> T.strip (T.pack err))) `curlPostMultipart` calls curl with `--form-string name=value` for each pair, plus `-b jar` to attach the captcha cookie. We capture the response body and follow no redirects --- a successful Kareha post answers `302` with a `Location` we don't actually need (the thread URL we already know), and an error post answers `200` with an HTML body we want to parse. > curlPostMultipart :: FilePath -> T.Text -> [(T.Text, T.Text)] -> IO (Either T.Text T.Text) > curlPostMultipart jar url fields = do > let formArgs = concatMap (\(k,v) -> ["--form-string", T.unpack k <> "=" <> T.unpack v]) fields > args = ["-s", "-i", "--compressed", "--max-time", "30", > "-b", jar, "-X", "POST", T.unpack url] ++ formArgs > r <- try $ readProcessWithExitCode "curl" args "" > case r of > Left e -> pure (Left ("curl failed: " <> T.pack (show (e :: IOException)))) > Right (ExitSuccess, out, _) -> pure (Right (T.pack out)) > Right (_, _, err) -> pure (Left ("curl error: " <> T.strip (T.pack err))) Pages ===== Board picker ------------ Fetch `https://4-ch.net/side.html`, harvest every `Name` it lists, render as type-1 menu items pointing back into this applet. > emitBoards :: Ctx -> IO () > emitBoards ctx = do > r <- curlGet (upstream <> "/side.html") > case r of > Left e -> mapM_ putLine [ errorItem e, terminator ] > Right html -> do > let bs = parseBoards html > if null bs > then mapM_ putLine [ errorItem "Could not parse the board list.", terminator ] > else mapM_ putLine $ > [ infoLine "4-ch.net via gopher — pick a board." > , infoLine "" > ] ++ > [ menuLine (name <> " (/" <> slug <> "/)") (link ctx slug) (ctxHost ctx) (ctxPort ctx) > | (slug, name) <- bs > ] ++ > [ infoLine "" > , httpLine "Open 4-ch.net in a web browser" (upstream <> "/") (ctxHost ctx) (ctxPort ctx) > , terminator > ] > -- | Pull every board (slug, display name) out of side.html. > -- Tolerant of inline `style=` and `title=` attributes. > -- > -- >>> parseBoards "Technology" > -- [("tech","Technology")] > -- > -- >>> parseBoards "DQN" > -- [("dqn","DQN")] > parseBoards :: T.Text -> [(T.Text, T.Text)] > parseBoards = uniq . go . T.replace "\r" "" . T.replace "\n" " " > where > go t = case T.breakOn "https://4-ch.net/" t of > (_, rest) | T.null rest -> [] > | otherwise -> > let after = T.drop (T.length "https://4-ch.net/") rest > (slug, after') = T.span (\c -> isAlphaNum c) after > isClose = T.isPrefixOf "/\"" after' > in if not isClose || T.null slug > then go (T.drop 1 rest) > else > let (_, txt0) = T.breakOn ">" after' > txt1 = T.drop 1 txt0 > (label, _) = T.breakOn " cleaned = T.strip (decodeEntities (stripTags label)) > in if T.null cleaned > then go (T.drop 1 rest) > else (slug, cleaned) : go (T.drop (T.length "https://4-ch.net/" + T.length slug) rest) > uniq = go' [] > where > go' _ [] = [] > go' seen ((s,n):xs) > | s `elem` seen = go' seen xs > | otherwise = (s,n) : go' (s:seen) xs Thread index ------------ Per-board page; each `

` containing a `kareha.pl` link is one thread. Title and reply count come straight out of the heading. > emitThreads :: Ctx -> T.Text -> IO () > emitThreads ctx board = do > r <- curlGet (upstream <> "/" <> board <> "/") > case r of > Left e -> mapM_ putLine [ errorItem e, terminator ] > Right html -> do > let ts = take maxRowsPerPage (parseThreads html) > mapM_ putLine $ > [ infoLine ("/" <> board <> "/ — " <> T.pack (show (length ts)) <> " threads (most recent first).") > , infoLine "" > ] ++ > [ menuLine (title <> " (" <> T.pack (show n) <> ")") (link ctx (board <> "/" <> tid)) > (ctxHost ctx) (ctxPort ctx) > | (tid, title, n) <- ts > ] ++ > [ infoLine "" > , menuLine ("[+] start a new thread on /" <> board <> "/") > (link ctx (board <> "/new")) (ctxHost ctx) (ctxPort ctx) > , homeMenu ctx > , terminator > ] > -- | Pull thread (id, title, reply-count) tuples out of a board page. > -- One row per `

...kareha.pl/...

` block. > parseThreads :: T.Text -> [(T.Text, T.Text, Int)] > parseThreads html = > let oneline = T.replace "\r" "" . T.replace "\n" " " $ html > chunks = drop 1 (T.splitOn "

" oneline) > in [ (tid, title, n) > | c <- chunks > , Just (tid, title, n) <- [parseThreadChunk c] > ] > parseThreadChunk :: T.Text -> Maybe (T.Text, T.Text, Int) > parseThreadChunk c = > let (head', _) = T.breakOn "

" c > href = textBetween "kareha.pl/" "\"" head' > tid = T.takeWhile isDigit href > afterAnchor = case T.breakOn ">" (snd (T.breakOn " (_, x) -> T.drop 1 x > beforeClose = fst (T.breakOn " rawTitle = T.strip (decodeEntities (stripTags beforeClose)) > countTxt = textBetween "(" ")" head' > n = readDigits countTxt > in if T.null tid || T.null rawTitle > then Nothing > else Just (tid, stripTrailingCount rawTitle, n) > where > readDigits t = case reads (T.unpack t) :: [(Int, String)] of > ((k,_):_) -> k > _ -> 0 > -- Some boards render "Title (12)" inside the anchor instead of > -- inside a separate ; strip a trailing "(NN)" so we > -- don't show the count twice. > stripTrailingCount t = > let s = T.stripEnd t > in if T.takeEnd 1 s == ")" > then > let dropped = T.dropEnd 1 s > (digits, before) = T.span (/= '(') (T.reverse dropped) > in if not (T.null digits) && T.all isDigit (T.reverse digits) > && T.takeEnd 1 (T.reverse before) == "(" > then T.stripEnd (T.dropEnd 1 (T.reverse before)) > else s > else s Thread view ----------- Per-thread page; each `
emitThread :: Ctx -> T.Text -> T.Text -> IO () > emitThread ctx board tid = do > r <- curlGet (upstream <> "/" <> board <> "/kareha.pl/" <> tid) > case r of > Left e -> mapM_ putLine [ errorItem e, terminator ] > Right html -> do > let posts = parsePosts html > title = parseThreadTitle html > returnLink = menuLine ("← back to /" <> board <> "/") > (link ctx board) (ctxHost ctx) (ctxPort ctx) > replyLink = menuLine "[reply to this thread]" > (link ctx (board <> "/" <> tid <> "/reply")) > (ctxHost ctx) (ctxPort ctx) > mapM_ putLine $ > [ infoLine ("── " <> title <> " ──") > , infoLine ("/" <> board <> "/ thread " <> tid <> " posts: " <> T.pack (show (length posts))) > , infoLine "" > , returnLink > , replyLink > , infoLine "" > ] ++ concatMap renderPost (zip [1..] (take maxRowsPerPage posts)) ++ > [ infoLine "" > , replyLink > , returnLink > , httpLine "Open this thread on the web" > (upstream <> "/" <> board <> "/kareha.pl/" <> tid) > (ctxHost ctx) (ctxPort ctx) > , terminator > ] > -- | Pull the bracketed thread title out of `` ("Foo - Tech & Programming @ 4-ch") > parseThreadTitle :: T.Text -> T.Text > parseThreadTitle html = > let t = textBetween "<title>" "" html > cleaned = T.strip (decodeEntities (stripTags t)) > -- Trim the trailing " - Board @ 4-ch" if present > cut = case T.breakOnEnd " - " cleaned of > ("", _) -> cleaned > (left, _) -> T.stripEnd (T.dropEnd 3 left) > in if T.null cut then "(untitled)" else cut > data Post = Post > { pNum :: T.Text > , pName :: T.Text > , pDate :: T.Text > , pBody :: T.Text > } deriving (Eq, Show) > parsePosts :: T.Text -> [Post] > parsePosts html = > let oneline = T.replace "\r" "" . T.replace "\n" " " $ html > chunks = drop 1 (T.splitOn "
in [ p | c <- chunks, Just p <- [parsePostChunk c] ] > parsePostChunk :: T.Text -> Maybe Post > parsePostChunk c = > let -- The replytext div splits metadata from body. > (meta, body0) = T.breakOn "
" c > bodyHtml = let afterOpen = T.drop (T.length "
") body0 > (inside, _) = T.breakOn "
" afterOpen > in inside > num = T.strip (decodeEntities (stripTags > (textBetween "" "" meta))) > name = T.strip (decodeEntities (stripTags > (textBetween "" "" meta))) > -- Date+ID lives between the close of the postertrip span and > -- the open of the deletebutton span. Spacing around the > -- colon varies (saged posts have an extra space); slicing on > -- the last in the pre-deletebutton region tolerates > -- that variation. > beforeDelete = fst (T.breakOn " afterTripClose = snd (T.breakOnEnd "" beforeDelete) > dateField = T.strip $ decodeEntities $ stripTags $ > T.dropWhile (\ch -> ch == ' ' || ch == ':') > (T.stripStart afterTripClose) > body = T.strip (htmlToText bodyHtml) > in if T.null num && T.null name && T.null body > then Nothing > else Just (Post num name dateField body) > renderPost :: (Int, Post) -> [T.Text] > renderPost (_, Post num name date body) = > let header = ">> " <> num <> " " <> name <> " " <> date > lines' = if T.null body then ["(no body)"] else T.lines body > in [ infoLine header ] ++ map (\l -> infoLine (" " <> l)) lines' ++ [ infoLine "" ] Session management ================== A session is a directory under `/tmp/4ch-sessions/` containing `cookies` (curl cookie jar) and `captcha.gif`. The directory itself is the lock; `createDirectoryIfMissing` is atomic relative to other parallel attempts under the same sid (which won't collide because sids are 64-bit random). > -- | Path to a session's directory; does NOT create it. > sessionDir :: T.Text -> FilePath > sessionDir sid = sessionRoot T.unpack sid > -- | Create a fresh session id (16 lowercase hex chars; ~64 bits). > newSessionId :: IO T.Text > newSessionId = do > n <- randomRIO (0, 0xFFFFFFFFFFFFFFFF :: Integer) > pure (T.pack (pad16 (showHex n))) > where > pad16 s = replicate (16 - length s) '0' ++ take 16 s > showHex 0 = "0" > showHex k = go k "" > where > go 0 acc = acc > go x acc = let (q, r) = x `divMod` 16 > d = "0123456789abcdef" !! fromIntegral r > in go q (d : acc) > -- | Wipe session directories older than `sessionTTL` seconds. > gcStaleSessions :: IO () > gcStaleSessions = do > createDirectoryIfMissing True sessionRoot > now <- getCurrentTime > entries <- listDirectory sessionRoot > forM_ entries $ \e -> do > let p = sessionRoot e > r <- try (getModificationTime p) :: IO (Either IOException UTCTime) > case r of > Right t | round (diffUTCTime now t) > sessionTTL -> do > _ <- try (removeDirectoryRecursive p) :: IO (Either IOException ()) > pure () > _ -> pure () > -- | Ensure a session directory exists with 0700 permissions. > ensureSession :: T.Text -> IO () > ensureSession sid = do > let d = sessionDir sid > createDirectoryIfMissing True d > setPermissions d > (emptyPermissions { readable = True, writable = True, searchable = True }) > -- | Fetch a fresh captcha into the session directory. > -- `selector` is `.postcaptcha` or `.threadcaptcha`. > freshCaptcha :: T.Text -> T.Text -> T.Text -> IO (Either T.Text ()) > freshCaptcha board sid selector = do > ensureSession sid > let jar = sessionDir sid "cookies" > gif = sessionDir sid "captcha.gif" > url = upstream <> "/" <> board <> "/captcha.pl?selector=" <> selector > r <- curlGetToFileWithJar jar gif url > case r of > Left e -> pure (Left e) > Right () -> do > bs <- BS.readFile gif > if BS.length bs < 6 || BS.take 3 bs /= "GIF" > then pure (Left "Captcha endpoint did not return a GIF.") > else pure (Right ()) Reply flow ========== > emitReplyInit :: Ctx -> T.Text -> T.Text -> IO () > emitReplyInit ctx board tid = do > gcStaleSessions > sid <- newSessionId > r <- freshCaptcha board sid ".postcaptcha" > case r of > Left e -> mapM_ putLine [ errorItem e, terminator ] > Right () -> emitReplyMenu ctx board tid sid > emitReplyMenu :: Ctx -> T.Text -> T.Text -> T.Text -> IO () > emitReplyMenu ctx board tid sid = do > let base = board <> "/" <> tid <> "/reply/" <> sid > gifSel = link ctx (base <> "/captcha.gif") > submit = link ctx (base <> "/submit") > mapM_ putLine > [ infoLine ("Reply to /" <> board <> "/ thread " <> tid) > , infoLine "" > , infoLine "1. View the captcha image below (gopher type 'g')." > , infoLine " If your client won't render it inline, fetch it:" > , infoLine (" curl 'gopher://" <> ctxHost ctx <> "/g" <> gifSel <> "' > c.gif") > , infoLine "2. Submit your post using the search prompt below." > , infoLine " Format: CAPTCHA||BODY" > , infoLine " In BODY, the literal two characters '\\n' become a newline." > , infoLine "" > , gifLine "captcha.gif" gifSel (ctxHost ctx) (ctxPort ctx) > , infoLine "" > , searchLine "Submit reply (CAPTCHA||BODY)" submit (ctxHost ctx) (ctxPort ctx) > , infoLine "" > , menuLine "← back to thread" > (link ctx (board <> "/" <> tid)) (ctxHost ctx) (ctxPort ctx) > , homeMenu ctx > , terminator > ] > doReply :: Ctx -> T.Text -> T.Text -> T.Text -> T.Text -> IO () > doReply ctx board tid sid query = do > let jar = sessionDir sid "cookies" > sessionOk <- doesFileExist jar > if not sessionOk > then mapM_ putLine > [ errorItem "Session has expired. Start a new reply." > , menuLine "Start a new reply" (link ctx (board <> "/" <> tid <> "/reply")) > (ctxHost ctx) (ctxPort ctx) > , terminator > ] > else case parseSubmitReply query of > Left e -> failPage ctx e (board <> "/" <> tid <> "/reply") > Right (cap, body) -> do > let fields = [ ("task", "post") > , ("thread", tid) > , ("password", "") > , ("field_a", "") > , ("field_b", "") > , ("name", "") > , ("link", "") > , ("captcha", cap) > , ("markup", "waka") > , ("comment", body) > ] > url = upstream <> "/" <> board <> "/kareha.pl" > resp <- curlPostMultipart jar url fields > emitSubmitResult ctx resp board (Just tid) Posting parser. Splits the search line on `||` and unescapes literal `\n` inside the body. > -- | >>> parseSubmitReply "abcd||hello world" > -- Right ("abcd","hello world") > -- > -- >>> parseSubmitReply "abcd||line1\\nline2" > -- Right ("abcd","line1\nline2") > -- > -- >>> parseSubmitReply "no-separator" > -- Left "Submit format is: CAPTCHA||BODY" > -- > -- >>> parseSubmitReply "||body but empty captcha" > -- Left "Captcha answer is empty." > -- > -- >>> parseSubmitReply "cap||" > -- Left "Body is empty." > parseSubmitReply :: T.Text -> Either T.Text (T.Text, T.Text) > parseSubmitReply q = > case T.splitOn "||" q of > [_] -> Left "Submit format is: CAPTCHA||BODY" > (cap:rest@(_:_)) -> > let body = T.intercalate "||" rest > capS = T.strip cap > bodyS = T.strip (unescapeNewlines body) > in if T.null capS then Left "Captcha answer is empty." > else if T.null bodyS then Left "Body is empty." > else if T.length bodyS > maxBodyChars > then Left ("Body exceeds " <> T.pack (show maxBodyChars) <> " chars.") > else Right (capS, bodyS) > _ -> Left "Submit format is: CAPTCHA||BODY" > -- | Decode literal `\n` to a real newline. Two-pass over the text > -- to keep the doctest readable. > -- > -- >>> unescapeNewlines "a\\nb" > -- "a\nb" > -- > -- >>> unescapeNewlines "no escapes here" > -- "no escapes here" > unescapeNewlines :: T.Text -> T.Text > unescapeNewlines = T.replace "\\n" "\n" New-thread flow =============== > emitNewThreadInit :: Ctx -> T.Text -> IO () > emitNewThreadInit ctx board = do > gcStaleSessions > sid <- newSessionId > r <- freshCaptcha board sid ".threadcaptcha" > case r of > Left e -> mapM_ putLine [ errorItem e, terminator ] > Right () -> emitNewThreadMenu ctx board sid > emitNewThreadMenu :: Ctx -> T.Text -> T.Text -> IO () > emitNewThreadMenu ctx board sid = do > let base = board <> "/new/" <> sid > gifSel = link ctx (base <> "/captcha.gif") > submit = link ctx (base <> "/submit") > mapM_ putLine > [ infoLine ("New thread on /" <> board <> "/") > , infoLine "" > , infoLine "1. View the captcha image below." > , infoLine (" Or fetch it: curl 'gopher://" <> ctxHost ctx <> "/g" <> gifSel <> "' > c.gif") > , infoLine "2. Submit using the search prompt below." > , infoLine " Format: CAPTCHA||TITLE||BODY" > , infoLine " In BODY, the literal two characters '\\n' become a newline." > , infoLine "" > , gifLine "captcha.gif" gifSel (ctxHost ctx) (ctxPort ctx) > , infoLine "" > , searchLine "Submit new thread (CAPTCHA||TITLE||BODY)" submit (ctxHost ctx) (ctxPort ctx) > , infoLine "" > , menuLine ("← back to /" <> board <> "/") > (link ctx board) (ctxHost ctx) (ctxPort ctx) > , homeMenu ctx > , terminator > ] > doNewThread :: Ctx -> T.Text -> T.Text -> T.Text -> IO () > doNewThread ctx board sid query = do > let jar = sessionDir sid "cookies" > sessionOk <- doesFileExist jar > if not sessionOk > then mapM_ putLine > [ errorItem "Session has expired. Start a new thread again." > , menuLine "Start a new thread" (link ctx (board <> "/new")) > (ctxHost ctx) (ctxPort ctx) > , terminator > ] > else case parseSubmitNewThread query of > Left e -> failPage ctx e (board <> "/new") > Right (cap, title, body) -> do > let fields = [ ("task", "post") > , ("title", title) > , ("password", "") > , ("field_a", "") > , ("field_b", "") > , ("name", "") > , ("link", "") > , ("captcha", cap) > , ("markup", "waka") > , ("comment", body) > ] > url = upstream <> "/" <> board <> "/kareha.pl" > resp <- curlPostMultipart jar url fields > emitSubmitResult ctx resp board Nothing > -- | >>> parseSubmitNewThread "cap||Title Here||body" > -- Right ("cap","Title Here","body") > -- > -- >>> parseSubmitNewThread "cap||only title" > -- Left "Submit format is: CAPTCHA||TITLE||BODY" > -- > -- >>> parseSubmitNewThread "cap||||body" > -- Left "Title is empty." > parseSubmitNewThread :: T.Text -> Either T.Text (T.Text, T.Text, T.Text) > parseSubmitNewThread q = > case T.splitOn "||" q of > (cap:title:rest@(_:_)) -> > let body = T.intercalate "||" rest > capS = T.strip cap > titleS = T.strip title > bodyS = T.strip (unescapeNewlines body) > in if T.null capS then Left "Captcha answer is empty." > else if T.null titleS then Left "Title is empty." > else if T.null bodyS then Left "Body is empty." > else if T.length bodyS > maxBodyChars > then Left ("Body exceeds " <> T.pack (show maxBodyChars) <> " chars.") > else Right (capS, titleS, bodyS) > _ -> Left "Submit format is: CAPTCHA||TITLE||BODY" Submit-result rendering ----------------------- Kareha's success response is a 302 redirect (Location: back to the thread). Its failure response is a 200 HTML page whose `

` is the human-readable error ("Wrong verification code entered.", "No text entered.", etc.). We parse both shapes off the `-i` (include headers) curl output. > emitSubmitResult :: Ctx -> Either T.Text T.Text -> T.Text -> Maybe T.Text -> IO () > emitSubmitResult ctx (Left e) _ _ = > mapM_ putLine [ errorItem e, homeMenu ctx, terminator ] > emitSubmitResult ctx (Right resp) board mTid = > case classifyResponse resp of > Success _ -> > let onSuccess = case mTid of > Just tid -> menuLine "→ view thread" > (link ctx (board <> "/" <> tid)) > (ctxHost ctx) (ctxPort ctx) > Nothing -> menuLine ("→ /" <> board <> "/ (find your new thread)") > (link ctx board) (ctxHost ctx) (ctxPort ctx) > in mapM_ putLine > [ infoLine "Posted." > , infoLine "" > , onSuccess > , homeMenu ctx > , terminator > ] > Failure msg -> > mapM_ putLine > [ errorItem ("Kareha rejected the post: " <> msg) > , infoLine "" > , menuLine ("← back to /" <> board <> "/") > (link ctx board) (ctxHost ctx) (ctxPort ctx) > , homeMenu ctx > , terminator > ] > data PostResult = Success T.Text | Failure T.Text deriving (Eq, Show) > -- | Classify a curl `-i` response: a `Location:` header means > -- success (Kareha redirects on accepted post), otherwise pull the > -- error message out of the first `

` in the body. > -- > -- >>> classifyResponse "HTTP/1.1 302 Found\r\nLocation: /tech/kareha.pl/1\r\n\r\n" > -- Success "/tech/kareha.pl/1" > -- > -- >>> classifyResponse "HTTP/1.1 200 OK\r\n\r\n

Wrong verification code entered.

" > -- Failure "Wrong verification code entered." > -- > -- >>> classifyResponse "HTTP/1.1 200 OK\r\n\r\nplain body" > -- Failure "Unknown error (no

in response)." > classifyResponse :: T.Text -> PostResult > classifyResponse r = > let lower = T.toLower r > loc = textBetween "location:" "\r" lower > in if not (T.null loc) > then > -- Recover original-case Location value. > let locOrig = textBetween "Location:" "\r" r > v = T.strip (if T.null locOrig then loc else locOrig) > in Success v > else > let h1 = T.strip (decodeEntities (stripTags (textBetween "

" "

" r))) > in if T.null h1 > then Failure "Unknown error (no

in response)." > else Failure h1 Captcha asset ------------- Writes the raw GIF to stdout. No gophermap framing --- Venusia streams whatever we emit; the parent menu's row already says this is type-`g`. > emitCaptchaGif :: T.Text -> T.Text -> Maybe T.Text -> T.Text -> IO () > emitCaptchaGif _board _tid Nothing _kind = pure () > emitCaptchaGif _board _tid (Just sid) _kind = do > let p = sessionDir sid "captcha.gif" > ok <- doesFileExist p > if not ok > then mapM_ putLine [ errorItem "Captcha session is gone.", terminator ] > else BS.readFile p >>= BS.hPut stdout Error pages ----------- > emitPathError :: Ctx -> T.Text -> IO () > emitPathError ctx p = mapM_ putLine > [ errorItem ("Unrecognised path-info: /" <> p) > , homeMenu ctx > , terminator > ] > failPage :: Ctx -> T.Text -> T.Text -> IO () > failPage ctx msg backSubsel = mapM_ putLine > [ errorItem msg > , menuLine "Try again" (link ctx backSubsel) (ctxHost ctx) (ctxPort ctx) > , homeMenu ctx > , terminator > ] End of file. The next blank line ensures GHC's literate parser doesn't choke on a trailing bird-track without content.