#!/usr/bin/env stack
> -- stack script --resolver lts-22.6 --package text --package bytestring --package process --package directory --package filepath --package random --package time
4_ch.lhs: gopherspace ↔ 4-ch.net (read + reply + new thread)
============================================================
A single literate-Haskell applet that gateways the Kareha text-board
at into gopherspace. Pick a board, read threads,
reply to one, or start a new one --- all over gopher, no JavaScript,
no HTML rendering, no web client.
This replaces the older `/gateway/4_ch` script (now living in
`.old/`), which was read-only, hardcoded three `[[gateway]]` blocks
in `routes.toml`, and parsed HTML with a `curl | sed` pipeline that
nobody wanted to touch. This applet self-mounts under any selector
(it derives its mount point from `$selector` minus `$pathinfo`,
Venusia 0.8.0.0+), parses HTML in native `Text`, and adds the bit
the original author waved at but never built: posting.
URL design
----------
`$SCRIPT` is whatever `routes.toml` mounts this file at ---
`/applets/4_ch.lhs` on this server. Substitute freely; the script
re-derives its base from `$selector` at request time.
$SCRIPT board picker
$SCRIPT/ thread index for board
$SCRIPT// view thread on board
$SCRIPT///reply open a reply session: fetch a
captcha, cache the cookie,
land on the reply menu
$SCRIPT///reply/ reply menu (captcha + submit)
$SCRIPT///reply//captcha.gif raw GIF bytes
$SCRIPT///reply//submit type-7 target; query is
"CAPTCHA||BODY"
$SCRIPT//new open a new-thread session
$SCRIPT//new/ new-thread menu
$SCRIPT//new//captcha.gif raw GIF bytes
$SCRIPT//new//submit type-7 target; query is
"CAPTCHA||TITLE||BODY"
`` is the board slug (lowercase alphanumeric, eg `tech`, `dqn`);
`` is the thread id (all digits); `` is a 16-hex-char session
id created when the reply/new flow starts. Anything off this grid is
a type-3 error row.
The captcha session
-------------------
Kareha gates posting on a small image captcha. The image is fetched
from `//captcha.pl?selector=.postcaptcha` (or
`.threadcaptcha`); the server sets a `captchakey` cookie whose value
is the answer it'll accept. The applet runs that handshake when the
user clicks "reply" or "new thread", stashes the GIF + cookie jar
under `/tmp/4ch-sessions//`, and surfaces both:
- a type-`g` link to the GIF (gopher clients that render images
show it inline),
- the underlying HTTPS URL as an info-line (for the GIF-blind),
- a type-7 search prompt to actually submit the post.
The cookie expires server-side in two weeks; the local session is
GC'd after `sessionTTL` seconds. The directory layout is the entire
state machine --- there is no database, no in-memory state, and no
locking. The session id is the cookie's only authority over the
applet, so it is generated from `random` (not predictable input) and
the directory permissions are 0700.
Submit encoding
---------------
Gopher type-7 takes one search line. We split that line on `||`
(double pipe) to pack multiple form fields into a single submission:
- reply: `CAPTCHA||BODY`
- new thread: `CAPTCHA||TITLE||BODY`
Inside `BODY`, the literal two-character sequence `\n` is converted
to a real newline before submission, so multi-paragraph posts are
possible from a single search field.
From the command line
---------------------
curl gopher://gopher.someodd.zip/1/applets/4_ch.lhs
curl gopher://gopher.someodd.zip/1/applets/4_ch.lhs/tech
curl gopher://gopher.someodd.zip/1/applets/4_ch.lhs/tech/1721174972
curl 'gopher://gopher.someodd.zip/1/applets/4_ch.lhs/tech/1721174972/reply'
(This file is markdown-flavoured literate Haskell. Headings use
setext underlines rather than ATX `#` because GHC's literate
parser interprets a `#` at column 1 of a non-code line as the start
of a pragma --- setext sidesteps that.)
Running the doctests
--------------------
doctest-lhs 4_ch.lhs
Reuses this file's own `-- stack` directive and language
pragmas, so doctest sees the same packages and extensions
`run-cached-lhs` compiles against.
Module header and imports
-------------------------
> {-# LANGUAGE OverloadedStrings #-}
> module Main (main) where
>
> import Control.Exception (IOException, SomeException, try)
> import Control.Monad (forM_, when)
> import qualified Data.ByteString as BS
> import Data.Char (isDigit, isAlphaNum, isHexDigit)
> import qualified Data.Text as T
> import qualified Data.Text.IO as TIO
> import Data.Time.Clock (UTCTime, diffUTCTime, getCurrentTime)
> import System.Directory (createDirectoryIfMissing, doesFileExist,
> getModificationTime, listDirectory,
> removeDirectoryRecursive, setPermissions,
> emptyPermissions, readable, writable, searchable)
> import System.Environment (getArgs, lookupEnv)
> import System.Exit (ExitCode (..))
> import System.FilePath ((>))
> import System.IO (BufferMode (..), hSetBuffering, hSetEncoding,
> stdout, utf8)
> import System.Process (readProcessWithExitCode)
> import System.Random (randomRIO)
Defaults
--------
Host/port follow figlet.lhs --- overridable via env so the same file
serves staging and production.
> defaultHost, defaultPort :: T.Text
> defaultHost = "gopher.someodd.zip"
> defaultPort = "70"
> upstream :: T.Text
> upstream = "https://4-ch.net"
> sessionRoot :: FilePath
> sessionRoot = "/tmp/4ch-sessions"
`sessionTTL` is the wall-clock age past which a session directory is
considered abandoned and removed. Two captcha cookies live two weeks
upstream, but a user who walks away mid-flow has effectively given
up; thirty minutes is plenty for a human to view a GIF and type a
post, and short enough to keep `/tmp` tidy.
> sessionTTL :: Int -- seconds
> sessionTTL = 30 * 60
> -- | Cap the per-page row count so a runaway board (a /dqn/ flame
> -- war thread with hundreds of replies) does not blow past whatever
> -- a gopher client will scroll.
> maxRowsPerPage :: Int
> maxRowsPerPage = 200
> -- | Hard cap on accepted submit body length. Kareha itself will
> -- reject anything truly enormous, but the applet has no reason to
> -- shovel half a megabyte through `curl -F`.
> maxBodyChars :: Int
> maxBodyChars = 8000
Request parsing
---------------
Same shape as figlet.lhs / ask.lhs / qdbviewer.lhs --- three
positional args from `routes.toml` (`$selector`, `$search`,
`$pathinfo`), any extras (`$remote_ip`) ignored via the cons
pattern.
> data Req = Req
> { reqSel :: T.Text -- full selector that resolved here
> , reqQ :: T.Text -- search text after the tab; empty when none
> , reqP :: T.Text -- selector portion after this script's filename
> } deriving (Eq, Show)
> -- | Parse the framework's argv into a 'Req'. Extras past the third
> -- position are ignored; empty argv is a usage error.
> --
> -- >>> parseArgs ["/applets/4_ch.lhs/tech/1721174972", "", "/tech/1721174972"]
> -- Req {reqSel = "/applets/4_ch.lhs/tech/1721174972", reqQ = "", reqP = "/tech/1721174972"}
> --
> -- >>> parseArgs ["/applets/4_ch.lhs", "", ""]
> -- Req {reqSel = "/applets/4_ch.lhs", reqQ = "", reqP = ""}
> parseArgs :: [String] -> Req
> parseArgs (s:q:p:_) = Req (T.pack s) (T.pack q) (T.pack p)
> parseArgs (s:q:_) = Req (T.pack s) (T.pack q) T.empty
> parseArgs [s] = Req (T.pack s) T.empty T.empty
> parseArgs [] = error
> "4_ch.lhs: missing argv[0] (gopher selector). When run by Venusia \
> \this is automatic; for manual testing pass it explicitly, e.g. \
> \`runghc 4_ch.lhs /applets/4_ch.lhs '' '/tech'`."
> data Ctx = Ctx
> { ctxScriptSel :: T.Text -- selector path to this script, no path-info
> , ctxHost :: T.Text
> , ctxPort :: T.Text
> }
Main dispatch
-------------
`stdout` is `NoBuffering` so partial output reaches the client under
Venusia's `stream = true`. The top-level `try` turns any uncaught
exception into a visible type-3 error row instead of a blank
response.
> main :: IO ()
> main = do
> hSetBuffering stdout NoBuffering
> hSetEncoding stdout utf8
> result <- try mainBody :: IO (Either SomeException ())
> case result of
> Right () -> pure ()
> Left e -> mapM_ putLine
> [ errorItem ("4_ch.lhs crashed: " <> T.pack (show e))
> , terminator
> ]
> mainBody :: IO ()
> mainBody = do
> req <- parseArgs <$> getArgs
> host <- maybe defaultHost T.pack <$> lookupEnv "GOPHER_HOST"
> port <- maybe defaultPort T.pack <$> lookupEnv "GOPHER_PORT"
> let pinfo = reqP req
> scriptSel = T.dropEnd (T.length pinfo) (reqSel req)
> ctx = Ctx scriptSel host port
> segs = filter (not . T.null) . T.splitOn "/" $ pinfo
> dispatch ctx segs (reqQ req)
Path-info segments select the handler. Pattern guards check shape
(board slug, thread digits, sid hex) and bounce to the error page
otherwise --- this is the only validation between path-info and disk
operations under `/tmp/4ch-sessions`, so it has to be exact.
> dispatch :: Ctx -> [T.Text] -> T.Text -> IO ()
> dispatch ctx segs q = case segs of
> [] -> emitBoards ctx
> [b] | isBoardSlug b -> emitThreads ctx b
> [b, t] | isBoardSlug b, isThreadId t -> emitThread ctx b t
> [b, t, "reply"] | isBoardSlug b, isThreadId t -> emitReplyInit ctx b t
> [b, t, "reply", sid] | isBoardSlug b, isThreadId t, isSid sid
> -> emitReplyMenu ctx b t sid
> [b, t, "reply", sid, "captcha.gif"]
> | isBoardSlug b, isThreadId t, isSid sid -> emitCaptchaGif b t (Just sid) "reply"
> [b, t, "reply", sid, "submit"]
> | isBoardSlug b, isThreadId t, isSid sid -> doReply ctx b t sid q
> [b, "new"] | isBoardSlug b -> emitNewThreadInit ctx b
> [b, "new", sid] | isBoardSlug b, isSid sid -> emitNewThreadMenu ctx b sid
> [b, "new", sid, "captcha.gif"]
> | isBoardSlug b, isSid sid -> emitCaptchaGif b "" (Just sid) "new"
> [b, "new", sid, "submit"]
> | isBoardSlug b, isSid sid -> doNewThread ctx b sid q
> _ -> emitPathError ctx (T.intercalate "/" segs)
Validators. Cheap; doctest-checked.
> -- | >>> map isBoardSlug ["tech", "dqn", "../etc", "TECH", ""]
> -- [True,True,False,False,False]
> isBoardSlug :: T.Text -> Bool
> isBoardSlug t = not (T.null t)
> && T.length t <= 32
> && T.all (\c -> isAlphaNum c && c == toLowerAscii c) t
> where toLowerAscii c = if c >= 'A' && c <= 'Z' then toEnum (fromEnum c + 32) else c
> -- | >>> map isThreadId ["1721174972", "12", "abc", "", "1234567890123456"]
> -- [True,True,False,False,True]
> isThreadId :: T.Text -> Bool
> isThreadId t = not (T.null t) && T.length t <= 16 && T.all isDigit t
> -- | >>> map isSid ["0123456789abcdef", "DEADBEEF12345678", "short", ""]
> -- [True,False,False,False]
> isSid :: T.Text -> Bool
> isSid t = T.length t == 16 && T.all (\c -> isHexDigit c && not (c >= 'A' && c <= 'F')) t
Gophermap line builders
-----------------------
Same row shape as figlet.lhs; tabs are the only structure on the
wire. Anything user-visible passes through `sanitize` first so
embedded CR/LF/TAB can never smuggle a fake row.
> putLine :: T.Text -> IO ()
> putLine t = TIO.putStr (t <> "\r\n")
> -- | >>> infoLine "hello"
> -- "ihello\t\t\t0"
> infoLine :: T.Text -> T.Text
> infoLine msg = "i" <> sanitize msg <> "\t\t\t0"
> -- | >>> menuLine "home" "/" "host" "70"
> -- "1home\t/\thost\t70"
> menuLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text
> menuLine d s h p = "1" <> sanitize d <> "\t" <> s <> "\t" <> h <> "\t" <> p
> -- | >>> searchLine "ask" "/q" "host" "70"
> -- "7ask\t/q\thost\t70"
> searchLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text
> searchLine d s h p = "7" <> sanitize d <> "\t" <> s <> "\t" <> h <> "\t" <> p
> -- | A type-`g` GIF item.
> --
> -- >>> gifLine "captcha" "/c.gif" "host" "70"
> -- "gcaptcha\t/c.gif\thost\t70"
> gifLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text
> gifLine d s h p = "g" <> sanitize d <> "\t" <> s <> "\t" <> h <> "\t" <> p
> -- | A type-`h` HTTP link.
> --
> -- >>> httpLine "URL" "https://example.com/" "host" "70"
> -- "hURL\tURL:https://example.com/\thost\t70"
> httpLine :: T.Text -> T.Text -> T.Text -> T.Text -> T.Text
> httpLine d url h p =
> "h" <> sanitize d <> "\tURL:" <> url <> "\t" <> h <> "\t" <> p
> -- | >>> errorItem "nope"
> -- "3nope\t\t\t0"
> errorItem :: T.Text -> T.Text
> errorItem msg = "3" <> sanitize msg <> "\t\t\t0"
> terminator :: T.Text
> terminator = "."
> -- | Strip CR/LF/TAB out of user/upstream-supplied row text so it
> -- cannot smuggle gophermap row breaks.
> --
> -- >>> sanitize "ok"
> -- "ok"
> --
> -- >>> sanitize "with\ttab"
> -- "with tab"
> --
> -- >>> sanitize "newline\nhere"
> -- "newline here"
> sanitize :: T.Text -> T.Text
> sanitize = T.map (\c -> if c == '\r' || c == '\n' || c == '\t' then ' ' else c)
Self-link helpers
-----------------
`link` joins path-info onto this script's mount point. `homeMenu` is
the back-to-board-picker row reused on every page that isn't the
picker.
> link :: Ctx -> T.Text -> T.Text
> link c sub = ctxScriptSel c <> "/" <> sub
> homeMenu :: Ctx -> T.Text
> homeMenu c = menuLine "← back to board list" (ctxScriptSel c) (ctxHost c) (ctxPort c)
HTML decoding
-------------
Kareha pages are XHTML; the bits the applet pulls out (post bodies,
thread titles) carry inline tags --- ``, `
`, `
`
--- and HTML entities. `htmlToText` peels the structural tags into
whitespace, strips anything else, and runs the entity decoder.
> -- | Decode the HTML entities Kareha emits. Single-pass over the
> -- text so `<` stays as the literal `<` (the user typed
> -- `<` and the server escaped the ampersand).
> --
> -- >>> decodeEntities "< > " '"
> -- "< > \" '"
> decodeEntities :: T.Text -> T.Text
> decodeEntities = go
> where
> go t = let (pre, rest) = T.breakOn "&" t
> in if T.null rest
> then t
> else case decodeOne rest of
> Just (d, more) -> pre <> d <> go more
> Nothing -> pre <> T.take 1 rest <> go (T.drop 1 rest)
>
> decodeOne r
> | "&" `T.isPrefixOf` r = Just ("&", T.drop 5 r)
> | "<" `T.isPrefixOf` r = Just ("<", T.drop 4 r)
> | ">" `T.isPrefixOf` r = Just (">", T.drop 4 r)
> | """ `T.isPrefixOf` r = Just ("\"", T.drop 6 r)
> | "'" `T.isPrefixOf` r = Just ("'", T.drop 5 r)
> | "," `T.isPrefixOf` r = Just (",", T.drop 5 r)
> | " " `T.isPrefixOf` r = Just (" ", T.drop 6 r)
> | "'" `T.isPrefixOf` r = Just ("'", T.drop 6 r)
> | otherwise = Nothing
> -- | Replace every `<...>` tag with the empty string.
> --
> -- >>> stripTags "a bold word"
> -- "a bold word"
> stripTags :: T.Text -> T.Text
> stripTags t = case T.breakOn "<" t of
> (pre, rest)
> | T.null rest -> pre
> | otherwise -> pre <> stripTags (T.drop 1 (T.dropWhile (/= '>') rest))
> -- | Convert a fragment of Kareha post-body XHTML to plain text.
> -- Inline tags become whitespace; everything else is stripped;
> -- entities are decoded last.
> --
> -- >>> htmlToText "hello
world
"
> -- "hello\nworld\n\n"
> --
> -- >>> htmlToText ">>1 quoted
"
> -- ">>1 > quoted"
> htmlToText :: T.Text -> T.Text
> htmlToText =
> decodeEntities
> . stripTags
> . T.replace "
" ""
> . T.replace "" "> "
> . T.replace "
" "\n\n"
> . T.replace "" ""
> . T.replace "
" "\n"
> . T.replace "
" "\n"
> . T.replace "
" "\n"
> -- | First `Text` lying strictly between `open` and the next `close`
> -- in the haystack. Returns empty if either marker is missing.
> --
> -- >>> textBetween "" "" "x hi y"
> -- "hi"
> --
> -- >>> textBetween "" "" "no markers here"
> -- ""
> textBetween :: T.Text -> T.Text -> T.Text -> T.Text
> textBetween open close hay =
> let (_, a) = T.breakOn open hay
> afterOpen = T.drop (T.length open) a
> (inside, _) = T.breakOn close afterOpen
> in if T.null a then T.empty else inside
HTTP helpers (via curl subprocess)
----------------------------------
Curl is invoked as a subprocess to keep the dependency surface flat
--- same posture as the rest of the applets. `curlGet` / `curlGetWithCookies` return
the body as `Text`; `curlGetBytes` returns raw `ByteString` (for the
GIF). `curlPostMultipart` POSTs a list of name/value pairs as
multipart/form-data with `--form-string`, which prevents `@file` and
` curlGet :: T.Text -> IO (Either T.Text T.Text)
> curlGet url = do
> r <- try $ readProcessWithExitCode "curl"
> ["-s", "--compressed", "--max-time", "20", T.unpack url] ""
> case r of
> Left e -> pure (Left ("curl failed: " <> T.pack (show (e :: IOException))))
> Right (ExitSuccess, out, _) -> pure (Right (T.pack out))
> Right (_, _, err) -> pure (Left ("curl error: " <> T.strip (T.pack err)))
> -- | GET binary into `out`, saving cookies to `jar`. The body is
> -- written by curl, so we never funnel bytes through `String` (which
> -- the system locale would re-decode and corrupt for non-text
> -- responses).
> curlGetToFileWithJar :: FilePath -> FilePath -> T.Text -> IO (Either T.Text ())
> curlGetToFileWithJar jar out url = do
> r <- try $ readProcessWithExitCode "curl"
> ["-s", "--compressed", "--max-time", "20", "-c", jar, "-o", out, T.unpack url] ""
> case r of
> Left e -> pure (Left ("curl failed: " <> T.pack (show (e :: IOException))))
> Right (ExitSuccess, _, _) -> pure (Right ())
> Right (_, _, err) -> pure (Left ("curl error: " <> T.strip (T.pack err)))
`curlPostMultipart` calls curl with `--form-string name=value` for
each pair, plus `-b jar` to attach the captcha cookie. We capture
the response body and follow no redirects --- a successful Kareha
post answers `302` with a `Location` we don't actually need (the
thread URL we already know), and an error post answers `200` with an
HTML body we want to parse.
> curlPostMultipart :: FilePath -> T.Text -> [(T.Text, T.Text)] -> IO (Either T.Text T.Text)
> curlPostMultipart jar url fields = do
> let formArgs = concatMap (\(k,v) -> ["--form-string", T.unpack k <> "=" <> T.unpack v]) fields
> args = ["-s", "-i", "--compressed", "--max-time", "30",
> "-b", jar, "-X", "POST", T.unpack url] ++ formArgs
> r <- try $ readProcessWithExitCode "curl" args ""
> case r of
> Left e -> pure (Left ("curl failed: " <> T.pack (show (e :: IOException))))
> Right (ExitSuccess, out, _) -> pure (Right (T.pack out))
> Right (_, _, err) -> pure (Left ("curl error: " <> T.strip (T.pack err)))
Pages
=====
Board picker
------------
Fetch `https://4-ch.net/side.html`, harvest every `Name` it lists, render as type-1
menu items pointing back into this applet.
> emitBoards :: Ctx -> IO ()
> emitBoards ctx = do
> r <- curlGet (upstream <> "/side.html")
> case r of
> Left e -> mapM_ putLine [ errorItem e, terminator ]
> Right html -> do
> let bs = parseBoards html
> if null bs
> then mapM_ putLine [ errorItem "Could not parse the board list.", terminator ]
> else mapM_ putLine $
> [ infoLine "4-ch.net via gopher — pick a board."
> , infoLine ""
> ] ++
> [ menuLine (name <> " (/" <> slug <> "/)") (link ctx slug) (ctxHost ctx) (ctxPort ctx)
> | (slug, name) <- bs
> ] ++
> [ infoLine ""
> , httpLine "Open 4-ch.net in a web browser" (upstream <> "/") (ctxHost ctx) (ctxPort ctx)
> , terminator
> ]
> -- | Pull every board (slug, display name) out of side.html.
> -- Tolerant of inline `style=` and `title=` attributes.
> --
> -- >>> parseBoards "Technology"
> -- [("tech","Technology")]
> --
> -- >>> parseBoards "DQN"
> -- [("dqn","DQN")]
> parseBoards :: T.Text -> [(T.Text, T.Text)]
> parseBoards = uniq . go . T.replace "\r" "" . T.replace "\n" " "
> where
> go t = case T.breakOn "https://4-ch.net/" t of
> (_, rest) | T.null rest -> []
> | otherwise ->
> let after = T.drop (T.length "https://4-ch.net/") rest
> (slug, after') = T.span (\c -> isAlphaNum c) after
> isClose = T.isPrefixOf "/\"" after'
> in if not isClose || T.null slug
> then go (T.drop 1 rest)
> else
> let (_, txt0) = T.breakOn ">" after'
> txt1 = T.drop 1 txt0
> (label, _) = T.breakOn " cleaned = T.strip (decodeEntities (stripTags label))
> in if T.null cleaned
> then go (T.drop 1 rest)
> else (slug, cleaned) : go (T.drop (T.length "https://4-ch.net/" + T.length slug) rest)
> uniq = go' []
> where
> go' _ [] = []
> go' seen ((s,n):xs)
> | s `elem` seen = go' seen xs
> | otherwise = (s,n) : go' (s:seen) xs
Thread index
------------
Per-board page; each `` containing a `kareha.pl` link is one
thread. Title and reply count come straight out of the heading.
> emitThreads :: Ctx -> T.Text -> IO ()
> emitThreads ctx board = do
> r <- curlGet (upstream <> "/" <> board <> "/")
> case r of
> Left e -> mapM_ putLine [ errorItem e, terminator ]
> Right html -> do
> let ts = take maxRowsPerPage (parseThreads html)
> mapM_ putLine $
> [ infoLine ("/" <> board <> "/ — " <> T.pack (show (length ts)) <> " threads (most recent first).")
> , infoLine ""
> ] ++
> [ menuLine (title <> " (" <> T.pack (show n) <> ")") (link ctx (board <> "/" <> tid))
> (ctxHost ctx) (ctxPort ctx)
> | (tid, title, n) <- ts
> ] ++
> [ infoLine ""
> , menuLine ("[+] start a new thread on /" <> board <> "/")
> (link ctx (board <> "/new")) (ctxHost ctx) (ctxPort ctx)
> , homeMenu ctx
> , terminator
> ]
> -- | Pull thread (id, title, reply-count) tuples out of a board page.
> -- One row per `...kareha.pl/...
` block.
> parseThreads :: T.Text -> [(T.Text, T.Text, Int)]
> parseThreads html =
> let oneline = T.replace "\r" "" . T.replace "\n" " " $ html
> chunks = drop 1 (T.splitOn "" oneline)
> in [ (tid, title, n)
> | c <- chunks
> , Just (tid, title, n) <- [parseThreadChunk c]
> ]
> parseThreadChunk :: T.Text -> Maybe (T.Text, T.Text, Int)
> parseThreadChunk c =
> let (head', _) = T.breakOn "
" c
> href = textBetween "kareha.pl/" "\"" head'
> tid = T.takeWhile isDigit href
> afterAnchor = case T.breakOn ">" (snd (T.breakOn " (_, x) -> T.drop 1 x
> beforeClose = fst (T.breakOn " rawTitle = T.strip (decodeEntities (stripTags beforeClose))
> countTxt = textBetween "(" ")" head'
> n = readDigits countTxt
> in if T.null tid || T.null rawTitle
> then Nothing
> else Just (tid, stripTrailingCount rawTitle, n)
> where
> readDigits t = case reads (T.unpack t) :: [(Int, String)] of
> ((k,_):_) -> k
> _ -> 0
> -- Some boards render "Title (12)" inside the anchor instead of
> -- inside a separate ; strip a trailing "(NN)" so we
> -- don't show the count twice.
> stripTrailingCount t =
> let s = T.stripEnd t
> in if T.takeEnd 1 s == ")"
> then
> let dropped = T.dropEnd 1 s
> (digits, before) = T.span (/= '(') (T.reverse dropped)
> in if not (T.null digits) && T.all isDigit (T.reverse digits)
> && T.takeEnd 1 (T.reverse before) == "("
> then T.stripEnd (T.dropEnd 1 (T.reverse before))
> else s
> else s
Thread view
-----------
Per-thread page; each ` emitThread :: Ctx -> T.Text -> T.Text -> IO ()
> emitThread ctx board tid = do
> r <- curlGet (upstream <> "/" <> board <> "/kareha.pl/" <> tid)
> case r of
> Left e -> mapM_ putLine [ errorItem e, terminator ]
> Right html -> do
> let posts = parsePosts html
> title = parseThreadTitle html
> returnLink = menuLine ("← back to /" <> board <> "/")
> (link ctx board) (ctxHost ctx) (ctxPort ctx)
> replyLink = menuLine "[reply to this thread]"
> (link ctx (board <> "/" <> tid <> "/reply"))
> (ctxHost ctx) (ctxPort ctx)
> mapM_ putLine $
> [ infoLine ("── " <> title <> " ──")
> , infoLine ("/" <> board <> "/ thread " <> tid <> " posts: " <> T.pack (show (length posts)))
> , infoLine ""
> , returnLink
> , replyLink
> , infoLine ""
> ] ++ concatMap renderPost (zip [1..] (take maxRowsPerPage posts)) ++
> [ infoLine ""
> , replyLink
> , returnLink
> , httpLine "Open this thread on the web"
> (upstream <> "/" <> board <> "/kareha.pl/" <> tid)
> (ctxHost ctx) (ctxPort ctx)
> , terminator
> ]
> -- | Pull the bracketed thread title out of `
` ("Foo - Tech & Programming @ 4-ch")
> parseThreadTitle :: T.Text -> T.Text
> parseThreadTitle html =
> let t = textBetween "" "" html
> cleaned = T.strip (decodeEntities (stripTags t))
> -- Trim the trailing " - Board @ 4-ch" if present
> cut = case T.breakOnEnd " - " cleaned of
> ("", _) -> cleaned
> (left, _) -> T.stripEnd (T.dropEnd 3 left)
> in if T.null cut then "(untitled)" else cut
> data Post = Post
> { pNum :: T.Text
> , pName :: T.Text
> , pDate :: T.Text
> , pBody :: T.Text
> } deriving (Eq, Show)
> parsePosts :: T.Text -> [Post]
> parsePosts html =
> let oneline = T.replace "\r" "" . T.replace "\n" " " $ html
> chunks = drop 1 (T.splitOn " in [ p | c <- chunks, Just p <- [parsePostChunk c] ]
> parsePostChunk :: T.Text -> Maybe Post
> parsePostChunk c =
> let -- The replytext div splits metadata from body.
> (meta, body0) = T.breakOn "
" c
> bodyHtml = let afterOpen = T.drop (T.length "
") body0
> (inside, _) = T.breakOn "
" afterOpen
> in inside
> num = T.strip (decodeEntities (stripTags
> (textBetween "
" "" meta)))
> name = T.strip (decodeEntities (stripTags
> (textBetween "
" "" meta)))
> -- Date+ID lives between the close of the postertrip span and
> -- the open of the deletebutton span. Spacing around the
> -- colon varies (saged posts have an extra space); slicing on
> -- the last in the pre-deletebutton region tolerates
> -- that variation.
> beforeDelete = fst (T.breakOn "
afterTripClose = snd (T.breakOnEnd "" beforeDelete)
> dateField = T.strip $ decodeEntities $ stripTags $
> T.dropWhile (\ch -> ch == ' ' || ch == ':')
> (T.stripStart afterTripClose)
> body = T.strip (htmlToText bodyHtml)
> in if T.null num && T.null name && T.null body
> then Nothing
> else Just (Post num name dateField body)
> renderPost :: (Int, Post) -> [T.Text]
> renderPost (_, Post num name date body) =
> let header = ">> " <> num <> " " <> name <> " " <> date
> lines' = if T.null body then ["(no body)"] else T.lines body
> in [ infoLine header ] ++ map (\l -> infoLine (" " <> l)) lines' ++ [ infoLine "" ]
Session management
==================
A session is a directory under `/tmp/4ch-sessions/
` containing
`cookies` (curl cookie jar) and `captcha.gif`. The directory itself
is the lock; `createDirectoryIfMissing` is atomic relative to other
parallel attempts under the same sid (which won't collide because
sids are 64-bit random).
> -- | Path to a session's directory; does NOT create it.
> sessionDir :: T.Text -> FilePath
> sessionDir sid = sessionRoot > T.unpack sid
> -- | Create a fresh session id (16 lowercase hex chars; ~64 bits).
> newSessionId :: IO T.Text
> newSessionId = do
> n <- randomRIO (0, 0xFFFFFFFFFFFFFFFF :: Integer)
> pure (T.pack (pad16 (showHex n)))
> where
> pad16 s = replicate (16 - length s) '0' ++ take 16 s
> showHex 0 = "0"
> showHex k = go k ""
> where
> go 0 acc = acc
> go x acc = let (q, r) = x `divMod` 16
> d = "0123456789abcdef" !! fromIntegral r
> in go q (d : acc)
> -- | Wipe session directories older than `sessionTTL` seconds.
> gcStaleSessions :: IO ()
> gcStaleSessions = do
> createDirectoryIfMissing True sessionRoot
> now <- getCurrentTime
> entries <- listDirectory sessionRoot
> forM_ entries $ \e -> do
> let p = sessionRoot > e
> r <- try (getModificationTime p) :: IO (Either IOException UTCTime)
> case r of
> Right t | round (diffUTCTime now t) > sessionTTL -> do
> _ <- try (removeDirectoryRecursive p) :: IO (Either IOException ())
> pure ()
> _ -> pure ()
> -- | Ensure a session directory exists with 0700 permissions.
> ensureSession :: T.Text -> IO ()
> ensureSession sid = do
> let d = sessionDir sid
> createDirectoryIfMissing True d
> setPermissions d
> (emptyPermissions { readable = True, writable = True, searchable = True })
> -- | Fetch a fresh captcha into the session directory.
> -- `selector` is `.postcaptcha` or `.threadcaptcha`.
> freshCaptcha :: T.Text -> T.Text -> T.Text -> IO (Either T.Text ())
> freshCaptcha board sid selector = do
> ensureSession sid
> let jar = sessionDir sid > "cookies"
> gif = sessionDir sid > "captcha.gif"
> url = upstream <> "/" <> board <> "/captcha.pl?selector=" <> selector
> r <- curlGetToFileWithJar jar gif url
> case r of
> Left e -> pure (Left e)
> Right () -> do
> bs <- BS.readFile gif
> if BS.length bs < 6 || BS.take 3 bs /= "GIF"
> then pure (Left "Captcha endpoint did not return a GIF.")
> else pure (Right ())
Reply flow
==========
> emitReplyInit :: Ctx -> T.Text -> T.Text -> IO ()
> emitReplyInit ctx board tid = do
> gcStaleSessions
> sid <- newSessionId
> r <- freshCaptcha board sid ".postcaptcha"
> case r of
> Left e -> mapM_ putLine [ errorItem e, terminator ]
> Right () -> emitReplyMenu ctx board tid sid
> emitReplyMenu :: Ctx -> T.Text -> T.Text -> T.Text -> IO ()
> emitReplyMenu ctx board tid sid = do
> let base = board <> "/" <> tid <> "/reply/" <> sid
> gifSel = link ctx (base <> "/captcha.gif")
> submit = link ctx (base <> "/submit")
> mapM_ putLine
> [ infoLine ("Reply to /" <> board <> "/ thread " <> tid)
> , infoLine ""
> , infoLine "1. View the captcha image below (gopher type 'g')."
> , infoLine " If your client won't render it inline, fetch it:"
> , infoLine (" curl 'gopher://" <> ctxHost ctx <> "/g" <> gifSel <> "' > c.gif")
> , infoLine "2. Submit your post using the search prompt below."
> , infoLine " Format: CAPTCHA||BODY"
> , infoLine " In BODY, the literal two characters '\\n' become a newline."
> , infoLine ""
> , gifLine "captcha.gif" gifSel (ctxHost ctx) (ctxPort ctx)
> , infoLine ""
> , searchLine "Submit reply (CAPTCHA||BODY)" submit (ctxHost ctx) (ctxPort ctx)
> , infoLine ""
> , menuLine "← back to thread"
> (link ctx (board <> "/" <> tid)) (ctxHost ctx) (ctxPort ctx)
> , homeMenu ctx
> , terminator
> ]
> doReply :: Ctx -> T.Text -> T.Text -> T.Text -> T.Text -> IO ()
> doReply ctx board tid sid query = do
> let jar = sessionDir sid > "cookies"
> sessionOk <- doesFileExist jar
> if not sessionOk
> then mapM_ putLine
> [ errorItem "Session has expired. Start a new reply."
> , menuLine "Start a new reply" (link ctx (board <> "/" <> tid <> "/reply"))
> (ctxHost ctx) (ctxPort ctx)
> , terminator
> ]
> else case parseSubmitReply query of
> Left e -> failPage ctx e (board <> "/" <> tid <> "/reply")
> Right (cap, body) -> do
> let fields = [ ("task", "post")
> , ("thread", tid)
> , ("password", "")
> , ("field_a", "")
> , ("field_b", "")
> , ("name", "")
> , ("link", "")
> , ("captcha", cap)
> , ("markup", "waka")
> , ("comment", body)
> ]
> url = upstream <> "/" <> board <> "/kareha.pl"
> resp <- curlPostMultipart jar url fields
> emitSubmitResult ctx resp board (Just tid)
Posting parser. Splits the search line on `||` and unescapes literal
`\n` inside the body.
> -- | >>> parseSubmitReply "abcd||hello world"
> -- Right ("abcd","hello world")
> --
> -- >>> parseSubmitReply "abcd||line1\\nline2"
> -- Right ("abcd","line1\nline2")
> --
> -- >>> parseSubmitReply "no-separator"
> -- Left "Submit format is: CAPTCHA||BODY"
> --
> -- >>> parseSubmitReply "||body but empty captcha"
> -- Left "Captcha answer is empty."
> --
> -- >>> parseSubmitReply "cap||"
> -- Left "Body is empty."
> parseSubmitReply :: T.Text -> Either T.Text (T.Text, T.Text)
> parseSubmitReply q =
> case T.splitOn "||" q of
> [_] -> Left "Submit format is: CAPTCHA||BODY"
> (cap:rest@(_:_)) ->
> let body = T.intercalate "||" rest
> capS = T.strip cap
> bodyS = T.strip (unescapeNewlines body)
> in if T.null capS then Left "Captcha answer is empty."
> else if T.null bodyS then Left "Body is empty."
> else if T.length bodyS > maxBodyChars
> then Left ("Body exceeds " <> T.pack (show maxBodyChars) <> " chars.")
> else Right (capS, bodyS)
> _ -> Left "Submit format is: CAPTCHA||BODY"
> -- | Decode literal `\n` to a real newline. Two-pass over the text
> -- to keep the doctest readable.
> --
> -- >>> unescapeNewlines "a\\nb"
> -- "a\nb"
> --
> -- >>> unescapeNewlines "no escapes here"
> -- "no escapes here"
> unescapeNewlines :: T.Text -> T.Text
> unescapeNewlines = T.replace "\\n" "\n"
New-thread flow
===============
> emitNewThreadInit :: Ctx -> T.Text -> IO ()
> emitNewThreadInit ctx board = do
> gcStaleSessions
> sid <- newSessionId
> r <- freshCaptcha board sid ".threadcaptcha"
> case r of
> Left e -> mapM_ putLine [ errorItem e, terminator ]
> Right () -> emitNewThreadMenu ctx board sid
> emitNewThreadMenu :: Ctx -> T.Text -> T.Text -> IO ()
> emitNewThreadMenu ctx board sid = do
> let base = board <> "/new/" <> sid
> gifSel = link ctx (base <> "/captcha.gif")
> submit = link ctx (base <> "/submit")
> mapM_ putLine
> [ infoLine ("New thread on /" <> board <> "/")
> , infoLine ""
> , infoLine "1. View the captcha image below."
> , infoLine (" Or fetch it: curl 'gopher://" <> ctxHost ctx <> "/g" <> gifSel <> "' > c.gif")
> , infoLine "2. Submit using the search prompt below."
> , infoLine " Format: CAPTCHA||TITLE||BODY"
> , infoLine " In BODY, the literal two characters '\\n' become a newline."
> , infoLine ""
> , gifLine "captcha.gif" gifSel (ctxHost ctx) (ctxPort ctx)
> , infoLine ""
> , searchLine "Submit new thread (CAPTCHA||TITLE||BODY)" submit (ctxHost ctx) (ctxPort ctx)
> , infoLine ""
> , menuLine ("← back to /" <> board <> "/")
> (link ctx board) (ctxHost ctx) (ctxPort ctx)
> , homeMenu ctx
> , terminator
> ]
> doNewThread :: Ctx -> T.Text -> T.Text -> T.Text -> IO ()
> doNewThread ctx board sid query = do
> let jar = sessionDir sid > "cookies"
> sessionOk <- doesFileExist jar
> if not sessionOk
> then mapM_ putLine
> [ errorItem "Session has expired. Start a new thread again."
> , menuLine "Start a new thread" (link ctx (board <> "/new"))
> (ctxHost ctx) (ctxPort ctx)
> , terminator
> ]
> else case parseSubmitNewThread query of
> Left e -> failPage ctx e (board <> "/new")
> Right (cap, title, body) -> do
> let fields = [ ("task", "post")
> , ("title", title)
> , ("password", "")
> , ("field_a", "")
> , ("field_b", "")
> , ("name", "")
> , ("link", "")
> , ("captcha", cap)
> , ("markup", "waka")
> , ("comment", body)
> ]
> url = upstream <> "/" <> board <> "/kareha.pl"
> resp <- curlPostMultipart jar url fields
> emitSubmitResult ctx resp board Nothing
> -- | >>> parseSubmitNewThread "cap||Title Here||body"
> -- Right ("cap","Title Here","body")
> --
> -- >>> parseSubmitNewThread "cap||only title"
> -- Left "Submit format is: CAPTCHA||TITLE||BODY"
> --
> -- >>> parseSubmitNewThread "cap||||body"
> -- Left "Title is empty."
> parseSubmitNewThread :: T.Text -> Either T.Text (T.Text, T.Text, T.Text)
> parseSubmitNewThread q =
> case T.splitOn "||" q of
> (cap:title:rest@(_:_)) ->
> let body = T.intercalate "||" rest
> capS = T.strip cap
> titleS = T.strip title
> bodyS = T.strip (unescapeNewlines body)
> in if T.null capS then Left "Captcha answer is empty."
> else if T.null titleS then Left "Title is empty."
> else if T.null bodyS then Left "Body is empty."
> else if T.length bodyS > maxBodyChars
> then Left ("Body exceeds " <> T.pack (show maxBodyChars) <> " chars.")
> else Right (capS, titleS, bodyS)
> _ -> Left "Submit format is: CAPTCHA||TITLE||BODY"
Submit-result rendering
-----------------------
Kareha's success response is a 302 redirect (Location: back to the
thread). Its failure response is a 200 HTML page whose `` is the
human-readable error ("Wrong verification code entered.", "No text
entered.", etc.). We parse both shapes off the `-i` (include
headers) curl output.
> emitSubmitResult :: Ctx -> Either T.Text T.Text -> T.Text -> Maybe T.Text -> IO ()
> emitSubmitResult ctx (Left e) _ _ =
> mapM_ putLine [ errorItem e, homeMenu ctx, terminator ]
> emitSubmitResult ctx (Right resp) board mTid =
> case classifyResponse resp of
> Success _ ->
> let onSuccess = case mTid of
> Just tid -> menuLine "→ view thread"
> (link ctx (board <> "/" <> tid))
> (ctxHost ctx) (ctxPort ctx)
> Nothing -> menuLine ("→ /" <> board <> "/ (find your new thread)")
> (link ctx board) (ctxHost ctx) (ctxPort ctx)
> in mapM_ putLine
> [ infoLine "Posted."
> , infoLine ""
> , onSuccess
> , homeMenu ctx
> , terminator
> ]
> Failure msg ->
> mapM_ putLine
> [ errorItem ("Kareha rejected the post: " <> msg)
> , infoLine ""
> , menuLine ("← back to /" <> board <> "/")
> (link ctx board) (ctxHost ctx) (ctxPort ctx)
> , homeMenu ctx
> , terminator
> ]
> data PostResult = Success T.Text | Failure T.Text deriving (Eq, Show)
> -- | Classify a curl `-i` response: a `Location:` header means
> -- success (Kareha redirects on accepted post), otherwise pull the
> -- error message out of the first `` in the body.
> --
> -- >>> classifyResponse "HTTP/1.1 302 Found\r\nLocation: /tech/kareha.pl/1\r\n\r\n"
> -- Success "/tech/kareha.pl/1"
> --
> -- >>> classifyResponse "HTTP/1.1 200 OK\r\n\r\nWrong verification code entered.
"
> -- Failure "Wrong verification code entered."
> --
> -- >>> classifyResponse "HTTP/1.1 200 OK\r\n\r\nplain body"
> -- Failure "Unknown error (no in response)."
> classifyResponse :: T.Text -> PostResult
> classifyResponse r =
> let lower = T.toLower r
> loc = textBetween "location:" "\r" lower
> in if not (T.null loc)
> then
> -- Recover original-case Location value.
> let locOrig = textBetween "Location:" "\r" r
> v = T.strip (if T.null locOrig then loc else locOrig)
> in Success v
> else
> let h1 = T.strip (decodeEntities (stripTags (textBetween "" "
" r)))
> in if T.null h1
> then Failure "Unknown error (no in response)."
> else Failure h1
Captcha asset
-------------
Writes the raw GIF to stdout. No gophermap framing --- Venusia
streams whatever we emit; the parent menu's row already says this is
type-`g`.
> emitCaptchaGif :: T.Text -> T.Text -> Maybe T.Text -> T.Text -> IO ()
> emitCaptchaGif _board _tid Nothing _kind = pure ()
> emitCaptchaGif _board _tid (Just sid) _kind = do
> let p = sessionDir sid > "captcha.gif"
> ok <- doesFileExist p
> if not ok
> then mapM_ putLine [ errorItem "Captcha session is gone.", terminator ]
> else BS.readFile p >>= BS.hPut stdout
Error pages
-----------
> emitPathError :: Ctx -> T.Text -> IO ()
> emitPathError ctx p = mapM_ putLine
> [ errorItem ("Unrecognised path-info: /" <> p)
> , homeMenu ctx
> , terminator
> ]
> failPage :: Ctx -> T.Text -> T.Text -> IO ()
> failPage ctx msg backSubsel = mapM_ putLine
> [ errorItem msg
> , menuLine "Try again" (link ctx backSubsel) (ctxHost ctx) (ctxPort ctx)
> , homeMenu ctx
> , terminator
> ]
End of file. The next blank line ensures GHC's literate parser
doesn't choke on a trailing bird-track without content.