2025-11-17 - Ecotopian Dungeon Scientist Word Cloud =================================================== Word Cloud I've been seeing word clouds for ages. Today i decided to generate one from my gopher hole. In the process i found and fixed a number of character encoding errors and typos. That alone made it worth the price of admission. Below i will outline the steps i took to generate the above image. I did this on Slackware64 15.0. Select the content to scrape words from: $ find public_gopher -type f -name '*.txt' -o -name '*.gph' \ >lis.txt $ wc -l lis.txt 477 So i have 477 text files. Parse out individual words: $ find public_gopher -type f -name '*.txt' -o -name '*.gph' \ -print0 >0lis.txt $ xargs -a 0lis.txt -0 cat |\ tr -s '[[:punct:][:space:]]' '\n' |\ tr A-Z a-z |\ sort >words.txt $ wc -l words.txt 1051670 words.txt So i have a little over a million words. Count frequency of words: $ uniq -c 9 {print $2}' >words2.txt $ wc -l words2.txt 7448 words2.txt $ tail -5 words2.txt for you that and the Much better, i have a list of 7448 unique words. Now i want to filter out boring words such as "and" and "the". $ cp words2.txt filter.txt $ ed filter.txt ... I manually edited filter.txt and deleted lines with interesting words, leaving behind only the boring words. This took a few minutes. I saved the edited file. Report word count, excluding filtered words: $ cat >filter.awk <<__EOF__ BEGIN { file = "filter.txt" while ((getline 0) { filter[$0] = 1 } close(file) } { # skip word if it begins or ends with a digit if (/^[0-9]/ || /[0-9]$/) { next } # skip word if it's less than 3 characters long if (length($0) < 3) { next } # skip word if it's in filter.txt if ($0 in filter) { next } words[$0]++ } END { for (word in words) { count = words[word] # skip word if it occurred fewer than 10 times if (count < 10) { continue } printf "%d %s\n", count, word } } __EOF__ $ awk -f filter.awk words.txt | sort -n >words3.txt I found the Python wordcloud generator on the following web pages. Create Fun Word Cloud Images Easily In Linux Terminal WordCloud Only Supported For TrueType Fonts Install Python wordcloud generator. On Slackware it is necessary to upgrade pip and Pillow first: # pip3 install --upgrade pip # pip3 install --upgrade Pillow # pip3 install wordcloud Finally, generate a word cloud: $ wordcloud_cli --text words3.txt --background white \ --font CaslonAntique.ttf --imagefile word-cloud.png \ --width 800 --height 600 $ pngtopam word-cloud.png |\ cjpeg -optimize -quality 80 >word-cloud.jpg That's it! tags: bencollver,technical,unix Tags ==== bencollver technical unix