X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: f996b,c288d57efd7c580d X-Google-Attributes: gidf996b,public From: David Riley Subject: Re: Detecting ASCII art in emails Date: 2000/04/09 Message-ID: #1/1 X-Deja-AN: 608555320 X-NNTP-Posting-Host: dmriley.demon.co.uk:194.222.183.234 References: <38E8A839.12EE@ukc.ac.uk> X-Complaints-To: abuse@demon.net X-Trace: news.demon.co.uk 955243043 nnrp-07:24120 NO-IDENT dmriley.demon.co.uk:194.222.183.234 Organization: None NNTP-Posting-Date: 9 Apr 2000 01:16:28 GMT Newsgroups: alt.ascii-art Ian Howlett writes: > I am writing a program to detect ASCII art in email messages. > > (The program will be in Perl, but any code or pseudocode will also > be warmly received.) After reading some of the suggestions in this thread, I've had a go at this and have had a fair amount of success detecting which posts to this group contain ascii-art, and in avoiding false positives with other types of text such as source code. I've only used 7 rules so far, but I've tried to make it easy to add new ones, and to adjust the weighting values. If anyone wants to see the code, it's at: http://www.dmriley.demon.co.uk/code/ascii-art-scripts/aagrep.pl -- (((( .' ) //=====e "Beneath the paving stones, the beach!" ))) \ )_//____________ -wall inscription Paris may '68 (((( J ) David Riley )===========O dave@dmriley.demon.co.uk ))))______)_______________)===========O http://www.dmriley.demon.co.uk/