X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: f996b,c288d57efd7c580d X-Google-Attributes: gidf996b,public From: parallax@apk.net (Wesley Clifford) Subject: Re: Detecting ASCII art in emails Date: 2000/04/04 Message-ID: <38ea4f82.27476028@news.apk.net>#1/1 X-Deja-AN: 606729216 References: <38E8A839.12EE@ukc.ac.uk> <8cdh7d$j6t$2@bigboote.WPI.EDU> X-Complaints-To: abuse@apk.net X-Trace: plonk.apk.net 954880263 2633 207.54.160.159 (4 Apr 2000 20:31:03 GMT) Organization: APK Net NNTP-Posting-Date: 4 Apr 2000 20:31:03 GMT Newsgroups: alt.ascii-art On 4 Apr 2000 19:53:17 GMT, pulp@WPI.EDU (Joshua E Millard) wrote: >Ian Howlett (ih2@ukc.ac.uk) wrote: >: I am currently designing the heuristics, or to put it another way, >: I'm trying to work out how you can identify the presence of ASCII art. >: >: My observations so far: >: >: Many non-alphanumeric characters are used. > >I would bet that you could get fairly solid results if you searched for >occurances of /, |, and \ as your primary hooks. Toss in - and _, maybe + >and some others (. and , and ' and = and others) if you like, or as a >lower-weighted factor for your heuristic, but in terms of raw efficiency, >the pipe and slashes are probably more omnipresent in ASCII art and >lacking in non-ART posts than any other characters. The first 80% or so >of the search is the easist part. :) Just don't search through any emails pertaining to Unix or Dos. cat //usr/my_file.txt|grep -i "/[../]" :-D --Wesley The only stupid question is the unasked question. And that's a good thing, because I hate stupid questions!