X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: f996b,ef11ec065eec2ab6 X-Google-Attributes: gidf996b,public From: iltzu@sci.fi (Ilmari Karonen) Subject: Re: character set study Date: 1998/01/28 Message-ID: <6anvq5$j8g$1@tron.sci.fi>#1/1 X-Deja-AN: 320166255 References: <34CE7037.4E9@atlantis-bbs.com> <34CE87E3.11E3@mpq.mpg.de> Distribution: world Organization: (dis)Order of the Holy Spoon (or whatever) Newsgroups: alt.ascii-art Andreas Freise (adf@mpq.mpg.de) wrote: : gem wrote: : > This a character count using 31721 characters-worth of : > Joan Stark's ascii drawings. I was playing around trying : > to write a BMP to "line" ascii program. To find the best : > characters to use I wrote a quick program to count the : > characters and their frequency. : Funny idea. I wonder if that analysis could be : a 'fingerprint' of the artist. Maybe not to : distinguish between hjw and jgs, but you'd : surely notice an increase of the > and < while scanning : llizard's art. Not to talk about other styles : (like by Normand Veilleux etc). Actually, I've been thinking about that. I have this little program that tries to identify files using a neural network, and while it's far from optimal, it does work. In fact, I'm pretty positive I could get it to tell basic ascii art styles (linedraw, dotty, solid, etc.) apart if I didn't mess up the search space with unrelated input.. The biggest problem is that the code can't identify clusters independently. When training, one has to assign a type to each file beforehand, and if the categories don't work out, it'll never learn. Also, I remember getting it to tell the difference between ascii art from this NG and other text files. Later I realized it was recognizing the RFC882 headers in the posts. Still, if I just found a better statistical method (neural nets feel like turning screws with a knife - possible but not quite appropriate here) it might work out. There's no way I could catch all correlations, but if I picked the inputs right, finding clusters should be enough. By the way, the inputs shouldn't only include character counts, but also the proximity of different characters. For example, in line art the cluster _,-'" is rather common. Maybe, some day.. -- ii 3D .sigXIlmari KaronenAscii 3D .sigIlmari Karonen_Ascii 3D .sigIlmari Kar fi.stereo@graphic.iltzu@sci.fi.stereo@graphiciltzu@sci.fin.stereo@graphicilt ltzu/ba/http://www.sci.fi/~iltzu/bahttp://www.sci.fi/~iltzu/1bahttp://www.sc is a stereogram!>