00:00:00 --- log: started forth/07.11.25 00:30:42 --- join: saon_ (n=saon@c-66-177-56-33.hsd1.fl.comcast.net) joined #forth 00:35:53 --- quit: saon (Read error: 110 (Connection timed out)) 01:06:05 --- quit: ygrek (Remote closed the connection) 01:06:05 --- quit: Off_Namuh (Remote closed the connection) 01:07:41 --- join: ygrek (i=user@gateway/tor/x-b10e51843f45ca7b) joined #forth 01:18:04 --- join: Off_Namuh (i=GPS@gateway/tor/x-b2512210d150f9cf) joined #forth 01:30:47 --- quit: ygrek (Remote closed the connection) 01:31:37 --- join: ygrek (i=user@gateway/tor/x-8535f212e3bd7023) joined #forth 01:37:32 --- join: doublec (n=doublec@203-211-87-234.ue.woosh.co.nz) joined #forth 03:15:56 --- quit: doublec () 03:33:45 --- join: Crest (n=crest@p5B10666F.dip.t-dialin.net) joined #forth 04:30:25 --- quit: Crest (Read error: 113 (No route to host)) 05:03:18 --- quit: Off_Namuh (Remote closed the connection) 05:04:59 --- join: edrx (i=edrx@189.25.141.102) joined #forth 05:47:59 --- quit: saon_ ("leaving") 05:58:31 --- join: nighty^ (n=nighty@sushi.rural-networks.com) joined #forth 06:09:02 --- join: ktne (n=ktne@unaffiliated/ktne) joined #forth 06:09:17 hello 06:09:34 --- part: ktne left #forth 06:41:20 --- join: nighty-- (n=nighty-@66-163-28-100.ip.tor.radiant.net) joined #forth 06:45:40 --- quit: nighty- (Read error: 110 (Connection timed out)) 08:09:19 --- quit: ecraven ("bbl") 08:11:02 --- quit: timlarson (Read error: 104 (Connection reset by peer)) 08:11:14 --- join: timlarson__ (n=timlarso@user-12l37rb.cable.mindspring.com) joined #forth 09:43:32 --- join: Crest (n=crest@p5B1068A3.dip.t-dialin.net) joined #forth 09:44:31 KragenSitaker, the stack can be managed most easily with very small factoring; keeping the number of working items on the stack to 3 or sometimes 4 at most; keeping items on the stack in the order in which they will be consumed. 09:49:40 --- quit: Crest (Read error: 113 (No route to host)) 09:57:33 --- quit: Al2O3_ ("Eggplant & SenseTalk: Driving Success Through Automation") 10:00:23 --- join: Al2O3 (n=Al2O3@229.sub-70-215-170.myvzw.com) joined #forth 10:43:04 --- quit: edrx (Read error: 110 (Connection timed out)) 10:49:22 --- quit: ygrek (Remote closed the connection) 10:57:08 --- join: edrx (i=edrx@189.25.131.9) joined #forth 12:54:23 --- join: ygrek (i=user@gateway/tor/x-6bccc44b68a140ba) joined #forth 13:08:04 --- join: doublec (n=doublec@202.180.114.137) joined #forth 13:44:27 --- join: tathi (n=josh@pdpc/supporter/bronze/tathi) joined #forth 13:44:27 --- mode: ChanServ set +o tathi 14:26:22 --- quit: ygrek (Remote closed the connection) 14:36:57 --- join: frunobulax (n=mhx@e243118.upc-e.chello.nl) joined #forth 14:38:03 In a not-yet-finished full-text search engine in ANS Forth (http://www.canonical.org/~kragen/tmp/invertedindex.fs.html) I saw: 14:38:16 : open-input ( c-addr u -- wfileid ) r/o open-file throw ; 14:38:29 On a good Forth, this will open a file but not create it. 14:38:50 Gforth for Linux is somewhat too 'helpful' in this regard. 14:39:05 This has been subject of a clf discussion for a spreadsheet. 14:39:35 Example for the generated code: 14:40:26 Flags: ANSI 14:40:27 $00548C40 : skip-non-word 8BC083ED048F4500 14:40:27 $00548C48 pop ebx 5B 14:40:28 $00548C49 lea eax, [eax*1 0 +] dword 8D040500000000 14:40:29 $00548C50 cmp ebx, [esp 0 +] dword 3B1C64 14:40:29 $00548C53 jne $00548C60 offset NEAR 0F8507000000 14:40:30 $00548C59 push ebx 53 14:40:31 $00548C5A jmp $00548C82 offset NEAR E923000000 14:40:33 $00548C5F pop ebx 5B 14:40:35 $00548C60 movzx eax, [ebx 0 +] byte 0FB603 14:40:37 $00548C63 lea ecx, [ebx 1 +] dword 8D4B01 14:40:39 $00548C66 cmp [eax $005484C0 +] byte, 0 b# 80B8C084540000 14:40:41 $00548C6D mov ebx, ecx 8BD9 14:40:43 $00548C6F je $00548C7F offset NEAR 0F840A000000 14:40:45 $00548C75 lea ebx, [ebx -1 +] dword 8D5BFF 14:40:47 $00548C78 push ebx 53 14:40:49 $00548C79 jmp $00548C82 offset NEAR E904000000 14:40:51 $00548C7E pop ebx 5B 14:40:53 $00548C7F jmp $00548C50 offset SHORT EBCF 14:40:55 $00548C81 push ebx 53 14:40:57 $00548C82 ; 8B450083C504 14:40:59 How to test this program, as compared to its C prototype? 14:41:01 please use a pastebin for this in the future 14:42:21 --- quit: frunobulax ("a quit that really quits") 14:43:30 Impressive. 14:43:48 there goes my daily fix of random x86 assembly. 15:05:10 Oh, but terribly important. 15:20:51 Quartus____: indeed, i have heard of very small factoring, keeping working items to a minimum, and keeping ordering of consumption; but I do not know if my difficulties are due to failures to do these things, or due to something else such as inexperience. 15:20:56 maybe you can tell from my code 15:23:24 you're mixing abstraction layers, i think. 15:23:30 I think so, too. 15:23:35 your memory management and low-level string manipulation is interspersed with the high-level words. 15:31:02 KragenSitaker: i ported your program to factor 15:31:32 http://pastebin.ca/797402 15:39:55 its probably even shorter in perl 16:15:53 * JasonWoof crosses his fingers, and starts up a script to convert his mp3s to ogg vorbis 16:16:49 it might not finish until morning, I hope it gets it right 16:17:58 why do it at all? 16:19:57 2 reasons: 1) to fix the broken ones, so I can put propper meta-tags on them 2) so I can run vorbis-gain on them and get uniform loudness from my music collection 16:20:18 oh, and it has a certain coolness factor 16:20:35 my ipod doesn't play oggs :( 16:20:55 yeah, I bought a player that does 16:21:14 cool little thing. about the size of a C battery 16:22:02 You can get uniform loudness from defacto extra loudness level dealie in MP3s, too 16:22:20 I looked for a replaygain thing for mp3s and didn't see one 16:22:32 didn't look all that hard though 16:22:37 There is such. Foobar2000 is a player that'll do it for you. 16:23:50 I don't run windoze 16:24:08 Well, I'm sure something similar exists for Linoox. 16:24:18 yeah, most likely 16:24:53 that's linicks btw ;) 16:26:02 it took me a while to write the script, but I get several benifits, including more familiarity with a couple scripting languages 16:26:22 and the file-naming scheme I like 16:29:26 I figured I'd have to recode to fix the broken old mp3 files anyway 16:29:38 I wouldn't think so. 16:30:49 how else could you fix it? 16:30:54 is there some mp3fixer utility? 16:31:05 Damaged headers? There are various utilities to fix those. 16:31:15 oh 16:32:16 I mp3s still have patent issues right? 16:32:21 iirc they expire somewhat soon 16:33:20 I think so. 16:33:35 factor has an incomplete mp3 player library 16:33:41 the author never finished it :/ 16:33:42 Re the expiry. And whatever issues pertain apply to the codecs; they haven't proved any obstacle thus far in any practical sense. 16:34:08 yeah they have 16:34:30 some of the big linux distros (possibly all of them) don't ship with mp3 playing capability 16:34:48 You can add the capability with one update. 16:34:53 yep 16:35:03 but it presents a barrier. 16:35:10 it marginalizes Linux 16:36:07 Ogg Vorbis' name alone marginalizes whatever it appears on. 16:36:21 having a weird name isn't nessesarily bad 16:36:29 godaddy.com is a nice example of how it can be good 16:36:47 Ogg Vorbis is a nice example of the opposite. 16:36:51 heh 16:37:03 I like open standards 16:37:07 I make an effort to use them 16:39:11 mp3 support can be easily added to linux. it's a recent development I think... it could easily get worse so that big distros dare not support mp3s at all 16:39:31 that's just an example of the kind of irritating crap that happens with proprietary formats 16:39:55 I realize that this is only marginally related to my music collection 16:40:54 it's just a general policy of mine to try to keep my data in well documented and supported (by software) open file formats 16:41:10 slava: 18:39 <@slava> its probably even shorter in perl 16:41:34 I figure this makes it most likely that I'll be able to use them far into the future 16:41:43 i mean 16:41:47 slava: thanks! awesome! 16:43:45 also thanks to you and Quartus____ on the design feedback; I'll see if I can figure out how to make the high-level words a little more high-level 16:44:22 it's true that if i don't have to juggle two stack entries every time i touch a string, the stack will get easier to manage... 16:50:47 slava: your rewrite seems to be missing .docids and add: or add-filename, which make it somewhat difficult to tell what else it's missing 16:51:05 i presume .docids prints the documents containing a search term 16:51:15 since the index is stored as a hashtable, you use the 'at' word together with the '.' word. 16:51:30 as far as i can tell, it is functionally equivalent to your version. 16:52:40 ah, i need to remove dupicates. so if one file has the same word more than once, it doesn't add several entries to that word's index. 16:52:55 yeah 16:52:58 i could also fold case to make it case insensitive, and use factor's porter-stemmer library to stem words (so that eat and eating index the same, etc). 16:52:59 very nice! 16:53:23 also you could omit punctuation from the words 16:53:37 yup. instead of using 'split', i could use parser combinators 16:53:49 and also parse the file incrementally instead of loading it all at once with 'contents' 16:54:04 oh, contents doesn't produce a lazy sequence of bytes? 16:54:09 no, it loads the file 16:54:37 support for utf8-encoded files is just a matter of calling a single word on 'contents' to decode utf8 16:55:12 for directory trees, we have some words too. 16:55:12 you don't even need to do that --- you already have support for them 16:55:33 factor strings are unicode 16:55:49 so if your data is stored as utf8, you'd need to decode it for international case conversion to work 16:56:14 international case conversion doesn't work even then 16:56:21 what do you mean? 16:56:29 dotless i 16:56:54 we have a library for unicode-aware and locale sensitive case conversion. 16:57:04 but yeah, it's not surprising that it's a lot easier if you start with a reasonably usable library that already implements word splitting, hash tables, memory allocation, and you don't do buffered file input 16:57:06 not sorting, yet 16:57:18 factor does buffering under the hood for you 16:57:25 so you can read one byte at a time if you want 16:57:30 you said contents reads the whole file, though 16:57:35 so your program isn't doing buffering 16:57:37 yes. but reading a word at a time isn't much harder 16:57:47 right, makes sense 16:57:51 " \t\n\r" read-until 16:57:56 you'd do that in a loop 16:58:05 essentially (there's the EOF case to handle) 16:58:20 but i also don't want to have N levels of function calls in the inner loop 16:59:20 how much data are you trying to index? 16:59:25 my hard disk 16:59:27 bbl 16:59:29 going to tango show 17:00:16 sounds like it would be an I/O bound task in any case 17:13:18 --- quit: Deformative (Read error: 104 (Connection reset by peer)) 17:13:42 --- join: Deformative (n=joe@c-68-61-240-49.hsd1.mi.comcast.net) joined #forth 18:03:42 --- join: LOOP-HOG (n=jasondam@c-76-105-172-75.hsd1.or.comcast.net) joined #forth 18:19:52 --- quit: LOOP-HOG () 18:56:48 --- quit: tathi ("leaving") 19:00:22 --- quit: Quartus____ () 19:00:39 --- join: Quartus (n=neal@CPE0001023f6e4f-CM001947482b20.cpe.net.cable.rogers.com) joined #forth 19:00:39 --- mode: ChanServ set +o Quartus 19:15:07 --- quit: neceve ("Konversation terminated!") 20:06:23 --- quit: doublec () 20:59:08 --- join: doublec (n=doublec@203-211-100-219.ue.woosh.co.nz) joined #forth 21:02:59 --- quit: doublec (Client Quit) 21:18:00 --- quit: Al2O3 ("Eggplant & SenseTalk: Driving Success Through Automation") 21:21:33 --- join: Al2O3 (n=Al2O3@176.sub-70-215-243.myvzw.com) joined #forth 21:23:30 slava: even in C I haven't gotten it to be an I/O bound task, even on a laptop 21:24:27 I think the best speed I got on a 500MHz laptop was 5 megabytes per second, and it's easy to get hard disks these days that do 40 21:25:36 for context, that's about 30 times as fast as Lucene, times or divided by three 21:27:20 one of the things people like about Lucene is that it's so fast 21:28:58 Lucene, of course, is extremely flexible and supports a lot of features my indexer doesn't --- document fields, different input formats, and much etc. 21:30:42 lucene is written in a higher level language 21:31:46 I wonder what Forth frunobulax was pasting from. It's not bigForth for sure. 21:32:48 yes, it is. But the other search engines people compare it with mostly aren't. 21:33:16 i'm porting a syntax highlighting engine i wrote in java to factor. 21:33:23 i write it several years ago, the code is fugly 21:33:33 its more of a rewrite than a port, but it reads the exact same syntax highlighting definition files 21:33:35 that's the idea anyway 21:33:59 is that because you really like the way your syntax highlighting definition files work? 21:34:13 not particularly -- its a rather minimalist programmable parser 21:34:22 but, the imporatnt thing is that there are almost 200 language definitions 21:34:39 some of them are quite elaborate -- html for instance highlights embdded css, javascript 21:34:47 so you'd be better off with something that parsed Vim syntax highlighting definition files instead? 21:34:47 there's also php which highlights embedded php plus html 21:34:54 i don't know how vim syntax files work 21:35:07 i guess i'm not certain that they do the embedded css and javascript 21:35:43 i have a memoization library 21:35:52 if i change : to MEMO: it defines a word which caches output values 21:36:09 i figured out a new use for it today 21:36:12 MEMO: intern-line-context ( context -- context ) ; 21:36:17 that's a word which does nothing to the top of the stack 21:36:21 cool 21:36:25 but because its defined with MEMO:, it has the effect if 'internalizing' objects 21:36:31 i'm using it to save memory 21:36:48 how does MEMO: do equality comparisons? 21:36:58 polymorphism 21:36:59 vim 7.0 comes with 481 syntax definition files 21:37:10 vim syntax requires regexps 21:37:11 how does it figure out how many arguments to look at? 21:37:16 stack effect comment 21:37:17 yes, it's all about regexps 21:37:25 oh, the stack effect comment isn't a comment? 21:37:54 no, factor checks them. if it doesn't match your code still loads but you get a message 21:38:16 nice 21:38:32 it does look like vim handles embedded JavaScript the way you'd expect: 21:38:33 syn include @htmlJavaScript syntax/javascript.vim 21:38:41 ... syn region javaScript start=+]*>+ keepend end=++me=s-1 contains=@htmlJavaScript,htmlCssStyleComment,htm 21:38:47 etc. 21:38:51 heh 21:38:56 i'm still going to use jedit syntax as a base :) 21:39:03 heh 21:39:06 the mode definitions are in xml 21:39:18 so i can give the xml lib a good workout 21:39:29 ew 21:39:32 some modes use regexps, but its not prevalent because simpler matching operations often suffice 21:39:42 so eventually i'll use the regexp lib, which is being worked on by someone else 21:40:00 well, I can understand not wanting to tackle vim syntax files, but if your metric of goodness is how many languages are supported, vim syntax files probably beat jedit 21:40:01 the java lib is 4098 lines of code 21:40:22 i want the factor version to be no bigger than, say, 400 lines 21:40:46 10x code reduction over java is typical in my experience 21:40:56 that sounds plausible --- the Factor sample you showed for the indexing problem makes it look like it's on the same level of abstraction as Python 21:41:12 yup, although its more flexible because you can extend the parser just as in forth 21:41:25 and there's a heavier emphasis on parse/compile-time meta-programming, as opposed to having too much runtime polymorphism 21:41:32 also it has a terser syntax because it's mostly horizontal and you don't have to name variables 21:41:40 like forth 21:41:44 its somewhat more static than python, even though its dynamically typed 21:41:53 the language and also the idioms in the library 21:42:05 but its more flexible in other ways, because hte compile stage is very flexible 21:43:39 it won't ever be as fast as forth, for obvious reasons 21:44:09 but the goal is to get as close as possible without having an overly complex optimizer 21:44:54 well, bigForth is only about as fast as the best Self compiler 21:45:04 so Factor could maybe be as fast as Forth 21:45:04 unlike bigforth, i do register allocation 21:45:23 i also perform specialization so factor's polymorphism can beat hand-rolled forth stuff, with passing XTs and using EXECUTE 21:45:23 if bigForth is Forth 21:45:32 yeah, you said 21:45:34 but generic dispatch is still a big hit 21:45:39 my arithmetic operators are generic 21:45:51 i go to great lengths to optimize out arithmetic dispatch where possible, using interval and type analysis 21:46:10 but in the general case, + * / involve two jumps 21:46:23 plus boxing/unboxing for big integers and floats 21:46:41 floats can be unboxed and stored in registers if the compiler can prove that certian values are indeed always floats 21:46:53 in forth, you just have your fp stack, that's it 21:47:12 much simpler implementation, because there is no polymorphism required 21:47:24 heh 21:47:31 python boxes all small integer values 21:47:34 which is silly 21:47:42 if you have a stack for every data type, you can never write a program that gets a type error 21:47:50 yeah, but you can't write generic code either 21:47:51 that doesn't make it a good idea 21:48:02 for example, try writing a 'greatest common divisor' word in ANS forth which works with either single or double cell numbers 21:48:04 can you opt out of integer overflow exceptions for performance-critical code in Factor? 21:48:13 you don't get overflow exceptions 21:48:18 small integers are promoted to heap-allocated big integers 21:48:23 that's what I meant 21:48:26 you can, however, opt to use machine arithmetic 21:48:28 can you opt out of that? 21:48:29 ok 21:48:33 and instead of a bignum, you get rollover 21:48:46 but the idea is to let the compiler make this optimization automatically 21:48:49 by using interval analysis 21:49:02 I'm not sure how much it matters to box small integer values in the normal case 21:49:03 for example, if you have a loop counter which is obunded by the length of an array, factor knows this will never be promoted to a bignum 21:49:14 and it will use machine arithmetic on the counter, inside the loop 21:49:15 big win 21:49:29 in Scheme one almost never uses integer values unless one is actually doing problem-domain arithmetic 21:49:40 we have a yuv->rgb decoder 21:49:40 it's one of the things I like about Scheme 21:49:45 which uses generic arithmetic 21:49:49 but because the yuv values come from an array of bytes 21:49:57 all the multiplications, shifts, additions, etc reduce to machine arithmetic 21:50:04 because the compiler proves that at no stage can any value exceed the word size 21:50:08 nice 21:50:19 so the programmer doesn't have to resort to type declarations, at least in this case 21:50:37 The APL approach can handle that kind of thing pretty well too 21:50:46 we have unboxed-float arrays 21:50:55 like OCaml or APL 21:50:55 and vector operations on them 21:51:09 vector operations also eliminate interpretive overhead, which can be nice 21:51:14 : v+ [ + ] 2map ; 21:51:31 HINTS: v+ float-array float-array ; 21:51:35 oh, well, vector operations can eliminate interpretive overhead :) 21:51:44 but not if you implement them in interpreted code! 21:51:46 the HINTS: tells the compiler to compile a specialized version 21:51:53 whcih assumes the inputs are float ararys 21:51:58 this eliminates all boxing/unboxing 21:52:05 except for one initial allocation of the new array, to hold the resulting values 21:52:09 does it hoist the type check out of the loop? 21:52:17 for the + dynamic dispatch I mean 21:52:17 yes 21:52:19 yes 21:52:25 cool! 21:52:33 it becomes a tight loop which reads floats into the FPU, adds them, stores them back to memory 21:52:52 but if you pass, object arrays, it dispatches on every iteration 21:53:05 the HINTS: doesn't enforce those types, it just makes it faster if they happen to agree 21:53:37 right 21:54:42 we do this for many sequence operations 21:54:51 'append' is written in a generic way, it can append any two (even user-defined) sequence types together 21:54:52 it seems like it might be better to write a yuv->rgb decoder with SSE vector primitives 21:54:55 but it has hints for string/string appends 21:55:08 KragenSitaker: yup, the compiler could do auto-vectorization after removing the dispatch :) 21:55:37 it could. or you could just write the vector primitives in assembly instead of teaching the compiler to do so. 21:55:44 yup 21:55:48 i haven't thought much about vectorization yet 21:55:49 it depends on how many different pieces of code you have to vectorize. 21:55:51 i do want v+ and such to use sse eventually 21:55:58 as little as possible 21:56:22 i want to have vectorized loops on x86 and ppc with as little platform-specific asm as possible 21:56:30 sure 21:56:31 which suggests some kind of auto-vectorization stage in the compiler 21:56:36 the analysis behind it is actually not terribly difficult 21:56:57 but given a particular quantity of platform-specific assembly, it's probably easier to put it in library routines than in the compiler backend 21:56:58 because i use 'map', 'reduce', '2map' and so on a lot, the compiler's task is simplified by a more regular structure than a typical C or Fortran loop index clusterfuck 21:58:56 heh 21:58:57 sure 21:59:23 my knowledge of compilers is fairly thin, btw, so i probably won't have much intelligent to say. but you knew that already. 21:59:30 well, so is mine 21:59:34 this is my first compiler project 21:59:56 i've learned a lot by studying forths, lisps and to a lesser extent self and the java vm 22:00:53 yeah, I've been following a pretty similar path 22:01:00 I've been trying to port Self to g++ 4.x 22:01:04 heh 22:02:53 well, I tried for a few days, then got distracted. I did finally get it to compile, but now I need to figure out how to align its gc and object-memory expectations with the new C++ ABI 22:04:15 and I don't really understand its gc 22:04:52 I tried just turning it off, hoping that would get things up and minimally running, but that wasn't sufficient. 22:06:44 Sadly I don't remember what the problem was now. 22:08:45 So I'll probably have to rediscover it from scratch when I finally get back to that. 22:09:00 In the mean time, I've been reading Forths in order to get more of a clue about compilers in general :) 22:09:54 Forth compilers are very simple, as a rule. 22:10:40 yes; that's why I've been reading them. 22:15:32 --- part: edrx left #forth 22:22:05 I'm not expecting them to tell me anything about parsing or crazy optimization schemes or anything like that. Just the basics. 22:30:59 384 lines and i haven't got to reading the xml yet. 22:31:00 hmm 23:23:28 --- quit: proteusguy (Read error: 104 (Connection reset by peer)) 23:27:14 --- join: doublec (n=doublec@203-211-100-219.ue.woosh.co.nz) joined #forth 23:32:57 --- join: Off_Namuh (i=GPS@gateway/tor/x-fa2f9f4dfe2ae369) joined #forth 23:41:29 --- join: proteusguy (n=proteusg@ppp-124.120.225.88.revip2.asianet.co.th) joined #forth 23:59:59 --- log: ended forth/07.11.25