00:00:00 --- log: started retro/06.08.30 00:00:46 Quartus: well, I'm sure crc would prefer doing that general solution itself, but I still could make the basic functionality this way. 00:03:22 Quartus: you said you have these things coded in windows already? 00:07:12 Cheery: http://retroforth.net/paste/?id=168 00:07:23 Cheery, yes I have coded AT-XY and GET-XY for Windows. 00:07:34 But they're built on top of the ANS layer. 00:08:16 ok, I'm trying to find out the cursor position from this thing now. 00:08:36 The paste I just put up does simple word-wrap, works for WORDS, etc. 00:10:17 It could easily be enhanced to add 'word-wrap' as a variable, so you could do word-wrap on or word-wrap off 00:11:02 seems neat solution. 00:11:36 but I'm still trying to find that cursor position -thing. 00:19:17 Here it is with an on/off variable as above. http://retroforth.net/paste/?id=169 00:19:37 You could plug your screen-width code into that, replace the screen-width constant. 00:20:43 yep, but I'm still looking at that parameter which gives me the cursor position. :) 00:21:18 Ok. I don't think it's needed, but it's one way to go. 00:23:35 gah, I give up, can't find it. :( 00:26:46 Thought, you made quite smart work, I can use it. 00:26:59 I will use it too, it meets the immediate need. :) 00:39:15 I wonder, why does your word wrap throws some 'extra' crs along the words list? 00:39:28 It doesn't for me. Maybe you need to reduce screen-width by 1. 00:40:25 with that ioctl command, this is AWESOME. :) 00:40:57 thought, I still wonder those overly newlines it throws out. 00:40:58 If you notice my code, screen-width is 79, not 80. You should decrement it. 00:41:50 causes even worse. 00:42:19 You've got something broken somewhere, then. Making it a lower number should make it better. Try fixing it at 40, see what happens. 00:43:07 still those spaces which aren't necessary. 00:43:28 If you run the code I pasted in a window >=80 chars wide, there are no extra lines. 00:43:37 I don't know what you've added/changed since then. 00:49:01 they seems software made crs, not crs caused by my window. 00:49:11 Try the code I pasted. 00:49:16 I tried. 00:49:57 I can vouch for it working in the rf-windows. Maybe Linux implements type and emit and cr differently. Let me check. 00:50:09 plus. there are still incredibly lots of space in the end. 00:50:35 Cheery, you can see the code I wrote, it's really, really simple. I can't debug your system from here, you'll have to do it. :) 00:50:50 yeh. I think I know what's the problem. 00:51:22 or then not... 00:55:50 On the first line of output, it sometimes is short, because the lf that happened wasn't using CR, but some internal one. So those cases would be cleaned-up if you used cursor-positioning code. 00:57:20 damn, I think what I'd really need is a nonlinear terminal, I realised there are much limitation which prevents me from doing a good enough pretty printer for now. :/ 00:58:20 I've just built it into my facility.fs; I'll post an update shortly. There, it uses the cursor position. 00:59:31 first, I can't define properly how to control the pretty-printer, I can't define what code makes just a long nonbreakable, I can't properly put it align the code, I can't set font, I can't do much at all for the way how it is represented. :/ 01:00:02 I rule over the pretty printer -goal, for now. 01:01:28 Updated FACILITY wordset for windows: http://quartus.net/retro/facility.fs 01:03:15 I'd need an alternate terminal design for doing this. 01:03:16 But... 01:03:23 Now I know how would I do it. :)) 01:03:39 What do you think is missing? 01:04:40 the structure, and even the smallest hack for fixing it does turn out being unconvenient, as we noticed. 01:04:46 oh, wait. 01:04:52 I think I know what's wrong now. :) 01:05:27 It's not inconvenient when it's working :) The windows one runs very nicely out of my facility.fs, acting on get-xy. Displays right up to the far edge of the screen. 01:05:28 I can fix it, then it works, but still it doesn't support everything. 01:05:34 What do you think is missing? 01:06:06 I think there is missing the handling for numbers... AND the cr -thing is not working properly. 01:06:09 true variable: word-wrap 01:06:09 :: word-wrap @ if dup get-xy drop + window-size drop > if cr then then d: type ; is type 01:06:09 :: word-wrap @ if get-xy drop window-size drop 1- > if cr then then d: emit ; is emit 01:06:30 That's with get-xy ( x y ) and window-size ( cols rows ). Works beautifully for any output string. 01:06:33 Numbers included. 01:06:40 Just a guess thought. 01:06:42 : foo 1000 for r . next ; foo wraps nicely. 01:07:02 hmm. 01:09:30 argh! miserable. :( 01:10:04 What's the problem? 01:11:05 http://retroforth.net/paste/?id=170 01:11:27 and screen-width gives 78 (window has 79 width) 01:12:22 you see the lines aren't clearly full, and there are error crs. 01:12:31 Is that a constant screen-width, or one you're getting from a system call? 01:12:49 one I'm getting from a system call, and it is correct. 01:12:57 Fix it with a constant and see if the problem is still there. 01:13:17 I've tried. 01:13:37 same problem 01:13:50 When you say you've tried my code, have you just tried rf-whatever -f wrap.fs (or whatever you called my snippet)? 01:13:57 No extra code of your own? 01:14:01 nop 01:14:26 It's really simple code. You can see what it does. 01:14:31 yep 01:14:52 I can assure you that it works (with minor issues, but not the ones you're seeing) in its original form just fine. 01:15:34 But anyway, now I see the problems with current terminal system linux advocates. 01:15:35 I had a look, and the definitions of emit/type/cr appear to be the same in all rf builds, so I don't think it's a difference in the builds. 01:15:51 And it is the reason why some smartass made html. 01:16:00 The only *nix I can test on right now is freebsd, and I don't believe there's an rf for that. 01:16:03 (which is even worse) 01:16:47 I conclude that alternate terminal design is required if one wants any kind of readability into output. 01:19:12 I think I even have such alternative design in my backburner currently. :) 01:19:33 It'll be typography quality when I'm done thinking with i.t 01:22:10 Er, there is a rf-freebsd, but it won't run on the host I have access to. 01:34:27 --- join: virl (n=virl@chello062178085149.1.12.vie.surfer.at) joined #retro 05:40:08 --- join: timlarson_ (n=timlarso@65.116.199.19) joined #retro 06:24:50 --- join: Ray_work (n=Raystm2@199.227.227.26) joined #retro 07:08:37 --- join: nighty (n=nighty@66-163-28-100.ip.tor.radiant.net) joined #retro 07:09:15 --- quit: nighty_ (Read error: 113 (No route to host)) 08:09:40 --- join: rabbitwhite (n=roger@136.160.196.114) joined #retro 10:08:22 crc: today I'm going to need your neat class system. :) 10:11:25 I need to make a set of words which runs a search&execution for containing constant in normal conditions but when (node) is true, it gives out that constant. 10:47:10 --- part: rabbitwhite left #retro 11:03:07 http://retroforth.net/paste/?id=173 11:03:14 It'd be completely working code, if I'd have the synergetic parts which I am lacking. 11:03:20 including: quicksort, oversimplified hashing function, node-stored functions, memory allocation, little help from crc... 11:03:26 node class & node deletion code. 11:03:57 It's an extendable node class. 11:04:29 oh, error handling is also lacking, 11:04:35 in other words, lots of work to do yet. :) 11:05:13 You can define new methods and give them to new nodeclasses ( crc's help needed in word 'method' ) 11:05:48 I wrote it in retro because I'm not yet convenient with rx you are doing, crc. 11:13:57 query method may fail there, so that's what needs the error handling. 11:29:04 --- quit: timlarson_ ("Leaving") 11:30:08 --- join: timlarson_ (n=timlarso@65.116.199.19) joined #retro 11:30:24 --- join: snoopy_1711 (i=snoopy_1@dslb-084-058-104-118.pools.arcor-ip.net) joined #retro 11:31:13 --- quit: timlarson_ (Client Quit) 11:31:18 --- join: timlarson_ (n=timlarso@65.116.199.19) joined #retro 11:31:58 --- quit: Snoopy42 (Nick collision from services.) 11:32:37 --- nick: snoopy_1711 -> Snoopy42 11:34:02 solved the thing where I needed crc's help. 11:37:23 :) 11:39:33 I wonder and wander, I love retro but I hate doing everything myself... 11:41:06 Of course it makes me superman compared to others because I'm in a kind of 'oldschooler' mode when I'm doing stuff. But somehow it doesn't always be nice. 11:41:55 Thought, I don't yet know how would one write code libraries with retro which could be just 'plugged in' 11:43:04 retro would be a perfect code for kind of patterned system. Doing literally what C advocates says unliterally "building programs from building blocks" 11:43:22 in C 'building blocks' are monolithic pieces of shit. 11:43:40 in forth they could be awesome and sick same time... 11:46:08 Think about the code Quartus made today. He could've relocate the input&output -fields and give me a small piece of x86 code in relocatable form, then I could've punch the hole into '79' constant console-width and make it call my function-version from that constant. 11:48:11 In normal C program that is clearly simply not possible without modifing/including the code. 12:14:13 --- quit: nighty (Remote closed the connection) 12:17:42 --- join: nighty (n=nighty@66-163-28-100.ip.tor.radiant.net) joined #retro 12:36:44 --- quit: Cheery ("Download Gaim: http://gaim.sourceforge.net/") 13:55:04 --- join: Ray-work (n=Raystm2@199.227.227.26) joined #retro 14:11:18 --- quit: Ray_work (Read error: 110 (Connection timed out)) 14:21:34 --- quit: timlarson_ ("Leaving") 14:30:06 --- join: Ray_work (n=Raystm2@199.227.227.26) joined #retro 14:45:46 --- quit: Ray-work (Read error: 110 (Connection timed out)) 15:14:05 --- quit: nighty (Read error: 113 (No route to host)) 15:23:07 --- join: Quartus_ (n=Quartus_@209.167.5.1) joined #retro 15:40:16 --- quit: virl (Remote closed the connection) 15:41:44 --- nick: nanstm -> Raystm2 15:43:26 good evening 15:46:32 Hi crc, hoping you are well. 16:00:07 --- join: nighty (n=nighty@CPE00119576a9c5-CM0012c90d36fc.cpe.net.cable.rogers.com) joined #retro 16:00:51 --- quit: Quartus_ ("used jmIrc") 16:02:46 Dinner, brb. 16:29:34 back 16:32:31 New ans stuff at the quartus link, Ray. 16:32:39 retro-ans.fs and facility.fs 16:32:46 excellent. 16:32:50 thanks Quartus. 16:33:08 Sure. I've added some significant stuff since and will be bundling it into one package in the next while. 16:33:26 I may add locals. 16:33:32 Right now, it might just be you me and crc interested in this, but ANS may just open retro up to a whole new crowd. 16:34:04 It may. As I say, I've tried to stay out of the way of retro's features, so most of the extra stuff is still available once the layer is loaded. 16:34:19 That's great as well. 16:34:30 Non-numeric prefixes aren't there. I could possibly add them back in. 16:34:58 I'm not sure there is a mechanical advantage. 16:36:04 Maybe not. I think non-numeric prefixes open the door to possible naming confusion. 16:36:34 I've written @foo and !foo words before -- maybe poor choices on my part. 16:37:53 I do it too, mainly when in the early stages of coding an interface. Pre-factor might be the term. Some kind of scaffolding to hang your ideas on before you factor all that out. 16:39:33 Right. 16:40:29 I know the - and + prefixes would cause confusion with other names, like -trailing. 16:40:53 Thinking about it, it's probably best that they're not available after the standard layer is loaded. 16:41:10 sure. I have to agree. 16:41:39 I discovered the hard way yesterday that there are a series of rf words that are also hex values, like b f and eb. I moved them to an rf vocabulary, so they're not in conflict but they're still there if anybody needs'em. 16:41:53 There are a few of those "gotcha's" in --- I was about to say . :) 16:42:42 I've added a struct mechanism and ported over some of the windows .h files, so far just the ones I use in facility.fs. 16:43:01 So instead of ... screen-info 4 cells + you can have screen-info srWindow Right @ 16:43:10 oh, cool. I'm about to read that. 16:43:22 The version up there doesn't have that stuff in it yet. You might want to hold off. 16:43:54 cool. 16:43:56 I will. 16:44:05 * Raystm2 goes to ANS instead. 16:44:27 It has 'needs' now. 16:44:30 I've been holding off making any changes to RxChess as I become more familiure with ANS. 16:44:51 needs is an import mechanism? 16:47:00 It's include, but it doesn't include a file twice. 16:47:13 oh excellent. 16:47:33 python has such a word, might even be include. 16:47:35 So you can put needs structs.fs in any file that needs structs, and it'll only get loaded the first time. Like .h files with #ifdef stuff. 16:47:48 'idempotency' is the technical term. 16:48:00 I understand. ah thanks for term. 16:51:50 I'm so glad you did this ANS. I imagine that crc is working on bringing the stray terms in to compliment. 16:52:19 Just about everything is there, now, save for the way it reads and parses source. 16:53:41 Now, Retroforth and the Rx-core make up a nice, tight, modular, modern forth. 16:53:48 Or will very soon. 16:54:20 Yes, not too bad. There's room to grow on the optimization side, as it doesn't fare well in contrast to Gforth, for instance. 16:54:47 I see. Was not aware. Beyound my scope of experiance, so far. :) 16:56:09 Yes, 4x slower than gforth-fast, 3x slower than gforth normal. 16:56:17 I suppose, starting as a general forth, that would be the case. But retro seems to be tending to be more competitve, even daily. 16:58:06 2x and 3x with some of the optimizations I've built. But I think there may be issues with the proximity of code and data that are making it slow on the pentium. Not my area of expertise but I'm reading about it. 16:58:47 Sorry, that's confusing. 3x slower than gforh-fast, 2x slower than gforth normal, when my optimizations are in. 16:59:00 Hi everyone 16:59:05 Hi Snoopy42. 16:59:18 since you are talking about performance, 16:59:41 hi Snoopy42 :) 16:59:47 yes yes? 16:59:54 last time I checked crc still used "xchange" in swap 17:00:08 Yes, it's inlined. Is there a faster way? 17:00:37 afaik xchange with memory is a absolute "no,no" since the Pentium 17:00:38 I've seen this idea somewhere. 17:00:43 (memory lock) 17:00:46 So what would you recommend? 17:00:57 third register. 17:01:11 xor them thru each other :) 17:01:17 You mean three mov's are faster than one xchg? 17:01:18 some time ago I rewrote it with a temp register, 17:01:29 was a lot faster! 17:01:33 Testing now. 17:01:40 yep they are! 17:01:41 oh it _is_ third register. cool :) 17:01:51 * Raystm2 are smart. 17:01:54 hehe 17:02:08 indeed ;-) 17:03:14 so that's one mem fetch for NOS into temp reg then send EAX to nos then then temp to EAX ? 17:03:32 yep, like this 17:06:18 while we wait for Snoopy42, I've seen some asm to ( I think it could be or but i'm saying ) xor two registers three times to swap the values in them. 17:06:34 About a 4% improvement calling a 3-move swap. I'll try inlining it. 17:06:41 neat. 17:07:30 on what CPU did you benchmark this? 17:08:08 Nearly 8% improvement inlining that sequence. It's a pentium from 2000. 17:08:31 I'm didn't bench just swap, I benched it in the context of a larger benchmark. 17:08:51 So that's a keeper, thanks. 17:08:51 ah ok, then 8% are quite nice :) 17:09:11 but IIRC my results where even larger 17:09:16 Now what about vectored words which, by default, start with a jump to the very next instruction? 17:09:29 Yes, it may improve swap by considerably more than 8%. 17:10:12 I never got a simple solution for vectored words 17:11:00 oh they start with a jump now? I think it was NOPs sometime 17:11:02 Wow, that radically improves the fib bench. 17:11:08 Maybe it was. Now it's jmp. 17:11:30 my wild guess is that NOPs are faster 17:11:46 I would think so too. 17:12:11 but who knows how modern CPUs optimize ;-) 17:12:31 14% speedup on the fib bench. 17:12:40 nice! 17:12:55 yikes very cool. 17:12:59 Sorry, wrong figure. 27%. 17:13:19 YIKES 17:13:21 keep talking, it's getting better and better ;-) 17:13:26 does crc know about this? 17:13:30 hehe. 17:13:38 That's significant! 17:13:56 I think so. How does that compare to Gforth now? 17:14:00 yep, but he once said that speed isnt his main goal 17:14:17 Sure, that's fine, but when the fruit hang this low, they should be picked. 17:14:33 exactly what I think 17:14:45 Seams easy enough to impliment. 17:15:09 Add it to ANS and let crc decide on his own. 17:15:53 Retro was faster than Gforth on this bench alread, Raystm2, so this is just gravy in that instance. The quicksort is 8% faster than it was, but still quite a bit slower than gforth. 17:16:31 okay, I see. 17:16:33 thanks. 17:17:24 rf is nearly 5 times slower than gforth-fast for the quicksort bench. And a big part of that is just initializing the array with random values, that's slower, which is what got me to thinking about caches. 17:17:49 rf is native-code, so it's an executable routine that's updating a 150k array of cells that sits right next to it. 17:18:04 Let me try something. 17:19:52 If I allot a chunk of space between the kernel, the array, and the code, it speeds up a bit. Nothing fantastic, but an improvement. 17:20:04 0.5x or so relative to gforth-fast. 17:20:28 So it comes down to 4.5x instead of 5x. So there's something there. 17:20:38 yes. indeed. 17:21:30 Perhaps the kernel's internal variables live too close to its own code, and so the same problem is inherent in the kernel. 17:21:39 Quicksort will be in the Knuth books you sent me? 17:21:57 Yes, Searching and Sorting. 17:22:13 Thanks. Reason to open them. I've been scared. hehe :) 17:22:33 Scared that I wouldn't understand them. 17:23:02 Knuth is a very nice man who explains things quite well. He's mathematically brilliant, so it can be tough if you're not versed in it, but it's all laid out. 17:23:57 Wikipedia has a fairly good article on quicksort. 17:25:35 I'm not as mathematical as I would like to be. I have Trig as pertains to electronics, but I have no Calculus. 17:25:55 Yes, 4.5x slower than gforth with a 16K gap *before* the array, none after it. So that suggests it's distance from the kernel code that's most important. 17:26:04 Without the gap, it's just over 5x slower. 17:26:26 I wonder what that's about. 17:26:54 Cache misses, I think. I'm only just digging into pentium optimization. 17:27:24 That's quite significant. Suggests that an x86 native-code Forth should definitely segregate data and code. 17:27:35 hmm. 17:28:20 You know, I understand that REVA is an optimized Retroforth. 17:28:29 yep, from all I heard mixing code and data on todays x86 is a very bad Idea 17:28:30 crc mentioned that. 17:28:36 about REVA. 17:28:43 ack. 17:29:32 Ok, so that's interesting. 17:31:23 I'm finding that starting rf-windows, doing a 16384 allot, and then whatever after, automatically gives a speedup. 17:32:07 I'll have to try walking the dictionary and stubbing out default vectors with nops. 17:32:18 What's the noop opcode again? ax ax mov ? 17:32:58 $90 17:33:17 according to old rf sources ;-) 17:33:55 dont know the opcode though :-/ 17:34:07 I'll let you know in a sec. 17:36:36 hehe wikipedia even tells you what page to find it in AOCP 17:45:08 $90 is xchg eax, eax 17:45:26 It's a tiny bit slower. 17:46:36 No improvement, really. 17:47:17 I'd have to rebuild rx to see if moving data 16K away from code helped. 17:47:38 I suspect it will. 17:48:04 interesting, so an unconditional jump seems not to thrash the pipeline 17:48:20 Not, at least, one with a zero offset. 17:48:43 yep, that was implied ;-) 17:49:07 If moving HERE 16K away from the kernel gives an overall 10% increase in a quicksort benchmark, I'd say it's a sure thing that moving all kernel data 16K away would be a plus, and further, that having all code and data separated can only help further. 17:50:34 Gforth, despite being threaded, is faster at least in part because though it mixes 'codespace' and dataspace, codespace doesn't hold native code. 17:53:13 * Raystm2 learns a new thing. 17:53:46 Just got a huge win out of a SWAP that looks like this: 17:53:55 code swap 17:53:56 nos cx mov 17:53:56 cx tos xchg 17:53:56 cx nos mov 17:53:56 end-code inline 17:54:49 Like, a 26% increase in speed over the last SWAP enhancement. 17:57:08 did you have "cx tos mov" as the 2nd instruction in your last version? 17:57:18 Yes, it did 3 moves instead. 17:57:45 The slow one: 17:57:45 code swap1 17:57:45 nos cx mov 17:57:45 tos nos mov 17:57:45 cx tos mov 17:57:46 end-code inline 17:58:03 interesting, maybe mov is executed by another "execution engine" than xchg 17:58:15 It's a major improvement in the quicksort bench. 17:58:21 so this could make senes 17:58:35 Takes it down to only about 3.3x slower. 17:59:07 I wonder which other primitives can be helped. I've re-written a few. 17:59:32 ROT needs rewriting. 17:59:32 I think I did rot -rot and over once, 18:00:22 it helpt too, but since thy are based on swap the impact was not huge but still noticeable 18:00:31 This one won't be. 18:01:44 I mean the rf version where based on swap, so if you improve swap rot and over will already be much faster 18:01:56 Except that they're kernel-internal, so don't have inlined sequences. 18:02:11 my version where assembly to 18:05:39 Nice. Quicksort bench is down to 3.1x with that new ROT. 18:07:30 cool, did you interleave xgh and mov again? 18:07:38 I did. 18:11:06 *definitely* code relative to data issues. Adding/deleting a single unused code definition in my optimization file makes a huge difference in the speed of the bench! 18:13:05 Takes it from 4x to 3x. Very odd. 18:13:11 another thought: 18:13:17 Code alignment issues? 18:13:23 yep! 18:14:06 Looks like that's the ticket. What's the trick, 16-byte alignment? 18:14:25 It youd be interesting to have a "colon" that aligns the first byte of code on a configurable alignment 18:14:40 That's what I'm working on. 18:15:08 the "magic number" surely depends on the CPU, but 16 sounds pretty "universal" 18:16:25 That's the ticket. 18:17:04 Hang on, it's not. At least not with 16. Let me keep fiddling with it. 18:19:24 32. 18:19:31 neat. 18:19:58 so you've got an P4 I guess ;-) 18:20:14 Embarssingly I have no idea. I bought this thing 6 years ago. 18:21:36 hrm, 16 doesn't do much for me, but 4 does. Something is awry. 18:21:49 according to wikipedia the P4 was released in November 2000 18:22:15 Then I don't have a p4. 18:23:27 Wait, I see why it's variable. I'm aligning HERE before calling 'entry', which creates a word header. Unfortunately it creates a word-header of variable length, so I can't guarantee code pointer alignment without going back and patching up the header afterward. I'll do that. 18:25:08 There we go. Just a regular 4-byte alignment does nicely, as long as it's consistent. 18:26:07 wow, you're already done with complesating the variable header length? 18:26:23 The way rf is coded, code starts right after the name field, which is of variable length. So you could shift all subsequent code by one byte just by adding a letter to the name of one word. 18:27:11 Yes, not hard to do. I patched the 'entry' routine to ensure that its codepointer was aligned. 18:29:20 cool 18:31:18 Ok, that reveals why my earlier tests were being thrown off by the whole adding-a-word thing. 18:32:00 hehe, yes x86 is mean! 18:32:45 16 is in fact the friendly value. 18:33:53 so how is quicksort doing now? 18:34:58 3.1x. 18:34:58 Consistently, now, instead of leaping up and down every time I add or remove a word. :) 18:35:41 this is certanly nice, but wasnt it down to 3.1 already after the new ROT? 18:35:54 Yes, but that was luck. 18:36:05 Alignment luck. I 18:36:08 :) 18:36:18 lol 18:39:29 Gah, the smoke got out somewhere. It's up to 4.2 again for some bizarre reason. 18:39:53 oh, sorry. I'm mad. 18:41:32 3.1, sometimes a touch better. 18:42:59 MUCH better. I wondered why some of what I was sure was an optimization in the quicksort turned out sometimes not to be. It was this nonsense. 18:45:43 yeah, that must be frustratig. Doing an optimazion and getting the opposite result because of stupid code alignment issues! 18:47:48 Oh yes. 18:48:21 Sometimes a radical difference, as all words subsequent to an apparently completely-unrelated change are shifted by an odd number of bytes. 18:49:58 Ok. *Now*, the 16K allot makes very little additional difference. 18:50:00 Weird. 18:54:59 --- quit: nighty (Read error: 104 (Connection reset by peer)) 18:55:09 well the P3 has seperate data and code L1 cache, co the code will probably undisturbed in its L1 cache 18:55:18 I thought the L1 cache was tiny. 18:55:30 2*16k 18:55:59 Hmm, yes, this routine likely fits in it. I wonder why the 16384 made a difference previously? Maybe it did something subtle to subsequent code placement. 18:57:05 Ok, more accurate figures: with all this enhancement, quicksort bench is at 3.4x gforth-fast. 18:57:06 could be 18:58:08 Ooh maybe I should mirror Quartus tests on this p4? 18:59:29 With no enhancement, it's 6x slower than gforth-fast. 19:01:01 With just the code alignment, it's 5.1x slower. 19:04:35 We can try that in time, Raystm2; I'd have to set you up with a bunch of files I haven't tidied for release yet. 19:06:51 It's like Abrash said, with these cpus you have to go by broad rules-of-thumb, and then throw things at the wall and see what sticks. 19:10:13 :) 19:26:30 The longer I run the bench for, the easier it is to cancel out the overhead (I'm timing it with an external app, as I haven't build a timer word into rf yet). Looks like the optimizations all-told cut the running-time of the bench in half. 19:27:11 Code-alignment by itself is good for 15%. 19:29:07 Excluding the overhead of the code that rf loads (ans module, etc.) it's looking like with the optimizations, it clocks in at 3.1x slower than gforth-fast for the quicksort bench. 19:37:36 I am distressed to see you naming your literal 'cliteral', as the c-prefix suggests a character-width operation. Why not call it LITERAL? 19:37:53 oops. Wrong chat. 20:06:59 Not to mention the politically correct connotations of such a name. 20:07:06 Heh. 20:07:20 I think that'd be with an o instead of an e. 20:07:49 ah. passes censorship. 20:08:48 "Word names for $200, Alex." 23:53:05 --- join: Cheery (n=Cheery@a81-197-19-23.elisa-laajakaista.fi) joined #retro 23:59:59 --- log: ended retro/06.08.30