00:00:00 --- log: started retro/12.11.28 00:25:02 --- join: oPless (~oPless@lart.doosh.net) joined #retro 03:30:11 --- join: tangentstorm (~michal@108-218-151-22.lightspeed.rcsntx.sbcglobal.net) joined #retro 08:24:29 --- join: karswell (~coat@93-97-29-243.zone5.bethere.co.uk) joined #retro 10:55:32 --- join: Mat2 (~claude@91-65-144-133-dynip.superkabel.de) joined #retro 10:55:39 hello ! 10:57:26 crc: thanks for the link, that is the C version I searched for. For my surprise it is not open-source so I'm out of luck :( 11:26:55 sorry :( 11:29:54 I use know the Lua vm as benchmark reference 11:30:29 (and gforth-fast) 11:36:31 * Mat2 find a way to efficient SIMD processing despite Intel SSE4 11:56:28 * Mat2 found a way to efficient SIMD processing on Intel cpu's 11:59:38 How's the new technique working out, Mat2? 12:11:59 works fine, specially on Intel Atom64 because SSE instructions can be paired with two integer ALU's 12:13:17 that doubles performance near the theoretical maximum (from ~670 to ~1600 mips) 12:14:23 :) 12:15:38 the drawback is: My fan turns on 12:18:57 somehow I have the feeling that these Intel Atom cpu's were not designed for such applications... 12:19:25 probably not! 12:19:36 can you throttle it? 12:19:58 they're probably also not used to people actually using the cpu to its full potential 12:20:22 I don't think so, anyhow: It works 12:21:20 SSE sucks by the way 12:21:27 can you use the mmx registers alongside sse? 12:22:10 yes, because there are mapped to the floating point stack 12:22:45 and I can use all 64 bit integer register of course 12:23:27 (the second 8 ones are used as call stack for example) 12:26:45 I use this manual: The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers 12:26:50 The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers 12:27:06 sorry: http://www.agner.org/optimize/ 12:28:20 If you are interested in effective assembly programming I would recommend it 12:30:22 it's definitely interesting to me, i just don't have time to follow through on everything i'm interested in :) 12:31:01 understandable :) 12:33:16 I upload my promised sources this night for sure by the way, sorry for the delay but but I wanted to get the maximum performance out of these cpu 12:34:18 no problem. i've been working on a grammar / parsing engine 12:34:58 https://github.com/sabren/b4/blob/master/pre/test_pre.pas 12:36:05 all tests pass, finally, but all it does is match... next up is actually build a syntax tree 12:37:02 looks good 12:39:24 I've also been digging through SWAG - an archive of old public domain pascal code from the DOS days. 12:39:24 i think i might set up a website that categorizes it... some of it is completely archaic but much is still relevant 12:39:42 http://kd5col.info/swag/ 12:40:43 also... i know you're not a huge fan of LLVM, but ... http://code.google.com/p/llvm-pascal/ 12:42:39 it is always good to have a compiler as alternative 12:44:24 the problem with LLVM are the limitations of there IL code which results in inefficient code for some applications 12:44:47 like efficient interpretation 12:54:48 the VGA related sources on these website are very useful for driver development (offer nice tricks for a basic framebuffer) 12:56:35 i'm curious how much of this stuff still works :) 12:57:54 i found a small calculator example on there last night that converts expressions to RPN 13:03:30 oh, algorithms don't get older, only better :D 13:04:57 :) yeah. i'm reading "brinch hansen on pascal compilers" ... from like 1985 or so. it's really good 13:13:02 I have here the book from Wirth about the oberon compiler, nice read 13:14:53 Yeah, he's put all his old books online now in PDF. They're really good. 13:15:19 in fact, i was going to use his RISC machine for my stuff until i found ngaro. 13:15:51 now you using a virtual MISC machine :) 13:16:45 yep :) 13:20:50 I'll recommend the site of Jeff Fox, you can find there a lot about optimizing code for MISC style cpu's 13:21:41 http://www.ultratechnology.com/ 13:22:14 most is colour forth related 13:22:43 oh yeah, i've seen this... i completely ignored the "chips" section, but he's got all kinds of fpga stuff on there. nice. 13:23:56 he was very influential 15:08:21 --- quit: saper (Read error: Operation timed out) 15:09:13 --- join: saper (saper@wikipedia/saper) joined #retro 15:51:31 tangentstorm: ok, the interpreter stub is uploaded, I only need to complete the primitive routines. That will be my work tomorrow 15:53:07 :) 15:53:10 * tangentstorm takes a look 15:56:26 that's actually pretty readable for assembly code 15:57:06 what do Li and Ld mean in the opcodes? 15:57:17 int / double ? 15:58:33 Load Immediate and Load Address 15:59:11 LI -> LIT, LD -> LOAD 16:00:17 BEA = Branch if Equal Always 16:08:15 it was an interesting exercise to schedule the instructions so both in-order paths work in parallel 16:11:24 well, i said readable "for assembly code" but I don't really understand assembly code :) i'm not sure what you're talking about 16:12:54 I think Dsp, Rsp, and Qsp are stacks? 16:13:28 the Atom cpu have two separate pipelines so two instructions can be executed parallel if there are no ressource conflicts 16:13:31 CND doesn't seem to be set to anything but false 16:13:59 oh, so you arranged the instructions to avoid conflicts 16:14:35 i guess DSP is the stack pointer for the D stack 16:14:47 so that would be Data, Return and Q-Stack 16:15:17 oh, that variable is not needed anymore because I now use the native flag register 16:15:19 yes 16:15:36 the Q stack is for closures 16:16:19 you're only using xmm0 right? 16:16:29 right 16:17:57 offsets into the call table are computed with the SIMD unit 16:20:37 and this unit is fully piplined (a hidden, third pipeline). The official documentation conceal this detail 16:21:40 so with the right sheduling SIMD processing cost no extra clock cycles 16:21:57 your instructions are 4-bit, right? 16:22:02 yes 16:22:45 so you could store 32 of them in each of the other sse registers 16:23:23 there are instructions to rotate those registers, too 16:23:36 so each one could basically be a loop 16:24:35 so you could have 7 threads running concurrently, with the complete code all stored directly in the registers 16:24:51 if the code fits in 32 instructions i mean 16:24:57 yes 16:26:33 exactly 15 because there exist 16 xmm register 16:27:03 that's the plan for multiprocessing 16:27:26 or if yo could fit it in 16 instructions, then you could have a 32-bit TOS and NOS for each one ( just use half of each register ) 16:28:48 for caching these virtual registers I will use the mmx register set 16:29:39 so working with register halves is not really needed 16:30:44 my guess is the fp unit is also pipelined (no extra cycles if carefully used) 16:31:32 neat stuff 16:32:14 I need some sleep, see you 16:32:21 ciao 16:32:34 --- quit: Mat2 (Quit: Verlassend) 19:28:17 --- quit: tangentstorm (Quit: trying out new tmux.conf) 19:48:21 --- join: tangentstorm (~michal@108-218-151-22.lightspeed.rcsntx.sbcglobal.net) joined #retro 20:08:19 --- quit: tangentstorm (Quit: WeeChat 0.3.2) 20:18:06 --- join: tangentstorm (~michal@108.218.151.22) joined #retro 20:20:46 --- quit: tangentstorm (Client Quit) 20:24:57 --- join: tangentstorm (~michal@108-218-151-22.lightspeed.rcsntx.sbcglobal.net) joined #retro 23:56:49 docl: I saw a giant book that was all about programming MUDS today 23:59:51 pretty sure it was this : http://www.amazon.com/Game-Programming-Premier-Press-Development/dp/1592000908 23:59:59 --- log: ended retro/12.11.28