00:00:00 --- log: started forth/20.08.03 01:10:12 --- join: xek joined #forth 03:42:44 --- quit: xek (Ping timeout: 265 seconds) 04:42:36 --- join: xek joined #forth 05:06:14 < tabemann> the people who are like "taxes are theft" to me remind me of the rich people who kid their money in offshore accounts, while the rest of us pay our share 05:06:49 and the people who want to control others in the name of "the greater good" remind me of brutal dictatorships who commit mass genocides 05:09:03 again, I genuinely see your position as evil. maybe take a moment and reflect on why that might be. stop trying to convince me that servitude is better and instead come up with a system that works and where people may participate voluntarily 05:34:27 remexre: have you explored modal type theory? 05:35:02 siraben: nope 06:41:20 --- quit: jsoft (Ping timeout: 240 seconds) 06:52:24 mark4, does MIPS still keep the calculation if you overflow? 06:52:40 no 06:52:51 wth 06:53:12 or, i dont think it does 06:53:40 ill have to double check but i think the result might not actually be stored back on overflow 06:56:21 that is just bizarre 06:56:59 also, Ive heard how terrible the other PICs are but none of that matters if you're doing C 06:57:30 its just sad for the poor guys who have to write something in assembly 07:02:13 --- quit: mark4 (Remote host closed the connection) 07:08:21 cmtptr: so you genuinely believe that taxes are evil and analogous to genocide.... 07:11:54 individual liberty should be the first consideration, even at the expense of safety or any "common good". if you can't defend a voluntary system, then you must concede that it is at best theft 07:12:29 would you support anarchist communism? 07:14:26 is it voluntary? 07:14:59 yes 07:15:24 then what do i care? 07:18:48 I should note, though, that it operates on the basis of one works what one can and one receives what one needs 07:19:40 as long as you don't force me or others to participate, then i don't really have to support or oppose it, do i? 07:21:55 well, I need to be getting ready for work, so I'll be on later 07:42:32 --- quit: Zarutian_HTC (Ping timeout: 240 seconds) 08:15:45 --- quit: xek (Ping timeout: 256 seconds) 08:28:48 --- join: SysDsnEng joined #forth 08:31:20 --- quit: SysDsnEng (Client Quit) 10:20:58 --- quit: gravicappa (Ping timeout: 264 seconds) 10:39:31 --- quit: pareidolia (Ping timeout: 260 seconds) 10:45:06 --- join: pareidolia joined #forth 11:41:53 --- join: gravicappa joined #forth 11:50:55 --- quit: gravicappa (Ping timeout: 260 seconds) 11:53:07 --- join: gravicappa joined #forth 12:45:28 --- join: mark4 joined #forth 14:12:56 --- quit: gravicappa (Ping timeout: 240 seconds) 14:14:35 so im porting x4 to 64 bits, its not running yet but im thinking i should convert it from direct threaded to sub threaded 14:14:39 any thoughts? 14:18:21 if its faster, why wouldnt you on x64? not like you need to save space 14:21:22 heh 14:21:53 actually making it sub threaded allows some optimizations at compile time without even making a peephole 14:22:06 instead of call + you just inline the asm for + 14:22:08 sort of thing 14:23:19 but tbh im not sure how much faster it can be lol 14:33:28 that is going to make the parameter stack a software stack tho because the processor stack will be the return stack 14:33:36 so stack code will need to change :) 14:34:57 --- join: dave0 joined #forth 14:39:56 --- join: Zarutian_HTC joined #forth 14:49:53 ok so if RSP is the return stack now them RBP has to be the parameter stack 14:50:42 woud you do "xchg rbp, rsp push rbx xchg rbp, rsp" to push rbx onto the parameter stack or.... 14:50:58 mov [rbp], rbx 14:51:07 add rbp, byte CELL 14:51:48 if i allocate the parameter stack as grows down i think the former is the only way that will work 14:51:49 that looks like i386 code 14:52:05 oh wait rbx.. that's amd64 14:52:09 yes 14:52:36 when you use the PUSH opcode and RSP points below bottom of stack you get a segfault 14:52:47 if the stack is allocated as grows down you get a new page allocated 14:53:06 i can either lock p stack to a specific size or make it grows down 14:53:15 the return stack is RSP and thats already grows down 14:53:30 but only a PUSH will cause the growing 14:53:57 actually a push the other way needs to be " sub rbp, byte CELL mov [rbp], rbx" 14:54:15 but that will only cause a segfault on stack overflow no grows down 14:54:47 anyone think 4k for a parameter stack is too limiting? 14:58:22 i used rsp as the forth instruction pointer, and rsi for the return stack, and rdi for the data stack... and because the direction of return and data stack didn't matter, i made them grow up :-) 14:58:47 how do you use the stack pointer as IP 14:59:04 using the string registers for stack pointers makes some sense 14:59:19 because push and pop are just cld/std lodsq/stosq 14:59:34 i was using rsi for IP 14:59:57 use sp just as the processor stack and use RSI and RDI for p/r stacks 15:00:52 mark4: i found that in protected mode, the stack is not written to (interrupts etc. are on their own kernel stack), so you can point rsp at read-only data and pop and RET all day 15:01:07 you can RET 15:01:49 lol 15:02:08 you lose push and pop for the data stack, but the interpreter's NEXT is unbelievable 15:02:13 thats not really subroutine threaded though 15:02:22 no direct threading 15:02:25 yea 15:02:41 im currently direct threaded but in my 32 bit x4 15:02:50 i had assumed ret would be faster than lods ; jmp rax 15:03:02 very probably 15:03:05 and shorter too 15:03:24 would that be faster than sub thread? 15:03:46 hang on one second 15:04:32 ! that also makes an unconditional branch much esier to implement 15:05:00 i didn't check subrouting threading 15:05:08 conditional branches would just add eip cell to skip branch vector 15:05:20 CAN you add to IP in 64 bit mode? 15:05:50 hmmm i don't think so 15:06:16 and with ret threading how would you nest. you would need to r_push eip 15:06:21 actually 15:06:27 call 1f 15:06:29 1: 15:06:34 nope, you push the stack register 15:06:36 pop rax 15:07:11 oooh right SP is IP 15:07:16 soooo confuzing lol 15:07:21 ehehe 15:07:47 did you invent that or did you copy it from somwehere else? 15:08:08 i wrote a program to benchmark ret vs. lods;jmp rax vs lods;jmp [rax] vs pop rax;jmp rax vs pop rax;jmp [rax] 15:08:14 afaik it's my own invention 15:08:23 at least i didn't copy it from anyone 15:08:28 yea 15:08:37 im pretty sure ive invented things after someone else :) 15:08:44 but that one is not obvious 15:09:07 it only works in protected mode 15:09:20 luckily unix and windows works 15:09:25 omg 15:09:26 but no bare metal 15:09:27 ! 15:09:39 you do NOT need to use a software stack 15:09:50 sp SP is your IP 15:09:56 SI is p stack 15:09:59 di is r stack 15:10:01 for example 15:10:06 to PUSH to the parameter stack 15:10:12 xchg rsp, rsi 15:10:18 push xxx 15:10:24 xchg rbp, rsi 15:10:46 so pushes and pops to/from both the p and r stacks are 3 opcodes but STILL use push! 15:10:53 i have not benchmarked xchg/push/xchg 15:11:28 i just did mov [rsi],xxx ; add rsi,8 15:11:40 so i think the code is longer 15:11:43 so you always point to next empty? 15:11:47 but i don't know how fast it is 15:12:02 umm hold on 15:12:04 or better yet 15:12:06 std 15:12:09 stodq 15:12:10 cld 15:12:35 thats a push 15:12:41 oh i hadn't thought of that optimization 15:12:42 a pop is simply a lodsq 15:12:47 :) 15:12:51 cool 15:13:01 for BOTH p and r stacks :) 15:13:56 however, thats going to need rax as cached top of stack. i currently use rbx :) 15:14:44 actually when you do stodq with std set does it decrement before or after? 15:15:19 push and pop have to have store then dec for push and inc them load for pop or some such 15:15:53 depending on if you are a full stack or an empty stack to use arm's thinking 15:16:17 i.e. does your stack pointer point to the top of stack item or the next empty space 15:17:16 im not sure the std stosq cld method will help unless the pointer update happens either first on push, second on pop or second on push, first on pop 15:17:22 if you get what i mean 15:18:56 i looked and the stack pointers points to the top cell 15:19:39 but the question is the order of operations on stosd when std is set 15:19:46 as opposed to when cld 15:20:00 assume stack always points to the most recent item 15:20:18 in order for "std stosq cld" to be a push where "lodsq" can be its pop 15:20:36 the stosq has to decrement rdi first 15:20:47 https://paste.c-net.org/RepentCesspool 15:21:00 that's what i have written for amd64 15:21:08 and actually pushes and pops get more complex because stosd and lodsd use different registers 15:21:40 so a pop would need to do xchg rsi rdi lol 15:22:03 ya i dont think using lods and stos will be cheaper 15:22:06 for the stacks i mean 15:22:44 mark4: what OS are you on? 15:22:59 linux 15:23:03 ah cool 15:23:08 gentoo linux ) 15:23:12 you should have no trouble compiling my code 15:23:42 despite working on it for a long time, it doesn't do much 15:24:57 for x4 i write a full memory manager and a terminfo parser and a text window interface to use it 15:25:04 the tui is currently slightly broken 15:25:06 needs work 15:25:22 all in forth? 15:25:36 but i could do multiple overlapping moving windows with text scrolling in any of 4 directions 15:25:39 yes all in forth 15:25:44 cool 15:25:51 i'm a forth newbie 15:26:01 https://github.com/mark4th/x4 15:26:17 https://github.com/mark4th/x4/tree/master/src/ext/terminal thats my terminfo parser 15:26:29 https://github.com/mark4th/x4/tree/master/src/ext/tui thats the text user interface 15:26:38 but menus are the bit that are broken on the latter 15:26:42 pulldown menus i mean 15:29:12 wow you handle signals in forth in twinch.f 15:30:20 twinch is the signal that you get when you change the size of a window 15:30:34 so if the terminal window size changes forth needs to udate its COLS and ROWS constants 15:30:47 thers a bug in that implementation too that ive known about but didnt fix 15:30:49 lol 15:30:55 :-) 15:31:36 one of the things i was working on was a debugger 15:31:55 i wanted a segv handler for when someones code tried to write to bad memory 15:32:05 mark4: were you able to run my test.zip ? 15:32:09 instead of crashing the entire system just display a "oopts ya screwed up" message 15:32:18 got it downloaded 15:32:28 was working on something, cant multi task lol 15:32:35 okay :-) 15:33:26 one gotcha is that you really shouldn't mix data and code on modern x86 (32 or 64), so no putting string literals inside the compiled code and so on 15:33:30 i thought unix signals would be limited to c code 15:33:51 ... otherwise it might not even be faster 15:34:08 alexshpilkin, thats one thing i dont really care about - im already breaking the rules by making the entire forth memory +rwx :) 15:34:27 it's not a rule 15:34:36 like, not a style thinf 15:34:41 *thing 15:34:41 if i wanted to enforce harvard architecture i would have to make the forth indirect threaded 15:35:09 it's that the CPU has separate I- and D-caches 15:35:18 my forth compiles over 4 megabytes of source code per second 15:35:33 and im breaking ALL the 'modern cs unerstanding' bs rules 15:35:42 again - dont care :) 15:35:58 and if you mix code and data in a single cacheline it'll kill your performance bloodily and messily 15:36:09 i find myself fighting with the assembler, but i don't know how to write my own assembler :-/ 15:36:20 i disagree - i think the performance hit is negligable 15:36:42 as evidenced by the fact that my compiler comopiles 4 megabytes of source per second 15:36:43 or more 15:37:39 alexshpilkin: you sure that applies if you don't write to the memory? 15:44:21 alexshpilkin, when i wrote x4 (my 32 bit forth) i wrote it specifically to be readable by anyone 15:44:42 when i created this channel many moons ago i would sit in here alone for months on end 15:45:01 occasionally someone would come by and chat 15:45:25 a couple of times people told me that they were not asm coders and not forth codeers but were very interested and could read my sources 15:45:35 * alexshpilkin is looking for benchmarks of C preprocessors, but apparently Warp didn't have one (wat?) 15:45:58 i chose direct threading and did not care about cache hits or instuction pipelines what have you 15:46:20 and i STILL ended up with what i have called one of the fastest compilers of any non triial programming language 15:46:41 though there are some new compilers for new languages cominig out that have compile speed as a prime directive 15:47:07 i.e. the V programming language that will always build in under a second 15:54:50 mark4: how did you do your lookup function? FIND i think 15:55:18 my headers and code are in separate sections 15:55:31 a header has the following structure 15:55:40 dd link-to-previous 15:55:47 db lenm "name" 15:55:53 dd pointer-to-cfa 15:56:00 my vocabularies are hashed 15:56:12 to search for foo you calculate the hash for foo and that selects the vocabulary thread 15:56:27 ah 15:56:35 i have a context stack... i search that voc thread of each vocab for the target word 15:56:44 is FIND a bottleneck when compiling? 15:56:52 i have (find) which is coded in asm 15:56:56 and find which is forth 15:57:05 yes thats why you do hashed vocabularies 15:57:24 my hashing algorithm is the same as the one used by laxen and perry in F83 15:57:32 * dave0 googles 15:57:51 (count byte * 2) + first char. if there is a second char multiply that by 2 and add the second char 15:57:55 no other chars are looked at 15:58:01 just count byte and chars 1 and 2 15:58:31 (((count * 2) + char 1) * 2 + char 2) 15:58:38 or just count *2 + c1 15:58:56 and that with 0x3f to give you a index from 0 to 64 15:59:06 a vocabulary is an array of linked lists. 15:59:15 of forth headers :) 15:59:28 actually its just an array of pointers to the most recent word in that thread 16:00:04 when i create a new word i create its header and do NOT add it to the vocabulary till the definition is completed 16:00:12 but i remember its thread index 16:00:29 did you ever try different hash algorithms? 16:00:44 no. but i was thinking of looking at others 16:00:57 but this hash is ULTRA ULTRA fast and very few opcodes 16:01:07 i think a more complex hash is going to slow down the compile 16:01:27 it might give a better spread across the threads, less hits etc so the threads will maybe be smaller 16:01:38 but im not sure if thats going to help that much 16:01:45 or to a degree that can even be measured 16:02:25 i thought to only "hash" on the length of words 16:02:36 no 16:02:40 that saves comparing lengths 16:02:45 nope 16:02:56 doesn't work? 16:03:05 think of a vocabulary as an array of pointers to word headers 16:03:18 so you hash "foo" and come up wtih index N 16:03:33 yoju fetch index N of the voc and traverse that thread to see if "foo" is in there 16:03:52 because "foo" and "bar have different hash values they go on different threads 16:04:19 if you dont hash all words go on the same thread - one long chain making it potentially slower to find where "foo" is in that chain 16:04:41 if you simply hashed on word lengths you would not spread very much across the table 16:04:43 what if your index N was just the length of the word? 16:05:16 yes there would be a lot of short words on one thread and few long words 16:05:21 most forth word names are quite short - therefore most forth words woud thread into a few close indicies of the vocabulary 16:05:25 but shorter is faster to compare 16:05:46 by actually hasing the word name you spread words around throughout the voc keeping each chain smaller and faster to search 16:06:37 when searching for "foo" you hash foo. and get thread N of the voc 16:06:46 you collect thread N which is a pointer to a header 16:07:07 you compare the length of the word you are searching for with the header you are pointing at 16:07:16 if they are the same you compare strings 16:07:29 if not you point to the previous header in the thread and compare lengths 16:07:44 when lengths match you do the string compare and if they do not match you again... link back one 16:08:02 when they do match you can collect the address of the words CFA from the header 16:08:16 and return "true" or "1" which is also true 16:08:24 based on whether or not the word is immediate or not 16:08:38 immediate words have a bit set in their headers length byte 16:08:47 if thread 1 only has words of 1 char, and thread 2 only has words of 2 char, and thread 3 only has words of 3 chars, etc... you can save a comparison with the string lengths 16:09:08 and if 95% of your words are 5 6 and 7 chars in length 16:09:27 yeah i'd have to benchmark 16:09:31 you are going to keep populating threads 5 6 and 7 and threads 20, 21, 22, ... 63 will never be populated 16:09:46 calculating the hash is very very very fast 16:10:01 then you know which thread your word would be in if it was defined 16:10:11 so you serch onlhy that thread of each vocab in context 16:10:30 the more you scatter words into threads the shorter the threads will stay 16:10:50 searching shorter threads means finding or failing faster 16:11:00 the ONLY benefit this has is on compile time 16:11:02 not run time 16:11:20 run time should not be dependant on forth headers unless run time also does creatng 16:14:45 i am starting to hate nasm's crippled macros more and more 16:14:52 creating words at runtime makes my head spin 16:15:03 i TRULY hate to say it but GAS macros are more powerful 16:15:04 i come from c where there's no such thing 16:15:13 what do you thik : does lol 16:15:21 you RUN colon to create a new colon definition 16:15:30 : constant create , does> @ ; 16:15:33 0 constant foo 16:15:58 when you execute foo it literally jumps into the bit of code inside its creator following the does> 16:16:09 all constants DO the bit of code after does> 16:16:27 and that fetches the body contents of the constant which is just a self fetching variable :) 16:17:04 mark4: i think of : as compile-time, not run time 16:17:06 create creates a new forth header 16:17:12 no colon is run time 16:17:30 well its fuzzy lol 16:17:41 compile time is when STATE = 1 16:17:47 runtime is when STATE = 0 16:17:58 yeah there's this nice blurring of interpret/compile/run time that you don't have in c 16:18:06 so when you execute colon its in run time but it switches you into compile time 16:18:34 void blah(void} { do useful stuff } blah() <-- run blah at compile time lol 16:19:07 i have a forth coder friend who says that "Developing embedded applications in C is like opening a can... WITH A ROCK!" 16:23:03 i also dig how you can return any number of args from a word... and even different numbers from the same word 16:31:23 yes i often retur either a result and true or JUST false 16:31:39 ( --- n1 t | f ) is how i document that 16:31:47 yes! 17:33:23 --- quit: _whitelogger (Remote host closed the connection) 17:36:24 --- join: _whitelogger joined #forth 18:06:27 hey guys 18:31:28 --- quit: Zarutian_HTC (Remote host closed the connection) 18:59:10 --- join: boru` joined #forth 18:59:12 --- quit: boru (Disconnected by services) 18:59:15 --- nick: boru` -> boru 19:22:16 --- join: cox joined #forth 19:23:04 --- quit: mark4 (Read error: Connection reset by peer) 20:23:44 --- quit: dave0 (Quit: dave's not here) 20:27:59 --- quit: kori (Quit: WeeChat 2.8) 21:27:25 --- join: jsoft joined #forth 22:14:31 --- join: betrion[m] joined #forth 22:16:51 --- join: gravicappa joined #forth 23:02:33 --- join: lonjil2 joined #forth 23:19:51 --- join: xek joined #forth 23:54:23 --- quit: _whitelogger (Remote host closed the connection) 23:57:24 --- join: _whitelogger joined #forth 23:59:59 --- log: ended forth/20.08.03