00:00:00 --- log: started forth/21.03.05 00:32:29 --- quit: scoofy (Ping timeout: 276 seconds) 01:03:50 --- join: scoofy joined #forth 01:32:18 inode, felix forth 01:54:05 i guess you'd have to first implement a means of calling signal(2)/sigaction(2) to register a handler for SIGSEGV? 02:20:04 inode, ok, thanks. 02:24:02 at least i didn't see anything at all for handling signals that would be generated by illegal memory access when skimming through that ff repo? 02:46:30 yes, there is none. you are correct. it is a simple code base, easier to understand. 02:51:17 --- quit: Croran (Ping timeout: 240 seconds) 02:54:29 --- join: Croran joined #forth 03:08:48 --- join: hosewiejacke joined #forth 04:58:03 ?dup dup if the top of stack is not zero? 05:01:45 --- join: xek joined #forth 05:10:51 --- quit: Zarutian_HTC1 (Remote host closed the connection) 05:13:55 correct 05:14:38 thanks 05:20:09 --- join: f-a joined #forth 05:36:40 --- quit: hosewiejacke (Ping timeout: 245 seconds) 05:41:55 --- join: hosewiejacke joined #forth 05:56:35 --- join: elioat joined #forth 06:06:58 --- quit: hosewiejacke (Remote host closed the connection) 06:32:01 --- join: hosewiejacke joined #forth 07:16:28 --- join: mark4 joined #forth 07:20:51 gdb is very good.... at C. At Forth, a good Forth has its own debugger, but you can still use it, just with a bit of pain. I was using it to debug mark4's x64 code, better than nothing. 07:21:36 uhhh does gforth have a debugger 07:21:39 * f-a checks 07:21:50 actually traditionally a good forth needs no debugger in the traditional sense. you dont debug forth by single stepping it normally 07:21:51 but.... 07:22:10 x4 used to have a fully working on years ago but things changed and i never kept up wity it 07:22:18 its also on the todo list to get working again 07:30:24 the only part of my debugger that I actually use is the disassembler 07:31:28 well 07:31:37 say you have a problem with the logic of your program 07:31:48 or some function leaving crap on the stack etc 07:31:55 where do you start from then, to debug? 07:32:13 I usually sprinkle ." stuff" in my words but that seems not efficient 07:33:08 I normally spend a rather long time thinking through the logic before I start coding for anything that's not trivial 07:34:52 it is a good attitude, shit happens regardless :P 07:35:03 imagine you are looking at a piece of code you did *not* write yourself 07:38:12 That's harder 07:38:38 I print out and review the code, mentally walking through as much as possible 07:39:29 If it has issues, I can run it under the single stepper or with execution tracing, but that's the extent of by debugging tools 07:40:14 Most of my debugging work is done on paper or in my head 08:12:52 --- quit: f-a (Remote host closed the connection) 08:26:05 depth . in various places helps too 08:26:28 crc if its so complificated i can just write it off the top of my head i like to code it on paper first 08:27:00 and i find more bugs just scanning slowly through my code or taking the time to properly comment them 08:27:26 if you have thought about your code long enough to explain it to someone else you have thought about it long enough to code it 08:27:36 tho in practace thats an itterative process :) 08:27:53 write a primitive. test THAT primitve 08:28:04 write another primitve and immediately test that one too 08:28:28 use your already tested primitves to create higher levvel definitions and then test them as soon as you write them 08:28:33 : foo .... ; 1 2 3 foo 08:28:38 : bar .... 1 2 3 bar 08:28:45 : blah foo blah ; 1 2 3 blah 08:29:01 : blah foo bar ; 1 2 3 blah i mean 08:29:28 thats the theory of how to develop forth properly... dont ask me if thats what i always do or not lol 08:29:29 shhh 08:29:43 modern theory calls that TDD 08:30:03 forth calls it the status quo, the norm, just what we do (tm) 08:42:26 --- join: Zarutian_HTC joined #forth 09:20:44 --- mode: ChanServ set +v mark4 09:31:44 I don't know what most people consider a 'debugger' but being able to step through code or print some kind of trace would suffice, gforth can do both of those and any forth can or could with a small amount of work 09:32:35 A forth is flexible enough that its interactive mode is more useful for debugging than what you get with some other langs 09:33:21 My forth doesn't have any debug features other than .S so far, and I still find myself e.g. using ' to locate addresses etc easily while debugging 09:37:50 --- quit: KipIngram (Ping timeout: 276 seconds) 09:38:02 anyone here know how to compile a utf8 string in C because ##c just gives you a cirle jerk 09:38:13 u8"blah" is wrong 09:38:27 L"blah" gives utf32 not utf8 09:38:42 mark4, I havent tried but could you use goldbolt to test? 09:38:47 by default (gcc, linux, utf8 locale) strings should be utf8, no? 09:39:09 i know u8"blah" is wrong because when i use it there is NO string in there that i can find 09:39:32 i dont want BLAH" encoded as 'B' 'L' 'A' 'H' those are the ascii characters 09:39:53 ascii is a subset of utf-8? 09:39:56 i need to be able to encode strings containing ANY characters 09:40:34 https://godbolt.org/z/Wa63eT ? 09:40:35 wouldnt BLAH" be encoded as 'B' 'L' 'A' 'H' but then anything outside of what normally fits would get a preceding escape byte? 09:41:04 ie isnt utf8 identical until it has to encode extended characters? 09:41:13 https://dpaste.com/EH8FX2XJ5 09:41:33 thats how you decode the codepoints 09:42:53 i need to be able to store EVERY CHARACTER as a specific width codepoint 09:43:23 don't you want utf-32 then, assuming character=unicode scalar value? 09:43:27 no 09:43:39 i actually dont want utf-32 09:45:49 I am slowly working through this code and trying to figure out what it does as it is leaving 4 values on the stack and I cannot figure out why. It is doing this before the interpret routine call. So, I figure it is reading some input and figuring out if the input should be accepted or not. 09:45:52 http://ix.io/2RNq 09:46:00 LITERALLY all i need right now is to do u8"The cow jumped over the moon" and printf it and then look in the binary for how that string got encoded 09:46:03 guess what. i can printf it 09:46:13 guess what... i cant see that string anywhere in the binary 09:46:27 I understand that it would be hard to figure out from this piece. I want not sure if this is a common task across forths. 09:46:31 its NOT being encoded as 'T', 'h', 'e', ' ', .. . . . 09:47:02 i did not port it down to x4 but if you go into x64 you can literally do $2501 emit 09:47:08 it is on my machine... https://cdn.remexre.xyz/screenshots/91b0cd7710f26db15eac541d3c1d963d70d99b27.png 09:47:21 mark4, could it be 16 bit runes? 09:47:26 $2501 emit ━ ok 09:47:28 wtf is a rune? 09:47:32 NOBODY talks about runes 09:48:04 $2563 emit ╣ ok 09:48:08 they talk about characters 09:48:12 they talk about code points 09:48:38 mark4: are you trying to get https://cdn.remexre.xyz/screenshots/06ba3f3942ec57f04f49b3bba0206ef0106707a2.png ? 09:48:49 instead of whinging about it, why don't you poke around in a debugger to find the string? :) 09:48:51 --- join: KipIngram joined #forth 09:49:06 no 09:49:14 --- nick: KipIngram -> Guest79368 09:49:25 im trying to get a string "xxxxxxxx" where all of those chars are encoded as their utf-8 codepoint 09:49:32 no matter WHAT charactes they are 09:49:47 characters don't have a single codepoint in utf-8, they're a variable-width sequence of bytes 09:49:54 the fucking string is not in there! i have looked with mc i have also looked wtih ida-pro 09:50:11 what sequence of bytes are you expecting to be there 09:50:30 for the case of puts("\u2563"); 09:50:38 u8"The cow jumped over the moon" <-- the ones i specified in the u8 string 09:50:46 what is puts ? 09:50:52 does not sound like c ? :) 09:50:55 it is 09:51:00 print a string to stdout 09:51:11 like printf but without formatting ok 09:51:16 i never actually encountered puts ever 09:51:59 can you upload your binary somewhere? 09:52:05 (or the .o file, or .S file) 09:52:59 actually i got it now. i dont understand why adding puts() of the string suddenly makes it visitible in the code but it did 09:53:23 and i was compiling with -O0 so the unusesd string should not have been purged from the binary 09:53:40 possibly the varargs calling convention for your platform makes something funky? dunno 09:53:58 i was not using printf, i was not doing anything with the string till i added the puts 09:53:59 hang on 09:54:34 https://dpaste.com/5VQHFSE56 09:54:39 i just added the puts 09:57:03 https://dpaste.com/8BM5CQ5JM 09:57:25 and while i can see the string in a binary dump. ida-pro still seems to be having issues. i cant see it anwywhere in there 09:57:33 it trying to disassemble the string as opcodes? 09:58:09 counted or null byte terminated? 09:58:14 I suspect that's because of the array 09:58:28 you're specifying that it goes into a *mutable, stack-allocated* array 09:58:38 you probably want const char* foo = "asdfasdf"; 09:58:48 to make foo a pointer to a .rodata-allocated constant string 09:59:00 so what's happening is, you're stack-allocing the array you're putting the string into 09:59:05 then moving constant chunks of it in 09:59:14 let me try that :) 09:59:43 the murder weapon: https://cdn.remexre.xyz/screenshots/7569a634f6a8432558cc9413f1b501f1545b4264.png 10:00:24 aTheCowJumpedOv db 'The cow jumped over the moon',0 10:00:29 it literally didnt help me lol 10:00:54 i need a chinese person to enter a chinese string in my foo.c :) 10:00:58 and give it back to me 10:01:27 asdf̈u :P 10:01:35 oh wack my irc client displays that wrong... 10:01:40 lol 10:02:02 i saw adsfu was that supposed to be chinese for piss off? :) 10:02:23 nah, asdf + combining diaresis + u 10:03:16 lol 10:03:39 o the F seems to have dots over it unless thats my eyes going fuzzy 10:05:27 yeah 10:06:05 i REALLY REALLY do NOT want to have to includes some 400gig utf8 string library in my 40k binary 10:06:58 im not actually going to be puts'ing strings. i need to implement a puts/printf like function that prints them into one of my TUI windows 10:07:10 and my windows do not use varible width charcters 10:07:49 its fine for the strings to be variable width, i just need to know how to extract each charater from those strings one at a time and to place them into my windows at the current cursor location 10:07:54 mmmmmmm 10:07:57 so 10:08:08 full-width characters 10:08:24 well. they dont need to be and probably should not be in the sources 10:08:33 but they need to be when they are emitted to the window 10:09:01 what i should do is commit my code as it now stands and then mark the gitnhub repo as no longer private 10:09:13 but i might want to sell this :) 10:09:18 I think you probably need the annoying unicode tables for any sort of "figure out how many character cells wide this string is" 10:10:31 just got a call from the "warranty center" lol 10:10:33 --- nick: Guest79368 -> KipIngram 10:10:45 --- mode: ChanServ set +v KipIngram 10:10:59 i didnt hang up i just saind "hey banchod does your mother know that you scam people?" lol 10:11:03 he hung up 10:11:41 i know how to decode the codepoints and decompose them to their character sequences 10:11:59 the first byte tells me how many bytes are in the codepoint 10:12:11 but. erm. i need to be able to compile strings AS CODEPOINTS!!!! 10:12:20 i dont know if u8"blah" does that properly or not 10:12:33 so that's outside the C spec, but what gcc will do is 10:13:06 if you're referring to them by address (i.e. not as an array initializer), outside of constant folding and other optimizations, it'll put strings in the rodata section 10:13:18 though not every string in the source will end up as a distinct string in rodata 10:13:29 e.g. ("foo" == "foo") may or may not be true 10:14:05 really what i need to be able to do is give someone using my code to do 10:14:24 win_puts("any string in any language"); 10:14:31 --- quit: Zarutian_HTC (Remote host closed the connection) 10:14:32 i.e. i need to be able to implement that function 10:14:39 also regardless, the length of a unicode scalar value in code points isn't sufficient to determine its length on a terminal 10:14:43 so.. that function needs to be able to parse the given string 10:15:05 the string is not written to the "window" as a secuence of "characters" 10:15:11 its written as an array of codepoints 10:16:01 not every code point is one character cell wide 10:16:06 e.g. U+0308 10:16:06 i.e. given a cell of the window containing $2500 it will decompose that cell into those charactesr at display time 10:17:28 i know that u8"abcd" will be compiled ideitican to "abcd" 10:17:44 i can handle straight ascii. 10:18:29 my win_puts() or win_printf() functions need to be able to read the next item from the string and place that one item in a given cell of the window array 10:18:40 i.e. i need to be able to get string[x] for any index of x 10:18:49 for ANY possible string in any language 10:19:10 that part is trivial 10:19:10 what is x measured in? scalar values? code points? character cells? things-a-human-considers a charater? 10:19:26 in codepoints 10:19:30 is what i want 10:19:32 you need utf-32 10:19:37 mope 10:19:42 or a second array of where things start 10:19:57 utf-8 is inherently variable-width 10:20:08 im not handling the string as an array in that way... ill be parsing through it from the beginning, extracting each 8 bit byte out of it till i have exactly one code point 10:20:11 THAT i can do 10:20:32 what i need to know is how i can specify a string in C that is encoded as CODEPOINTS!!!! 10:20:55 yes. i understand im not looking for the X'th character that was not an exact example 10:21:02 i cant do x++ to get the next character 10:21:22 i need to parse forward of the current index till i get to the next index 10:21:32 i understand that 10:21:49 what i do NOT understand is how to specify a string in C so that it is encoded as a stream of utf8 codepoints 10:22:00 not utf16 not utf32. utf8 10:22:08 SPECIFICALLY a stream of utf8 codepoints 10:22:09 --- join: f-a joined #forth 10:22:23 and i do not know that u8"blah blah" does that 10:22:30 it's kinda a shame that utf-8 is relatively difficult to handle because it leads to so many english-speaking developers building things that don't support international text 10:22:47 yes and thats what im trying to accomplish 10:22:52 but idk either in C, I've mostly worked in languages like go where you're lucky enough to have rune[] 10:22:53 sorry 10:22:54 I'm 99.9% sure that unless you're using a bizzaro compiler, it's doing exactly that, as long as you follow the rules I stated above 10:23:04 > if you're referring to them by address (i.e. not as an array initializer), outside of constant folding and other optimizations, it'll put strings in the rodata section 10:23:07 i KNOW my existing code can display any character in any language just given its CODE POINT of what ever length it is 10:23:49 if a string of utf8 characters is encodes as xx xx yy zz zz zz aa aa bb cc cc then i need to be able to parse the X char, the Y char, the Z char the A char and the B and C chars 10:23:53 i can do that!!!!!!!!!!!!1 10:23:57 thats FUCKING TRIVIAL!!!!!!!!!!! 10:24:03 thers no rocket surgery involved there 10:24:12 how the FUCK do i specify that string in C 10:24:26 char foo= "abc" does not do that 10:24:31 char* foo = "abc"; 10:24:39 i dont know if char foo = u8"abc" does that 10:24:56 what if my abc string is not 'a' 'b' 'c' 10:25:03 what if its a chinese word 10:25:06 or japanese 10:25:08 or korean 10:25:12 or indonesian 10:25:18 or .. .. . . 10:25:32 i need to be able to compile strings in ANY LANGUAGE!!! 10:25:34 either this works or your compiler is ISO-incompliant (or doesn't support utf8) 10:25:36 as utf8 codepoints 10:25:44 oh 10:25:54 when you specify char f[];, you're not requesting that the string be in .rodata though 10:25:56 show me an example of compiling a string as a stream of utf8 codepoints 10:25:57 that was the previous problem 10:25:59 and PROVE thats what it does? 10:26:13 i dont give a fuck where its compiled to 10:26:18 as long as i can access it 10:26:41 u8"chinese sentence here" <-- does this compile that chinese sentence as a stream of utf8 codepoints? 10:26:56 thats ALL i care about right now 10:27:08 the parsing of that string is MY problem and i alaready know how to handle that 10:27:14 --- quit: dave0 (Quit: dave's not here) 10:27:40 as a stream of variable width utf-8 codepoints of course 10:28:09 what's the widest utf-8 codepoint? 10:28:15 not as 000x 00xx 0xxx xxxx values but as xxxxxxxxx and all mashed up togehter as a stream of codepoints that need to be handled 10:28:19 well 32 bits 10:28:28 mark4: https://cdn.remexre.xyz/screenshots/5fa65e3e6974a92b8ce65b9bdeb96a378f00f425.png ? 10:28:29 the highest utf8 codepoint is something like 0x110000 10:29:07 remexre: you didnt even specify that those were utf8 characters. you can do that? 10:29:12 yeah 10:29:20 no need for the bullshit u8"xxxxx" visual clutter? 10:29:22 !!!!!!!!!!!!!! 10:29:25 i'm using a utf-8 locale with gcc 10:29:33 which is like, eminently reasonable 10:29:50 if you're using some 80s POS compiler with a shift-jis locale, that's when u8"" is useful 10:29:55 because maybe the compiler is dumb 10:29:57 err how do you "use a utf-9 locale with c" ? 10:30:10 well the compiler in this case is the most recent gcc 10:30:15 like my system locale is a utf-8 locale 10:30:18 compiled with the c17 standard 10:30:24 oooh ok 10:30:46 so. really i dont need to worry i can just implement win_puts(win, "blah"); 10:31:02 you have to worry about combining characters and full-width characters 10:31:09 if you're doing a TUI 10:31:12 but if you're not, yeah 10:31:35 the first byte in your code is a 0xEn byte 10:31:49 that tells me how many 8 bit bytes there are in that codepoint 10:32:00 e4 bd ad 10:32:04 right, but not how many spaces on a screen the character occupies 10:32:05 e5 9b bd 10:32:10 look i can even do it in my head :) 10:32:18 e.g. both of those characters are two character cells wide 10:32:22 and U+0308 is "zero" 10:32:37 in that it modifies the previous character instead of occupying its own character cell 10:32:47 hang on give me a sec 10:33:37 ok yea my emit in my forth must be expecing 16 bit codepoints only thats a bug 10:33:40 i can fix that 10:34:26 no actually im not sure whats going on there. 10:35:01 let me see if my C code can emit those chinese chars correctly 10:35:19 ooooh i see a problem lol 10:35:29 erm. ok so... how do i tell how many cells each char takes? 10:35:35 --- part: hosewiejacke left #forth 10:36:05 that's where you need a bloated table, sadly 10:36:09 i am 99.99% sure my c code will write the correct sequcence of bytes to display those chars but... while those chars take up one cell of the window array 10:36:14 they take up 2 bytes of the display space 10:36:44 I think, 90% of the time you can tell whether a character is wider than one char-cell from its block 10:36:50 ooooh! lol nope 10:36:59 and afaik you can always tell whether a char is a combining char by block 10:37:00 err yea no scratch that idea 10:37:19 i had the idea of tracking the actual cursor location on the display to see how many cells had been used by each character 10:37:24 thats kind of too late lol 10:37:51 if I said you need a dozen ranges and to check if characters are within those ranges, would that be better 10:37:52 do combining chars always display correctly? 10:38:13 like are there any characters it's illegal to combine with? 10:38:13 for example, with the same font in xterm as i use in gnome terminal my box charsetes do not displayu corectly 10:38:15 for example 10:38:44 I certainly agree that lots of software does this wrong :P 10:38:47 the top line of a window boerder displahysa s ━━━━━ 10:38:52 in gnome terminal 10:38:59 but in xterm it displays as ━ ━ ━ ━ ━ 10:39:04 with very tiny gaps between 10:39:05 same font 10:39:15 just being rendered differently in different terminals 10:40:00 --- quit: gravicappa (Ping timeout: 245 seconds) 10:40:19 im assuming combining means that one "character" like 'x' in some language might display as 2 physical characters on the display like 'xx' 10:40:24 is that what that means? 10:40:29 other way around 10:40:49 your c code has two chinese characters in it 10:40:51 --- join: gravicappa joined #forth 10:40:56 those woul be displayed in 2 cells 10:41:05 combining char = two unicode scalar values form one character, that fits in one cell 10:41:15 full-width = cjk characters that require 2 char cells 10:41:36 so your example C does not have TWO chinese characters in it but... just one? 10:41:52 is it stored as a single codepoint in the compiled string? 10:41:56 no, it has two 10:42:06 U+0308 is the combining-character example 10:42:19 ok then it has two characters and those two charactes will be displayed in adjacent cells on the display as single characters ? 10:42:21 im lost lol 10:42:40 $0308 emit ̈ ok 10:42:43 oooh i get it 10:42:51 its like you can have an A with dots over it 10:42:57 yeah 10:43:01 or you can display the A and then display the dots over it later! 10:43:12 yep 10:43:40 ok. show me a string in c that uses an A with dots over it but specified with combining characters 10:43:55 and... im not sure how you can do that in a text mode anyway 10:43:59 --- part: f-a left #forth 10:44:00 so i dont think its an issue for me 10:44:21 in a graphical mode you can merge the two before rendering or render one then the other in the same place 10:44:35 https://cdn.remexre.xyz/screenshots/e8cabcf33b176d6939d1ada558f8b82808b6824c.png 10:44:42 ok it's possible my term is fucked 10:44:45 oh wait 10:44:47 this is a bitmap font 10:44:49 sec 10:45:04 https://cdn.remexre.xyz/screenshots/3947f77cc4e3de43a0426a9eb0a9e6d6a49e2efa.png 10:45:06 there we go 10:45:47 hang on 10:46:04 'A' emit $0308 emit Ä ok 10:46:06 it works :P 10:46:11 the terminal handles it 10:46:23 it KNOWS that $308 is a combining char and does it for me 10:46:30 however 10:46:33 :) 10:46:35 drat 10:47:21 i cant store [0000:000a][0000:0308] in consecutive cells of my window array 10:47:44 because those are ONE character 10:47:44 yeah, and there have been nasty terminal bugs about this in the past... 10:48:29 i literally need to be able to take the string containing the combined chars and COMBINE them in some way and store the combined data in my window array 10:48:51 that's normalization 10:48:55 and then when i go to actually output those to the console i need to separate them again and output them individually 10:48:56 again requires big tables 10:49:05 and doesn't always remove all combining characters 10:49:28 yea. maybe if i just say screw combining characters! lol 10:49:34 and be broken like everyone else :/. 10:50:05 yeah... I gave up on TUI instead, and am planning GUI-over-serial-line and "boring" CLI only... 10:50:25 i hate quitting lol 10:50:41 what i can do is make every cell 64 bits!!! lol 10:50:50 cuz everyone has 287456923465 gigs of ram 10:51:11 i hate this idea tho 10:51:23 you can have multiple combining chars on a single char 10:51:41 tho idk if any normal human languages use this 10:51:49 but e.g. your browser supports it (see zalgo text) 10:52:12 lol 10:53:11 i could simplify and say "this supports english utf-8 only" lol 10:53:27 thus obliterating the need for utf-8 in the first place lol 10:53:56 I added comments to this code. I am still debugging to get it working. Just want to check if my comments make sense. http://ix.io/2RNV 10:54:08 well, there are americans who have non-ascii chars in their names 10:54:29 so i ran ida pro erlier and i just got an email from them saying ida tells us you are out of date, here click this link for an update :) 10:55:33 joe9: not following, if the address is false jump to that address 10:55:34 ? 10:55:59 you are jumping to the flag not the address.. shouldnt it be ( addr f --- ) ? 10:56:23 oooh nvm the address is pointed to by esi. my bad 10:56:36 yea the code and comments look good 10:57:00 if you kept top of stack in ebx instead of eax you could do lodsd instead 11:02:43 so next would be lodsd followed by jmp eax 11:04:29 --- quit: xek (Quit: Leaving) 11:16:46 mark4, thanks. 11:18:08 This macro is named (if) 11:18:22 I am not sure if there is a convention on when to put () for names 11:18:49 usually (if) is a primitive for if 11:18:59 if might be an immediate word that compiles (if) 11:19:07 the parens are valid here 11:19:12 ok, thanks. 11:25:31 --- join: Zarutian_HTC joined #forth 11:26:41 --- quit: Zarutian_HTC (Remote host closed the connection) 12:09:41 --- quit: inode (Quit: ) 12:15:40 --- join: Zarutian_HTC joined #forth 12:23:55 ok so i just took a look back at your chinese utf8 strings. its compiling the characters not the codepoints 12:24:34 for exampe € is whats displayed. e2 82 ac is what is output to display it, 20ac is the codepoint 12:24:47 your "chinese chars" is being compiled AS CHARACTERS not as codepoints 12:24:50 no good to me :/ 12:26:26 $e4 (emit) $b8 (emit) $ad (emit) 中 ok 12:26:37 (emit) writes those characters directly to stdout 12:26:50 so im back to the original problem 12:27:02 how do i compile an array of CODEPOINTS not a stream of characters 12:29:58 e2 82 ac is the codepoint, just in utf-8 encoding while you're probably looking for some other 12:33:43 no its not the codepoint, its the utf8 character 12:33:51 hang on ill give you a non chinese example 12:34:06 ━ is the character 12:34:13 2501 is the codepoint 12:34:37 ━ is the character as displayed 12:34:39 i mean 12:35:03 but the bytes that are output to display that character are different. hang on i need to write code to get it lol 12:35:52 there's no "utf8 character". utf8 is an encoding to map larger numbers onto octets with a few constraints (0..127 are idempotent, it's self synchronizing, there are no 0 bytes except _actual_ NUL) 12:36:38 --- quit: gravicappa (Ping timeout: 256 seconds) 12:39:02 the character in this case is 81 94 e2 12:39:08 the codepoint is 2501 12:39:19 what is displayed is ━ 12:39:35 to display it you write the 81 94 e2 to the terminal 12:39:46 but those three bytes are THE CHARACTER not the codepoint 12:43:07 81 94 e2 is backwards - it's not a valid utf8 sequence 12:44:22 nicer to calculate that stuff on a stack though :-) 12:46:39 11100010 10010100 10000001 - first byte starts with 1110 = 3 bytes encoding. first byte gives "0010", second byte gives (stripping leading 10) "010100, third byte gives "000001" = 0010010100000001 12:46:41 2 base ! 0010010100000001 hex . 2501 ok 12:46:59 so yes, that's precisely the code point you're looking for 12:49:40 --- quit: elioat (Quit: elioat) 12:50:28 oh yea my bad 13:26:56 so after another long discussion in a different channel the concensus is that i need to accept "some string in some language" is going to be compiled as utf8 characters and at run time convert that string to utf8 codepoints :/ 13:30:10 --- join: f-a joined #forth 13:30:26 --- quit: f-a (Remote host closed the connection) 13:30:42 and i have encode and decode backwards 13:31:36 encoding is going from codepoint to byte sequence and decoding is going from byte sequence to codepoint. 13:31:45 that sounds horribly backwards to me 13:32:53 you can encode everything in UTF32/UCS4, which is just a list/array of 32bit values that contain a codepoint each. but to get those out to a terminal, GUI or any other target you'll have to convert it to whatever that target speaks. 13:33:35 no 13:33:51 or you keep them internally as utf-8 encoded string, where you have a reasonable chance to be able to just dump them byte-by-byte - or still convert them to whatever the target wants 13:33:54 utf32 is not acceptable to me. i dont want my library to be 500 megs in size like libncurses :P 13:34:25 or is it 5 gigs now lol 13:34:40 --- join: f-a joined #forth 13:35:13 if going from utf8 to utf32 means that utf32 is 500megs and utf8 isn't, you're predominantly using ascii characters (the high plane code points are larger in utf8 encoding, at 6 bytes, than they are in utf32, at 4) 13:35:18 ill just take the utf8 decoded bytes in the strings such as win_puts("some string\n") at run time and convert them to the codepoints 13:35:37 also, your utf8 text would still be 125megs if all ascii 13:35:39 lol 13:36:15 my point is is that space efficiency is orders of magnitude more important to me than runtime speed efficiency here 13:36:33 i mean. the executable or .so needs to be as TINY as I can make it 13:37:10 run time will already be using up two buffers of 32 bits per char each for every single window and one buffer of 32 bits for each char per secreen 13:37:30 screens are what are written to the display. windows are written into the screen if the char at X, Y has changed 13:37:34 thus the double buffering 13:38:11 anyway i have to go back to wallymart. went there, got all my stuff and had to leave it there beause my roomie had my card so he could get rent out of my bank lol 13:38:14 my rent not his :P 13:38:16 brb 13:44:00 --- join: elioat joined #forth 13:45:27 --- quit: f-a (Read error: Connection reset by peer) 13:48:45 --- join: f-a joined #forth 14:05:05 --- join: cmtptr joined #forth 14:05:13 omg how long have i not been in this channel! 14:05:51 i wonder what juicy forth gossip i've been missing out on and didn't notice 14:06:14 (rhetorical question btw, the answer is probably 68 days since that's my uptime) 14:06:19 --- quit: elioat (Quit: elioat) 14:37:27 lol 15:54:31 cmtptr: just read the logs :) 16:06:05 sounds like a lot of work 16:09:49 --- quit: f-a (Quit: leaving) 16:19:31 --- join: elioat joined #forth 16:26:36 --- join: f-a joined #forth 16:27:57 --- quit: elioat (Quit: elioat) 16:52:50 --- quit: Zarutian_HTC (Ping timeout: 260 seconds) 16:54:13 --- join: Zarutian_HTC joined #forth 17:03:03 --- quit: f-a (Read error: Connection reset by peer) 17:07:50 --- join: f-a joined #forth 18:20:33 --- join: boru` joined #forth 18:20:36 --- quit: boru (Disconnected by services) 18:20:39 --- nick: boru` -> boru 18:30:17 --- quit: f-a (Quit: leaving) 18:46:48 --- quit: lispmacs[work] (Ping timeout: 240 seconds) 19:08:00 --- join: gravicappa joined #forth 19:16:59 --- quit: Zarutian_HTC (Ping timeout: 276 seconds) 19:42:01 --- join: Zarutian_HTC joined #forth 19:43:46 --- quit: cartwright (Remote host closed the connection) 19:46:12 --- join: cartwright joined #forth 20:04:45 --- join: dave0 joined #forth 20:42:40 --- quit: sts-q (Ping timeout: 260 seconds) 20:47:49 --- quit: proteus-guy (Remote host closed the connection) 20:50:55 --- join: sts-q joined #forth 21:29:23 --- quit: cartwright (Remote host closed the connection) 21:31:39 --- join: cartwright joined #forth 21:42:46 --- quit: dave0 (Quit: dave's not here) 22:01:11 well i have a utf8 decode thats now working. i parse a string of bytes compiled by "俪俨俩俪俭修俯", convert those to codepoints then use my existing codepoint emitter to emit them and i get the right stuff displayed 22:01:28 those were just cut and paste from a dump of utf8 codepoints in my forth :) 22:01:38 they are not as far as i know a valid chinese word/sentence :) 22:02:41 俪俨俩俪俭修俯 4fea 4fe8 4fe9 4fea 4fed 4fee 4fef 22:51:41 --- join: f-a joined #forth 23:09:45 --- quit: f-a (Quit: leaving) 23:59:59 --- log: ended forth/21.03.05