00:00:00 --- log: started forth/21.03.05
00:32:29 --- quit: scoofy (Ping timeout: 276 seconds)
01:03:50 --- join: scoofy joined #forth
01:32:18 <joe9> inode, felix forth
01:54:05 <inode> i guess you'd have to first implement a means of calling signal(2)/sigaction(2) to register a handler for SIGSEGV?
02:20:04 <joe9> inode, ok, thanks.
02:24:02 <inode> at least i didn't see anything at all for handling signals that would be generated by illegal memory access when skimming through that ff repo?
02:46:30 <joe9> yes, there is none. you are correct. it is a simple code base, easier to understand.
02:51:17 --- quit: Croran (Ping timeout: 240 seconds)
02:54:29 --- join: Croran joined #forth
03:08:48 --- join: hosewiejacke joined #forth
04:58:03 <joe9> ?dup dup if the top of stack is not zero?
05:01:45 --- join: xek joined #forth
05:10:51 --- quit: Zarutian_HTC1 (Remote host closed the connection)
05:13:55 <proteusguy> correct
05:14:38 <joe9> thanks
05:20:09 --- join: f-a joined #forth
05:36:40 --- quit: hosewiejacke (Ping timeout: 245 seconds)
05:41:55 --- join: hosewiejacke joined #forth
05:56:35 --- join: elioat joined #forth
06:06:58 --- quit: hosewiejacke (Remote host closed the connection)
06:32:01 --- join: hosewiejacke joined #forth
07:16:28 --- join: mark4 joined #forth
07:20:51 <veltas> gdb is very good.... at C. At Forth, a good Forth has its own debugger, but you can still use it, just with a bit of pain. I was using it to debug mark4's x64 code, better than nothing.
07:21:36 <f-a> uhhh does gforth have a debugger
07:21:39 * f-a checks
07:21:50 <mark4> actually traditionally a good forth needs no debugger in the traditional sense.  you dont debug forth by single stepping it normally
07:21:51 <mark4> but....
07:22:10 <mark4> x4 used to have a fully working on years ago but things changed and i never kept up wity it
07:22:18 <mark4> its also on the todo list to get working again
07:30:24 <crc> the only part of my debugger that I actually use is the disassembler
07:31:28 <f-a> well
07:31:37 <f-a> say you have a problem with the logic of your program
07:31:48 <f-a> or some function leaving crap on the stack etc
07:31:55 <f-a> where do you start from then, to debug?
07:32:13 <f-a> I usually sprinkle ." stuff" in my words but that seems not efficient
07:33:08 <crc> I normally spend a rather long time thinking through the logic before I start coding for anything that's not trivial
07:34:52 <f-a> it is a good attitude, shit happens regardless :P
07:35:03 <f-a> imagine you are looking at a piece of code you did *not* write yourself
07:38:12 <crc> That's harder
07:38:38 <crc> I print out and review the code, mentally walking through as much as possible
07:39:29 <crc> If it has issues, I can run it under the single stepper or with execution tracing, but that's the extent of by debugging tools
07:40:14 <crc> Most of my debugging work is done on paper or in my head
08:12:52 --- quit: f-a (Remote host closed the connection)
08:26:05 <mark4> depth . in various places helps too 
08:26:28 <mark4> crc if its so complificated i can just write it off the top of my head i like to code it on paper first
08:27:00 <mark4> and i find more bugs just scanning slowly through my code or taking the time to properly comment them
08:27:26 <mark4> if you have thought about your code long enough to explain it to someone else you have thought about it long enough to code it
08:27:36 <mark4> tho in practace thats an itterative process :)
08:27:53 <mark4> write a primitive. test THAT primitve
08:28:04 <mark4> write another primitve and immediately test that one too
08:28:28 <mark4> use your already tested primitves to create higher levvel definitions and then test them as soon as you write them
08:28:33 <mark4> : foo .... ;  1 2 3 foo
08:28:38 <mark4> : bar .... 1 2 3 bar
08:28:45 <mark4> : blah foo blah ;  1 2 3 blah
08:29:01 <mark4> : blah foo bar ; 1 2 3 blah i mean
08:29:28 <mark4> thats the theory of how to develop forth properly... dont ask me if thats what i always do or not lol
08:29:29 <mark4> shhh
08:29:43 <mark4> modern theory calls that TDD
08:30:03 <mark4> forth calls it the status quo, the norm, just what we do (tm)
08:42:26 --- join: Zarutian_HTC joined #forth
09:20:44 --- mode: ChanServ set +v mark4
09:31:44 <veltas> I don't know what most people consider a 'debugger' but being able to step through code or print some kind of trace would suffice, gforth can do both of those and any forth can or could with a small amount of work
09:32:35 <veltas> A forth is flexible enough that its interactive mode is more useful for debugging than what you get with some other langs
09:33:21 <veltas> My forth doesn't have any debug features other than .S so far, and I still find myself e.g. using ' to locate addresses etc easily while debugging
09:37:50 --- quit: KipIngram (Ping timeout: 276 seconds)
09:38:02 <mark4> anyone here know how to compile a utf8 string in C because ##c just gives you a cirle jerk
09:38:13 <mark4> u8"blah" is wrong 
09:38:27 <mark4> L"blah" gives utf32 not utf8
09:38:42 <MrMobius> mark4, I havent tried but could you use goldbolt to test?
09:38:47 <remexre> by default (gcc, linux, utf8 locale) strings should be utf8, no?
09:39:09 <mark4> i know u8"blah" is wrong because when i use it there is NO string in there that i can find
09:39:32 <mark4> i dont want BLAH" encoded as 'B' 'L' 'A' 'H'  those are the ascii characters
09:39:53 <remexre> ascii is a subset of utf-8?
09:39:56 <mark4> i need to be able to encode strings containing ANY characters
09:40:34 <remexre> https://godbolt.org/z/Wa63eT ?
09:40:35 <MrMobius> wouldnt BLAH" be encoded as 'B' 'L' 'A' 'H' but then anything outside of what normally fits would get a preceding escape byte?
09:41:04 <MrMobius> ie isnt utf8 identical until it has to encode extended characters?
09:41:13 <mark4> https://dpaste.com/EH8FX2XJ5
09:41:33 <mark4> thats how you decode the codepoints
09:42:53 <mark4> i need to be able to store EVERY CHARACTER as a specific width codepoint 
09:43:23 <remexre> don't you want utf-32 then, assuming character=unicode scalar value?
09:43:27 <mark4> no
09:43:39 <mark4> i actually dont want utf-32
09:45:49 <joe9> I am slowly working through this code and trying to figure out what it does as it is leaving 4 values on the stack and I cannot figure out why. It is doing this before the interpret routine call. So, I figure it is reading some input and figuring out if the input should be accepted or not.
09:45:52 <joe9> http://ix.io/2RNq
09:46:00 <mark4> LITERALLY all i need right now is to do u8"The cow jumped over the moon" and printf it and then look in the binary for how that string got encoded
09:46:03 <mark4> guess what. i can printf it
09:46:13 <mark4> guess what... i cant see that string anywhere in the binary
09:46:27 <joe9> I understand that it would be hard to figure out from this piece. I want not sure if this is a common task across forths.
09:46:31 <mark4> its NOT being encoded as 'T', 'h', 'e', ' ', .. . . .
09:47:02 <mark4> i did not port it down to x4 but if you go into x64 you can literally do $2501  emit
09:47:08 <remexre> it is on my machine... https://cdn.remexre.xyz/screenshots/91b0cd7710f26db15eac541d3c1d963d70d99b27.png
09:47:21 <joe9> mark4, could it be 16 bit runes?
09:47:26 <mark4> $2501 emit â ok
09:47:28 <mark4> wtf is a rune?
09:47:32 <mark4> NOBODY talks about runes
09:48:04 <mark4> $2563 emit â£ ok
09:48:08 <mark4> they talk about characters
09:48:12 <mark4> they talk about code points
09:48:38 <remexre> mark4: are you trying to get https://cdn.remexre.xyz/screenshots/06ba3f3942ec57f04f49b3bba0206ef0106707a2.png ?
09:48:49 <inode> instead of whinging about it, why don't you poke around in a debugger to find the string? :)
09:48:51 --- join: KipIngram joined #forth
09:49:06 <mark4> no
09:49:14 --- nick: KipIngram -> Guest79368
09:49:25 <mark4> im trying to get a string "xxxxxxxx" where all of those chars are encoded as their utf-8 codepoint
09:49:32 <mark4> no matter WHAT charactes they are
09:49:47 <remexre> characters don't have a single codepoint in utf-8, they're a variable-width sequence of bytes
09:49:54 <mark4> the fucking string is not in there! i have looked with mc i have also looked wtih ida-pro
09:50:11 <remexre> what sequence of bytes are you expecting to be there
09:50:30 <remexre> for the case of puts("\u2563");
09:50:38 <mark4> u8"The cow jumped over the moon"  <-- the ones i specified in the u8 string
09:50:46 <mark4> what is puts ?
09:50:52 <mark4> does not sound like c ? :)
09:50:55 <inode> it is
09:51:00 <inode> print a string to stdout
09:51:11 <mark4> like printf but without formatting ok
09:51:16 <mark4> i never actually encountered puts ever
09:51:59 <remexre> can you upload your binary somewhere?
09:52:05 <remexre> (or the .o file, or .S file)
09:52:59 <mark4> actually i got it now.  i dont understand why adding puts() of the string suddenly makes it visitible in the code but it did
09:53:23 <mark4> and i was compiling with -O0 so the unusesd string should not have been purged from the binary
09:53:40 <remexre> possibly the varargs calling convention for your platform makes something funky? dunno
09:53:58 <mark4> i was not using printf, i was not doing anything with the string till i added the puts
09:53:59 <mark4> hang on
09:54:34 <mark4> https://dpaste.com/5VQHFSE56
09:54:39 <mark4> i just added the puts
09:57:03 <mark4> https://dpaste.com/8BM5CQ5JM
09:57:25 <mark4> and while i can see the string in a binary dump. ida-pro still seems to be having issues. i cant see it anwywhere in there
09:57:33 <mark4> it trying to disassemble the string as opcodes?
09:58:09 <Zarutian_HTC> counted or null byte terminated?
09:58:14 <remexre> I suspect that's because of the array
09:58:28 <remexre> you're specifying that it goes into a *mutable, stack-allocated* array
09:58:38 <remexre> you probably want const char* foo = "asdfasdf";
09:58:48 <remexre> to make foo a pointer to a .rodata-allocated constant string
09:59:00 <remexre> so what's happening is, you're stack-allocing the array you're putting the string into
09:59:05 <remexre> then moving constant chunks of it in
09:59:14 <mark4> let me try that :)
09:59:43 <remexre> the murder weapon: https://cdn.remexre.xyz/screenshots/7569a634f6a8432558cc9413f1b501f1545b4264.png
10:00:24 <mark4> aTheCowJumpedOv db 'The cow jumped over the moon',0
10:00:29 <mark4> it literally didnt help me lol
10:00:54 <mark4> i need a chinese person to enter a chinese string in my foo.c :)
10:00:58 <mark4> and give it back to me
10:01:27 <remexre> asdfÌu :P
10:01:35 <remexre> oh wack my irc client displays that wrong...
10:01:40 <mark4> lol
10:02:02 <mark4> i saw adsfu was that supposed to be chinese for piss off? :)
10:02:23 <remexre> nah, asdf + combining diaresis + u
10:03:16 <mark4> lol
10:03:39 <mark4> o the F seems to have dots over it unless thats my eyes going fuzzy
10:05:27 <remexre> yeah
10:06:05 <mark4> i REALLY REALLY do NOT want to have to includes some 400gig utf8 string library in my 40k binary
10:06:58 <mark4> im not actually going to be puts'ing strings. i need to implement a puts/printf like function that prints them into one of my TUI windows
10:07:10 <mark4> and my windows do not use varible width charcters
10:07:49 <mark4> its fine for the strings to be variable width, i just need to know how to extract each charater from those strings one at a time and to place them into my windows at the current cursor location
10:07:54 <remexre> mmmmmmm
10:07:57 <remexre> so
10:08:08 <remexre> full-width characters
10:08:24 <mark4> well. they dont need to be and probably should not be in the sources
10:08:33 <mark4> but they need to be when they are emitted to the window
10:09:01 <mark4> what i should do is commit my code as it now stands and then mark the gitnhub repo as no longer private
10:09:13 <mark4> but i might want to sell this :)
10:09:18 <remexre> I think you probably need the annoying unicode tables for any sort of "figure out how many character cells wide this string is"
10:10:31 <mark4> just got a call from the "warranty center" lol
10:10:33 --- nick: Guest79368 -> KipIngram
10:10:45 --- mode: ChanServ set +v KipIngram
10:10:59 <mark4> i didnt hang up i just saind "hey banchod does your mother know that you scam people?" lol
10:11:03 <mark4> he hung up
10:11:41 <mark4> i know how to decode the codepoints and decompose them to their character sequences
10:11:59 <mark4> the first byte tells me how many bytes are in the codepoint
10:12:11 <mark4> but. erm.  i need to be able to compile strings AS CODEPOINTS!!!!
10:12:20 <mark4> i dont know if u8"blah" does that properly or not
10:12:33 <remexre> so that's outside the C spec, but what gcc will do is
10:13:06 <remexre> if you're referring to them by address (i.e. not as an array initializer), outside of constant folding and other optimizations, it'll put strings in the rodata section
10:13:18 <remexre> though not every string in the source will end up as a distinct string in rodata
10:13:29 <remexre> e.g. ("foo" == "foo") may or may not be true
10:14:05 <mark4> really what i need to be able to do is give someone using my code to do
10:14:24 <mark4> win_puts("any string in any language");
10:14:31 --- quit: Zarutian_HTC (Remote host closed the connection)
10:14:32 <mark4> i.e. i need to be able to implement that function
10:14:39 <remexre> also regardless, the length of a unicode scalar value in code points isn't sufficient to determine its length on a terminal
10:14:43 <mark4> so.. that function needs to be able to parse the given string
10:15:05 <mark4> the string is not written to the "window" as a secuence of "characters"
10:15:11 <mark4> its written as an array of codepoints
10:16:01 <remexre> not every code point is one character cell wide
10:16:06 <remexre> e.g. U+0308
10:16:06 <mark4> i.e. given a cell of the window containing $2500 it will decompose that cell into those charactesr at display time
10:17:28 <mark4> i know that u8"abcd" will be compiled ideitican to "abcd" 
10:17:44 <mark4> i can handle straight ascii. 
10:18:29 <mark4> my win_puts() or win_printf() functions need to be able to read the next item from the string and place that one item in a given cell of the window array
10:18:40 <mark4> i.e. i need to be able to get string[x] for any index of x
10:18:49 <mark4> for ANY possible string in any language
10:19:10 <mark4> that part is trivial
10:19:10 <remexre> what is x measured in? scalar values? code points? character cells? things-a-human-considers a charater?
10:19:26 <mark4> in codepoints
10:19:30 <mark4> is what i want
10:19:32 <remexre> you need utf-32
10:19:37 <mark4> mope
10:19:42 <remexre> or a second array of where things start
10:19:57 <remexre> utf-8 is inherently variable-width
10:20:08 <mark4> im not handling the string as an array in that way... ill be parsing through it from the beginning, extracting each 8 bit byte out of it till i have exactly one code point
10:20:11 <mark4> THAT i can do
10:20:32 <mark4> what i need to know is how i can specify a string in C that is encoded as CODEPOINTS!!!!
10:20:55 <mark4> yes. i understand im not looking for the X'th character that was not an exact example
10:21:02 <mark4> i cant do x++ to get the next character
10:21:22 <mark4> i need to parse forward of the current index till i get to the next index
10:21:32 <mark4> i understand that
10:21:49 <mark4> what i do NOT understand is how to specify a string in C so that it is encoded as a stream of utf8 codepoints
10:22:00 <mark4> not utf16 not utf32. utf8 
10:22:08 <mark4> SPECIFICALLY a stream of utf8 codepoints
10:22:09 --- join: f-a joined #forth
10:22:23 <mark4> and i do not know that u8"blah blah" does that
10:22:30 <nihilazo> it's kinda a shame that utf-8 is relatively difficult to handle because it leads to so many english-speaking developers building things that don't support international text
10:22:47 <mark4> yes and thats what im trying to accomplish
10:22:52 <nihilazo> but idk either in C, I've mostly worked in languages like go where you're lucky enough to have rune[]
10:22:53 <nihilazo> sorry
10:22:54 <remexre> I'm 99.9% sure that unless you're using a bizzaro compiler, it's doing exactly that, as long as you follow the rules I stated above
10:23:04 <remexre> > if you're referring to them by address (i.e. not as an array initializer), outside of constant folding and other optimizations, it'll put strings in the rodata section 
10:23:07 <mark4> i KNOW my existing code can display any character in any language just given its CODE POINT of what ever length it is
10:23:49 <mark4> if a string of utf8 characters is encodes as xx xx  yy zz zz zz aa aa bb cc cc  then i need to be able to parse the X char, the Y char, the Z char the A char and the B and C chars
10:23:53 <mark4> i can do that!!!!!!!!!!!!1
10:23:57 <mark4> thats FUCKING TRIVIAL!!!!!!!!!!!
10:24:03 <mark4> thers no rocket surgery involved there
10:24:12 <mark4> how the FUCK do i specify that string in C
10:24:26 <mark4> char foo= "abc" does not do that
10:24:31 <remexre> char* foo = "abc";
10:24:39 <mark4> i dont know if char foo = u8"abc" does that
10:24:56 <mark4> what if my abc string is not 'a' 'b' 'c'
10:25:03 <mark4> what if its a chinese word
10:25:06 <mark4> or japanese
10:25:08 <mark4> or korean
10:25:12 <mark4> or indonesian
10:25:18 <mark4> or .. ..  . . 
10:25:32 <mark4> i need to be able to compile strings in ANY LANGUAGE!!!
10:25:34 <remexre> either this works or your compiler is ISO-incompliant (or doesn't support utf8)
10:25:36 <mark4> as utf8 codepoints
10:25:44 <mark4> oh
10:25:54 <remexre> when you specify char f[];, you're not requesting that the string be in .rodata though
10:25:56 <mark4> show me an example of compiling a string as a stream of utf8 codepoints
10:25:57 <remexre> that was the previous problem
10:25:59 <mark4> and PROVE thats what it does?
10:26:13 <mark4> i dont give a fuck where its compiled to
10:26:18 <mark4> as long as i can access it 
10:26:41 <mark4> u8"chinese sentence here"  <-- does this compile that chinese sentence as a stream of utf8 codepoints?
10:26:56 <mark4> thats ALL i care about right now
10:27:08 <mark4> the parsing of that string is MY problem and i alaready know how to handle that
10:27:14 --- quit: dave0 (Quit: dave's not here)
10:27:40 <mark4> as a stream of variable width utf-8 codepoints of course
10:28:09 <inode> what's the widest utf-8 codepoint?
10:28:15 <mark4> not as 000x 00xx 0xxx xxxx values but as xxxxxxxxx and all mashed up togehter as a stream of codepoints that need to be handled
10:28:19 <mark4> well 32 bits
10:28:28 <remexre> mark4: https://cdn.remexre.xyz/screenshots/5fa65e3e6974a92b8ce65b9bdeb96a378f00f425.png ?
10:28:29 <mark4> the highest utf8 codepoint is something like 0x110000
10:29:07 <mark4> remexre: you didnt even specify that those were utf8 characters. you can do that?
10:29:12 <remexre> yeah
10:29:20 <mark4> no need for the bullshit u8"xxxxx" visual clutter?
10:29:22 <mark4> !!!!!!!!!!!!!!
10:29:25 <remexre> i'm using a utf-8 locale with gcc
10:29:33 <remexre> which is like, eminently reasonable
10:29:50 <remexre> if you're using some 80s POS compiler with a shift-jis locale, that's when u8"" is useful
10:29:55 <remexre> because maybe the compiler is dumb
10:29:57 <mark4> err how do you "use a utf-9 locale with c" ?
10:30:10 <mark4> well the compiler in this case is the most recent gcc
10:30:15 <remexre> like my system locale is a utf-8 locale
10:30:18 <mark4> compiled with the c17 standard
10:30:24 <mark4> oooh ok
10:30:46 <mark4> so. really i dont need to worry i can just implement win_puts(win, "blah");
10:31:02 <remexre> you have to worry about combining characters and full-width characters
10:31:09 <remexre> if you're doing a TUI
10:31:12 <remexre> but if you're not, yeah
10:31:35 <mark4> the first byte in your code is a 0xEn byte
10:31:49 <mark4> that tells me how many 8 bit bytes there are in that codepoint
10:32:00 <mark4> e4 bd ad    
10:32:04 <remexre> right, but not how many spaces on a screen the character occupies
10:32:05 <mark4> e5 9b bd
10:32:10 <mark4> look i can even do it in my head :)
10:32:18 <remexre> e.g. both of those characters are two character cells wide
10:32:22 <remexre> and U+0308 is "zero"
10:32:37 <remexre> in that it modifies the previous character instead of occupying its own character cell
10:32:47 <mark4> hang on give me a sec
10:33:37 <mark4> ok yea my emit in my forth must be expecing 16 bit codepoints only thats a bug
10:33:40 <mark4> i can fix that
10:34:26 <mark4> no actually im not sure whats going on there.
10:35:01 <mark4> let me see if my C code can emit those chinese chars correctly
10:35:19 <mark4> ooooh i see a problem lol
10:35:29 <mark4> erm. ok so... how do i tell how many cells each char takes?
10:35:35 --- part: hosewiejacke left #forth
10:36:05 <remexre> that's where you need a bloated table, sadly
10:36:09 <mark4> i am 99.99% sure my c code will write the correct sequcence of bytes to display those chars but... while those chars take up one cell of the window array
10:36:14 <mark4> they take up 2 bytes of the display space
10:36:44 <remexre> I think, 90% of the time you can tell whether a character is wider than one char-cell from its block
10:36:50 <mark4> ooooh! lol nope
10:36:59 <remexre> and afaik you can always tell whether a char is a combining char by block
10:37:00 <mark4> err yea no scratch that idea
10:37:19 <mark4> i had the idea of tracking the actual cursor location on the display to see how many cells had been used by each character
10:37:24 <mark4> thats kind of too late lol
10:37:51 <remexre> if I said you need a dozen ranges and to check if characters are within those ranges, would that be better
10:37:52 <mark4> do combining chars always display correctly?
10:38:13 <remexre> like are there any characters it's illegal to combine with?
10:38:13 <mark4> for example, with the same font in xterm as i use in gnome terminal my box charsetes do not displayu corectly
10:38:15 <mark4> for example
10:38:44 <remexre> I certainly agree that lots of software does this wrong :P
10:38:47 <mark4> the top line of a window boerder displahysa s âââââ
10:38:52 <mark4> in gnome terminal
10:38:59 <mark4> but in xterm it displays as â â â â â 
10:39:04 <mark4> with very tiny gaps between
10:39:05 <mark4> same font
10:39:15 <mark4> just being rendered differently in different terminals
10:40:00 --- quit: gravicappa (Ping timeout: 245 seconds)
10:40:19 <mark4> im assuming combining means that one "character" like 'x' in some language might display as 2 physical characters on the display like 'xx'
10:40:24 <mark4> is that what that means?
10:40:29 <remexre> other way around
10:40:49 <mark4> your c code has two chinese characters in it
10:40:51 --- join: gravicappa joined #forth
10:40:56 <mark4> those woul be displayed in 2 cells
10:41:05 <remexre> combining char = two unicode scalar values form one character, that fits in one cell
10:41:15 <remexre> full-width = cjk characters that require 2 char cells
10:41:36 <mark4> so your example C does not have TWO chinese characters in it but... just one?
10:41:52 <mark4> is it stored as a single codepoint in the compiled string?
10:41:56 <remexre> no, it has two
10:42:06 <remexre> U+0308  is the combining-character example
10:42:19 <mark4> ok then it has two characters and those two charactes will be displayed in adjacent cells on the display as single characters ?
10:42:21 <mark4> im lost lol
10:42:40 <mark4> $0308 emit Ì ok
10:42:43 <mark4> oooh i get it
10:42:51 <mark4> its like you can have an A with dots over it
10:42:57 <remexre> yeah
10:43:01 <mark4> or you can display the A and then display the dots over it later!
10:43:12 <remexre> yep
10:43:40 <mark4> ok. show me a string in c that uses an A with dots over it but specified with combining characters 
10:43:55 <mark4> and... im not sure how you can do that in a text mode anyway
10:43:59 --- part: f-a left #forth
10:44:00 <mark4> so i dont think its an issue for me
10:44:21 <mark4> in a graphical mode you can merge the two before rendering or render one then the other in the same place
10:44:35 <remexre> https://cdn.remexre.xyz/screenshots/e8cabcf33b176d6939d1ada558f8b82808b6824c.png
10:44:42 <remexre> ok it's possible my term is fucked
10:44:45 <remexre> oh wait
10:44:47 <remexre> this is a bitmap font
10:44:49 <remexre> sec
10:45:04 <remexre> https://cdn.remexre.xyz/screenshots/3947f77cc4e3de43a0426a9eb0a9e6d6a49e2efa.png
10:45:06 <remexre> there we go
10:45:47 <mark4> hang on
10:46:04 <mark4> 'A' emit $0308 emit AÌ ok
10:46:06 <mark4> it works :P
10:46:11 <mark4> the terminal handles it
10:46:23 <mark4> it KNOWS that $308 is a combining char and does it for me
10:46:30 <mark4> however
10:46:33 <mark4> :)
10:46:35 <mark4> drat
10:47:21 <mark4> i cant store [0000:000a][0000:0308] in consecutive cells of my window array
10:47:44 <mark4> because those are ONE character 
10:47:44 <remexre> yeah, and there have been nasty terminal bugs about this in the past...
10:48:29 <mark4> i literally need to be able to take the string containing the combined chars and COMBINE them in some way and store the combined data in my window array
10:48:51 <remexre> that's normalization
10:48:55 <mark4> and then when i go to actually output those to the console i need to separate them again and output them individually
10:48:56 <remexre> again requires big tables
10:49:05 <remexre> and doesn't always remove all combining characters
10:49:28 <mark4> yea. maybe if i just say screw combining characters! lol
10:49:34 <mark4> and be broken like everyone else :/.
10:50:05 <remexre> yeah... I gave up on TUI instead, and am planning GUI-over-serial-line and "boring" CLI only...
10:50:25 <mark4> i hate quitting lol
10:50:41 <mark4> what i can do is make every cell 64 bits!!! lol
10:50:50 <mark4> cuz everyone has 287456923465 gigs of ram
10:51:11 <mark4> i hate this idea tho
10:51:23 <remexre> you can have multiple combining chars on a single char
10:51:41 <remexre> tho idk if any normal human languages use this
10:51:49 <remexre> but e.g. your browser supports it (see zalgo text)
10:52:12 <mark4> lol
10:53:11 <mark4> i could simplify and say "this supports english utf-8 only" lol
10:53:27 <mark4> thus obliterating the need for utf-8 in the first place lol
10:53:56 <joe9> I added comments to this code. I am still debugging to get it working. Just want to check if my comments make sense. http://ix.io/2RNV
10:54:08 <remexre> well, there are americans who have non-ascii chars in their names
10:54:29 <mark4> so i ran ida pro erlier and i just got an email from them saying ida tells us you are out of date, here click this link for an update :)
10:55:33 <mark4> joe9: not following, if the address is false jump to that address
10:55:34 <mark4> ?
10:55:59 <mark4> you are jumping to the flag not the address..  shouldnt it be ( addr f --- ) ?
10:56:23 <mark4> oooh nvm the address is pointed to by esi.  my bad
10:56:36 <mark4> yea the code and comments look good
10:57:00 <mark4> if you kept top of stack in ebx instead of eax you could do lodsd instead
11:02:43 <mark4> so next would be lodsd followed by jmp eax
11:04:29 --- quit: xek (Quit: Leaving)
11:16:46 <joe9> mark4, thanks.
11:18:08 <joe9> This macro is named (if)
11:18:22 <joe9> I am not sure if there is a convention on when to put () for names
11:18:49 <mark4> usually (if) is a primitive for if
11:18:59 <mark4> if might be an immediate word that compiles (if)
11:19:07 <mark4> the parens are valid here
11:19:12 <joe9> ok, thanks.
11:25:31 --- join: Zarutian_HTC joined #forth
11:26:41 --- quit: Zarutian_HTC (Remote host closed the connection)
12:09:41 --- quit: inode (Quit: )
12:15:40 --- join: Zarutian_HTC joined #forth
12:23:55 <mark4> ok so i just took a look back at your chinese utf8 strings. its compiling the characters not the codepoints
12:24:34 <mark4> for exampe â¬ is whats displayed. e2 82 ac is what is output to display it, 20ac is the codepoint
12:24:47 <mark4> your "chinese chars" is being compiled AS CHARACTERS not as codepoints
12:24:50 <mark4> no good to me :/
12:26:26 <mark4> $e4 (emit) $b8 (emit) $ad (emit) ä¸­ ok
12:26:37 <mark4> (emit) writes those characters directly to stdout
12:26:50 <mark4> so im back to the original problem
12:27:02 <mark4> how do i compile an array of CODEPOINTS not a stream of characters
12:29:58 <patrickg> e2 82 ac is the codepoint, just in utf-8 encoding while you're probably looking for some other
12:33:43 <mark4> no its not the codepoint, its the utf8 character
12:33:51 <mark4> hang on ill give you a non chinese example
12:34:06 <mark4> â is the character
12:34:13 <mark4> 2501 is the codepoint
12:34:37 <mark4> â is the character as displayed
12:34:39 <mark4> i mean
12:35:03 <mark4> but the bytes that are output to display that character are different. hang on i need to write code to get it lol
12:35:52 <patrickg> there's no "utf8 character". utf8 is an encoding to map larger numbers onto octets with a few constraints (0..127 are idempotent, it's self synchronizing, there are no 0 bytes except _actual_ NUL)
12:36:38 --- quit: gravicappa (Ping timeout: 256 seconds)
12:39:02 <mark4> the character in this case is 81 94 e2
12:39:08 <mark4> the codepoint is 2501
12:39:19 <mark4> what is displayed is â
12:39:35 <mark4> to display it you write the 81 94 e2 to the terminal
12:39:46 <mark4> but those three bytes are THE CHARACTER not the codepoint
12:43:07 <patrickg> 81 94 e2 is backwards - it's not a valid utf8 sequence
12:44:22 <patrickg> nicer to calculate that stuff on a stack though :-)
12:46:39 <patrickg> 11100010 10010100 10000001 - first byte starts with 1110 = 3 bytes encoding. first byte gives "0010", second byte gives (stripping leading 10) "010100, third byte gives "000001" = 0010010100000001
12:46:41 <patrickg> 2 base ! 0010010100000001 hex . 2501  ok
12:46:59 <patrickg> so yes, that's precisely the code point you're looking for
12:49:40 --- quit: elioat (Quit: elioat)
12:50:28 <mark4> oh yea my bad
13:26:56 <mark4> so after another long discussion in a different channel the concensus is that i need to accept "some string in some language" is going to be compiled as utf8 characters and at run time convert that string to utf8 codepoints :/
13:30:10 --- join: f-a joined #forth
13:30:26 --- quit: f-a (Remote host closed the connection)
13:30:42 <mark4> and i have encode and decode backwards
13:31:36 <mark4> encoding is going from codepoint to byte sequence and decoding is going from byte sequence to codepoint. 
13:31:45 <mark4> that sounds horribly backwards to me
13:32:53 <patrickg> you can encode everything in UTF32/UCS4, which is just a list/array of 32bit values that contain a codepoint each. but to get those out to a terminal, GUI or any other target you'll have to convert it to whatever that target speaks.
13:33:35 <mark4> no
13:33:51 <patrickg> or you keep them internally as utf-8 encoded string, where you have a reasonable chance to be able to just dump them byte-by-byte - or still convert them to whatever the target wants
13:33:54 <mark4> utf32 is not acceptable to me. i dont want my library to be 500 megs in size like libncurses :P
13:34:25 <mark4> or is it 5 gigs now lol
13:34:40 --- join: f-a joined #forth
13:35:13 <patrickg> if going from utf8 to utf32 means that utf32 is 500megs and utf8 isn't, you're predominantly using ascii characters (the high plane code points are larger in utf8 encoding, at 6 bytes, than they are in utf32, at 4)
13:35:18 <mark4> ill just take the utf8 decoded bytes in the strings such as win_puts("some string\n") at run time and convert them to the codepoints
13:35:37 <patrickg> also, your utf8 text would still be 125megs if all ascii
13:35:39 <mark4> lol
13:36:15 <mark4> my point is is that space efficiency is orders of magnitude more important to me than runtime speed efficiency here
13:36:33 <mark4> i mean.  the executable or .so needs to be as TINY as I can make it 
13:37:10 <mark4> run time will already be using up two buffers of 32 bits per char each for every single window and one buffer of 32 bits for each char per secreen
13:37:30 <mark4> screens are what are written to the display. windows are written into the screen if the char at X, Y has changed
13:37:34 <mark4> thus the double buffering
13:38:11 <mark4> anyway i have to go back to wallymart. went there, got all my stuff and had to leave it there beause my roomie had my card so he could get rent out of my bank lol
13:38:14 <mark4> my rent not his :P
13:38:16 <mark4> brb
13:44:00 --- join: elioat joined #forth
13:45:27 --- quit: f-a (Read error: Connection reset by peer)
13:48:45 --- join: f-a joined #forth
14:05:05 --- join: cmtptr joined #forth
14:05:13 <cmtptr> omg how long have i not been in this channel!
14:05:51 <cmtptr> i wonder what juicy forth gossip i've been missing out on and didn't notice
14:06:14 <cmtptr> (rhetorical question btw, the answer is probably 68 days since that's my uptime)
14:06:19 --- quit: elioat (Quit: elioat)
14:37:27 <mark4> lol
15:54:31 <crc> cmtptr: just read the logs :)
16:06:05 <cmtptr> sounds like a lot of work
16:09:49 --- quit: f-a (Quit: leaving)
16:19:31 --- join: elioat joined #forth
16:26:36 --- join: f-a joined #forth
16:27:57 --- quit: elioat (Quit: elioat)
16:52:50 --- quit: Zarutian_HTC (Ping timeout: 260 seconds)
16:54:13 --- join: Zarutian_HTC joined #forth
17:03:03 --- quit: f-a (Read error: Connection reset by peer)
17:07:50 --- join: f-a joined #forth
18:20:33 --- join: boru` joined #forth
18:20:36 --- quit: boru (Disconnected by services)
18:20:39 --- nick: boru` -> boru
18:30:17 --- quit: f-a (Quit: leaving)
18:46:48 --- quit: lispmacs[work] (Ping timeout: 240 seconds)
19:08:00 --- join: gravicappa joined #forth
19:16:59 --- quit: Zarutian_HTC (Ping timeout: 276 seconds)
19:42:01 --- join: Zarutian_HTC joined #forth
19:43:46 --- quit: cartwright (Remote host closed the connection)
19:46:12 --- join: cartwright joined #forth
20:04:45 --- join: dave0 joined #forth
20:42:40 --- quit: sts-q (Ping timeout: 260 seconds)
20:47:49 --- quit: proteus-guy (Remote host closed the connection)
20:50:55 --- join: sts-q joined #forth
21:29:23 --- quit: cartwright (Remote host closed the connection)
21:31:39 --- join: cartwright joined #forth
21:42:46 --- quit: dave0 (Quit: dave's not here)
22:01:11 <mark4> well i have a utf8 decode thats now working. i parse a string of bytes compiled by "ä¿ªä¿¨ä¿©ä¿ªä¿­ä¿®ä¿¯", convert those to codepoints then use my existing codepoint emitter to emit them and i get the right stuff displayed
22:01:28 <mark4> those were just cut and paste from a dump of utf8 codepoints in my forth :)
22:01:38 <mark4> they are not as far as i know a valid chinese word/sentence :)
22:02:41 <mark4> ä¿ªä¿¨ä¿©ä¿ªä¿­ä¿®ä¿¯ 4fea  4fe8  4fe9  4fea  4fed  4fee  4fef
22:51:41 --- join: f-a joined #forth
23:09:45 --- quit: f-a (Quit: leaving)
23:59:59 --- log: ended forth/21.03.05