00:00:00 --- log: started forth/10.03.02 00:04:32 Good morning. 00:22:29 --- quit: GammaRays (Ping timeout: 240 seconds) 00:24:14 --- join: GammaRays (~user@77.246.230.163) joined #forth 00:29:59 --- quit: ASau``` (Remote host closed the connection) 00:30:38 --- join: ASau``` (~user@77.246.230.163) joined #forth 00:37:33 --- quit: alex4nder (Ping timeout: 268 seconds) 00:39:10 --- join: alex4nder (~alexander@dsl093-145-168.sba1.dsl.speakeasy.net) joined #forth 00:45:02 --- quit: ASau``` (Ping timeout: 240 seconds) 00:45:53 --- quit: ygrek (Ping timeout: 245 seconds) 00:46:12 --- join: ASau``` (~user@77.246.230.163) joined #forth 01:10:03 --- quit: proteusguy (Ping timeout: 276 seconds) 01:13:57 --- quit: alex4nder (Quit: sleep) 01:16:12 --- join: tripFantastic (1000@c-68-56-68-122.hsd1.fl.comcast.net) joined #forth 01:16:19 anyone awake? 01:17:59 No. 01:18:17 There's only spectre haunting the world. 01:18:24 The spectre of FIG Forth. 01:23:25 --- join: proteusguy (~proteusgu@zeppelin.proteus-tech.com) joined #forth 01:31:57 --- quit: ASau``` (Read error: Connection reset by peer) 01:32:07 hi proteusguy 01:32:39 --- join: ASau``` (~user@77.246.230.163) joined #forth 01:34:41 --- quit: GammaRays (Remote host closed the connection) 01:35:03 --- join: GammaRays (~user@77.246.230.163) joined #forth 03:10:10 --- join: ygrek (debian-tor@gateway/tor-sasl/ygrek) joined #forth 04:10:25 --- nick: gogonkt -> zR2D2 04:11:36 Good evening 04:12:07 R2D2 already in used Orz 04:14:38 --- join: TR2N (email@89-180-145-189.net.novis.pt) joined #forth 04:15:20 The +N instruction seems to be a multiplication step. There's a description in something I read on how to use it for a fast multiplication routine. 04:16:24 ASau: You're right, though. I just ran out of opcodes. But if I use the return stack top as the address pointer then I don't need the opcodes that manipulated the A register, so they could be put to other uses. Like what you just suggested. 04:21:33 the arm cpu allows ldr r1, [r3, r0, lsl #1] to load r1 with the data at address r3 + 2* r0 04:21:42 it doesnt allow the same thing with ldrh 04:21:59 ldrh r1, [r3, r0, lsl #1] is not legal bleh 04:38:21 KipIngram: CM proposed to call it *+ 04:38:23 --- quit: proteusguy (Ping timeout: 245 seconds) 04:38:42 But I don't use original *+ 04:50:40 --- join: proteusguy (~proteusgu@zeppelin.proteus-tech.com) joined #forth 04:55:36 --- quit: scj (Read error: Connection reset by peer) 04:57:02 *+ actually appeals to me a bit more than +N. 04:58:29 I experimented with something like this: 04:59:14 : *+ >r d2*' r@ swap 0<> and m+ r> ; ( ud u -- ud' u ) 05:02:54 One thing that caught my attention in the MuP21 architecture was the separate stack top: it's regarded as a "T" register rather than just the "top of the stack." 05:03:24 I'd planned on just having a stack, and the top two elements would feed the ALU. Are there any subtle issues I'm missing here? Or is it really just a naming issue? 05:05:28 --- nick: zR2D2 -> gogonkt 05:08:26 It was pointed by someone that strictly speacking all those top-of-stack "registers" are buses. 05:08:35 *speaking 05:09:09 And, AFAIR, someone proposed TTA-like Forth processors. 05:09:14 (Tarasov?) 05:10:27 i dont know why ARM doesnt do an arm FDMI :) 05:10:36 with F being forth instead of T = thumb 05:15:07 Forth is like the A-frame of programming. It's economical, structurally sound, etc. etc. just like an A-frame, but it suffers from "too easy syndrome." It doesn't support the egos of its architects the way fancier platforms do. 05:15:31 I don't know many custom builders that brag about how great their A-frames are. 05:16:33 ASau: In my current design the top element of the stack is registered just like all of the other elements. 05:16:58 no. a frames also LOOK easy. forth on the other hand is so different from the norm that most people reject it 05:17:03 I do keep running into the need to treat it differently, though. 05:17:24 because they think its more difficult. 05:17:34 KipIngram: Forth isn't "too easy." 05:17:36 Right now I have three fundamental stack operations in my emulator: push, pop, and load top. 05:17:37 the perception is wrong however. forth is simple 05:17:53 wouldnt that be push, pop and fetch? 05:18:42 Data stack control is the only part of this thing that's still emulated "C-style", and I am seeing that to get it to boolean logic control I need to have separate control signals for the top element and the "rest of them." 05:18:50 if "load top" first did "push top" you could get rid of push and just call "load top" a push 05:18:55 So in that sense I'm seeing that the top element is "different." 05:19:18 But sometimes I don't need to change elements 2..N. 05:19:20 Forth pushes too much of complexity into programming efforts, 05:19:38 and stack machines were shown to have inferior performance. 05:19:43 asau *boggle* 05:19:44 For instance, my approach to literals is to have an operation (either LIT+ or LIT-) which pushes a constant to the stack and sets "literal mode." 05:19:47 where? 05:19:52 The constants being either 0 or -1. 05:20:04 Then each opcode that follows adds a nibble into that literal. 05:20:09 I440r: SFTW, this is well-known result. 05:20:15 Which means I need to update the top of stack with a new value without touching the rest of the stack. 05:20:25 It affects BPF, hence it is well-known. 05:20:49 bpf? 05:21:02 Berkeley packet filter. 05:21:29 wtf is that and how does it show forth is more difficult on the programmer 05:22:16 There're a number of other interpreters that moved away from stack VM and got performance improvements. 05:22:19 --- join: scj (syljo361@boneym.mtveurope.org) joined #forth 05:22:19 Oh, actually I only need separate *enables* for the top and the rest. 05:22:33 The "load top" would in fact be "push" as far as the top is concerned. 05:22:45 So "push" = enable top, enable the rest, push. 05:22:53 and "load" = enable top, disable the rest, push 05:23:36 Or rather, "push" = enable top, enable the rest, external source for top, above source for the rest, clock. 05:23:49 "load" = enable top, disable the rest, external source for top, clock. 05:24:03 "pop" = enable top, enable the rest, below source for all, clock 05:25:02 Ok, great. My goal for today was to get those stack control signals over into boolean for; this helps. 05:26:19 I really hate to put gasoline on this other fire, but I can't help pointing out that the whole "Forth puts too much work onto the progammer" notion is missing the point entirely. It's languages like C++ that put too much work on the programmer. 05:26:28 There's too much to learn, too many fancy constructs, too much BS. 05:26:43 Forth is like checkers - 30 minutes and you know *all* of the important concepts to start to work. 05:26:50 or more like chess: 05:27:07 30 minutes to learn how *everything* works, but a lifetime of honing the art of using it. 05:27:18 What could possibly be more fun? :-) 05:27:28 Forth is *FUN*. 05:27:32 Isn't that why we're all here? 05:27:36 Well, most of us ast least? 05:28:01 Ok, going to work on these equations now while it's fresh on my mind. 05:28:08 KipIngram: if you want to be taken seriously, you don't use C++ in debates. 05:32:10 It is accepted that C++ has many defects, any comparison to it 05:32:10 is regarded as playing in the second division. 05:39:10 :-) God, I love this place... 05:40:41 JFYI, this is the article against stack VM (note one of authors): 05:41:05 "Virtual Machine Showdown: Stack Versus Registers" 05:41:26 Yunhe Shi, David Gregg, Andrew Beatty, M. Anton Ertl 05:42:58 0,25 to 0,30 speedup on registers. 05:49:59 Actually, all authors except one are or were Forth activists. 05:50:21 ASau: does that include applications that involve a lot of context switching (a lot of need to save and restore registers)? 05:51:07 For the stack machine candidates, do they have hardware stacks, or stacks in memory? 05:51:09 Etc. 05:51:09 KipIngram: no, but this is irrelevant. 05:51:26 Of course it's relevant. You don't have to save and restore registers in Forth. 05:51:36 Modern machines provide register files. 05:52:06 KipIngram: sure, stack machines require more moves between 05:52:06 stack and memory, that's why they are slower. See ref. above. 05:53:00 If you ever wrote CMT in Forth, you know that you do much juggling to switch contexts. 05:53:37 You have to save and restore all those stack and segment pointers. 05:53:49 Including instruction pointer. 05:57:35 I don't know, ASau. I wouldn't claim that all Forth implementations are "good," but clearly you could minimize a context switch to include 1) the instruction pointer and 2) the data stack pointer. 05:57:44 Maybe a flags register, depending on your system. 05:58:03 But you don't need to context switch the return stack. 05:58:32 If I wanted a really fast stack machine that had to do context switching I'd just have multiple data stacks in hardware. 05:58:54 Certainly that has a cost, but it all depends on what we want to optimize, doesn't it? 05:59:55 If you don't switch return stack, then if one coprocess unwinds stack, another one loses return point. 06:00:03 And if I were going to build something like that I'd probably just save a context by pushing the IP to the outbound context's hardware data stack, flipping a couple of bits to mux in the new context's data stack, and pop the IP. 06:00:05 Done. 06:00:36 Well, you're right of course. I was thinking in terms of interrupt service routines. 06:00:44 So you'd do the same thing for the return stack; just have more than one. 06:00:51 You have to push several registers into memory anyway. 06:01:04 But if your goal was to make a machine that could scream through context switches that would be the way to do it. 06:01:10 If this isn't memory, then it is register file window. 06:01:12 You could switch contexts in a single cycle then. 06:01:22 which doesn't make difference to modern register architectures. 06:01:26 Yes, granted, but do you want speed or compactness? 06:01:56 I do see, though, that I'm arguing unfairly with you; you could invoke multiple register banks and get the same speedup. 06:02:22 1,25 factor in code compactness against 1,30 factor in speed. 06:02:36 Cycles cost more than memory today. 06:03:04 So I think the fair comparison is to compare a "register heavy" machine with a hardware stack machine. 06:03:20 In either case a context switch will involve moving a bunch of stuff to and from memory. 06:03:41 But comparing a register file machine to a machine that keeps the stacks in memory doesn't seem fair. 06:03:54 You have fast resources and the competition doesn't - of course you will be faster. 06:04:06 Yes, but this is becoming less and less relevant. 06:04:31 MP machines entered desktop and even cell phones. 06:09:07 I think that if you want to optimize context switch, 06:09:07 you provide several shadow register files. 06:09:38 You just tell "switch to bank #13," and you're there :D 06:10:59 But this doesn't differ between stack and register architectures. :p 06:15:07 Right - that's what I just said. I agree. So you have to compare these architectures only in cases where they have "equal hardware resources" in some reaosnable sense. 06:15:22 One register bank = one hardware stack. 06:15:45 Even having a hardware return stack isn't really fair if the register machine has to push return addresses to memory. 06:16:09 So one would think that the best comparison to a "traditional" register processor would be a stack machine with the data stack in hardware but the return stack in memory. 06:16:13 That's head to head. 06:17:02 The stack machine should get more operations per unit of code, since it doesn't have to specify register targets. 06:17:28 But on the other hand, if the right operands aren't on top of the stack then juggling will be required. 06:17:51 Since you can target any registers you want in a register machine you don't have to worry as much about the optimality of the code. 06:18:19 But for the stack machine you do - you could have a great implementation or a terrible one depending on the competence of the programmer. 06:18:58 And if you pick one of those applications where you need to manipulate the heck out of seven or eight parameters then you're likely to see the register machine win. 06:19:15 So how do you decide what the right mix of test code is in a case like this? 06:19:38 And how do you decide if your stack machine program is "good"? 06:21:47 There's saying attributed to Paracelsius, I like to recall on such occasions, 06:22:10 everything is determined by number, measure and weight. 06:22:58 There's only one absolute measure that you cannot overcome at all, 06:23:00 it is time. 06:25:39 Sure, I agree. The first part of this discussion was good; I think we see clearly that either architecture allows you to pour in hardware to speed the thing up (multiple register files / multiples stacks etc. etc.), and which things are roughly equivalent (hardware data stack / memory return stack being most equivalent to traditional register cpus). 06:26:04 Like I said, the register machine lets you write your code more flexibly, since you can always get at all of your data. 06:26:26 On the other hand, you can't execute as many instructions per second, because you can't get them into the processor as fast as I can. 06:26:31 Because my instructions are more compact. 06:26:35 There's no arguing with that. 06:26:42 So it's all a matter of how well I can use my instructions. 06:27:05 If you want to claim greater overall speed, then you have to content that I will waste enough of my instructions juggling the stack for you to win. 06:27:15 I don't see any other quantitiative way for you to make your case. 06:27:47 And whether I do that are not will depend on 1) the task at hand and 2) my ability to write an intellgent program. 06:28:21 Let's take 2 off the table; someone out there can write great Forth code. 06:28:27 So it all comes down to the application. 06:28:53 I'm sure I can pick apps that let the stack machine win, and you can pick apps that let the register machine win. 06:29:29 * crcx can just choose to not worry about which is faster and just use the tools he's most comfortable with :) 06:29:41 Yeah. 06:29:47 The ones that are the most fun. :-) 06:30:54 KipIngram: there's problem with "wasting" you mention. 06:31:35 Noone cares if you have to waste cycles to move data or not, 06:31:41 just don't :) 06:33:42 Another point here is that both of us will have to decide how much we want to use subroutines. Since the test architectures both have in-memory return stacks both of us will pay a big penalty for lots of subroutine calls. 06:34:02 Forth encourages lots of subroutines, but doesn't *require* it. 06:34:31 I think this ultimately comes down to code compactness, because the one class of memory operation neither of us can avoid is fetching the program. 06:34:50 Like I said, I will get more opcode density, but I may have to use some of them to juggle the stack. 06:34:57 This isn't different anywhere, the limit is universal and it is acknowledged. 06:35:18 Sure - I'm just pointing out that we can drop that factor from the discussion; it affects us both. 06:35:41 I think the key metric is "opcode density" vs. "efficiency of opcode utilization". 06:35:59 There's another dimension: 06:36:07 availability of optimization software. 06:36:10 You don't have to "arrange" your data; once it's in the register file you've got it. 06:38:52 BTW, I have interesting question for you. 06:39:06 Two of them: ;) 06:40:06 1. How hard is it to provide custom handheld? E.g. Palm T series. 06:40:09 So any opcodes I use to "arrange" my data are lost to me, by comparison, in terms of getting the total job done. 06:40:43 2. How hard is it to provide specialized hardware with FPN and preferrably SIMD? 06:40:47 You mean the overall design process for such a thing? Hardware and all? 06:41:38 Let's assume that I want to spend $1k on palm-comparable device with Forth architecture inside. 06:42:00 And we have to design it from scratch? Touch screen, case, etc. etc.? 06:42:17 What volume? 06:42:44 I don't think you'll find more than several dozens of customers. 06:42:55 There're not so many Forth lovers around. 06:44:02 I just ripped out the old data stack stuff and replaced it with the new boolean equations. It compiles now, but I'm scared to run it. :-( 06:44:19 If it worked on this iteration I'd be shocked, so I'll feel like I took a step backward. 06:44:53 Oh. 06:45:09 The specific test code I had in there didn't use the data stack anyway. 06:45:15 Time for a new test. 06:46:49 Oh, it works. At least the LIT+ command properly builds a literal on the stack, and the JMP command pops the stack. 06:46:52 I'll be damned. 06:46:57 :-) 06:47:40 Do you mean that you split JMP into LIT + EXIT?? 06:47:41 ASau: Are you leading up to a further point in our discussion, or are you really asking me about making such a machine? 06:48:07 I'm just asking lazily. 06:48:17 I don't have free $1k. 06:48:23 Except for colan definition nesting (which is encoded into the instruction stream), my control transfer instructions (JMP, JZ, etc.) take their target from the data stack. 06:49:02 Well... 06:49:08 I can't comment on the cost of the mechanical engineering, but the cost of the electronics wouldn't be so bad. 06:49:33 While it may be easier to do because of code/part reuse, 06:49:43 it is definitly _not_ convenient. 06:49:59 It simplified the hardware substantially. 06:50:12 Because in addition to this I build literals on the stack using a series of opcodes. 06:50:42 This looks like oversimplification to me. 06:50:51 PDA-sized TFT panels (with touchscreen) are sub-$100. 06:51:32 Various types of processors are available inexpensively as well. We'd have to talk about volume allowed for the battery and what kind of battery life you wanted,e tc. 06:51:54 What was the other thing you asked about? 06:52:35 How hard is it to support FPN? 06:52:59 These may be non-IEEE. 06:53:10 I actually think that this approach to literals is good. I discussed this on here the other day. With this approach a small literal (0-15) takes only 10 bits to encode. 06:53:25 Literals 16-255 take only 15 bits. 06:53:29 And so on. 06:54:07 You have real-world data on frequency of Forth word patterns usage. 06:54:07 The traditional Forth approach would be to have a LIT primitive and then put the literal in a full cell in the instruction stream. That requires 21 bits, and also requires that hte instruction sequencer figure out what it's supposed to do with the cell. 06:55:54 You have real-world data on literal usage too. 06:56:08 Though it isn't about Forth. 06:56:09 The complexity of figuring out when to treat that next cell as a literal, how to handle the remaining instruction slots in the cell that held the LIT primitive (if LIT didn't fall in slot 3), and so on all leads to logic complexity that reduces my clock rate. 06:56:24 This stats relates to RISC v CISC debate. 06:56:46 So it's not just about space required and so on - getting a faster clock rate pays off on *every single bit of code I execute*. 06:57:02 Slowing the whole mahcine down to get "traditional" literal handling doesn't look like a win tome. 06:57:04 to me. 06:58:38 But I would be happy to learn anything you know about literal distributions in typical code. I agree that's pretty language independent. 06:59:26 That's not the fact. 07:01:19 What's not? 07:02:34 --- join: sunwukong (~vukung@business-80-99-161-225.business.broadband.hu) joined #forth 07:04:09 Some programming languages may significantly distort the distribution. 07:06:21 Some, as in that's the exception, or most? 07:13:14 I'm more inclined to think "most".t 07:13:31 Most languages are implemented in interpreters. 07:14:36 Well, then we're hardly going to win any speed races with those. 07:15:07 Byte code and VM techniques are usual. 07:15:28 But they do distort literal values usage for obvious reasons. 07:44:59 Ugh. SWAP creates a completely new stack manipulation case. It updates the top and second elements, but no others. 07:45:12 --- join: kar8nga (~kar8nga@jol13-1-82-66-176-74.fbx.proxad.net) joined #forth 07:45:51 That means I'll need enables for the top, the second, and "the rest." 07:47:27 If you cache TOS, you can reduce it. 07:47:33 : swap over nip ; 07:50:04 nip? I'm not familiar with that, but it looks like it would do this: 07:50:17 : nip (a b c -- b c) ... ; 07:50:28 Yes 07:50:33 That's no easier for me. 07:50:37 In hardware, I mean. 07:52:07 It would still require that element 2 have a separate enable from the deeper elements, and other words require that the top and second have separate enables. So it ends up the same, just with different control signal values. 07:53:34 : swap over >r drop r> ; 07:53:50 : nip >r drop r> ; 07:54:31 KipIngram do you recall a c-lang program that was part of a FIG forth tarball way back that preprocessed a forth kernel in forth and output assembler code? 07:54:47 nip takes two cells, returns 1 07:54:57 (ab-b) 07:55:08 tF: No, sorry - I wasn't that deeply familiar with the tarball. 07:55:15 ASau: I want single cycle swap. 07:55:21 it coulda been a zipfile 07:55:21 It's ok - it's done now. 07:55:26 FIG Forth wasn't distributed in tarballs. 07:55:29 Just one extra signal. 07:56:56 Hey, I like how this is working out. I have this set up now so it's 100% independent of the exact instruction encoding. I generate a signal for each opcode, and use those in the equations. 07:58:00 When I'm all done I will look at how many levels of logic are involved everywhere and use instruction encoding to cut down my worst case delays. 07:58:35 --- quit: sunwukong (Remote host closed the connection) 08:02:42 --- join: forther (~forther@c-98-210-250-202.hsd1.ca.comcast.net) joined #forth 08:03:04 hi all 08:09:10 hi 08:21:43 --- join: alex4nder (~alexander@wsip-72-215-164-129.sb.sd.cox.net) joined #forth 08:21:44 hey 08:21:55 hi 08:31:49 hmm ... division step. restoring looks like little bit too complex for single step hardware implementation 08:33:41 but non-restoring may be way to go 08:40:10 I wasn't sure about a division step either. But a multiplication step seems easy enough. 08:41:21 So hey, should I have a register in the ALU that accumulates the high order part of products? 08:42:06 And sums for that matter? 08:42:17 Then "carry" is just "nonzero in that register". 08:45:20 --- join: Quartus` (~Quartus`@74.198.8.60) joined #forth 08:46:59 are talking bout multiplication or the division? 08:47:34 multiplication. I'm not going to worry about division yet. 08:49:15 in "post s24" C18 Chuck use A as the multiplicant and the storage of the lower part of the product 08:49:29 is it what you asked about? 08:51:39 That's the general thing on my mind, yes. I'm just fishing around for a good way to go on this part. 08:52:09 I'd like to be able to set up and then just run through a series of shift / partial product operations to get my full product. 08:52:19 Hopefully one click per cycle. 08:52:45 In the MuP21 the ripple adder slows some of that down, but I wouldn't mind putting in a carry chain if it fixed that. 08:53:06 I'm not really "up against" this part yet, so it's on my mind but not my focus. 08:54:17 This is working out to have a four-input mux feeding into the top of stack, with the options as follows: 08:54:22 00: second element of stack 08:54:28 01: alu output 08:54:53 10: stack top shifted left four bits with next literal contribution OR'd in 08:55:01 11: top of return stack 08:55:46 So every other possibility will show up as one of the alu outputs. 08:56:37 For instance, if I do that HP style "RUP" command then it will cause the alu to present the fourth stack element as the top of stack input. 09:25:38 Ok, here's a slightly different spin on the whole "jump target on the stack" thing. 09:25:48 I do see some of the inconvenience that ASau alluded to. 09:26:03 Unconditional jumps are no problem; just literal up the target and jump. 09:26:11 Conditional jumps, not so nice. 09:26:32 Take JZ for example. The only thing that makes sense is for the flag to be on the top of the stack when I execute that. 09:26:41 So the address would have to be under that. 09:26:52 --- join: qFox (~C00K13S@5356B263.cable.casema.nl) joined #forth 09:26:57 Which means I'd have needed to put it there before computing the flag. 09:27:23 or compute the flag, then literal up the target, and the swap jz. What a pain. 09:27:36 But I still want all of the simplifications I got by taking that approach to begin with. 09:28:08 So instead, how about this. 09:28:36 Have a command a lot like LIT+, except it initiates literal construction on the return stack. 09:28:59 Then JMP, JZ, etc. take their target from there. 09:29:08 Actually JMP would just be RET wouldn't it? 09:29:32 --- quit: tgunr (Read error: Connection reset by peer) 09:29:37 And JZ, JCZ, etc. would be come conditional returns. 09:29:46 Which might be useful in their own right. 09:30:09 This seems more in keeping with the "datastack is for data, return stack is for addresses" idea. 09:30:10 --- join: tgunr (~davec@cust-66-249-166-11.static.o1.com) joined #forth 09:30:40 --- quit: TreyB (Ping timeout: 260 seconds) 09:31:00 So I have to have an extra LIT instruction, that I'd call TARGET or something like that, but I would no longer need a separate JMP instruction. 09:31:24 --- join: TreyB (~trey@adsl-76-240-63-203.dsl.hstntx.sbcglobal.net) joined #forth 09:31:27 And I could still use my CALL opcode to call that address; I'd just replace the top of return stack element contaents with the return address. 09:31:31 I like it. 09:31:55 ASau: does that eliminate some of the inconvenience? 09:39:36 --- quit: alex4nder (Read error: Connection reset by peer) 09:39:56 --- join: alex4nder (~alexander@wsip-72-215-164-129.sb.sd.cox.net) joined #forth 09:42:41 Hey, this notion of calling an address that's on the return stack: that "swaps" the current execution point with the one on the return stack. That's pretty close to a coroutine implmentation, isn't it? Two routines could just pass control back and forth that way. 09:43:16 Finally, when one was done, it would just RET instead of CALL. 09:49:22 --- quit: kar8nga (Remote host closed the connection) 09:49:55 KipIngram, I didn't follow this discussion and may repeat the things you've been already 09:51:11 KipIngram: I lagged out, but yah.. coroutines are stupid-simple with Forth 09:51:12 if you are after minimalistic conditional and unconditional jumps, I'd suggest returns and conditional returns 09:51:44 *been through 09:54:17 KipIngram: first, if you want to have really minimal set of opcodes, 09:54:26 there's no need for conditional jumps. 09:54:47 You have to expose flags on stack. 09:54:48 at all? 09:54:52 Yes. 09:54:55 At all. 09:55:20 you mean use the flags to calculate the addresses to unconditionally jump? 09:55:23 Mmm, set jump points multiplied by conidition. 09:55:25 Yes. 09:55:36 Where condition is 1 if true, 0 if false. 09:55:48 minimalistic design, leading to really bloated code 09:55:58 You convert it to WFF, then just perform relative jump. 09:56:06 Yeah, let's not do everything with nand. 09:56:26 This is rather awful, but it can be tolerated. 09:56:49 Or "Subtract and branch if less than or equal to zero" 09:56:52 If it is relative jump, it may meet short-literal restriction. 09:56:54 Single instruction computer. 09:57:09 Asau is just commenting to be annoying/purist, not to be helpful. :) 09:57:11 Writing decompiler will be awful though. 09:58:20 --- join: Maki (~Maki@dynamic-78-30-167-37.adsl.eunet.rs) joined #forth 09:58:36 Though this isn't main problem. 09:58:43 The real problem is in backpatching. 09:58:56 Hmmm. I'm not really after "absolute minimal." Just "the right mix." 09:59:33 --- quit: madwork (Ping timeout: 265 seconds) 10:01:10 How important is it to have separate opcodes (however implemented) to jump/return on zero and jump/return on "carry zero"? That's what the MuP21 has, I think. 10:01:29 But the main Forth conditional constructs just use the zero conditional. 10:01:55 I feel tempted to drop the carry-related conditional and leave that opcode free for later use. 10:02:26 If I'd write compiler, I wouldn't miss. 10:02:42 Like I said above, the main problem is in backpatching. 10:02:55 This makes compiler sophisticated. 10:03:54 Basically, you have to either waste space on NOOPs or to perform backtracking. 10:04:30 I don't really want to mess around with relative transfers. I don't tend to think of Forth as relocatable in any way. Compile it where you want it. 10:05:12 --- quit: alex4nder (Ping timeout: 260 seconds) 10:07:23 like basic on ][+ 10:08:15 KipIngram: This means that you have to waste space on loading full-cell address. 10:09:13 --- join: kar8nga (~kar8nga@jol13-1-82-66-176-74.fbx.proxad.net) joined #forth 10:10:24 Oh, you mean that short jumps would still require a full address? 10:10:37 Yes. 10:11:06 But I'm buildling a system that will let me put application code (headerless) in low memory. So my absolute addresses will be pretty small anyway. 10:11:12 But yeah, I see your point. 10:11:29 Actually I have another idea in mind for handling small loops anyway. 10:12:10 I don't know if I'll ever implement it, but it would involve a hardware FIFO for opcodes. 10:12:33 I'd have an opcode that said "memorize". Starting then, every opcode that executed would go in the fifo. 10:12:52 Then I'd have a conditional opcode that either kept going or "replayed" the fifo. 10:13:03 This idea isn't new. 10:13:11 The idea would be that this would also unwind subroutine calls. 10:13:17 Very few ideas are new. 10:13:22 It was new to me, though. :-) 10:13:34 I've long since given up on stumbling across anything truly unprecedented. 10:13:35 It was proposed (LtU site?) that return stack contains future commands instead of return addresses. 10:14:13 It relates to CPS transformation somehow. 10:17:30 A fifo for that works out nicely hardware-wise, because I already have the three opcode slots in a cell (bits 4:0, 9:5, 14:10) feeding into three inputs of a four-input mux. The FIFO just represents the fourth possible place for opcodes to come from. 10:18:25 I'm particularly fond of how this eliminates all threading overhead. Couple it with a good profiler that tells you where it will pay to use it and it seems like a win. 10:25:54 Also, once I optimize the state durations so that execution-only states are shorter than "fetch / execute" states then the FIFO will really pay off; every state will be a fast execute-only state when I'm running from that. So no threading overhead, no fetching overhead, just burn through code. 10:26:09 --- quit: Quartus` (Ping timeout: 252 seconds) 10:28:46 what's CPS? 10:30:51 --- join: madwork (~madgarden@204.138.110.15) joined #forth 10:33:31 --- part: jeremy_c left #forth 10:36:37 Good morning, it's 2010. 10:37:47 http://en.wikipedia.org/wiki/CPS_conversion 10:39:10 This is very nice - moving the target to the return stack freed up an input to another four-input mux that I use to pick the next instruction memory address. That is definitely where interrupt vectors will feed in. 10:43:56 --- quit: kar8nga (Remote host closed the connection) 10:44:06 I fretted a long time the other day about having all four of those inputs taken up with interrupts still outstanding. 10:48:07 * Deformative yawns. 11:10:54 ASau, wrong, it's 1975 11:13:48 KipIngram, regarding to that hardware fifo exec. do you read about Chuck's micronext instruction? 11:30:39 No, I haven't. 11:32:23 with it you can execute up to 3 ops in a next loop without re-fetching the word 11:32:45 as you know in c18 you can have up to 4 ops in a single word 11:33:24 Oh, I see. I thought about having a way of "repeating a cell". Like a cell iterator or something. 11:34:04 I decided this FIFO approach was more general, and also offers the advantage of unwinding threading. 11:34:40 What it won't do is capture unexecuted branches. This is only good for repeating precisely the code you've executed before. 11:35:14 most of the bottlenecks are the loops 11:36:58 This would eliminate the innermost loop for sure. You'd just repeat the fifo as long as you wanted to loop. 11:37:31 --- join: Quartus` (~Quartus`@74.198.8.57) joined #forth 11:55:29 --- join: alex4nder (~alexander@dsl093-145-168.sba1.dsl.speakeasy.net) joined #forth 12:00:48 hey 12:32:33 --- quit: tgunr (Ping timeout: 276 seconds) 12:44:13 --- quit: proteusguy (Ping timeout: 258 seconds) 12:56:23 --- join: proteusguy (~proteusgu@zeppelin.proteus-tech.com) joined #forth 12:59:40 On the MuP21 the A register was the only way to access memory, whether you incremented it or not. When Chuck dropped the A register in favor of keeping the pointer on the return stack, did that remain the only way to access memory? 12:59:53 In other words, did @ and ! expect the address to be on the return stack? 13:00:17 When ! finds data and address on the data stack then a "double pop" is required. 13:00:26 he did not dropped it. he added @r !r 13:01:01 yes @r and !r expected the address on the top of return stack 13:01:20 But @ and ! still worked in the usual Forth way? 13:01:24 no 13:02:06 in mup21 ! and @ worked via register A 13:02:45 --- join: cmeme (~cmeme@boa.b9.com) joined #forth 13:03:24 again, @r !r appeared in i21 and f21 13:03:49 mup21 hadn't them 13:05:17 also c18 is most modern Chuck's design, so you better use it as the refference to Chuck's idea of proper opset 13:10:03 do you need a link to c18 description? 13:10:49 No, I have that somewhere. 13:11:52 I have a document on the SeaForth 40C18; is that the right one? 13:12:51 yes 13:16:09 --- quit: Quartus` (Ping timeout: 248 seconds) 13:16:43 --- quit: ygrek (Ping timeout: 245 seconds) 13:18:35 From what I can tell so far, and I'm pretty far into this now, the biggest problem with traditional ! is that it adds an otherwise unrequired pathway: load each stack item with the value from two levels below. 13:19:23 But with the Spartan 6 that costs me neither time nor logic, because every flip f lop in the device has a 6-inpuyt lut in front of it. 13:19:38 So that's free functionality in my case. 13:20:01 I think I'll keep it so that I can have ! work in the familiar way. 13:21:41 I use !+ and !-. 13:23:03 Instead !. So I need one clock cycle. 13:27:39 Maki: On which system? What do those do exactly, on your system? 13:34:27 --- join: Maki_ (~Maki@dynamic-213-198-207-96.adsl.eunet.rs) joined #forth 13:37:06 --- quit: Maki (Ping timeout: 276 seconds) 13:37:18 KipIngram: !+ is store with postincrement. Leave's address on top incremented. Real ! is !+ drop. 13:38:44 So +! drops the *2nd* element of the stack? That's back again to this notion that the top of the stack isn't really part of the stack. 13:40:45 It is separate register on my arch. 13:45:31 --- mode: ChanServ set +o I440r 13:48:09 --- join: segher (~segher@84-105-60-153.cable.quicknet.nl) joined #forth 13:51:16 --- quit: Maki_ (Quit: Leaving) 14:00:40 --- quit: qFox (Read error: Connection reset by peer) 14:03:21 kip if TOS is cached in a "register" i still consider it to be part of the stack 14:04:28 lol im loving arm assembler tho!!! 14:04:59 i had a cmp r1, r3, moveq pc, lr followed by cmp r2, r3 moveq pc, lr 14:05:15 converted that to cmp r1, r3 cmpne r2, r3 moveq pc, lr 14:16:02 Yeah, the conditional execution bits rock. 14:21:10 im not a big fan of the "multiply by constant" code. its heavilly obfuscated but at the same time is pure genius heh 14:21:24 divide i mean 14:21:46 ldr r1, =92492492 14:21:54 smull r3, r1, r0, r1 14:22:04 mov r3, r0, asr #31 14:22:11 add r0, r1, r0 14:22:21 rsb r0, r1, r0, asr #3 14:22:32 that divides r0 by 14 14:23:10 oh and that 92492492 is 0x92492492 14:28:17 --- quit: alex4nder (Ping timeout: 265 seconds) 14:43:56 --- quit: I440r (Read error: Connection reset by peer) 14:44:09 damn he quit 14:45:19 --- join: I440r (~mark4@c-69-136-171-118.hsd1.in.comcast.net) joined #forth 14:50:50 --- join: Quartus` (~Quartus`@74.198.8.59) joined #forth 15:17:39 The problem with using the return stack for memory access addresses is that all of those words become unusable from the interpreter. 15:18:15 why? 15:18:52 e.g. : @ >R @R ; 15:18:53 For the same reason I can't say " 1 >R" from the interpreter. System crashes, right? 15:19:09 you can use words that use the return stack just fine 15:19:34 KipIngram: this means that interpreter should be done another way. 15:19:47 on some systems you have to have the return stack balanced after every line or word you run in the interpreter, sure 15:20:21 Yes, it sounds that way. In my experience that hangs the interpreter up. Can it be done gracefully, without the interpreter having to save and restore the return stack? 15:21:29 I thought that somewhere deep in there it got the next word's address on the stack and called "EXECUTE" or whatever. Then as soon as it tries to return it will find that bogus value your word left on the return stack instead of the proper return address and croak. 15:21:57 Hang on, let me try this in pForth. 15:23:00 (I'm going to change that...) 15:23:14 Segmentation fault. 15:23:33 KipIngram: pForth is too primitive now. 15:23:54 So I'll ask again: can it be fixed *gracefully*, or does it require that the interpreter to backflips to fix things up for you? 15:24:00 Like keeping a "virtural return stack". 15:24:29 My keyboard must be wearing out; lots of typos today. 15:26:41 It all depends on what you mean. 15:26:55 The best and the most straightforward way is 15:27:14 make it always compile 15:27:32 and interprete only at certain points 15:27:43 like: a) switching into compilation mode; 15:27:49 I want it to work like legacy Forth. 15:27:57 b) end of line and ret stack balanced. 15:28:00 --- quit: gogonkt (Ping timeout: 256 seconds) 15:28:14 It will work like legacy Forth, only better. 15:29:43 --- join: gogonkt (~info@218.13.60.157) joined #forth 15:31:07 There are other issues for a Spartan 6 implementation as well. The RAM on Spartan 6 is synchrounous. So when I execute !, or whatever, that's cool; the address and data are applied and the RAM gets clocked at the end of that cycle. 15:31:28 When I execute @, though, the address gets clocked into the RAM at the end of that cycle, but that's also what clocks the stack registers. 15:31:59 So on the next cycle the data I expected is *not* in the register yet. It's now available at the output of the RAM, but it will take a second clock to register it onto the stack. 15:32:41 I'd already considered (for slower external memory) splitting @ into two pieces, one of which grabbed the address and one of which stacked the data. 15:35:07 So maybe I do have an A register. When I clock a value into A I'll also clock it into the memory. That way I can have that data on the next instruction. 15:35:10 IMO, you're oversimplifying hardware. 15:35:23 This RAM is dual ported; none of this conflicts with the instruction fetching. 15:35:32 That's not possible. 15:41:59 the const says that the law-makers are congress; there's no auth for the prefident to make (write) law. 15:42:30 it's not in his job description 15:43:19 i dont care about his health either. 15:44:13 --- part: tripFantastic left #forth 15:56:27 --- join: crc_ (~charlesch@71.23.210.149) joined #forth 15:59:21 --- quit: Quartus` (Ping timeout: 248 seconds) 15:59:43 --- quit: crc (Ping timeout: 258 seconds) 16:32:53 Ok. I decided to implement an A register. Instructions are A!, A@, !, @, !+, and @+. 16:34:12 How important is A@? Can I get away without it? 16:34:48 I imagine it might be handy for referencing structures. 16:35:05 Right now I have 27 opcodes defined, so I guess I'll add it. 16:41:38 --- join: Pusdesris (~joe@c-76-112-68-135.hsd1.mi.comcast.net) joined #forth 16:44:06 --- quit: Deformative (Ping timeout: 265 seconds) 17:05:22 --- quit: forther (Ping timeout: 265 seconds) 17:45:30 --- join: forther (~forther@c-98-210-250-202.hsd1.ca.comcast.net) joined #forth 17:48:22 --- nick: crc_ -> crc 17:50:31 --- mode: ChanServ set +o crc 18:50:38 --- nick: Pusdesris -> Deformative 19:13:50 I've implemented a "carry" register; addition overflows into that. Should left and right shift operations shift the 32-bit carry:top entity? 19:14:22 It seems like they should; I have a feeling that's necessary to get a proper multiplication step. 19:16:29 --- quit: forther (Quit: Leaving) 19:20:16 are there any experts at GNU assembler in here? 19:20:36 * crc stays away from GNU assemblers 19:21:06 so would i if i could afford $12947562478925634789 for the arm development suite 19:21:30 as it is the ONLY free option is to be constantly tearing my hiar out and banging my head against a wall of nails repeatedluy 19:21:37 hair 19:21:49 bleh i cant type even when im NOT MADDER THAN HELL!!!!!!!!!! 19:21:51 :/ 19:22:07 all i want to do is create a simple table of pointers to data 19:22:12 table1: .db lots of data 19:22:16 table2: .db more data 19:22:17 ... 19:22:27 foo: .word table1, table2, .... 19:22:31 well guess what 19:22:39 foo does not contain a table of pointers 19:22:59 it contains a table of FUDGED offsetish looking values that i cant interpret 19:23:28 tables 1,2,3,N are all at address 0x0800xxxx 19:23:40 the foo table has values like 0x0000xxxx 19:29:36 odd 19:30:00 not odd 19:30:03 FUCKED UP 19:30:14 like everything that has GNU applied to it 19:31:50 I think I'll just make the top register of the stack 32 bits. If you pop, the high order half is gone. But + will flow into it, and left and right shift will work properly on it. 19:33:14 --- quit: mathrick (Ping timeout: 248 seconds) 19:59:09 --- join: _dinya__ (~Denis@90.150.122.129) joined #forth 20:02:07 --- quit: |dinya_| (Ping timeout: 265 seconds) 20:05:10 --- quit: TR2N (Ping timeout: 245 seconds) 20:06:28 --- join: TR2N (email@89-180-145-189.net.novis.pt) joined #forth 20:30:09 --- quit: maht (Ping timeout: 265 seconds) 20:38:52 --- join: maht (~maht__@85-189-31-174.proweb.managedbroadband.co.uk) joined #forth 20:52:34 --- join: nighty__ (~nighty@210.188.173.245) joined #forth 21:12:38 --- part: TR2N left #forth 21:20:48 --- quit: proteusguy (Ping timeout: 268 seconds) 21:32:20 --- join: proteusguy (~proteusgu@zeppelin.proteus-tech.com) joined #forth 22:54:56 i440r: you want .long 22:55:17 .word is 16-bit 22:55:27 not on an arm 22:55:38 .word is 32 bits. .hword is 16 bits 22:55:45 it is 22:56:00 .word is 32 bits on arm :) 22:56:14 some other arm assembler might have different syntax, but that is irrelevant 22:58:23 oh wait, the size of .word is actually target-dependent 23:00:45 told u :P 23:01:25 Maybe you are looking for .4byte ? 23:01:29 --- join: madwork_ (~madgarden@204.138.110.15) joined #forth 23:01:53 gnu as is such a pain 23:04:44 no shit :) 23:04:49 --- quit: madwork (Ping timeout: 265 seconds) 23:06:18 I gave up on it. ARM's asm is not a bundle of joy.. but atleast it is not gas 23:08:22 Hah. 23:13:16 arm asm is way cool 23:14:31 I440r: I meant the assembler ARM produces. You better not mean that is way cool :) 23:14:38 oh 23:14:57 yea. that is a pile of trash, it doesnt even know where the end of the source file is unless you tell it 23:15:52 First negative thing I have heard about arm. 23:16:08 I should have purchased their stock when it was under $3 a share. 23:16:09 :( 23:16:11 arm make awesome hardware but they SUCK at software 23:16:30 Deformative: Yes. Don't get me wrong. I wuv ARM.. the hardware :) 23:17:07 Mmm, I am going to be working at qualcomm this summer. 23:17:09 So that's fun. 23:17:15 Arm technology everywhere. 23:19:47 Well, I should sleep. 23:19:51 o/ 23:20:53 sleep well 23:21:04 I should too. Only got 14 hrs in before I woke up. 23:32:45 --- join: ygrek (debian-tor@gateway/tor-sasl/ygrek) joined #forth 23:45:38 --- join: kar8nga (~kar8nga@jol13-1-82-66-176-74.fbx.proxad.net) joined #forth 23:54:38 --- quit: ygrek (Ping timeout: 245 seconds) 23:59:59 --- log: ended forth/10.03.02