* * * * * 99 ways to program a hex, Part 23: C89, const correctness, assertive, system calls, full buffering, lookup table > From: Mark Grosberg > To: Sean Conner > Subject: Boston: Well, since you're in the land of non-portability … > Date: Sun, 29 Jan 2012 05:55:00 > > > static void hexout(char *dest,unsigned long value,size_t size,const int > > padding) > > { > > assert(dest != NULL); > > assert(size > 0); > > assert((padding >= ' ') && (padding <= '~')); > > > > dest[size] = padding; > > while(size--) > > { > > dest[size] = (char)((value & 0x0F) + '0'); > > if (dest[size] > '9') dest[size] += 7; > > value >>= 4; > > } > > } > > > > You're also in the land of ASCII (American Standard Code for Information > Interchange) specificness. Couldn't you make that: > > dest[size] = "0123456789ABCDEF"[value & 0x0f]; > > And then not be tied to ASCII? You could also then switch out that array > pointer if you wanted to get a mix of uppercase, lower case depending on > what you need. > > -MYG > I initially reject the idea of doing this. My reasoning? The code itself is already non-portable, being restricted to a Posix [1]-like system. So what's one more non-portable item on the list? The sequence if (dest[size] > '9') dest[size] += 7 is around six (for a lot of architectures that aren't RISC (Reduced Instruction Set Computer) based) to twelve bytes (RISC systems) in size, and now you want to add an additional 16 bytes? [He asks, working from a system with a few gigabytes of RAM (Random Access Memory) —Editor] [Shut up! –Sean]. Also, in my nearly 30 years of working with computers, I've yet to come across a non-ASCII based computer system. Yes, there are a few. Baudot code [2] perhaps being the oldest and perhaps, the oddest one. Then there are the 6-bit character encoding schemes [3] and Radix-50 [4], which pack multiple 6-bit characters per “word” of storage (where a “word” could be 16, 18, 32, 36, 60 or 66 bits in size) and varied from system to system. And let's not forget EBCDIC (Extended Binary Coded Decimal Interchange Code) [5], one of about six nearly identical, but maddendly different, encoding schemes developed by IBM [6]. All of these were developed for machines in the 60s, but ASCII won out in the end, being the most widely used and at the core of Unicode [7]. So I asked on a mailing list of classic computer enthusiasts: > From: Sean Conner > To: Classic Computer Talk > Subject: C compilers and non-ASCII systems > Date: Tue, 31 Jan 2012 11:21:02 -0500 > > A friend recently raised an issue with some code I wrote (a hex dump > routine) saying it depended upon ASCII and thus, would break on non-ASCII > based systems (and proposed a solution, but that's beside the issue here). > I wrote back, saying the code in question was non-portable to begin with > (since it depended upon read() and write()—it was targetted at Posix based > systems) and besides, I've never encountered a non-ASCII system in the > nearly 30 years I've been using computers. > > So now I'm wondering—besides Baudot, 6-bit BCD (Binary Coded Decimal) and > EBCDIC (Extended Binary Coded Decimal Interchange Code), is there any other > encoding scheme used? And of Baudot, 6-bit BCD and EBCDIC, are there any > systems using those encoding schemes AND have a C compiler available? > > -spc (Or can I safely assume ASCII and derivatives these days?) > I figure if anyone knew the answer, these people would (many of them not only use computers like the PDP-10 [8], but use them as heaters during the winter months). The answers were fascinating. > From: "Shoppa, Tim" > To: Classic Computer Talk > Subject: Re: C compilers and non-ASCII systems > Date: Tue, 31 Jan 2012 13:18:55 -0500 > > IBM has a very handy page on C compatibility with EBCDIC system services: > > http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html [9] > > From: "Dave" > To: Classic Computer Talk > Subject: RE: C compilers and non-ASCII systems > Date: Tue, 31 Jan 2012 19:33:06 -0000 > > Please consider other character codes. An EBCDIC port of GCC is alive and > well on several of the "legacy" operating systems (MVS, VM and Music) that > run on the Hercules IBM 360/370/XA/390/z emulator. And whilst zLinux runs > in ASCII (or whatever it uses to get more than 256 points in a code page) > many zLinux sites also have the zVM hypervisor, which includes an optional > EBCDIC C compiler. Having ported the BREXX interpreter to this environment > I was stung by the fact that the original author had made assumptions about > character ordering that are not true on an EBCDIC platform. > > From: Phil Budne > To: Classic Computer Talk > Subject: Re: C compilers and non-ASCII systems > Date: Tue, 31 Jan 2012 13:00:52 -0500 > > See “IBM libascii functions for z/OS UNIX System Services” > > http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html [10] > > Overview > The libascii functions are integrated into the base of the Language > Environment. They help you port ASCII-based C applications to the > EBCDIC-based z/OS UNIX environment. > > From: Nemo > To: Classic Computer Talk > Subject: Re: C compilers and non-ASCII systems > Date: Tue, 31 Jan 2012 13:32:06 -0500 > > z/OS is not only POSIX, it is UNIX (see > http://www.opengroup.org/openbrand/register/brand3470.htm [11]). > Oh. Well then … I figure I would then try Mark's suggestion (and several other people on the mailing list suggested the same thing) and at least time the change to see if it's a worthwhile change for such odd-looking, but legal, C code. > /************************************************************************* > * > * Copyright 2012 by Sean Conner. All Rights Reserved. > * > * This program is free software; you can redistribute it and/or > * modify it under the terms of the GNU General Public License > * as published by the Free Software Foundation; either version 2 > * of the License, or (at your option) any later version. > * > * This program is distributed in the hope that it will be useful, > * but WITHOUT ANY WARRANTY; without even the implied warranty of > * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > * GNU General Public License for more details. > * > * You should have received a copy of the GNU General Public License > * along with this program; if not, write to the Free Software > * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. > * > * Comments, questions and criticisms can be sent to: sean@conman.org > * > *************************************************************************/ > > /* Style: C89, const correctness, assertive, system calls, full buffering */ > /* lookup table */ > > #include > #include > #include > #include > > #include > #include > #include > #include > > #define LINESIZE 16 > > /********************************************************************/ > > extern const char *sys_errlist[]; > extern int sys_nerr; > > static void do_dump (const int,const int); > static size_t dump_line (char **const,unsigned char *,size_t,const unsigned long); > static void hexout (char *const,unsigned long,size_t,const int); > static void myperror (const char *const); > static size_t myread (const int,char *,size_t); > static void mywrite (const int,const char *const,const size_t); > > /********************************************************************/ > > int main(const int argc,const char *const argv[]) > { > if (argc == 1) > do_dump(STDIN_FILENO,STDOUT_FILENO); > else > { > int i; > > for (i = 1 ; i < argc ; i++) > { > int fhin; > > fhin = open(argv[i],O_RDONLY); > if (fhin == -1) > { > myperror(argv[i]); > continue; > } > > mywrite(STDOUT_FILENO,"-----",5); > mywrite(STDOUT_FILENO,argv[i],strlen(argv[i])); > mywrite(STDOUT_FILENO,"-----\n",6); > > do_dump(fhin,STDOUT_FILENO); > if (close(fhin) < 0) > myperror(argv[i]); > } > } > > return EXIT_SUCCESS; > } > > /************************************************************************/ > > static void do_dump(const int fhin,const int fhout) > { > unsigned char buffer[4096]; > char outbuffer[75 * 109]; > char *pout; > unsigned long off; > size_t bytes; > size_t count; > > assert(fhin >= 0); > assert(fhout >= 0); > > memset(outbuffer,' ',sizeof(outbuffer)); > off = 0; > count = 0; > pout = outbuffer; > > while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0) > { > unsigned char *p = buffer; > > for (p = buffer ; bytes > 0 ; ) > { > size_t amount; > > amount = dump_line(&pout,p,bytes,off); > p += amount; > bytes -= amount; > off += amount; > count++; > > if (count == 109) > { > mywrite(fhout,outbuffer,(size_t)(pout - outbuffer)); > memset(outbuffer,' ',sizeof(outbuffer)); > count = 0; > pout = outbuffer; > } > } > } > > if ((size_t)(pout - outbuffer) > 0) > mywrite(fhout,outbuffer,(size_t)(pout - outbuffer)); > } > > /********************************************************************/ > > static size_t dump_line( > char **const pline, > unsigned char *p, > size_t bytes, > const unsigned long off > ) > { > char *line; > char *dh; > char *da; > size_t count; > > assert(pline != NULL); > assert(*pline != NULL); > assert(p != NULL); > assert(bytes > 0); > > line = *pline; > > hexout(line,off,8,':'); > if (bytes > LINESIZE) > bytes = LINESIZE; > > p += bytes; > dh = &line[10 + bytes * 3]; > da = &line[58 + bytes]; > > for (count = 0 ; count < bytes ; count++) > { > p --; > da --; > dh -= 3; > > if ((*p >= ' ') && (*p <= '~')) > *da = *p; > else > *da = '.'; > > hexout(dh,(unsigned long)*p,2,' '); > } > > line[58 + count] = '\n'; > *pline = &line[59 + count]; > return count; > } > > /**********************************************************************/ > > static void hexout(char *const dest,unsigned long value,size_t size,const int padding) > { > assert(dest != NULL); > assert(size > 0); > assert((padding >= ' ') && (padding <= '~')); > > dest[size] = padding; > while(size--) > { > dest[size] = "0123456789ABCDEF"[value & 0x0f]; > value >>= 4; > } > } > > /************************************************************************/ > > static void myperror(const char *const s) > { > int err = errno; > > assert(s != NULL); > > mywrite(STDERR_FILENO,s,strlen(s)); > mywrite(STDERR_FILENO,": ",2); > > if (err > sys_nerr) > mywrite(STDERR_FILENO,"(unknown)",9); > else > mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err])); > mywrite(STDERR_FILENO,"\n",1); > } > > /************************************************************************/ > > static size_t myread(const int fh,char *buf,size_t size) > { > size_t amount = 0; > > assert(fh >= 0); > assert(buf != NULL); > assert(size > 0); > > while(size > 0) > { > ssize_t bytes; > > bytes = read(fh,buf,size); > if (bytes < 0) > { > myperror("read()"); > exit(EXIT_FAILURE); > } > if (bytes == 0) > break; > > amount += bytes; > size -= bytes; > buf += bytes; > } > > return amount; > } > > /*********************************************************************/ > > static void mywrite(const int fh,const char *const msg,const size_t size) > { > assert(fh >= 0); > assert(msg != NULL); > assert(size > 0); > > if (write(fh,msg,size) < (ssize_t)size) > { > if (fh != STDERR_FILENO) > myperror("output"); > > exit(EXIT_FAILURE); > } > } > > /***********************************************************************/ > It can't be that much faster, can it? > [spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null > > real 0m0.468s > user 0m0.450s > sys 0m0.018s > [spc]lucy:~/projects/99/src>time ./23 ~/bin/firefox/libxul.so >/dev/null > > real 0m0.257s > user 0m0.245s > sys 0m0.012s > Almost twice as fast as what I thought was the fastest version already. Ouch. Several people (including Mark) mentioned that on modern CPUs, a branch instruction is like hitting a brick wall. Yes, it's quite apparent that that is true. But this does give me an idea for removing one more [DELETED-brick wall- DELETED] branch point … * Part 22: C89, const correctness, assertive, system calls, full buffering [12] * Part 24: more lookup tables [13] [1] http://en.wikipedia.org/wiki/POSIX [2] http://en.wikipedia.org/wiki/Baudot_code [3] http://en.wikipedia.org/wiki/BCD_(6-bit) [4] http://en.wikipedia.org/wiki/DEC_Radix-50 [5] http://en.wikipedia.org/wiki/EBCDIC [6] http://www.ibm.com/ [7] http://unicode.org/ [8] http://www.columbia.edu/cu/computinghistory/pdp10.html [9] http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html [10] http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html [11] http://www.opengroup.org/openbrand/register/brand3470.htm [12] gopher://gopher.conman.org/0Phlog:2012/01/30.1 [13] gopher://gopher.conman.org/0Phlog:2012/02/01.3 Email Sean Conner at sean@conman.org .