* * * * *
                                        
  99 ways to program a hex, Part 23: C89, const correctness, assertive, system
                      calls, full buffering, lookup table
                                        
> From: Mark Grosberg <XXXXXXXXXXXXXXXXXXXXX>
> To: Sean Conner <sean@conman.org>
> Subject: Boston: Well, since you're in the land of non-portability …
> Date: Sun, 29 Jan 2012 05:55:00
> 
> > static void hexout(char *dest,unsigned long value,size_t size,const int
> > padding)
> > {
> >   assert(dest != NULL);
> >   assert(size >  0);
> >   assert((padding >= ' ') && (padding <= '~'));
> >   
> >   dest[size] = padding;
> >   while(size--)
> >   {
> >     dest[size] = (char)((value & 0x0F) + '0');
> >     if (dest[size] > '9') dest[size] += 7;
> >     value >>= 4;
> >   }
> > }
> > 
> 
> You're also in the land of ASCII (American Standard Code for Information
> Interchange) specificness. Couldn't you make that:
> 
> dest[size] = "0123456789ABCDEF"[value & 0x0f];
> 
> And then not be tied to ASCII? You could also then switch out that array
> pointer if you wanted to get a mix of uppercase, lower case depending on
> what you need.
> 
> -MYG
> 

I initially reject the idea of doing this. My reasoning? The code itself is
already non-portable, being restricted to a Posix [1]-like system. So what's
one more non-portable item on the list? The sequence if (dest[size] > '9')
dest[size] += 7 is around six (for a lot of architectures that aren't RISC
(Reduced Instruction Set Computer) based) to twelve bytes (RISC systems) in
size, and now you want to add an additional 16 bytes? [He asks, working from
a system with a few gigabytes of RAM (Random Access Memory) —Editor] [Shut
up! –Sean]. Also, in my nearly 30 years of working with computers, I've yet
to come across a non-ASCII based computer system.

Yes, there are a few. Baudot code [2] perhaps being the oldest and perhaps,
the oddest one. Then there are the 6-bit character encoding schemes [3] and
Radix-50 [4], which pack multiple 6-bit characters per “word” of storage
(where a “word” could be 16, 18, 32, 36, 60 or 66 bits in size) and varied
from system to system. And let's not forget EBCDIC (Extended Binary Coded
Decimal Interchange Code) [5], one of about six nearly identical, but
maddendly different, encoding schemes developed by IBM [6]. All of these were
developed for machines in the 60s, but ASCII won out in the end, being the
most widely used and at the core of Unicode [7].

So I asked on a mailing list of classic computer enthusiasts:

> From: Sean Conner <spc@conman.org>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 11:21:02 -0500
> 
> A friend recently raised an issue with some code I wrote (a hex dump
> routine) saying it depended upon ASCII and thus, would break on non-ASCII
> based systems (and proposed a solution, but that's beside the issue here).
> I wrote back, saying the code in question was non-portable to begin with
> (since it depended upon read() and write()—it was targetted at Posix based
> systems) and besides, I've never encountered a non-ASCII system in the
> nearly 30 years I've been using computers.
> 
> So now I'm wondering—besides Baudot, 6-bit BCD (Binary Coded Decimal) and
> EBCDIC (Extended Binary Coded Decimal Interchange Code), is there any other
> encoding scheme used? And of Baudot, 6-bit BCD and EBCDIC, are there any
> systems using those encoding schemes AND have a C compiler available?
> 
> -spc (Or can I safely assume ASCII and derivatives these days?)
> 

I figure if anyone knew the answer, these people would (many of them not only
use computers like the PDP-10 [8], but use them as heaters during the winter
months).

The answers were fascinating.

> From: "Shoppa, Tim" <XXXXXXXXXXXXXXXXX>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: Re: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 13:18:55 -0500
> 
> IBM has a very handy page on C compatibility with EBCDIC system services:
> 
> http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html [9]
> 

> From: "Dave" <XXXXXXXXXXXXXXXXXXXX>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: RE: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 19:33:06 -0000
> 
> Please consider other character codes. An EBCDIC port of GCC is alive and
> well on several of the "legacy" operating systems (MVS, VM and Music) that
> run on the Hercules IBM 360/370/XA/390/z emulator. And whilst zLinux runs
> in ASCII (or whatever it uses to get more than 256 points in a code page)
> many zLinux sites also have the zVM hypervisor, which includes an optional
> EBCDIC C compiler. Having ported the BREXX interpreter to this environment
> I was stung by the fact that the original author had made assumptions about
> character ordering that are not true on an EBCDIC platform.
> 

> From: Phil Budne <XXXXXXXXXXXXXXXXX>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: Re: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 13:00:52 -0500
> 
> See “IBM libascii functions for z/OS UNIX System Services”
> 
> http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html [10]
> 
> Overview

>         The libascii functions are integrated into the base of the Language
>         Environment. They help you port ASCII-based C applications to the
>         EBCDIC-based z/OS UNIX environment.
> 

> From: Nemo <XXXXXXXXXXXXXXXX>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: Re: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 13:32:06 -0500
> 
> z/OS is not only POSIX, it is UNIX (see
> http://www.opengroup.org/openbrand/register/brand3470.htm [11]).
> 

Oh.

Well then … 

I figure I would then try Mark's suggestion (and several other people on the
mailing list suggested the same thing) and at least time the change to see if
it's a worthwhile change for such odd-looking, but legal, C code.

> /*************************************************************************
> *
> * Copyright 2012 by Sean Conner.  All Rights Reserved.
> *
> * This program is free software; you can redistribute it and/or
> * modify it under the terms of the GNU General Public License
> * as published by the Free Software Foundation; either version 2
> * of the License, or (at your option) any later version.
> *
> * This program is distributed in the hope that it will be useful,
> * but WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> * GNU General Public License for more details.
> *
> * You should have received a copy of the GNU General Public License
> * along with this program; if not, write to the Free Software
> * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
> *
> * Comments, questions and criticisms can be sent to: sean@conman.org
> *
> *************************************************************************/
> 
> /* Style: C89, const correctness, assertive, system calls, full buffering */
> /*	  lookup table */
> 
> #include <stdlib.h>
> #include <string.h>
> #include <errno.h>
> #include <assert.h>
> 
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> 
> #define LINESIZE	16
> 
> /********************************************************************/
> 
> extern const char *sys_errlist[];
> extern int         sys_nerr;
> 
> static void	do_dump		(const int,const int);
> static size_t	dump_line	(char **const,unsigned char *,size_t,const unsigned long);
> static void	hexout		(char *const,unsigned long,size_t,const int);
> static void	myperror	(const char *const);
> static size_t	myread		(const int,char *,size_t);
> static void	mywrite		(const int,const char *const,const size_t);
> 
> /********************************************************************/
> 
> int main(const int argc,const char *const argv[])
> {
>   if (argc == 1)
>     do_dump(STDIN_FILENO,STDOUT_FILENO);
>   else
>   {
>     int i;
>     
>     for (i = 1 ; i < argc ; i++)
>     {
>       int fhin;
>       
>       fhin = open(argv[i],O_RDONLY);
>       if (fhin == -1)
>       {
>         myperror(argv[i]);
>         continue;
>       }
>       
>       mywrite(STDOUT_FILENO,"-----",5);
>       mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
>       mywrite(STDOUT_FILENO,"-----\n",6);
>       
>       do_dump(fhin,STDOUT_FILENO);
>       if (close(fhin) < 0)
>         myperror(argv[i]);
>     }
>   }
>   
>   return EXIT_SUCCESS;
> }
>       
> /************************************************************************/     
> 
> static void do_dump(const int fhin,const int fhout)
> {
>   unsigned char  buffer[4096];
>   char           outbuffer[75 * 109];
>   char          *pout;
>   unsigned long  off;
>   size_t         bytes;
>   size_t         count;
>   
>   assert(fhin  >= 0);
>   assert(fhout >= 0);
> 
>   memset(outbuffer,' ',sizeof(outbuffer));
>   off      = 0;
>   count    = 0;
>   pout     = outbuffer;
>   
>   while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
>   {
>     unsigned char *p = buffer;
>     
>     for (p = buffer ; bytes > 0 ; )
>     {
>       size_t amount;
>       
>       amount    = dump_line(&pout,p,bytes,off);
>       p        += amount;
>       bytes    -= amount;
>       off      += amount;
>       count++;
>       
>       if (count == 109)
>       {
>         mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
>         memset(outbuffer,' ',sizeof(outbuffer));
>         count    = 0;
>         pout     = outbuffer;
>       }      
>     }
>   }
>   
>   if ((size_t)(pout - outbuffer) > 0)
>     mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
> }
> 
> /********************************************************************/
> 
> static size_t dump_line(
> 	char                **const pline,
> 	unsigned char              *p,
> 	size_t                      bytes,
> 	const unsigned long         off
> )
> {
>   char   *line;
>   char   *dh;
>   char   *da;
>   size_t  count;
>   
>   assert(pline  != NULL);
>   assert(*pline != NULL);
>   assert(p      != NULL);
>   assert(bytes  >  0);
>   
>   line = *pline;
>   
>   hexout(line,off,8,':');
>   if (bytes > LINESIZE)
>     bytes = LINESIZE;
>   
>   p  += bytes;
>   dh  = &line[10 + bytes * 3];
>   da  = &line[58 + bytes];
>   
>   for (count = 0 ; count < bytes ; count++)
>   {
>     p  --;
>     da --;
>     dh -= 3;
>     
>     if ((*p >= ' ') && (*p <= '~'))
>       *da = *p;
>     else
>       *da = '.';
>     
>     hexout(dh,(unsigned long)*p,2,' ');
>   }
>   
>   line[58 + count] = '\n';
>   *pline = &line[59 + count];
>   return count;
> }
> 
> /**********************************************************************/  
> 
> static void hexout(char *const dest,unsigned long value,size_t size,const int padding)
> {
>   assert(dest != NULL);
>   assert(size >  0);
>   assert((padding >= ' ') && (padding <= '~'));
>   
>   dest[size] = padding;
>   while(size--)
>   {
>     dest[size] = "0123456789ABCDEF"[value & 0x0f];
>     value >>= 4;
>   }
> }
> 
> /************************************************************************/
> 
> static void myperror(const char *const s)
> {
>   int err = errno;
>   
>   assert(s != NULL);
>   
>   mywrite(STDERR_FILENO,s,strlen(s));
>   mywrite(STDERR_FILENO,": ",2);
>   
>   if (err > sys_nerr)
>     mywrite(STDERR_FILENO,"(unknown)",9);
>   else
>     mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
>   mywrite(STDERR_FILENO,"\n",1);
> }
> 
> /************************************************************************/
> 
> static size_t myread(const int fh,char *buf,size_t size)
> {
>   size_t amount = 0;
>   
>   assert(fh   >= 0);
>   assert(buf  != NULL);
>   assert(size >  0);
>   
>   while(size > 0)
>   {
>     ssize_t bytes;
>     
>     bytes = read(fh,buf,size);
>     if (bytes < 0)
>     {
>       myperror("read()");
>       exit(EXIT_FAILURE);
>     }
>     if (bytes == 0)
>       break;
>     
>     amount += bytes;
>     size   -= bytes;
>     buf    += bytes;
>   }
>   
>   return amount;
> }
> 
> /*********************************************************************/  
>   
> static void mywrite(const int fh,const char *const msg,const size_t size)
> {
>   assert(fh   >= 0);
>   assert(msg  != NULL);
>   assert(size >  0);
>   
>   if (write(fh,msg,size) < (ssize_t)size)
>   {
>     if (fh != STDERR_FILENO)
>       myperror("output");
>       
>     exit(EXIT_FAILURE);
>   }
> }
> 
> /***********************************************************************/
> 

It can't be that much faster, can it?

> [spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null
> 
> real    0m0.468s
> user    0m0.450s
> sys     0m0.018s
> [spc]lucy:~/projects/99/src>time ./23 ~/bin/firefox/libxul.so >/dev/null
> 
> real    0m0.257s
> user    0m0.245s
> sys     0m0.012s
> 

Almost twice as fast as what I thought was the fastest version already.

Ouch.

Several people (including Mark) mentioned that on modern CPUs, a branch
instruction is like hitting a brick wall.

Yes, it's quite apparent that that is true.

But this does give me an idea for removing one more [DELETED-brick wall-
DELETED] branch point …

* Part 22: C89, const correctness, assertive, system calls, full buffering
  [12]
* Part 24: more lookup tables [13]

[1] http://en.wikipedia.org/wiki/POSIX
[2] http://en.wikipedia.org/wiki/Baudot_code
[3] http://en.wikipedia.org/wiki/BCD_(6-bit)
[4] http://en.wikipedia.org/wiki/DEC_Radix-50
[5] http://en.wikipedia.org/wiki/EBCDIC
[6] http://www.ibm.com/
[7] http://unicode.org/
[8] http://www.columbia.edu/cu/computinghistory/pdp10.html
[9] http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html
[10] http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html
[11] http://www.opengroup.org/openbrand/register/brand3470.htm
[12] gopher://gopher.conman.org/0Phlog:2012/01/30.1
[13] gopher://gopher.conman.org/0Phlog:2012/02/01.3

Email Sean Conner at sean@conman.org

.