Adventures of a Programmer: Parser Writing Peril IV

I didn’t know that libtommath has such a convoluted building process; more than half a dozen files need a touch to add a single file to the library!
So for now: to something different.

Mise en Place

It is seen as one of the fundamentals of every good kitchen to have everything right at hand and well prepared when service starts. The very same view holds in programming, too.

The final program—the calulator— needs something to hold some settings some-body might seem worth to keep and, more so wants to change. That’s what the so called configuration files are made for.
We could use one of the many libraries available but we don’t. Who would have thought. The two simple reasons: I was not able to find a library with a fitting license and most of these libraries are way to complex, not flexible enough or lacked some functions needed.

We don’t need much, just a simple “key = value” list where neither “key” nor “value” exceeds the length of some dozen bytes. A simple task one is tempted to think and a simple task it is. In some way. Or the other.
Reading such a file is simple. The following file shall be our example file.

me@home:~/PARSER/$ cat test.ini 
# Example config
key_one=This is a test #  stringvalue
key_two = 1234 # number value
# key = value

gooood_entry  =  4
notherkey = -----------                 
when_the_value_left_its_key =
# no pairs over multiple lines, should throw an error for this lines
# when commented out

another = 1
another = 2
undetected error = 4
nested_comments = 4#5#6 should give only "4" 
#commented = out
 # commented = out, too

I put two errors in it. The first one is commented out: a multi-line value. We do not need multi-line values. BTW: most famous last words start with “We don’t need no…”. The second one will not get detected by the parser by design: the white space in the key “undetected error”. Those errors will get their proper treatment by a different function because they are not a technical error. The delimiter between key and value is an equal sign and the delimiter between the individual pairs is a new-line, that’s all of the grammar of our configure-file-parser there is with the only extra rule that neither keys nor values may start and/or end with a white-space. So the following two entries are equal (with the underbars denoting white-space):

key=value
___key________=_________________value_____________

The reading part suffers a bit from bit-juggling but should nevertheless be legible.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <errno.h>

/* This should be enough for everyone */
#define MAX_LINE 256

int read_config(FILE* file){
    char line[MAX_LINE];
    char *start,*end, *name, *value;
    int lineno=0,error=0;

    while (fgets(line, MAX_LINE, file) != NULL) {
       /* We work on  copy */
       start = line;
       /* Line numbering starts at 1 because the first line should be the 1st */
       lineno++;
       /* Early out here because expected length is at most a quarter of
          MAX_LINE */
       if(strlen(line)+1 ==  MAX_LINE){
         return lineno;
       }
       /* Get rid of leading space */
       while (*start && isspace((int)(*start)))start++;
       /* Single line comment: skip the whole line */
       if(*start == '#'){
         continue;
       }
       /* Rubbish? Skip line */
       else if(strlen(line) < 3){
         continue;
       }
       else if(*start){
         /* Set a pointer to the first occurence of the equal sign if any*/
         end = strchr(start,'=');
         if(end != NULL){
           /* Overwrite the eqal sign with a zero, marking "end of string" */
           *end = '\0';
           /* Trim white-space from beginning and end */
           name = trim(start);
           /* Step pointer one further to the part behind the equal sign*/
           end++;
           /* value might be empty but that is not our problem */
           value = remove_comment(trim(end));
           /* Do what ever needs to done with the tuple */
           printf("\"%s\" = \"%s\"\n",name,value);
         }
         else{
           /* A lonely key at line lineno. Or a lonely value, who knows */
           return lineno;
         }
       }
       /* something unexpected happened at line lineno */
       else if(!feof(file)){
         error = lineno;
       }   
    } 
    return error;
}

The two helper function did ot get the attention they deserve, I’m afraid, so here they are:

char * trim(char *s){
    size_t size;
    char *end;
    size = strlen(s);
    if (!size)return s;
    end = s + size - 1;
    while (end >= s && isspace(*end))end--;
    *(end + 1) = '\0';
    while (*s && isspace(*s))s++;
    return s;
}
char * remove_comment(char *s){
    char *end = strchr(s,'#');
    if(end != NULL) *end = '\0';
    return s;
}

The only interesting thing is, that the trim() function trims the end first. It does it that way to be able to replace the leftmost white_space with a zero, such that the next loop can stop there. Getting rid of the comments is easily done by replacing the hash with a zero. Both functions work on the original.

Writing the configure file back is a bit more work if we want to keep the original comments. The file will be very small which makes it possible to work with an in-memory copy.

One little problem with that approach: we need to know the file size to avoid trying to read a very large file. It would work without, we could just reallocate as needed but i wanted to show what happens when you sail out of the save haven that is standard ISO-C.
To use the POSIX variant we need to add a bit to the preliminaries.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <errno.h>

#if (defined _POSIX_C_SOURCE) &&  (_POSIX_C_SOURCE >= 200112L)
#include <sys/types.h>
#include <sys/stat.h>
#include <limits.h>
#endif

And in the code itself:

#if (defined _POSIX_C_SOURCE) &&  (_POSIX_C_SOURCE >= 200112L)
    struct stat st;
    if(fstat(fileno(file), &st) <0 ){
      return errno;
    }
    if(st.st_size > (off_t)LONG_MAX){
      /* Wut? */
      errno = EFBIG;
      return errno;
    }
    file_size = (long) st.st_size;
#else
    if (fseek(file, 0, SEEK_END) < 0){
      return errno;
    }
    file_size = ftell(file);
    if(file_size<0){
      return errno;
    }
#endif
    /* Both ways seems to change the position of the FILE pointer */
    rewind(file);

Both have their advantages and their disadvantages, too. The POSIX way can measure the size of binary files but has trouble to do so with some device files and vice versa. We have only a small little textfile and could just happily go with the ISO-C way.

int write_config(FILE* file){
    char line[MAX_LINE];
    char *start,*end, *name, *value;
    char *config, *end_config, temp_config[MAX_LINE];
    int lineno=0,error=0;
    long file_size=0, mem_size, available_mem_size;
    size_t line_length;

    errno = 0;

#if (defined _POSIX_C_SOURCE) &&  (_POSIX_C_SOURCE >= 200112L)
    struct stat st;
    if(fstat(fileno(file), &st) <0 ){
      return errno;
    }
    if(st.st_size > (off_t)LONG_MAX){
      /* Wut? */
      errno = EFBIG;
      return errno;
    }
    file_size = (long) st.st_size;
#else
    if (fseek(file, 0, SEEK_END) < 0){
      return errno;
    }
    file_size = ftell(file);
    if(file_size<0){
      return errno;
    }
#endif
    rewind(file);
    /* allocate memory for the in-memory copy */
    config = malloc(file_size + SOME_EXTRA_MEM);
    if(config == NULL){
      return errno;
    }
    /* keep the number for further use*/
    available_mem_size = file_size + SOME_EXTRA_MEM;
    /* We need some pointers for the juggling */
    end_config = config;
    /* The memory already used */
    mem_size = 0;

    while (fgets(line, MAX_LINE, file) != NULL) {
       start = line;
       lineno++;
       while (*start && isspace((unsigned char)(*start))){
          /* Basically the same as in reading but put every ws in memory*/
          /* 
             This is also the single place assuming preallocated memory. Placing
             a reallocation subroutine here would make the hassle with detecting
             the size of the file come to an end.
           */
          *end_config = *start;
          mem_size++;
          end_config++;
          start++;
       }
       if(*start == '#'){
         /* Keep the comments */
         line_length = strlen(line);
         /* memory allocated might not be enough */
         while((mem_size + (long)line_length +1) > available_mem_size){
            config = realloc(config,available_mem_size*2);
            if(config == NULL){
              return errno;
            }
            available_mem_size *= 2;
         }
         /* We know that we have enough memory, so strcat will do */
         config = strcat(config,line);
         end_config = config + strlen(config);
         /* Keep ledger up-to-date */
         mem_size += line_length +1;
         continue;
       }
       /* Yes, we keep everything, even the rubbish */
       else if(strlen(line) < 3){
         while((mem_size +(long)strlen(line) +1 ) > available_mem_size){
           config = realloc(config,available_mem_size*2);
           if(config == NULL){
             return errno;
           }
           available_mem_size *= 2;
         }
         config = strcat(config,line);
         end_config = config + strlen(config);
         mem_size += strlen(line);
         continue;
       }
       else if(*start){
         end = strchr(start,'=');
         if(end != NULL){
           *end = '\0';
           name = trim(start);
           end++;
           value = remove_comment(trim(end));
           /* The same as in reading, so we get both, key and value */
           if(strcmp(name,"gooood_entry") == 0){
             value = "The value for the key \"gooood_entry\" has been changed";
           }
           /* additional three bytes for " = " and two for line-end and zero*/
           /* make that three for Windows! */
           line_length = strlen(name) + 3 + strlen(value) + 2;
           while((mem_size + (long)line_length) > available_mem_size){
             config = realloc(config,available_mem_size*2);
             if(config == NULL){
               return errno;
             }
             available_mem_size *= 2;
           }
           if(snprintf(temp_config,line_length,"%s = %s\n",name,value) < 0){
             /* set errno to a value to be able to detect where it came from */
             return errno;
           }
           config = strcat(config,temp_config);
           end_config = config + strlen(config);    
           mem_size += line_length;
         }
       }

       else if(!feof(file)){
         /* 
            Set something externally to make clear that the following is not
            a value of errno.
          */
         error = lineno;
       }   
    }
    /* Let's add one more tuple, because we can */
    name  = "additional_key";
    value = "additional value: 3628800";
    line_length = strlen(name) + 3 + strlen(value) + 2;
    if(snprintf(temp_config,line_length,"%s = %s\n",name,value) < 0){
      /* set errno to a value to be able to detect where it came from */
      return errno;
    }
    while((mem_size + (long)line_length) > available_mem_size){
      config = realloc(config,available_mem_size*2);
      if(config == NULL){
        return errno;
      }
      available_mem_size *= 2;
    }
    config = strcat(config,temp_config);
    /* Print to a file, here: stdout */
    printf("%s",config);
    /* Give memory back to OS */
    free(config);
    return errno;
}

The main function does not check for every error.

int main(int argc, char **argv){
    FILE* file;
    int error;

    if(argc < 2){
      fprintf(stderr,"Usage: %s filename\n",argv[0]);
      exit(EXIT_FAILURE);
    }

    file = fopen(argv[1], "r");
    if (!file){
      fprintf(stderr,"Opening file: \"%s\" failed\n",argv[1]);
      exit(EXIT_FAILURE);
    }
    error = read_config(file);
    printf("\n\tChanging one entry and adding one at the end\n\n");
    if(error){
      fprintf(stderr,"Error in reading at line: %d\n",error);
      exit(EXIT_FAILURE);
    }
    error = write_config(file);   
    fclose(file);
    if(error){
      fprintf(stderr,"Error in writing. errno: %d\n",error);
      exit(EXIT_FAILURE);
    }
    exit(EXIT_SUCCESS);
}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s