Wednesday, July 11, 2018

Text Adventure Programming in C, Part 1

TEXT ADVENTURE PROGRAMMING IN C

First post of this new blog. Hello!

INTRODUCTION:

Previously, I had run a blog on programming in C++. It was, as this one will also be, a sort of diary of programming in which I listed my experiments. It was mostly for my benefit, for online self reference. However, I noticed several people learning to program used it for obtaining tips. One particular section, in fact, was quite popular; Text Adventures. These are, indeed, entertaining to program for a beginner in a language. Therefore, it is where I will start this current blog.

This blog is about the language C. There will be little reference to other languages. I am also going to write it without bogging down in jargon and technicalities in as much as that is possible, so that it remains a reasonably light-hearted reference / read through for general re familiarization purposes. The next few posts are going to be about text adventure programming with the C language. I expect everyone knows what text adventures are. So, with that said, straight to it...

It goes without saying, text adventures require text input! Now, where this was easy and reasonably protected in C++ with the Standard Template Libraries, particularly the STL string functions (not to be confused with "string.h"), it becomes a little bit more problematic in C. Some care is required to avoid potentially dangerous situations for the program, like buffer overflows. Personally, I like it; it teaches you to think about your program with a little more care than simply invoking some function that "does everything for you". C style strings can be a cause of huge headaches for people who have followed some C++ tutorial that only teaches the use of the STL string functions, and never covered what is really happening with strings. So, this first section of the Text Adventure in C posts is going to cover C style strings, and how to manipulate them in various ways that will not crash your program or produce unexpected characters in your strings.

C STRINGS:

A string is an array of characters. Here is one common way of how you can declare a string...

     char c_str[10];

This is an empty string of a fixed number of characters, ten of them, from positions 0 to 9. To be correct (and legal for the program) the last one must be NULL TERMINATED. That means, this string can hold a maximum of 9 characters, with a null terminator at the end. This is a null terminator "character", as you would write it in your programs...

   '\0'

The null terminator tells the program the string is "finished". It does not necessarily have to be in the last position holder of the character array (the string). But it should be somewhere in that array. The program will not try to read the string beyond it (unless you write code to try and force it to, which is not a good idea). For example, the string array declared above has 10 "places" for characters. Here is a word that could be held in that string...

   c_str[0] = 'H';
   c_str[1] = 'E';
   c_str[2] = 'L';
   c_str[3] = 'L';
   c_str[4] = 'O';
   c_str[5] = '\0';

HELLO. A five letter word. That is a perfectly legal (and the simplest) way to assign characters to the string. Note the null terminator at the end. The whole thing is of a "length" less than the 10 places reserved for the string. If you tried to assign more than 10 characters to this string, you would end up with a situation known as a buffer overflow...

   c_str[0] = 'Q';
   c_str[1] = 'U';
   c_str[2] = 'I';
   c_str[3] = 'X';
   c_str[4] = 'O';
   c_str[5] = 'T';
   c_str[6] = 'I';
   c_str[7] = 'C';
   c_str[8] = 'A';
   c_str[9] = 'L';
   c_str[10] = 'L';
   c_str[11] = 'Y';
   c_str[12] = '\0';

This is bad. Very bad. There was no memory reserved for characters 10 to 12 in the original declaration of the character array. They, essentially, are "invading" memory that the program may try and use for other things later. Even worse, several compilers will accept this situation without question, leading you to believe that everything is A-okay with your program at compile time, when it is clearly not.

The moral or the story; be mindful of this happening. Make sure that the size of your character array is enough for the number of characters you may ever want to put into it. Conversely, if there is a risk that a string may accidentally end up with more characters than it should during runtime, write code that will check and restrict it to the length of the string. And don't forget to include, in that code, a small procedure to add the null terminator at the end. These are things that C does not necessarily protect you from, as a programmer. Granted, some functions in the C header files for string manipulation do attempt to offer some protection, to an extent, but not all of them. Take care.

Now, note that in the example above, each letter (character) was assigned individually to its respective position in the array. Each character was surrounded by a pair of single quotation marks ('Q'). This symbology (tokens) was used to specify that what we were assigning was indeed a character. If double quotation marks are used, the program assumes that a complete string is being assigned. And indeed, such a complete, "ready made" string can be assigned to a character array at declaration, like this...

   char c_str[10] = "HELLO";

...or even...

   char c_str[] = "HELLO";

The first example will occupy six places of specified ten place long array (five for HELLO, and another for the null terminator). With this method, again, you have to be careful of avoiding a buffer overflow. The second example, which does not specify the length of the array (empty brackets) actually creates a character array of the correct and exact length for the string being assinged to it, that is, six places.

This method works at declaration time of the character array. It will not work on a string that has already been created, like this...

   char c_str[10];
   c_str = "HELLO";

This is not allowed. Once the array has been created, each character will need to be assigned individually to its respective place, as we did further above (and repeat here)...

   c_str[0] = 'H';
   c_str[1] = 'E';
   c_str[2] = 'L';
   c_str[3] = 'L';
   c_str[4] = 'O';
   c_str[5] = '\0';

Aside from a function or two in string.h that solves this issue, there is, fortunately, another solution; the string literal. The string literal looks just like a pointer declaration (which it is), behaves as an array of characters, automatically adds a null terminator to the end, and can be accessed like an array. It looks confounding in that it can be used as such without apparently having had memory reserved for it, but it has one limitation. It is created in read only memory, so it cannot be accessed for the purpose of modification, like a true array can. Here's a string literal;

   const char *str_lit = "Hello";

It is probably best used as a temporary measure, as a local variable with no persistence, inside a function. In fact, using a string literal, it is completely possible to recreate the strcpy() function...

   void my_strcpy(char *str, const char *const_str)
   {
       int i = 0;
       do
       {
           *(str + i) = *(const_str + i);
           // str[i] = const_str[i]; // Either way is good.
           i++;
       }while(*(const_str + i) != '\0');
   }

You can use that function this way...

     my_strcpy(c_str, "Hello");

...like the string.h strcpy() function. Here is a screen shot of that program, in which "Hello" is taken as a string literal argument in the function, copied to c_str, and which is also testing within the main loop() that the null terminator is present in the final string output from the function...



And that is pretty much a fair starter or refresher for C style strings. Next time, text input and output, to start programming the text adventure, plus some more string manipulation.

All the best!

No comments:

Post a Comment