Sunday, July 15, 2018

Text Adventure Programming in C, Part 2

TEXT ADVENTURE PROGRAMMING IN C

TEXT INPUT:

So, the basics of C style strings were covered last time. Now it is time to look at how to input strings. Again, in C++, it is no big issue with STL, but is fraught with pitfalls to trip up the unwary with C. There are several ways to input text with C, using the stdio.h functions. There are also as many ways to cause buffer overflows, if the programmer is not careful.

The most basic function for text input is the multipurpose scanf function. However, it is not designed specifically and only for text. It accepts data from stdin, which in most cases is the keyboard. This data is read according to the "format" specified when the function is called, and stored in a variable, by address, also specified when calling the function. The data can be floating point, integer, character, or even a string. scanf receives some bad publicity, I have noticed, but it is not that bad, once it is understood, and remains a good option for a string input if handled properly. I will come back to it.

By far the favorite method for text input, I notice, is the fgets function, also in the stdio.h header. This initially may seem a bad option, because fgets is a function to read data from a file that was opened with fopen. However, fgets can be forced to accept input from the keyboard by using stdin (keyboard) as the "file stream" argument. Here is the prototype of fgets...

char *fgets(char *str, int n, FILE *stream)

What is liked about it is that "int n" part. It limits the number of characters that fgets reads into the string. If n is equal to the length of the string, it prevents fgets from causing buffer overflows in the string. Also, fgets will read spaces and include them in the string, so sentences can be input. scanf will (normally) only read a string up to one word long.

So, inputting something like "The large lake" with...

scanf("%s", c_str);

...will only accept the word "The" into the string, and will ignore the rest after the first space (note, &c_str, is not necessary when assigning to a string with scanf, as arrays are already "addresses").

fgets, with the same input, on the other hand...

fgets(c_str, 15, stdin);

...will accept the whole string: All 14 characters, including spaces, plus the null terminator. Ideal.

But there is, indeed, a potential problem. It is not always realized that when the user types in a line of text for input, a queue of characters is actually being made in the keyboard buffer. If you type in 20 characters and hit Return, yes, 14 of them get sucked into the string, but the last six stay in the keyboard buffer, waiting to get used the next time a command to accept strings or characters is invoked. This causes some pretty annoying problems, as part of the last input becomes the beginning of the next input, whether you wanted it or not. Try inputting "Barcelona is an historical city" into this program...

int main(int argc, char *argv[])
{
    char c_str[15];

    puts("Input a string (14 characters maximum, including spaces)");

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);


    return 0; 
}


What happened? You never got to do the second input. It assigned the 14 characters to the string (and invisibly added the terminator as fgets should)...

"Barcelona is a"

...then without waiting for another input from the user for the second fgets, grabbed as much as it could of the rest of the original input still in the keyboard buffer...

"n historical c"

...and even then, still left the characters "ity" in the keyboard buffer. If this is undesired behavior for your program, then clearly, that keyboard buffer needs to be flushed before the next input.

There is often a recommendation to use fflush(stdin)...

int main(int argc, char *argv[])
{
    char c_str[15];

    puts("Input strings (14 characters maximum, including spaces)");

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    fflush(stdin);

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    fflush(stdin);
}

This may or may not work, depending on the platform (it will work on Windows). However, fflush it is not designed for the purpose of flushing input streams. Look it up here. A better, universal solution is required for flushing the keyboard buffer of excess characters not taken by fgets. This is it...

do{}while(getchar() != '\n');

The loop, with getchar(), will keep reading and discarding characters from the keyboard buffer until it finds the newline at the end, and will discard that too. It is important to do this the way I have done it here, with the do{}while format, and not just...

while(getchar() != '\n');

...like I sometimes see recommended. Doing it this way would leave the '\n' in the buffer, which isn't desired, either. Here is the program done properly...

int main(int argc, char *argv[])
{
    char c_str[15];

    puts("Input strings (14 characters maximum, including spaces)");
    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    do{}while(getchar() != '\n');

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    do{}while(getchar() != '\n');
}

That solves the keyboard buffer problem, and for all intents and purposes could be the end of this post. It will certainly do for a text adventure input function. However, there are another couple of methods of text input in C that I want to look at, for the sake of broadening our options.

One solution that I like is to read the string one character at a time, inside a loop, using getch(). Now, getch() is a version of getchar() that automatically hits return itself after each character is input. It is contained in the conio.h header, which unfortunately is only available on the Windows platform, and there is a Linux version of getch() in curses.h, which would also allow this to be utilized on that platform. Here is a screen shot of the method...


What is important with this method are three things that must be considered;

1). If you try to type in text beyond the size of the string buffer, the loop will stop one short of the end of the buffer, assign a null terminator, and break out of the loop. That is, you were never in any risk of writing beyond the limits of the string and causing a buffer overflow.

2). If you hit Return before the end of the string buffer is reached, the program will assign the null terminator and break out of the loop.

3). Finally, if the user hits Backspace, the iterator (buffer_counter) for the string will move back a place (it is moved back two places because the loop, when it runs again, will move it forward one place). For the visual representation on the console, the putchar(' ') function will over-write the last character with a blank, then move the cursor back one place with putchar('\b'). This allows the user to edit the input effectively.

Nonetheless, for these blog posts on Text Adventures, I will be reverting to good old scanf. It is not as bad as the publicity it gets, so...

Let us hear it for scanf!

So scanf is rubbish? Avoid it at all costs? Well, if you use it incorrectly, maybe so. But it does work, and it works very well for this application, if you get the formatting right. Text adventures normally require a two word input; a verb and a noun. Sometimes, even three words might be acceptable, if you have programmed an option for an adjective in between the verb and the noun.

GO WEST

GET BLUE CARD

...might be examples of commands you would give in a text adventure. An improperly formatted scanf, like this...

scanf("%s", c_str);

...would only pick up GO and GET from each of those inputs, because it stops reading the input at the occurrence of the first white space. And, as we have already seen, it leaves the rest of the user input in the keyboard buffer, ready to ruin your next command input attempt. So, the first thing to force scanf to read spaces. It is done this way...

scanf("%[^\n]s", c_str);

To a beginner, that already looks scary, but it is not. Let us look at another reference to the scanf function...

http://www.cplusplus.com/reference/cstdio/scanf/

The specifier [^character] , form that reference, is a negated scanset. What does this mean? It alters scanf's default behavior so that it does not stop reading the input at the first white space, but at the character the programmer specifies in those brackets. So, if you wanted scanf to read your input into a string up to the first occurrence of the letter 'p' or 'P' (remember, case sensitive), you would do this as the format specifier...

"%[^pP]s"

If you want it to keep collecting characters, including white spaces, until you hit Return, then you would use the "new line" escape character...

"%[^\n]s"

This is all good, but it does not (yet) protect us from a buffer overflow. Fortunately, there is a way. Specify the number of characters to collect...

"%14s"

This will only pass 14 characters to a string once you hit Return, no matter how much you type into the console. Combine both methods, like this...

scanf("%14[^\n]s", c_str);

...and you can collect white spaces and avoid a buffer overflow. The rest of what was typed will stay in the keyboard buffer, and can be cleared out with the same...

do{}while(getchar() != '\n');

...that we have already looked at above. Here is a screen shot of a working program along these lines...


And that is what I will be using for text input for the rest of this text adventure section of the blog.

All the best!

Notes:

Just in case it was not clear in the example above, it is best to use the scanf() width parameter one less than the size of your character buffer (string length). This ensures that there will be space for the null terminator at the end. Like this...

char my_string[50];
scanf("%49[^\n]s", my_string);


Also, make use of scanf()'s return integer. This can help to catch situations where no characters were entered (for example, the user presses return accidentally, without having entered a command). If the string was entered, it will return 1. If nothing was entered, it will return 0.


int scanf_check = scanf("%49[^\n]s", my_string);

if(scanf_check == 0)
{
    printf("No input\n");
}
else
{
    printf("%s\n", my_string);
}

Useful, for a error tight text adventure.



Wednesday, July 11, 2018

Text Adventure Programming in C, Part 1

TEXT ADVENTURE PROGRAMMING IN C

First post of this new blog. Hello!

INTRODUCTION:

Previously, I had run a blog on programming in C++. It was, as this one will also be, a sort of diary of programming in which I listed my experiments. It was mostly for my benefit, for online self reference. However, I noticed several people learning to program used it for obtaining tips. One particular section, in fact, was quite popular; Text Adventures. These are, indeed, entertaining to program for a beginner in a language. Therefore, it is where I will start this current blog.

This blog is about the language C. There will be little reference to other languages. I am also going to write it without bogging down in jargon and technicalities in as much as that is possible, so that it remains a reasonably light-hearted reference / read through for general re familiarization purposes. The next few posts are going to be about text adventure programming with the C language. I expect everyone knows what text adventures are. So, with that said, straight to it...

It goes without saying, text adventures require text input! Now, where this was easy and reasonably protected in C++ with the Standard Template Libraries, particularly the STL string functions (not to be confused with "string.h"), it becomes a little bit more problematic in C. Some care is required to avoid potentially dangerous situations for the program, like buffer overflows. Personally, I like it; it teaches you to think about your program with a little more care than simply invoking some function that "does everything for you". C style strings can be a cause of huge headaches for people who have followed some C++ tutorial that only teaches the use of the STL string functions, and never covered what is really happening with strings. So, this first section of the Text Adventure in C posts is going to cover C style strings, and how to manipulate them in various ways that will not crash your program or produce unexpected characters in your strings.

C STRINGS:

A string is an array of characters. Here is one common way of how you can declare a string...

     char c_str[10];

This is an empty string of a fixed number of characters, ten of them, from positions 0 to 9. To be correct (and legal for the program) the last one must be NULL TERMINATED. That means, this string can hold a maximum of 9 characters, with a null terminator at the end. This is a null terminator "character", as you would write it in your programs...

   '\0'

The null terminator tells the program the string is "finished". It does not necessarily have to be in the last position holder of the character array (the string). But it should be somewhere in that array. The program will not try to read the string beyond it (unless you write code to try and force it to, which is not a good idea). For example, the string array declared above has 10 "places" for characters. Here is a word that could be held in that string...

   c_str[0] = 'H';
   c_str[1] = 'E';
   c_str[2] = 'L';
   c_str[3] = 'L';
   c_str[4] = 'O';
   c_str[5] = '\0';

HELLO. A five letter word. That is a perfectly legal (and the simplest) way to assign characters to the string. Note the null terminator at the end. The whole thing is of a "length" less than the 10 places reserved for the string. If you tried to assign more than 10 characters to this string, you would end up with a situation known as a buffer overflow...

   c_str[0] = 'Q';
   c_str[1] = 'U';
   c_str[2] = 'I';
   c_str[3] = 'X';
   c_str[4] = 'O';
   c_str[5] = 'T';
   c_str[6] = 'I';
   c_str[7] = 'C';
   c_str[8] = 'A';
   c_str[9] = 'L';
   c_str[10] = 'L';
   c_str[11] = 'Y';
   c_str[12] = '\0';

This is bad. Very bad. There was no memory reserved for characters 10 to 12 in the original declaration of the character array. They, essentially, are "invading" memory that the program may try and use for other things later. Even worse, several compilers will accept this situation without question, leading you to believe that everything is A-okay with your program at compile time, when it is clearly not.

The moral or the story; be mindful of this happening. Make sure that the size of your character array is enough for the number of characters you may ever want to put into it. Conversely, if there is a risk that a string may accidentally end up with more characters than it should during runtime, write code that will check and restrict it to the length of the string. And don't forget to include, in that code, a small procedure to add the null terminator at the end. These are things that C does not necessarily protect you from, as a programmer. Granted, some functions in the C header files for string manipulation do attempt to offer some protection, to an extent, but not all of them. Take care.

Now, note that in the example above, each letter (character) was assigned individually to its respective position in the array. Each character was surrounded by a pair of single quotation marks ('Q'). This symbology (tokens) was used to specify that what we were assigning was indeed a character. If double quotation marks are used, the program assumes that a complete string is being assigned. And indeed, such a complete, "ready made" string can be assigned to a character array at declaration, like this...

   char c_str[10] = "HELLO";

...or even...

   char c_str[] = "HELLO";

The first example will occupy six places of specified ten place long array (five for HELLO, and another for the null terminator). With this method, again, you have to be careful of avoiding a buffer overflow. The second example, which does not specify the length of the array (empty brackets) actually creates a character array of the correct and exact length for the string being assinged to it, that is, six places.

This method works at declaration time of the character array. It will not work on a string that has already been created, like this...

   char c_str[10];
   c_str = "HELLO";

This is not allowed. Once the array has been created, each character will need to be assigned individually to its respective place, as we did further above (and repeat here)...

   c_str[0] = 'H';
   c_str[1] = 'E';
   c_str[2] = 'L';
   c_str[3] = 'L';
   c_str[4] = 'O';
   c_str[5] = '\0';

Aside from a function or two in string.h that solves this issue, there is, fortunately, another solution; the string literal. The string literal looks just like a pointer declaration (which it is), behaves as an array of characters, automatically adds a null terminator to the end, and can be accessed like an array. It looks confounding in that it can be used as such without apparently having had memory reserved for it, but it has one limitation. It is created in read only memory, so it cannot be accessed for the purpose of modification, like a true array can. Here's a string literal;

   const char *str_lit = "Hello";

It is probably best used as a temporary measure, as a local variable with no persistence, inside a function. In fact, using a string literal, it is completely possible to recreate the strcpy() function...

   void my_strcpy(char *str, const char *const_str)
   {
       int i = 0;
       do
       {
           *(str + i) = *(const_str + i);
           // str[i] = const_str[i]; // Either way is good.
           i++;
       }while(*(const_str + i) != '\0');
   }

You can use that function this way...

     my_strcpy(c_str, "Hello");

...like the string.h strcpy() function. Here is a screen shot of that program, in which "Hello" is taken as a string literal argument in the function, copied to c_str, and which is also testing within the main loop() that the null terminator is present in the final string output from the function...



And that is pretty much a fair starter or refresher for C style strings. Next time, text input and output, to start programming the text adventure, plus some more string manipulation.

All the best!