Sunday, September 2, 2018

Text Adventure Programming in C, Part 3

PARSING TEXT:

Once the processes for handling text strings and managing input with C are resolved and mastered, it is time to proceed to parsing the string. Parsing a string effectively breaks a single string containing several words into a number of single word strings. In a text adventure, these will be interpreted as commands, at the most basic level, in the syntax VERB and NOUN. Of course, slightly more advanced programming of the command interpretation can include ADJECTIVES, and even PHRASAL VERBS, like "switch off" or "talk to", for example. That is for another part of this blog, however. For now, the objective of this post is simply to be able to split the command line into single strings of its component words. Parsing.

Parsing a command string is relatively simple with the <string.h> function strtok(). This will be covered here. Also, as this blog is about learning and understanding some basic programming in C, I will develop a simple custom function that can be used for the same effect as a routine that uses strtok(), for the next post.

Here is a very quick example of using strtok() to parse a command string...

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
    int i = 0;
    char command[256];
    char split_command[3][256];
    int split_command_index = 0;
    int scanf_input = 0;

    scanf_input = scanf("%255[^\n]s", command);
    if(scanf_input)
    {
        do{}while(getchar() != '\n');
        char *token;
        token = strtok(command, " ");
        while(token != NULL)
        {
            strcpy(split_command[split_command_index], token);
            token = strtok(NULL, " ");
            split_command_index++;
            if(split_command_index > 2)
                break;
        }
    }
    else
    {
        printf("No input");
    }

    for(i = 0; i < split_command_index; i++)
    {
        printf("%s\n", split_command[i]);
    }

    return 0;
}



So, what is happening here?

First thing (in this case), I am establishing that the command string collected by scanf() is going to be split into a maximum of three words. They will be the first three words of the string, or as many words less than three as the original string has. The places for these words are reserved in the split_command string...

char split_command[3][256];

strtok() is used afterwards in a pretty standard way on the original command string, implementing space " " as the sole token parameter. Every time strtok() successfully collects a word from the command string, the word is copied with strcpy() to the active "slot" of the split_command array, and the split_command_index is incremented to the next element of the split_command array. If the split_command_index ever exceeds 2 (array elements 0, 1, 2, three words), it breaks out of the strtok() loop to ensure there is no overflow of the split_command array elements. 

That is basically all that is happening here. It succeeds in parsing the command string into individual words that can then be handled by the program to effect VERB/ADJECTIVE/NOUN style commands in the text adventure.

CASE SENSITIVITY:

That could effectively be the end of this post, but there is also another consideration. C is a case sensitive language. While we are here, we may as well address this issue.

Nothing is stopping the player from using upper and lower case characters during the input of the original command. The player is free to do this. The problem arises when the program has to interpret the command words and compare them to the basic commands established in the game coding. To the program "Go West" and GO WEST" would be two totally different and unrelated commands, unless the coder established something like this (pseudocode)...

if("Go West" or "GO WEST") then Move_Player_West;

Okay, it seems solved, but now what if the player input "go WEST"? or "gO WEst"? Yeah, that if statement is going to get mighty long. The root of it all is that "letters" (alphabetic characters) to the program are just numbers, and for console programs especially, conform to the ASCII encoding. Here is a non-extended ASCII code table for quick reference, though I would recommend looking at the extended table in the link...


ASCII code 65, for example, is "A". The lower case "a" is ASCII code 97. The program sees them as different characters because they are different numbers. When doing a comparison, "A" and "a" may as well be "f" and "w". Anyway, try it...

#include <stdio.h>
int main(int argc, char *argv[])
{
    printf("%c%c%c%c%c", 72, 101, 108, 108, 111);
    printf("%c", 255);
    printf("%c%c%c%c%c\n", 87, 111, 114, 108, 100);
    return 0;
}

If all of those perennial introductions to programming were done that way, we might learn something useful to us for the future right off the bat!

So. To simplify this, it is somewhat important to convert the commands to all upper or all lower case, so that the program can interpret the commands with a "standard" case. Solving this, however, is very easy with some functions out of the <ctype.h> header. Specifically, we would want to use isalpha(), islower(), and toupper().

All of these functions take a character as an integer argument; the ASCII code for that character. You can get away with just passing a character variable to the functions (they resolve themselves, as the character is interpreted as an ASCII code), but for the sake of being correct, you should really pass it an integer. This means making an extra integer variable and typecasting the character to it upon assignment as an integer.

Also, the functions work at a character level, so if you are seeking to convert a whole string to upper or lower case, you must iterate through each individual character of the string performing the exercise repeatedly until the end of the string.

Here is an example...

#include <stdio.h>
#include <ctype.h>

int main(int argc, char *argv[])
{
    int i = 0;
    unsigned int ch;
    char test_string[] = "thEse SHoULd aLl Be upPeR cASe";

    while(test_string[i])
    {
        ch = (unsigned int)test_string[i];

        if(isalpha(ch) && islower(ch))
        {
            test_string[i] = (char)toupper(ch);
        }

        i++;
    };

    printf("%s\n", test_string);

    return 0;
}

Problem solved.

BRINGING IT TOGETHER:

So, in previous posts, I have covered in some detail strings, input, parsing, and some conversion of alpha characters. It is time to bring it into one small program that will be useful for the text adventure. The parser is going to be a little bit more complex in future posts, as it is going to handle and make sense of the split command words, so in this example (where it only accomplishes the splitting of the command into individual words) it will be known as parser_stage_one.

Also, I am a fan of functions and of checking return values from them (void functions are for very exceptional cases). For a very basic program (that would improve readability and make more sense for beginner programmers), the error checking could be done away with and the separate functions either omitted (all processes in main()), or compressed into one function. Some more explanation of what is happening in the following program might be required, but I think not. I believe it is pretty self explanatory to even novice programmers, and it includes everything seen so far...

#include <stdio.h>
#include <ctype.h>
#include <string.h>

//CONSTANTS ------------------------------------------------
const int MAX_TEXT_BUFFER = 256;
const int MAX_COMMAND_ARRAY = 3;

//PROTOTYPES -----------------------------------------------
int command_input(char com[]);
int parser_stage_one(char com[], char split_com[][MAX_TEXT_BUFFER]);

//MAIN ----------------------------------------------------
int main(int argc, char *argv[])
{
    char command[MAX_TEXT_BUFFER];
    char split_command[MAX_COMMAND_ARRAY][MAX_TEXT_BUFFER];
    int number_of_words = 0;
    int i = 0;

    do
    {
        if(command_input(command))
        {
            if((number_of_words = parser_stage_one(command, split_command)))
            {
                printf("Number of words %d\n", number_of_words);
                for(i = 0; i < number_of_words; i++)
                {
                    printf("%s\n", split_command[i]);
                }
            }
            else
            {
                printf("GUESS WHAT? TEXT NOT INPUT\n");
            }

        }
        else
        {
            printf("TEXT NOT INPUT\n");
        }
    }while(strcmp("QUIT", command));

    return 0;
}

//FUNCTIONS ------------------------------------------------
int command_input(char com[])
{
    int input = scanf("%255[^\n]s", com);
    do{}while(getchar() != '\n');
    if(input)
    {
        int i = 0;
        unsigned int ch;
        while(com[i])
        {
            ch = (unsigned int)com[i];
            if(isalpha(ch) && islower(ch))
            {
                com[i] = (char)toupper(ch);
            }
            i++;
        };

    }
    return input;
}


int parser_stage_one(char com[], char split_com[][MAX_TEXT_BUFFER])
{
    int split_com_index = 0;
    char *token = strtok(com, " ");
    while(token != NULL)
    {
        strcpy(split_com[split_com_index], token);
        token = strtok(NULL, " ");
        split_com_index++;
        if(split_com_index >= MAX_COMMAND_ARRAY)
        {
            break;
        }

    };

    return split_com_index;
}



Okay, so, anyone who may read this and require a more in depth run through of the program so far, post and shout and I'll add some notes.

That would be all for now. Next time, an alternate version of this same program, up to this point.

All the best!

Sunday, July 15, 2018

Text Adventure Programming in C, Part 2

TEXT ADVENTURE PROGRAMMING IN C

TEXT INPUT:

So, the basics of C style strings were covered last time. Now it is time to look at how to input strings. Again, in C++, it is no big issue with STL, but is fraught with pitfalls to trip up the unwary with C. There are several ways to input text with C, using the stdio.h functions. There are also as many ways to cause buffer overflows, if the programmer is not careful.

The most basic function for text input is the multipurpose scanf function. However, it is not designed specifically and only for text. It accepts data from stdin, which in most cases is the keyboard. This data is read according to the "format" specified when the function is called, and stored in a variable, by address, also specified when calling the function. The data can be floating point, integer, character, or even a string. scanf receives some bad publicity, I have noticed, but it is not that bad, once it is understood, and remains a good option for a string input if handled properly. I will come back to it.

By far the favorite method for text input, I notice, is the fgets function, also in the stdio.h header. This initially may seem a bad option, because fgets is a function to read data from a file that was opened with fopen. However, fgets can be forced to accept input from the keyboard by using stdin (keyboard) as the "file stream" argument. Here is the prototype of fgets...

char *fgets(char *str, int n, FILE *stream)

What is liked about it is that "int n" part. It limits the number of characters that fgets reads into the string. If n is equal to the length of the string, it prevents fgets from causing buffer overflows in the string. Also, fgets will read spaces and include them in the string, so sentences can be input. scanf will (normally) only read a string up to one word long.

So, inputting something like "The large lake" with...

scanf("%s", c_str);

...will only accept the word "The" into the string, and will ignore the rest after the first space (note, &c_str, is not necessary when assigning to a string with scanf, as arrays are already "addresses").

fgets, with the same input, on the other hand...

fgets(c_str, 15, stdin);

...will accept the whole string: All 14 characters, including spaces, plus the null terminator. Ideal.

But there is, indeed, a potential problem. It is not always realized that when the user types in a line of text for input, a queue of characters is actually being made in the keyboard buffer. If you type in 20 characters and hit Return, yes, 14 of them get sucked into the string, but the last six stay in the keyboard buffer, waiting to get used the next time a command to accept strings or characters is invoked. This causes some pretty annoying problems, as part of the last input becomes the beginning of the next input, whether you wanted it or not. Try inputting "Barcelona is an historical city" into this program...

int main(int argc, char *argv[])
{
    char c_str[15];

    puts("Input a string (14 characters maximum, including spaces)");

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);


    return 0; 
}


What happened? You never got to do the second input. It assigned the 14 characters to the string (and invisibly added the terminator as fgets should)...

"Barcelona is a"

...then without waiting for another input from the user for the second fgets, grabbed as much as it could of the rest of the original input still in the keyboard buffer...

"n historical c"

...and even then, still left the characters "ity" in the keyboard buffer. If this is undesired behavior for your program, then clearly, that keyboard buffer needs to be flushed before the next input.

There is often a recommendation to use fflush(stdin)...

int main(int argc, char *argv[])
{
    char c_str[15];

    puts("Input strings (14 characters maximum, including spaces)");

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    fflush(stdin);

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    fflush(stdin);
}

This may or may not work, depending on the platform (it will work on Windows). However, fflush it is not designed for the purpose of flushing input streams. Look it up here. A better, universal solution is required for flushing the keyboard buffer of excess characters not taken by fgets. This is it...

do{}while(getchar() != '\n');

The loop, with getchar(), will keep reading and discarding characters from the keyboard buffer until it finds the newline at the end, and will discard that too. It is important to do this the way I have done it here, with the do{}while format, and not just...

while(getchar() != '\n');

...like I sometimes see recommended. Doing it this way would leave the '\n' in the buffer, which isn't desired, either. Here is the program done properly...

int main(int argc, char *argv[])
{
    char c_str[15];

    puts("Input strings (14 characters maximum, including spaces)");
    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    do{}while(getchar() != '\n');

    fgets(c_str, 15, stdin);
    printf("%s\n", c_str);

    do{}while(getchar() != '\n');
}

That solves the keyboard buffer problem, and for all intents and purposes could be the end of this post. It will certainly do for a text adventure input function. However, there are another couple of methods of text input in C that I want to look at, for the sake of broadening our options.

One solution that I like is to read the string one character at a time, inside a loop, using getch(). Now, getch() is a version of getchar() that automatically hits return itself after each character is input. It is contained in the conio.h header, which unfortunately is only available on the Windows platform, and there is a Linux version of getch() in curses.h, which would also allow this to be utilized on that platform. Here is a screen shot of the method...


What is important with this method are three things that must be considered;

1). If you try to type in text beyond the size of the string buffer, the loop will stop one short of the end of the buffer, assign a null terminator, and break out of the loop. That is, you were never in any risk of writing beyond the limits of the string and causing a buffer overflow.

2). If you hit Return before the end of the string buffer is reached, the program will assign the null terminator and break out of the loop.

3). Finally, if the user hits Backspace, the iterator (buffer_counter) for the string will move back a place (it is moved back two places because the loop, when it runs again, will move it forward one place). For the visual representation on the console, the putchar(' ') function will over-write the last character with a blank, then move the cursor back one place with putchar('\b'). This allows the user to edit the input effectively.

Nonetheless, for these blog posts on Text Adventures, I will be reverting to good old scanf. It is not as bad as the publicity it gets, so...

Let us hear it for scanf!

So scanf is rubbish? Avoid it at all costs? Well, if you use it incorrectly, maybe so. But it does work, and it works very well for this application, if you get the formatting right. Text adventures normally require a two word input; a verb and a noun. Sometimes, even three words might be acceptable, if you have programmed an option for an adjective in between the verb and the noun.

GO WEST

GET BLUE CARD

...might be examples of commands you would give in a text adventure. An improperly formatted scanf, like this...

scanf("%s", c_str);

...would only pick up GO and GET from each of those inputs, because it stops reading the input at the occurrence of the first white space. And, as we have already seen, it leaves the rest of the user input in the keyboard buffer, ready to ruin your next command input attempt. So, the first thing to force scanf to read spaces. It is done this way...

scanf("%[^\n]s", c_str);

To a beginner, that already looks scary, but it is not. Let us look at another reference to the scanf function...

http://www.cplusplus.com/reference/cstdio/scanf/

The specifier [^character] , form that reference, is a negated scanset. What does this mean? It alters scanf's default behavior so that it does not stop reading the input at the first white space, but at the character the programmer specifies in those brackets. So, if you wanted scanf to read your input into a string up to the first occurrence of the letter 'p' or 'P' (remember, case sensitive), you would do this as the format specifier...

"%[^pP]s"

If you want it to keep collecting characters, including white spaces, until you hit Return, then you would use the "new line" escape character...

"%[^\n]s"

This is all good, but it does not (yet) protect us from a buffer overflow. Fortunately, there is a way. Specify the number of characters to collect...

"%14s"

This will only pass 14 characters to a string once you hit Return, no matter how much you type into the console. Combine both methods, like this...

scanf("%14[^\n]s", c_str);

...and you can collect white spaces and avoid a buffer overflow. The rest of what was typed will stay in the keyboard buffer, and can be cleared out with the same...

do{}while(getchar() != '\n');

...that we have already looked at above. Here is a screen shot of a working program along these lines...


And that is what I will be using for text input for the rest of this text adventure section of the blog.

All the best!

Notes:

Just in case it was not clear in the example above, it is best to use the scanf() width parameter one less than the size of your character buffer (string length). This ensures that there will be space for the null terminator at the end. Like this...

char my_string[50];
scanf("%49[^\n]s", my_string);


Also, make use of scanf()'s return integer. This can help to catch situations where no characters were entered (for example, the user presses return accidentally, without having entered a command). If the string was entered, it will return 1. If nothing was entered, it will return 0.


int scanf_check = scanf("%49[^\n]s", my_string);

if(scanf_check == 0)
{
    printf("No input\n");
}
else
{
    printf("%s\n", my_string);
}

Useful, for a error tight text adventure.



Wednesday, July 11, 2018

Text Adventure Programming in C, Part 1

TEXT ADVENTURE PROGRAMMING IN C

First post of this new blog. Hello!

INTRODUCTION:

Previously, I had run a blog on programming in C++. It was, as this one will also be, a sort of diary of programming in which I listed my experiments. It was mostly for my benefit, for online self reference. However, I noticed several people learning to program used it for obtaining tips. One particular section, in fact, was quite popular; Text Adventures. These are, indeed, entertaining to program for a beginner in a language. Therefore, it is where I will start this current blog.

This blog is about the language C. There will be little reference to other languages. I am also going to write it without bogging down in jargon and technicalities in as much as that is possible, so that it remains a reasonably light-hearted reference / read through for general re familiarization purposes. The next few posts are going to be about text adventure programming with the C language. I expect everyone knows what text adventures are. So, with that said, straight to it...

It goes without saying, text adventures require text input! Now, where this was easy and reasonably protected in C++ with the Standard Template Libraries, particularly the STL string functions (not to be confused with "string.h"), it becomes a little bit more problematic in C. Some care is required to avoid potentially dangerous situations for the program, like buffer overflows. Personally, I like it; it teaches you to think about your program with a little more care than simply invoking some function that "does everything for you". C style strings can be a cause of huge headaches for people who have followed some C++ tutorial that only teaches the use of the STL string functions, and never covered what is really happening with strings. So, this first section of the Text Adventure in C posts is going to cover C style strings, and how to manipulate them in various ways that will not crash your program or produce unexpected characters in your strings.

C STRINGS:

A string is an array of characters. Here is one common way of how you can declare a string...

     char c_str[10];

This is an empty string of a fixed number of characters, ten of them, from positions 0 to 9. To be correct (and legal for the program) the last one must be NULL TERMINATED. That means, this string can hold a maximum of 9 characters, with a null terminator at the end. This is a null terminator "character", as you would write it in your programs...

   '\0'

The null terminator tells the program the string is "finished". It does not necessarily have to be in the last position holder of the character array (the string). But it should be somewhere in that array. The program will not try to read the string beyond it (unless you write code to try and force it to, which is not a good idea). For example, the string array declared above has 10 "places" for characters. Here is a word that could be held in that string...

   c_str[0] = 'H';
   c_str[1] = 'E';
   c_str[2] = 'L';
   c_str[3] = 'L';
   c_str[4] = 'O';
   c_str[5] = '\0';

HELLO. A five letter word. That is a perfectly legal (and the simplest) way to assign characters to the string. Note the null terminator at the end. The whole thing is of a "length" less than the 10 places reserved for the string. If you tried to assign more than 10 characters to this string, you would end up with a situation known as a buffer overflow...

   c_str[0] = 'Q';
   c_str[1] = 'U';
   c_str[2] = 'I';
   c_str[3] = 'X';
   c_str[4] = 'O';
   c_str[5] = 'T';
   c_str[6] = 'I';
   c_str[7] = 'C';
   c_str[8] = 'A';
   c_str[9] = 'L';
   c_str[10] = 'L';
   c_str[11] = 'Y';
   c_str[12] = '\0';

This is bad. Very bad. There was no memory reserved for characters 10 to 12 in the original declaration of the character array. They, essentially, are "invading" memory that the program may try and use for other things later. Even worse, several compilers will accept this situation without question, leading you to believe that everything is A-okay with your program at compile time, when it is clearly not.

The moral or the story; be mindful of this happening. Make sure that the size of your character array is enough for the number of characters you may ever want to put into it. Conversely, if there is a risk that a string may accidentally end up with more characters than it should during runtime, write code that will check and restrict it to the length of the string. And don't forget to include, in that code, a small procedure to add the null terminator at the end. These are things that C does not necessarily protect you from, as a programmer. Granted, some functions in the C header files for string manipulation do attempt to offer some protection, to an extent, but not all of them. Take care.

Now, note that in the example above, each letter (character) was assigned individually to its respective position in the array. Each character was surrounded by a pair of single quotation marks ('Q'). This symbology (tokens) was used to specify that what we were assigning was indeed a character. If double quotation marks are used, the program assumes that a complete string is being assigned. And indeed, such a complete, "ready made" string can be assigned to a character array at declaration, like this...

   char c_str[10] = "HELLO";

...or even...

   char c_str[] = "HELLO";

The first example will occupy six places of specified ten place long array (five for HELLO, and another for the null terminator). With this method, again, you have to be careful of avoiding a buffer overflow. The second example, which does not specify the length of the array (empty brackets) actually creates a character array of the correct and exact length for the string being assinged to it, that is, six places.

This method works at declaration time of the character array. It will not work on a string that has already been created, like this...

   char c_str[10];
   c_str = "HELLO";

This is not allowed. Once the array has been created, each character will need to be assigned individually to its respective place, as we did further above (and repeat here)...

   c_str[0] = 'H';
   c_str[1] = 'E';
   c_str[2] = 'L';
   c_str[3] = 'L';
   c_str[4] = 'O';
   c_str[5] = '\0';

Aside from a function or two in string.h that solves this issue, there is, fortunately, another solution; the string literal. The string literal looks just like a pointer declaration (which it is), behaves as an array of characters, automatically adds a null terminator to the end, and can be accessed like an array. It looks confounding in that it can be used as such without apparently having had memory reserved for it, but it has one limitation. It is created in read only memory, so it cannot be accessed for the purpose of modification, like a true array can. Here's a string literal;

   const char *str_lit = "Hello";

It is probably best used as a temporary measure, as a local variable with no persistence, inside a function. In fact, using a string literal, it is completely possible to recreate the strcpy() function...

   void my_strcpy(char *str, const char *const_str)
   {
       int i = 0;
       do
       {
           *(str + i) = *(const_str + i);
           // str[i] = const_str[i]; // Either way is good.
           i++;
       }while(*(const_str + i) != '\0');
   }

You can use that function this way...

     my_strcpy(c_str, "Hello");

...like the string.h strcpy() function. Here is a screen shot of that program, in which "Hello" is taken as a string literal argument in the function, copied to c_str, and which is also testing within the main loop() that the null terminator is present in the final string output from the function...



And that is pretty much a fair starter or refresher for C style strings. Next time, text input and output, to start programming the text adventure, plus some more string manipulation.

All the best!