Sunday, September 2, 2018

Text Adventure Programming in C, Part 3

PARSING TEXT:

Once the processes for handling text strings and managing input with C are resolved and mastered, it is time to proceed to parsing the string. Parsing a string effectively breaks a single string containing several words into a number of single word strings. In a text adventure, these will be interpreted as commands, at the most basic level, in the syntax VERB and NOUN. Of course, slightly more advanced programming of the command interpretation can include ADJECTIVES, and even PHRASAL VERBS, like "switch off" or "talk to", for example. That is for another part of this blog, however. For now, the objective of this post is simply to be able to split the command line into single strings of its component words. Parsing.

Parsing a command string is relatively simple with the <string.h> function strtok(). This will be covered here. Also, as this blog is about learning and understanding some basic programming in C, I will develop a simple custom function that can be used for the same effect as a routine that uses strtok(), for the next post.

Here is a very quick example of using strtok() to parse a command string...

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
    int i = 0;
    char command[256];
    char split_command[3][256];
    int split_command_index = 0;
    int scanf_input = 0;

    scanf_input = scanf("%255[^\n]s", command);
    if(scanf_input)
    {
        do{}while(getchar() != '\n');
        char *token;
        token = strtok(command, " ");
        while(token != NULL)
        {
            strcpy(split_command[split_command_index], token);
            token = strtok(NULL, " ");
            split_command_index++;
            if(split_command_index > 2)
                break;
        }
    }
    else
    {
        printf("No input");
    }

    for(i = 0; i < split_command_index; i++)
    {
        printf("%s\n", split_command[i]);
    }

    return 0;
}



So, what is happening here?

First thing (in this case), I am establishing that the command string collected by scanf() is going to be split into a maximum of three words. They will be the first three words of the string, or as many words less than three as the original string has. The places for these words are reserved in the split_command string...

char split_command[3][256];

strtok() is used afterwards in a pretty standard way on the original command string, implementing space " " as the sole token parameter. Every time strtok() successfully collects a word from the command string, the word is copied with strcpy() to the active "slot" of the split_command array, and the split_command_index is incremented to the next element of the split_command array. If the split_command_index ever exceeds 2 (array elements 0, 1, 2, three words), it breaks out of the strtok() loop to ensure there is no overflow of the split_command array elements. 

That is basically all that is happening here. It succeeds in parsing the command string into individual words that can then be handled by the program to effect VERB/ADJECTIVE/NOUN style commands in the text adventure.

CASE SENSITIVITY:

That could effectively be the end of this post, but there is also another consideration. C is a case sensitive language. While we are here, we may as well address this issue.

Nothing is stopping the player from using upper and lower case characters during the input of the original command. The player is free to do this. The problem arises when the program has to interpret the command words and compare them to the basic commands established in the game coding. To the program "Go West" and GO WEST" would be two totally different and unrelated commands, unless the coder established something like this (pseudocode)...

if("Go West" or "GO WEST") then Move_Player_West;

Okay, it seems solved, but now what if the player input "go WEST"? or "gO WEst"? Yeah, that if statement is going to get mighty long. The root of it all is that "letters" (alphabetic characters) to the program are just numbers, and for console programs especially, conform to the ASCII encoding. Here is a non-extended ASCII code table for quick reference, though I would recommend looking at the extended table in the link...


ASCII code 65, for example, is "A". The lower case "a" is ASCII code 97. The program sees them as different characters because they are different numbers. When doing a comparison, "A" and "a" may as well be "f" and "w". Anyway, try it...

#include <stdio.h>
int main(int argc, char *argv[])
{
    printf("%c%c%c%c%c", 72, 101, 108, 108, 111);
    printf("%c", 255);
    printf("%c%c%c%c%c\n", 87, 111, 114, 108, 100);
    return 0;
}

If all of those perennial introductions to programming were done that way, we might learn something useful to us for the future right off the bat!

So. To simplify this, it is somewhat important to convert the commands to all upper or all lower case, so that the program can interpret the commands with a "standard" case. Solving this, however, is very easy with some functions out of the <ctype.h> header. Specifically, we would want to use isalpha(), islower(), and toupper().

All of these functions take a character as an integer argument; the ASCII code for that character. You can get away with just passing a character variable to the functions (they resolve themselves, as the character is interpreted as an ASCII code), but for the sake of being correct, you should really pass it an integer. This means making an extra integer variable and typecasting the character to it upon assignment as an integer.

Also, the functions work at a character level, so if you are seeking to convert a whole string to upper or lower case, you must iterate through each individual character of the string performing the exercise repeatedly until the end of the string.

Here is an example...

#include <stdio.h>
#include <ctype.h>

int main(int argc, char *argv[])
{
    int i = 0;
    unsigned int ch;
    char test_string[] = "thEse SHoULd aLl Be upPeR cASe";

    while(test_string[i])
    {
        ch = (unsigned int)test_string[i];

        if(isalpha(ch) && islower(ch))
        {
            test_string[i] = (char)toupper(ch);
        }

        i++;
    };

    printf("%s\n", test_string);

    return 0;
}

Problem solved.

BRINGING IT TOGETHER:

So, in previous posts, I have covered in some detail strings, input, parsing, and some conversion of alpha characters. It is time to bring it into one small program that will be useful for the text adventure. The parser is going to be a little bit more complex in future posts, as it is going to handle and make sense of the split command words, so in this example (where it only accomplishes the splitting of the command into individual words) it will be known as parser_stage_one.

Also, I am a fan of functions and of checking return values from them (void functions are for very exceptional cases). For a very basic program (that would improve readability and make more sense for beginner programmers), the error checking could be done away with and the separate functions either omitted (all processes in main()), or compressed into one function. Some more explanation of what is happening in the following program might be required, but I think not. I believe it is pretty self explanatory to even novice programmers, and it includes everything seen so far...

#include <stdio.h>
#include <ctype.h>
#include <string.h>

//CONSTANTS ------------------------------------------------
const int MAX_TEXT_BUFFER = 256;
const int MAX_COMMAND_ARRAY = 3;

//PROTOTYPES -----------------------------------------------
int command_input(char com[]);
int parser_stage_one(char com[], char split_com[][MAX_TEXT_BUFFER]);

//MAIN ----------------------------------------------------
int main(int argc, char *argv[])
{
    char command[MAX_TEXT_BUFFER];
    char split_command[MAX_COMMAND_ARRAY][MAX_TEXT_BUFFER];
    int number_of_words = 0;
    int i = 0;

    do
    {
        if(command_input(command))
        {
            if((number_of_words = parser_stage_one(command, split_command)))
            {
                printf("Number of words %d\n", number_of_words);
                for(i = 0; i < number_of_words; i++)
                {
                    printf("%s\n", split_command[i]);
                }
            }
            else
            {
                printf("GUESS WHAT? TEXT NOT INPUT\n");
            }

        }
        else
        {
            printf("TEXT NOT INPUT\n");
        }
    }while(strcmp("QUIT", command));

    return 0;
}

//FUNCTIONS ------------------------------------------------
int command_input(char com[])
{
    int input = scanf("%255[^\n]s", com);
    do{}while(getchar() != '\n');
    if(input)
    {
        int i = 0;
        unsigned int ch;
        while(com[i])
        {
            ch = (unsigned int)com[i];
            if(isalpha(ch) && islower(ch))
            {
                com[i] = (char)toupper(ch);
            }
            i++;
        };

    }
    return input;
}


int parser_stage_one(char com[], char split_com[][MAX_TEXT_BUFFER])
{
    int split_com_index = 0;
    char *token = strtok(com, " ");
    while(token != NULL)
    {
        strcpy(split_com[split_com_index], token);
        token = strtok(NULL, " ");
        split_com_index++;
        if(split_com_index >= MAX_COMMAND_ARRAY)
        {
            break;
        }

    };

    return split_com_index;
}



Okay, so, anyone who may read this and require a more in depth run through of the program so far, post and shout and I'll add some notes.

That would be all for now. Next time, an alternate version of this same program, up to this point.

All the best!

No comments:

Post a Comment