Blog

Relearning MSX #18: Structure of an MSX-C program

Posted by in MSX, Retro, Technology, Uncategorized | March 20, 2015

In the last post we saw a very brief example of command line parameters, and we learnt about input/output redirection and pipes.

This week we’re going to see the structure of a program written in C. This post is intended as an introduction for users who haven’t programmed in the C language before. If you have coded anything in C before then you can completely skip it.

A few characteristics of C programs

If you’ve been an MSX user for a long time then it’s likely that you’ve seen lots of MSX BASIC programs. If that’s the case, then you’ll immediately notice several differences:

msx-c_vs_basic

C program (left) and MSX BASIC program (right). Click to enlarge.

C programs are written mostly in lower case

All keywords and preprocessor directives in C are written in lowercase. The convention is to write variable names in lowercase too. Nothing is stopping you from defining your own functions and variables in uppercase if you  want, but that’s rare.

C programs don’t have line numbers

In BASIC we jump to different parts of the program using the instructions GOTO or GOSUB and specifying a line number. In C we use control structures such as while, for and do, so line numbers aren’t necessary.

It’s up to you how much whitespace you want to use

There are recommendations on how source code should be written in order to make it easier to understand, but at the end of the day how you code is up to you. You can use indentation and whitespace to make your code look cleaner and easier to follow, or you can go the opposite route in order to make the source more compact (though this won’t affect the final binary) or make it harder for others to understand. The important thing to remember is that we must at least use a space to separate alphanumeric elements (keywords, function and variable names, constants, etc).

See these two listings for example:

putline_indented

putline() function with newlines and tabs for indentation. Click to enlarge.

putline_compact

The same putline() function in a single line. Click to enlarge.

Both of these are exactly the same putline() function from chapter #17. The first version uses a different line for each expression in the program, and also uses indentation and spaces to make the program structure easier to follow. The second version removes all newlines and whitespace except for the spaces in char ch and int n, because these are needed for the program to be correct.

Some people take this to the extreme. There’s even a yearly contest (IOCCC) that awards the most obfuscated C programs submitted by programmers from around the world. As an example, the image below is a completely valid program from the 2013 contest (click on the image to open the original source code in a new window):

obfuscated1

endoh2.c from the 2013 IOCCC

In short, write your programs in whichever way is most comfortable for you, but if you expect other people to be able to read and understand your code then indent properly.

C statements end with a semicolon

In BASIC we can put one statement per line, or we can enter several statements in a single line separing them with colons.

In C we use semicolons to mark the end of a statement, and we can split expressions and function calls into several lines to make them easier to read. The two code fragments below are equivalent:

c_newline

c_newline2

A C program is a collection of functions

Let’s look again at the TRIANGLE.C program from chapter #17:

example_c

The TRIANGLE.C program from chapter #17. Click to enlarge.

This program defines two functions putline() and main(). These form the whole program. In C there is no code outside of a function.

In this example there are calls to other functions like putchar() and atoi() as well. We’ll see soon where these come from.

Programming in C means defining functions

In C each function is an execution unit. C functions are similar to BASIC subroutines, but they’re called by name and they receive parameters inside parenthesis after the function name.

For example, the putline() function we’ve seen accepts a character ch and a number n as parameters, and it prints the character ch on the screen n times. In BASIC this would be done like this:

CH$="A":N=10:GOSUB 1000

The equivalent call in C would be:

putline('A', 10);

The C syntax is more compact and easier to understand compared to BASIC. Calling the function by name instead of line number makes a huge difference because it allows us to understand immediately what the function does.

Library functions are provided by the system

The TRIANGLE.C program defines two functions putline() and main(). It also uses putchar() and atoi(). The last two are known as library functions, because they are provided with the compiler.

One thing to note is that these library functions aren’t especial. They’re just functions defined in some other source file and compiled into a relocatable library file for convenience (CLIB.REL in the case of MSX-C).

Every C program starts in the main() function

Unlike a BASIC program, which always starts in the first line of code, a C program always starts in the main() function, regardless of where it is in the file(s) containing the source code. Let’s see an example. Type this program and save it with file name ABB.C:

abb_c

ABB.C. Click to enlarge.

This program defines five functions:

  • b_bb()
  • bb()
  • main()
  • b()
  • abb()

It’s a bit hard to follow the flow because almost every function calls or is called by some other function:

  • b_bb() calls b(), putchar() and bb()
  • bb() calls b() and putchar()
  • main() calls abb(), b_bb(), bb() and b_bb()
  • b() calls putchar()
  • abb() calls putchar() and bb()

Here’s a diagram of who calls who in ABB.C:

abb_diagram

You can see that main() calls almost every other function, but no other function calls main(). If you compile and run the program you should get the following output:

abb_output

Output of the ABB.COM program. Click to enlarge.

If the program started execution in the first function, b_bb(), then we would expect the output to be “b bb…” instead of “abb b..“. We can see that the main() function gets executed first even though it’s not explicitly called anywhere in the program.

This is what makes the main() function especial: there is always one (and only one) main() function in every C program, and the code will start executing from it.

By the way, the ABB.C program has a line we hadn’t seen before:

declarations

Function declarations in the ABB.C program. Click to enlarge.

That line contains function declarations.

In C we have to declare symbols (functions, variables, constants) before using them

Take a look at the code for the b_bb() function. It calls the b() and bb() functions, but these are defined further below in the program. When the compiler sees the definition for b_bb() it still doesn’t know about b() and bb().

The line just before the b_bb() definition declares the names of the functions that will be defined later in the file, so the compiler knows that whenever it sees the name of one of these yet undefined functions it shouldn’t return an error.

We could also reorder the functions in the source code so they won’t call any function that hasn’t been defined yet: b() -> bb() -> b_bb(), abb() -> main(). In this particular case we wouldn’t need the declaration line, but sometimes this can’t be avoided, as is the case when we have two functions that call each other.

One thing to note is that the function declarations don’t add any overhead to the program. It just tells the compiler not to generate any error when it encounters calls to these functions. There isn’t even any need to actually use the declared functions.

Consider this small example:

foobar

FOOBAR.C. Declaring two unused functions. Click to enlarge.

This example declares two functions foo() and bar() that are never used in the program. This isn’t an error and it won’t add any overhead to the compiled program.

stdio.h contains the declaration for all the library functions

At this point you may be wondering about how we’ve been able to use the putchar() function without declaring it anywhere. Is there anything especial about library functions that makes declaring them unnecessary?

The answer is no. Library functions must be declared as well, but their declarations are already available in the header files provided with the compiler, in this case stdio.h. That’s why we have this line at the very beginning of our programs:

include

The #include line at the beginning of all our programs. Click to enlarge.

Go ahead and open the file B:\INCLUDE\STDIO.H in AKID and scroll down a bit. You’ll find the declaration for putchar() in there, together with many other functions available in the MSX-C standard library:

stdio_h

Part of the MSX-C STDIO.H. Click to enlarge.

If you look at the end of STDIO.H you’ll see that it also #includes a bunch of other header files that contain even more function declarations.

When the compiler (technically, the preprocessor) sees an #include line it replaces this line with the contents of the file specified.

Variables must also be declared before use too

It’s not just functions that must be declared: we have to declare every variable so the compiler knows what kind of data each variable will hold. This also causes the compiler to reserve enough memory to hold the value. We’ve seen this already in the TRIANGLE.C program:

variable_declarations

Variable declarations in the TRIANGLE.C program. Click to enlarge.

Declaring variables is very, very important because C is very strict about the kind of data each variable holds. We’ll see more about his in future posts.

Comments start with /* and end with */

Lastly, in C programs comments start whenever the compiler sees the tuple /* (slash-asterisk) outside of a quoted string, and end with the tuple */ (asterisk-slash).

Comments can be anywhere (even in the middle of a statement or expression) and are treated as whitespace. They can spawn several lines (often will, especially when commenting out blocks of code). Some examples:

comment1

Comment between an if clause and an statement. Click to enlarge.

comment2

Comment spanning a few lines. Click to enlarge.

comment3

Another multi-line comment, this time taking advantage of the asterisks to create a nice text header. Click to enlarge.

In MSX-C comments can also be nested inside other comments:

comment4

A nested comment in MSX-C. Click to enlarge.

In the example above the comment after the monster() call is nested inside another comment. This will happen often when commenting out blocks of code during development. This is supported by default by MSX-C (though this behaviour can be disabled), but many compilers don’t accept nested comments. You don’t need to worry about this for now, since we’re going to be working with MSX-C.

In the next post…

We’re going to make things a bit more interesting. In the next post we’ll put MSX-C away for a little while, and we’re going to start playing with assembly language under MSX-DOS. Once we know enough to do basic stuff we’ll see how to integrate functions written in assembly into MSX-C.


This series of articles is supported by your donations. If you’re willing and able to donate, please visit the link below to register a small pledge. Every little amount helps.

Javi Lavandeira’s Patreon page

4 comments on “Relearning MSX #18: Structure of an MSX-C program

  1. Pingback: Relearning MSX #18: Structure of an MSX-C program | Vintage is the New Old

  2. Gregory on said:

    I just finsihed this part 18 in the series.
    It has been an amazing experience. Thank you for taking your time to publish al this information and explaining it in such a comprehensive way. Hope the next tutorial will be coming soon.

    Greetings,

    Gregory

  3. Francesc on said:

    Wow… for me it’s super weird to see function declarations like:

    b_bb(), bb(), main(), b(), abb();

    I will have to get a copy of K&R first edition (pre ANSI). There’s no declaration type for those functions!
    I would have expected something like:

    int b_bb(), bb(), main(), b(), abb();

    • Yes, it feels weird when we’re used to modern C. The functions’ return types are implied (ints, I think), and the parameters aren’t declared because the compiler doesn’t do parameter checking.

      Those were interesting times. :-)

Leave a Reply to Francesc Cancel reply

Your email address will not be published. Required fields are marked *