lavandeira.net

Relearning MSX #34: Data types in MSX-C

Posted by Javi Lavandeira in How-to, MSX, Retro, Technology | November 13, 2015

SD Mesxes #15 cover. 34!

In the previous post we learnt how to display numbers in different notations. The output was different for each numeric base (for the convenience of the user of the program), but from the computer’s point of view it didn’t make a difference: all these values were stored and handled internally in exactly the same way.

Today we’re going to look at data types. These have an impact on how numbers (and characters) are stored in the computer’s memory, and how the computer handles the data during arithmetic operations, comparisons and function calls.

What we’re going to see in this post is extremely important for coding error-free programs. Try not to skip it even if it may seem boring (I’ll try and keep things interesting). If anything isn’t clear then ask in the comments below. I’ll be happy to help.

What are data types?

Let’s start with a small experiment. Imagine that we have two numbers written in hexadecimal, 0x9000 and 0x3000. If we print these numbers in decimal we see that 0x9000 is -28672 and 0x3000 is 12288:

(Click to enlarge)

The computer sees 0x9000 as a negative number and 0x3000 as positive. Therefore, if we compare them the result will be that 0x3000 is bigger, right?

Wrong:

(Click to enlarge)

How can this be? The first program told us that 0x9000 is negative, so 0x3000 being positive must be bigger than 0x9000. What’s going on here?

What’s happening is that MSX-C by default assigns the unsigned type to hexadecimal data.

How the compiler uses data types

In MSX-C integers are always 16-bit binary values. The same binary representation can be seen as two different numbers depending on whether it is being interpreted as a signed number (an int) or an unsigned value:

1001000000000000 (0x9000 in binary)

Treated as an int, the binary number above is -28672. The same binary number, treated as an unsigned represents the number 36864.

Let’s see how these two types behave, this time using a couple variables of each:

TYPES3.C (Click to enlarge)

This program defines two variables of type int (i1 and i2) and two variables of type unsigned (u1 and u2). To each pair we assign the same hexadecimal values (0x9000 and 0x3000).

When you run the program you should see this result:

(Click to enlarge)

Now we understand why the example program gave that strange result:

When seen as ints, 0x9000 is smaller than 0x3000
When seen as unsigned, 0x9000 is bigger than 0x3000

Next we’ll look at this from the computer’s point of view.

MSX-C’s three data types

MSX-C understands exactly three data types: int, unsigned and char. This is how they differ:

data_types_in_msx-c

Other compilers support additional data types in order to handle very small or very large real numbers in floating point notation (types float and double) and also very large integers (the long type). MSX-C supports natively just the three types above, although the MSX-C Library package includes functions to work with floats, doubles and longs (both signed and unsigned). We’ll study these in a future post.

The int data type

This is the most common data type in MSX-C. An int value is 16 bits long, so it occupies two bytes in the computer’s memory. The topmost bit indicates whether the value is positive or negative:

int_type

By default, the MSX-C compiler handles the following as int:

Decimal numbers between -32768 and 32767
Variables declared as int
Any data that’s preceded by the (int) cast operator

We haven’t seen the cast operator before. It consists on the name of a data type enclosed in parentheses, and it’s used in front of a value, a variable, or an expression to force the compiler to convert the value into the type inside the parentheses.

It’s easier to understand with an example. Look again at the second program in this post. We saw that hexadecimal values in the program are treated as unsigned by default. That’s why comparing 0x9000 and 0x3000 resulted in the program saying that 0x9000 is bigger. Using the cast operator we can force the compiler to treat these values as ints instead:

(Click to enlarge)

The unsigned data type

Like int, this type is 16 bits long. There is no sign bit, so all the bits are used to store the value. That’s why unsigned variables can’t store negative numbers.

unsigned_type

The MSX-C handles the following as unsigned by default:

Decimal numbers higher than 32767 (32768 to 65535)
Any data written in hexadecimal
Any data written in octal
Variables declared as unsigned
Any data that’s preceded by the (unsigned) cast operator

When you want to use unsigned data don’t forget to declare your variables as unsigned! Mixing up int and unsigned values will cause bugs that will be very difficult to track down. Look at this example:

(Click to enlarge)

This program is simple enough, right? There’s a for() loop that counts from 0 to 60000, printing every number on the screen one after the other. You may think it should run as expected, but it doesn’t. Go ahead and compile it yourself and confirm. Think for a few minutes and see if you can guess why this program doesn’t work. I’ll wait.

…

Were you able to figure it out? Here’s the reason: the variable i was declared as an int.

When the MSX-C compiler compares values of different types, one of them being a variable and the other a constant, the variable’s type always takes precedence. The condition in the loop (i <= 60000) contains an int variable (i) and an unsigned constant (60000). The variable’s type takes precedence, so the compiler automatically casts 60000 into an int, so the resulting code is this:

for (i = 0; i <= (int)60000; i++) {
    ...
}

Which is equivalent to this:

for (i = 0; i <= -5536; i++) {
    ...
}

Where does this -5536 come from? The unsigned value 60000 in binary is 1110 1010 0110 0000 (I added spaces for clarity). MSX-C takes the same binary value (1110 1010 0110 0000), but casts it into an int, which is a signed value. And 1110 1010 0110 0000 viewed as an int is -5536.

The loop exits immediately because of the way the compiler handles the loop condition: it substracts the variable from the constant value and looks at the result. If the result is not negative then the condition is true and the loop continues. If the result is negative then the condition is false and the loop ends.

This is an illustration of what should be happening:

i = 0 -> 60000 - i = 60000 -> 60000 is not negative -> condition is true
[…]
i = 10000 -> 60000 - i = 50000 -> 50000 is not negative -> condition is true
[…]
i = 60000 -> 60000 - i = 0 -> 0 is not negative -> condition is true
i = 60001 -> 60000 - i = -1 -> -1 is negative -> condition is false -> loop ends

However, this is what actually happens:

i = 0 -> -5536 - i = -5536 -> -5536 is negative -> condition is false -> loop ends

Because we mistakenly declared the variable i as an int, MSX-C casts the value 60000 into an int as well, so the loop is over even before it starts.

Fixing the problem is easy: just make sure to declare the variable as unsigned:

(Click to enlarge)

The char data type

char is another unsigned type. It only takes 8 bits (1 byte) in memory.

char_type

MSX-C handles the following as char by default:

Character constants
Variables declared as char
Any data preceded by the (char) cast operator

The other two numeric types in MSX-C (int and unsigned) both take two bytes in memory, while char data only takes one. Keep that in mind when thinking about the size of data structures (we’ll learn about that in a future post).

Some functions such as putchar() only accept parameters of type char and won’t work properly if we pass them a parameter of another type:

putchar(65); Error: "65" by itself is an int
putchar('A'+1); Error: the expression 'A'+1 is of type int because 1 is an int

This problem can also happen the other way around: many functions such as printf() take parameters of type int and won’t work properly if we pass them a char value:

printf("%d", 'A'); Error: printf() expects an int, but 'A' is of type char

In these three cases we can solve the problem by casting the value into the type the function needs:

putchar((char)65); Correct: prints the character A
putchar((char)('A'+1)); Correct: prints the character B
printf("%d", (int)'A'); Correct: prints the number 65 (character A's code)

Let’s put all this information in one place and summarize the default data types used by the compiler:

msx-c_default_data_types

Type precedence rules in MSX-C

We’ve seen that some data types have precedence over the others. Here’s the list of precedences, sorted from highest precedence to lowest:

msx-c_data_type_precedence

In other words, when we mix two or more data types in an expression, the types of the variables always take precedence over the types of the constants. On top of that, unsigned values take precedence over int values, and int values take precedence over char.

Summary

In this post we’ve seen very important concepts about the data types in MSX-C. If anything isn’t clear then read it again and don’t hesitate to ask in the comments below.

We’ve learnt about the unsigned, int and char data types: their value ranges and bit sizes. We’ve seen how MSX-C decides what data type to use during compilation. We have also learnt how to use the cast operator to explicitly convert data of one type into another type. Finally, we have seen what data types take precedence over the others in our programs.

In the next post…

In Relearning MSX #14 we talked about FPC.COM and how it can be used to check that we’re using the right types in our programs. Now that we’ve learnt about types we’re ready to take a deeper look at this tool and we’ll see a few examples of how we can use it to debug our programs

This series of articles is supported by your donations. If you’re willing and able to donate, please visit the link below to register a small pledge. Every little amount helps.

Javi Lavandeira’s Patreon page

Blog

Search

Categories

Archives

Relearning MSX #34: Data types in MSX-C

What are data types?

How the compiler uses data types

MSX-C’s three data types

The int data type

The unsigned data type

The char data type

Type precedence rules in MSX-C

Summary

In the next post…

Leave a Reply Cancel reply

Copyright