It’s been a looong while since the last post. Almost half a year. That’ll teach me to set deadlines for stuff I do in my free time.
Let’s continue where we left it in the last post: we’re going to start learning how to code in assembly language. It will be very handy even when working with C because often we will need to understand what the computer is doing behind the scenes. Plus, it’s actually very simple to learn (though mastering it and learning the underlying hardware will take some more time.)
In this series of articles we’re going to follow the book MSX-DOSアセンブラプルグラミング (MSX-DOS Assembler Programming) by Tetsuya Kageyama, published by ASCII Publications in 1988:
Without further ado:
Assemblers and the assembly language
When reading a book on machine code, we often read sentences like this:
…to create programs in machine code we write an assembly language program and we use a tool called an assembler.
Sometimes the meaning of this isn’t obvious to everybody. What does the assembler do? What kind of programming language is assembly?
Let’s try and answer these questions.
What’s an assembler?
The only language the computer understand is machine code. However, machine code is just numbers and it doesn’t make much sense to us humans. It would be nice to have a programming language that is easier for us to understand than machine code, but that still relates to the instruction set of the CPU we’re programming for (in MSX machines the very popular Z80 CPU.) That language is assembly, and the assembler is the tool that translates (assembles) our assembly programs into machine code:
Assembly language instructions hold a very close relationship to the CPU’s machine code instructions. These assembly instructions are called mnemonics, and each of them corresponds to a machine code instruction in the CPU’s instruction set.
Summarizing, when we’re writing an assembly language program what we’re doing is writing mnemonics, and the assembler is translating these into the corresponding machine code instructions for the CPU.
Assembly language versus other programming languages
What makes assembly different from other programming languages? What are its pros and cons with respect to other languages? Let’s compare it with BASIC and C and see in that aspects they differ.
BASIC shouldn’t need any introduction by now. It’s the language you can program right away as soon as you turn on your MSX. The manuals that came with the computer used to describe MSX-BASIC in detail. Computer magazines from the late 70s to the early 90s published pages and pages of BASIC programs that you had to spend hours entering on the computer in order to run them. This is how many of us became familiar with our machines during our childhood:
MSX wasn’t the only platform that came with BASIC out of the box. Most other 8-bit platforms booted directly into some kind of BASIC interpreter where you could start writing programs immediately.
The C programming language was developed in 1972 at Bell Labs for use in the PDP-11 minicomputer. It became very popular and was ported to many platforms, including MSX. Even today C is one of the most widely used languages, either in its original form or in one of the versions derived from the original: Objective-C, C++, C# and others.
Let’s compare the steps involved in running a program written in each of these languages.
When we load a BASIC program in the computer (whether by typing it, or by loading it from tape or disk) and enter the RUN command, another program inside the computer examines one instruction in the BASIC program, processes it to understand its meaning, executes it, and then proceeds to the next instruction to do the same process, again and again, until the program ends.
In other words, when a BASIC program runs it is examined line by line as it runs. This is why BASIC programs are slower. However, it has the advantage that it’s easy to write and test BASIC programs because they can be run immediately. Also, they’re being checked for errors as they’re running, so if any errors are found then the program will stop and will tell us where the error happened.
Languages that run like BASIC (being examined and interpreted during actual run time) are called interpreted languages, and the program in charge of examining and running the code is called the interpreter. Other modern interpreted programming languages are Python, Perl, PHP and Ruby.
As explained earlier, before we can run them, programs written in assembly language have to be converted to machine code by an assembler. The assembler takes care of examining the code during this process, so there’s no need to examine the code every time we run it. This makes programs run extremely fast. The downside is that while the assembler can check that the assembly code is syntactically correct, it can’t check the program logic. It will catch misspelt mnemonics and symbols, but it won’t catch programming errors such as infinite loops. A BASIC program that contains errors will stop execution and let us fix the problem, but a machine code program will most of the time completely hang the computer, requiring a reset (and losing any unsaved work we may have done.)
What about C? Like assembly, C programs also require an intermediate step before we can run them. A compiler examines the C code and outputs a machine code program that we can run directly on the computer:
There’s still the risk that the C program contained logic errors that will make it crash during execution. However, programs written in C run much faster than BASIC programs because they’re not examined and interpreted while they run. The compile process happens only during development. The resulting program is often very close in speed to programs written directly in assembly language.
Another difference between assembly and other languages is the relationship between the language instructions and the CPU. When programming in assembly there’s a one-to-one relationship between a mnemonic and its corresponding machine code representation. We can even say that assembly language is machine language, and we wouldn’t be completely wrong. This is not the case with interpreted or compiled languages. One instruction in C or BASIC usually corresponds to many machine code instructions. We call these high-level languages.
Next in the pipeline…
In the next article we’ll examine a small assembly program and its resulting machine code output. We’ll see a few assembly language instructions and pseudoinstructions for the assembler. We’ll also learn about symbols
This series of articles is supported by your donations. If you’re willing and able to donate, please visit the link below to register a small pledge. Every little amount helps.