An Introduction to MorphOS PPC Assembly
From MorphOS Library
Jump In The Deep End And Compile Something Right Now
You could copy and paste but just entering the two lines below into a suitable text editor is probably quicker.
Save as 'shorty.p'.
.text is an assembler directive - more about it later. blr is one of many branch instructions. This particular one means 'Branch to Link Register'. When a program terminates, this is the last instruction to be executed and it causes the sequence of instruction execution to pass back to the calling program or environment (shell, Ambient etc). Subroutines can also terminate with this instruction but, again, more about this later.
This is as good a time as any to ask if you've installed vbcc yet - if not, go and do it. Download vbcc
Once vbcc is installed, open a shell window and change directory to where shorty.p is saved.
vasmppc_std -Felf -o ram:shorty.o shorty.p
This tells vasm to generate an elf formatted object file called shorty.o in ram: from the shorty.p source file.
Hello? World's Shortest MorphOS Program - Or Is It?
There are times when a linker is needed to further process an object file to generate an executable - this isn't one of those times.
Go ahead and enter this into the shell window:
It will appear as if nothing happens - but at least it happens very quickly. However, something did happen. shorty.o was identified as an executable program and loaded into memory where the cpu instruction execution sequence passed to it. As mentioned above, the single blr instruction in this program caused the instruction execution sequence to pass directly back to where it was called from - in this case, the shell.
It may be of interest to know that all PPC instructions are four bytes long. In PPC nomenclature, this length is known as a word. But let's take a moment to have a look at the size of shorty.o
Enter this into the shell window:
2660 bytes?! But this is typical for object files as they contain much additional information that is used in the process of debugging and linking. Linking, particularly with a few key options, will generate an executable that is noticeably smaller.
Try entering this into the shell window:
vlink -b elf32morphos -s -x -o ram:shorty ram:shorty.o list ram:shorty
The newly generated executable is 328 bytes - much smaller than it's object file but what are those other 324 bytes doing there? They are the ELF 'container' that holds the single, four byte instruction of shorty. ELF stands for Executable and Linkable Format and it is used by MorphOS, AmigaOS4 and AROS - not to forget, Linux, UNIX, BSD and even video game consoles. There is abundant information about this file format online but this topic will be discussed a little further when the use of the objdump program is introduced.
The Old And The New
Generally, in order to make a program that does something useful or to at least produce an observable result, it is necessary to use operating system library functions. Despite the markedly different cpu family that MorphOS runs on compared to the 68k family used by the classic AmigaOS, MorphOS is largely compatible with AmigaOS. This compatibility is reflected in a very similar API shared with AmigaOS and extends to using a MorphOS system structure as if it were the internal data and address registers of an actual 68k processor. This is called the EmulHandle structure and it is always available through the PPC GPR2 register. Also, as with 68k AmigaOS, the address of the Exec library base is always to be found at memory location 4. Many aspects of the AmigaOS API fit closely to the features available in the 68k cpu and the following code comparison should illustrate how it is echoed in the PPC MorphOS API.
A common task performed during the initialisation of many programs - opening the Dos Library. It is a particularly simple example of library usage but most functions follow this form.
@ha & @l are needed to specify the high and low halfwords of 32 bit immediate values because the fixed 32 bit size of PPC instructions does not allow enough space for an instruction opcode and additional 32 bits of data.
One could be forgiven for thinking that the PPC code snippet looks a little ungainly compared to the former. Whereas every PPC instruction is four bytes long, 68k instructions can be as little as two bytes but up to as many as ten. A single 68k instruction can load a value from a 32 bit memory address specified by one instruction operand and store it at another 32 bit memory address specified by the second instruction operand. The 68k can also perform other operations beyond simple loads and stores directly on memory. Or at least appear to... In truth, computer memory only sends and receives data - no other data processing (like adding, subtracting & etc) occurs in memory. While the 68k instruction add.l #$12345678,(a0) appears to add the immediate value of it's first operand to whatever may already be stored at the address pointed to by a0, the contents of that address are actually loaded from memory into a private work register, the addition is performed and then the result is stored back to the same memory location. So this instruction actually performs two memory accesses where it might appear that there was only one. Contrast this with PPC assembly programming where memory loads and stores are all done explicitly. Before data can be operated upon it must be loaded from memory into a GPR (General Purpose Register), zero (in the case of a simple memory copy) or more operations can then be performed and then the data may be stored back to memory.
Unless you are brand-spanking-new to the topic of PPC assembly, you will already know that the PPC has 32 GPRs - r0 through r31. But did you know that the desired use of these registers is set out in something called the System V.4 ABI which MorphOS adheres to?
r0 - volatile r1 - stack pointer r2 - for system use (with MorphOS, r2 points to the EmulHandle structure) r3 - initialised with a pointer to a Dos command buffer r4 - initialised with the length of the Dos command buffer r5 - initialised with a pointer to an ELF structure r3 ... r10 volatile & can be used to pass function arguments. If more arguments are required, the stack is used. r11 & r12 - volatile r13 - small data area pointer. If this register is needed by a function or subroutine, it must be saved first and then restored before returning to where it was called from. r14 ... r31 - No predefined purpose. If these registers are needed by a function or subroutine, they must be saved first and then restored before returning to where it was called from.
The above use of the word 'volatile' means that functions are not expected to preserve the contents of these registers. r1 & r2 are the only registers that must be restored to their initial values by a terminating program, however, while most programs will modify r1 and later restore it, it is best to have not modified r2 in the first place.
This System V.4 ABI also sets out a particular way for programs and subroutines to organise their stack frames. When a program is loaded into memory and the instruction execution sequence passes to it, the stack pointer is still pointing to the calling program's stack frame and there is also an important address stored in the Link Register at this time. Unless this program is very simple (like shorty) it must save the Link Register address and, most likely, create it's own stack frame. Somewhat conveniently, there is a special position in the caller's stack frame that a callee program can save the contents of the Link Register to. This is a fairly common example of a program or subroutine's very first instructions.
Uh indeed... Let's try to visualise some memory being used as stack space.
Note that stwu r1,-8(r1) creates a two-word stack frame which is the smallest stack that a program or subroutine can have if it, in turn, calls another program, subroutine or library function. More often, a larger stack frame is created although, stwu r1,-4(r1) could be used by a program or subroutine to create a one-word stack frame but this would be redundant and such a program must not call any other program, subroutine or library function. Such a program would not need to store the value in the Link Register (LR) to prevent it from being overwritten by subsequent calls and can terminate and return to it's caller with a simple blr instruction - just like shorty does.
A further note about larger stack frames and appropriate stack sizes: For reasons relating to PPC architecture, it is a good idea to choose stack frame sizes that are multiples of 16 bytes.
This is a less common example of a program's first instructions but it may better illustrate how to use stack frames.
Let's jump ahead and look at the last few instructions involved in program termination that would 'undo' the above instructions.
A Little Less Talk And A Little More Action Please
It's time to compile another program - this one will actually do something.
Not another 'Hello World' program... I'm afraid so. This time, copy and paste is probably quicker. A fully commented version will be made available for download soon.
Note that this example does not use the MorphOS SDK. Instead, some 'quick and dirty' methods are used for the sake of simplicity and readability.
# Various library function offsets .set _LVOOpenLibrary,-552 .set _LVOCloseLibrary,-414 .set _LVOVPrintf,-954 .set _AbsExecBase,4 # EmulHandle structure (always pointed to by r2) .set reg_d0,0 .set reg_d1,4 .set reg_d2,8 .set reg_d3,12 .set reg_d4,16 .set reg_d5,20 .set reg_d6,24 .set reg_d7,28 .set reg_a0,32 .set reg_a1,36 .set reg_a2,40 .set reg_a3,44 .set reg_a4,48 .set reg_a5,52 .set reg_a6,56 .set reg_a7,60 .set EmulCallDirectOS,100 # Stack frame offsets .set stack_pos0_caller_stack,0 .set stack_pos1_callerLR,4 .set stack_pos2_ExecBase,8 .set stack_pos3_DosBase,12 .set new_4_word_stack,16 .text mflr r0 stw r0,stack_pos1_callerLR(r1) stwu r1,-new_4_word_stack(r1) lis r3,dosName@ha addi r3,r3,dosName@l stw r3,reg_a1(r2) li r3,0 stw r3,reg_d0(r2) li r3,_AbsExecBase lwz r3,0(r3) stw r3,stack_pos2_ExecBase(r1) stw r3,reg_a6(r2) li r3,_LVOOpenLibrary lwz r0,EmulCallDirectOS(r2) mtctr r0 bctrl cmpwi r3,0 beq exit stw r3,stack_pos3_DosBase(r1) lis r4,string1@ha addi r4,r4,string1@l stw r4,reg_d1(r2) li r4,0 stw r4,reg_d2(r2) stw r3,reg_a6(r2) li r3,_LVOVPrintf lwz r0,EmulCallDirectOS(r2) mtctr r0 bctrl lwz r3,stack_pos3_DosBase(r1) stw r3,reg_a1(r2) lwz r3,stack_pos2_ExecBase(r1) stw r3,reg_a6(r2) li r3,_LVOCloseLibrary lwz r0,EmulCallDirectOS(r2) mtctr r0 bctrl li r3,0 exit: addi r1,r1,new_4_word_stack lwz r0,stack_pos1_callerLR(r1) mtlr r0 blr .rodata __abox__: dosName: .string dos.library string1: .string "Hello World\n"
Save this source file as HelloWorld.p and open a shell window. Change directory to where HelloWorld.p was just saved and enter:
vasmppc_std -Felf -o ram:hw.o HelloWorld.p
Once again, because of the simplicity of this program, linking isn't necessary so just enter the following in the shell window:
Are you impressed? Feel free to modify, experiment with and improve this source code. A not-too-difficult challenge might be to change this program so that it prints the arguments given to it in the shell window - a number of small but significant changes would be needed to do this.
When choosing which registers to use in your own programs, be aware that use of r0 in some instruction operands will not always work as you might expect. In these instructions, the actual content of r0 is ignored and the result is based on the constant value zero instead. For example, imagine that r0 contains the value 100 and then this instruction is executed: addi r0,r0,50 It looks like the result should be r0 = r0 + 50 = 150 The result will actually be r0 = 0 + 50 = 50 This may seem odd but, as long as the programmer is aware of it, it can be useful. A good PPC instruction reference manual will explain this, and many other things, in much greater detail. There are many of these reference documents available on the internet - this is one of them MPCFPE32B.pdf
MorphOS SDK, Objdump, ELFs & Sections
The installer script of the latter seems to expect that there is a pre-existing assign for include: and, if your system already assigns include: to somewhere, please skip over this next part.
What follows is my S:user-startup after the installation of vbcc, it's MorphOS compiler target and an additional assign that I made preceded by this comment -
- vbcc MorphOS target needs include
- to be assigned to somewhere
; ; MorphOS user-startup ; ; This script is executed on system boot by ; startup-sequence. You can make personal ; changes in here. ; ; $VER: user-startup 1.1 ; ; Enable the following to mount the inet-handler. Note that TCP: allows ; easy access to internet, and allows scripts to listen for incoming ; connections. Some malware could abuse this. ;Mount TCP: ;BEGIN vbcc assign >NIL: vbcc: mos26:vbcc assign >NIL: C: vbcc:bin ADD setenv VBCC vbcc: ;END vbcc ;vbcc MorphOS target needs include: to be assigned to somewhere assign include: vbcc:targets/ppc-morphos/include ;BEGIN vbcc-ppc-morphos assign >NIL: vincludemos: vbcc:targets/ppc-morphos/include assign >NIL: vincludemos: include: add assign >NIL: vlibmos: vbcc:targets/ppc-morphos/lib ;END vbcc-ppc-morphos
As mentioned just before the previous source code example, some quick and dirty methods were used for the sake of simplicity and readability and this refers to the way library function offsets were established. This approach will quickly become detrimental as more offsets are added. There are a number of solutions to this problem but only one will be presented here and it involves the removal of the leading underscore character from function names and appending '@l' to the end so that, for example, li r3,_LVOOpenLibrary becomes li r3,LVOOpenLibrary@l. Assemble with vasm as before but note that the resulting object file is no longer executable. To make this object executable, vlink is needed.
vlink -b elf32morphos -o <desired executable name> <existing object name> -lamiga
In the above shell command, -lamiga refers to the libamiga.a file that vlink knows to look for in vlibmos: This file contains information used to resolve many library function names to numerical values. Note that any executable generated from the above vlink command will contain a lot of symbol information that, while potentially useful, increases the size of the executable. This can be avoided with a few additions to the above command.
vlink -b elf32morphos -s -x -P__abox__ <desired executable name> <existing object name> -lamiga
-s strip symbols from the output file
-x discard local symbols from input file(s)
-P<symbol> preserve this symbol
For more information, please refer to the documentation for vlink and other programs installed with vbcc.
At this point it would be useful to have a number of freshly generated object files and executables to look at with objdump although any ELF can be used for this purpose.
final part yet to come