Difference between revisions of "An Introduction to MorphOS PPC Assembly"
From MorphOS Library
m (Minor additions and other small edits.) |
m (Fixed bad link to source archive.) |
||
(14 intermediate revisions by the same user not shown) | |||
Line 22: | Line 22: | ||
</pre> | </pre> | ||
− | This tells vasm to generate an ELF formatted object file called shorty.o in ram: from the shorty.s source file. | + | This tells vasm to generate an ELF formatted object file called ''shorty.o'' in ram: from the ''shorty.s'' source file. |
==Hello? World's Shortest MorphOS Program - Or Is It?== | ==Hello? World's Shortest MorphOS Program - Or Is It?== | ||
Line 34: | Line 34: | ||
</pre> | </pre> | ||
− | It will appear as if nothing happens - but at least it happens very quickly. However, something did happen. shorty.o was identified as an executable program and loaded into memory where the cpu instruction execution sequence passed to it. As mentioned above, the single '''blr''' instruction in this program caused the instruction execution sequence to pass directly back to where it was called from - in this case, the shell. | + | It will appear as if nothing happens - but at least it happens very quickly. However, something did happen. ''shorty.o'' was identified as an executable program and loaded into memory where the cpu instruction execution sequence passed to it. As mentioned above, the single '''blr''' instruction in this program caused the instruction execution sequence to pass directly back to where it was called from - in this case, the shell. |
− | It may be of interest to know that all PPC instructions are four bytes long. In PPC nomenclature, this length is known as a word. But let's take a moment to have a look at the size of shorty.o | + | It may be of interest to know that all PPC instructions are four bytes long. In PPC nomenclature, this length is known as a '''word'''. But let's take a moment to have a look at the size of ''shorty.o'' |
Enter this into the shell window: | Enter this into the shell window: | ||
Line 44: | Line 44: | ||
</pre> | </pre> | ||
− | 2664 bytes?! But this is typical for object files as they contain much additional information that is used in the process of debugging and linking. Linking, particularly with the -s 'strip all symbols' option, will generate an executable that is noticeably smaller. | + | 2664 bytes?! But this is typical for object files as they contain much additional information that is used in the process of debugging and linking. Linking, particularly with the '''-s''' 'strip all symbols' option, will generate an executable that is noticeably smaller. |
Try entering this into the shell window: | Try entering this into the shell window: | ||
Line 54: | Line 54: | ||
</pre> | </pre> | ||
− | The newly generated executable is 328 bytes - much smaller than | + | The newly generated executable is 328 bytes - much smaller than its object file but what are those other 324 bytes doing there? They are the ELF 'container' that holds the single, four byte instruction of ''shorty''. ELF stands for Executable and Linkable Format and it is used by MorphOS, AmigaOS4 and AROS - not to forget, Linux, UNIX, BSD and even video game consoles. There is abundant information about this file format online but this topic will be discussed a little further when the use of the ''objdump'' program is introduced. |
==The Old And The New== | ==The Old And The New== | ||
Line 64: | Line 64: | ||
[[File:Old-and-new.png]] | [[File:Old-and-new.png]] | ||
− | @ha & @l are needed to specify the high and low halfwords of 32 bit immediate values because the fixed 32 bit size of PPC instructions does not allow enough space for an instruction opcode and additional 32 bits of data. | + | <small>'''@h''' '''@ha''' & '''@l''' attributes are needed to specify the high and low halfwords of 32 bit immediate values because the fixed 32 bit size of PPC instructions does not allow enough space for an instruction opcode and additional 32 bits of data. The difference between '''@h''' and '''@ha''' is due to the common use of the '''addi''' (add immediate) instruction, which only takes a signed immediate value, to provide the lower 16 bits of a 32 bit constant - such as an address. Sometimes these lower 16 bits will equate to a negative value and it is then that prior use of the '''@ha''' attribute will cause its immediate value to be adjusted to compensate. These details are only resolved before execution, when the program is loaded. The '''@h''' attribute just specifies the upper halfword of a 32 bit constant without regard for what follows. While I'm banging on about such intricacies, the meaning of ''zero'' in ''load word and zero'' of the '''lwz''' instruction is that on 64 bit PPC processors, this instruction will load a 32 bit value from memory into the lower word of a 64 bit GPR and then clear (or zero) its upper word. Something to consider when MorphOS is available on G5 class computers... </small> |
One could be forgiven for thinking that the PPC code snippet looks a little ungainly compared to the former. Whereas every PPC instruction is four bytes long, 68k instructions can be as little as two bytes but up to as many as ten. A single 68k instruction can load a value from a 32 bit memory address specified by one instruction operand and store it at another 32 bit memory address specified by the second instruction operand. The 68k can also perform other operations beyond simple loads and stores directly on memory. Or at least appear to... In truth, computer memory only sends and receives data - no other data processing (like adding, subtracting & etc) occurs in memory. While the 68k instruction '''add.l #$12345678,(a0)''' appears to add the immediate value of it's first operand to whatever may already be stored at the address pointed to by a0, the contents of that address are actually loaded from memory into a private work register, the addition is performed and then the result is stored back to the same memory location. So this instruction actually performs two memory accesses where it might appear that there was only one. Contrast this with PPC assembly programming where memory loads and stores are all done explicitly. Before data can be operated upon it must be loaded from memory into a GPR (General Purpose Register), zero (in the case of a simple memory copy) or more operations can then be performed and then the data may be stored back to memory. | One could be forgiven for thinking that the PPC code snippet looks a little ungainly compared to the former. Whereas every PPC instruction is four bytes long, 68k instructions can be as little as two bytes but up to as many as ten. A single 68k instruction can load a value from a 32 bit memory address specified by one instruction operand and store it at another 32 bit memory address specified by the second instruction operand. The 68k can also perform other operations beyond simple loads and stores directly on memory. Or at least appear to... In truth, computer memory only sends and receives data - no other data processing (like adding, subtracting & etc) occurs in memory. While the 68k instruction '''add.l #$12345678,(a0)''' appears to add the immediate value of it's first operand to whatever may already be stored at the address pointed to by a0, the contents of that address are actually loaded from memory into a private work register, the addition is performed and then the result is stored back to the same memory location. So this instruction actually performs two memory accesses where it might appear that there was only one. Contrast this with PPC assembly programming where memory loads and stores are all done explicitly. Before data can be operated upon it must be loaded from memory into a GPR (General Purpose Register), zero (in the case of a simple memory copy) or more operations can then be performed and then the data may be stored back to memory. | ||
Line 95: | Line 95: | ||
The above use of the word 'volatile' means that functions are not expected to preserve the contents of these registers. r1 & r2 are the only registers that must be restored to their initial values by a terminating program, however, while most programs will modify r1 and later restore it, it is best to have not modified r2 in the first place. | The above use of the word 'volatile' means that functions are not expected to preserve the contents of these registers. r1 & r2 are the only registers that must be restored to their initial values by a terminating program, however, while most programs will modify r1 and later restore it, it is best to have not modified r2 in the first place. | ||
− | This System V.4 ABI also sets out a particular way for programs and subroutines to organise their stack frames. When a program is loaded into memory and the instruction execution sequence passes to it, the stack pointer is still pointing to the calling program's stack frame and there is also an important address stored in the Link Register at this time. Unless this program is very simple (like shorty) it must save the Link Register address and, most likely, create it's own stack frame. Somewhat conveniently, there is a special position in the caller's stack frame that a callee program can save the contents of the Link Register to. This is a fairly common example of a program or subroutine's very first instructions. | + | This System V.4 ABI also sets out a particular way for programs and subroutines to organise their stack frames. When a program is loaded into memory and the instruction execution sequence passes to it, the stack pointer is still pointing to the calling program's stack frame and there is also an important address stored in the Link Register at this time. Unless this program is very simple (like ''shorty'') it must save the Link Register address and, most likely, create it's own stack frame. Somewhat conveniently, there is a special position in the caller's stack frame that a callee program can save the contents of the Link Register to. This is a fairly common example of a program or subroutine's very first instructions. |
[[File:Awkward-program-initialisation.png]] | [[File:Awkward-program-initialisation.png]] | ||
− | Uh indeed... Let's try to visualise some memory being used as stack space. | + | 'Uh?' indeed... Let's try to visualise some memory being used as stack space. |
[[File:Stack-diagram16.png]] | [[File:Stack-diagram16.png]] | ||
− | Note that '''stwu r1,-8(r1)''' creates a two-word stack frame which is the smallest stack that a program or subroutine can have if it, in turn, calls another program, subroutine or library function. More often, a larger stack frame is created although, '''stwu r1,-4(r1)''' could be used by a program or subroutine to create a one-word stack frame but this would be redundant and such a program must not call any other program, subroutine or library function. Such a program would not need to store the value in the Link Register (LR) to prevent it from being overwritten by subsequent calls and can terminate and return to it's caller with a simple '''blr''' instruction - just like shorty does. | + | Note that '''stwu r1,-8(r1)''' creates a two-word stack frame which is the smallest stack that a program or subroutine can have if it, in turn, calls another program, subroutine or library function. More often, a larger stack frame is created although, '''stwu r1,-4(r1)''' could be used by a program or subroutine to create a one-word stack frame but this would be redundant and such a program must not call any other program, subroutine or library function. Such a program would not need to store the value in the Link Register (LR) to prevent it from being overwritten by subsequent calls and can terminate and return to it's caller with a simple '''blr''' instruction - just like ''shorty'' does. |
A further note about larger stack frames and appropriate stack sizes: For reasons relating to PPC architecture, it is a good idea to choose stack frame sizes that are multiples of 16 bytes. | A further note about larger stack frames and appropriate stack sizes: For reasons relating to PPC architecture, it is a good idea to choose stack frame sizes that are multiples of 16 bytes. | ||
Line 121: | Line 121: | ||
But wait... | But wait... | ||
− | Not another 'Hello World' program... I'm afraid so. This time, copy and paste is probably quicker. Alternatively, download this fully commented [http://aminet.net/dev/src/ | + | Not another 'Hello World' program... I'm afraid so. This time, copy and paste is probably quicker. Alternatively, download this fully commented [http://aminet.net/dev/src/MorphOS_PPC_HelloWorld.lha source archive]. |
Note that this example does not use the MorphOS SDK. Instead, some 'quick and dirty' methods are used for the sake of simplicity and readability. | Note that this example does not use the MorphOS SDK. Instead, some 'quick and dirty' methods are used for the sake of simplicity and readability. | ||
<pre> | <pre> | ||
− | # Various library function offsets | + | # Various library function offsets |
− | .set _LVOOpenLibrary,-552 | + | .set _LVOOpenLibrary,-552 |
− | .set _LVOCloseLibrary,-414 | + | .set _LVOCloseLibrary,-414 |
− | .set _LVOVPrintf,-954 | + | .set _LVOVPrintf,-954 |
+ | |||
+ | .set _AbsExecBase,4 | ||
+ | |||
+ | # EmulHandle structure (always pointed to by r2) | ||
+ | .set reg_d0,0 | ||
+ | .set reg_d1,4 | ||
+ | .set reg_d2,8 | ||
+ | .set reg_d3,12 | ||
+ | .set reg_d4,16 | ||
+ | .set reg_d5,20 | ||
+ | .set reg_d6,24 | ||
+ | .set reg_d7,28 | ||
+ | .set reg_a0,32 | ||
+ | .set reg_a1,36 | ||
+ | .set reg_a2,40 | ||
+ | .set reg_a3,44 | ||
+ | .set reg_a4,48 | ||
+ | .set reg_a5,52 | ||
+ | .set reg_a6,56 | ||
+ | .set reg_a7,60 | ||
+ | .set EmulCallDirectOS,100 | ||
+ | |||
+ | # Stack frame offsets | ||
+ | .set stack_pos0_caller_stack,0 | ||
+ | .set stack_pos1_callerLR,4 | ||
+ | .set stack_pos2_ExecBase,8 | ||
+ | .set stack_pos3_DosBase,12 | ||
+ | .set new_4_word_stack,16 | ||
+ | |||
+ | .text | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
mflr r0 | mflr r0 | ||
stw r0,stack_pos1_callerLR(r1) | stw r0,stack_pos1_callerLR(r1) | ||
stwu r1,-new_4_word_stack(r1) | stwu r1,-new_4_word_stack(r1) | ||
− | + | ||
lis r3,dosName@ha | lis r3,dosName@ha | ||
addi r3,r3,dosName@l | addi r3,r3,dosName@l | ||
Line 177: | Line 177: | ||
lwz r0,EmulCallDirectOS(r2) | lwz r0,EmulCallDirectOS(r2) | ||
mtctr r0 | mtctr r0 | ||
− | bctrl | + | bctrl |
− | + | ||
cmpwi r3,0 | cmpwi r3,0 | ||
beq exit | beq exit | ||
− | + | ||
stw r3,stack_pos3_DosBase(r1) | stw r3,stack_pos3_DosBase(r1) | ||
− | + | ||
lis r4,string1@ha | lis r4,string1@ha | ||
addi r4,r4,string1@l | addi r4,r4,string1@l | ||
Line 193: | Line 193: | ||
lwz r0,EmulCallDirectOS(r2) | lwz r0,EmulCallDirectOS(r2) | ||
mtctr r0 | mtctr r0 | ||
− | bctrl | + | bctrl |
− | + | ||
lwz r3,stack_pos3_DosBase(r1) | lwz r3,stack_pos3_DosBase(r1) | ||
stw r3,reg_a1(r2) | stw r3,reg_a1(r2) | ||
Line 202: | Line 202: | ||
lwz r0,EmulCallDirectOS(r2) | lwz r0,EmulCallDirectOS(r2) | ||
mtctr r0 | mtctr r0 | ||
− | bctrl | + | bctrl |
− | + | ||
li r3,0 | li r3,0 | ||
− | + | ||
exit: addi r1,r1,new_4_word_stack | exit: addi r1,r1,new_4_word_stack | ||
lwz r0,stack_pos1_callerLR(r1) | lwz r0,stack_pos1_callerLR(r1) | ||
mtlr r0 | mtlr r0 | ||
− | blr | + | blr |
− | + | ||
− | .rodata | + | .rodata |
− | + | ||
− | __abox__ | + | .global __abox__ #__abox__ is a special MorphOS symbol that will |
− | + | __abox__: #differentiate this program from other PPC | |
− | dosName: | + | .word 1 #executables that can run on MorphOS... |
− | .string dos.library | + | .type __abox__,@object #When linking, care should be taken |
− | + | .size __abox__,4 #to avoid stripping this symbol. | |
− | string1: | + | |
− | .string "Hello World\n" | + | dosName: |
+ | .string "dos.library" | ||
+ | |||
+ | string1: | ||
+ | .string "Hello World\n" | ||
</pre> | </pre> | ||
− | Save this source file as HelloWorld.s and open a shell window. Change directory to where HelloWorld.s was just saved and enter: | + | Save this source file as ''HelloWorld.s'' and open a shell window. Change directory to where ''HelloWorld.s'' was just saved and enter: |
<pre> | <pre> | ||
Line 236: | Line 240: | ||
Are you impressed? Feel free to modify, experiment with and improve this source code. A not-too-difficult challenge might be to change this program so that it prints the arguments given to it in the shell window - a number of small but significant changes would be needed to do this. ''Hint: Have another look at the System V.4 register usage above.'' | Are you impressed? Feel free to modify, experiment with and improve this source code. A not-too-difficult challenge might be to change this program so that it prints the arguments given to it in the shell window - a number of small but significant changes would be needed to do this. ''Hint: Have another look at the System V.4 register usage above.'' | ||
− | When choosing which registers to use in your own programs, be aware that use of r0 in some instruction operands will not always work as you might expect. In these instructions, the actual content of r0 is ignored and the result is based on the constant value zero instead. For example, imagine that r0 contains the value 100 when this instruction is executed: '''addi r0,r0,50''' It looks like the result should be r0 = r0 + 50 = 150 The result will actually be r0 = 0 + 50 = 50 This may seem odd but, as long as the programmer is aware of it, this behaviour can be useful. A good PPC instruction reference manual will explain this, and many other things, in much greater detail. There are many of these reference documents available online - this is one of them [http://www.freescale.com/files/product/doc/MPCFPE32B.pdf MPCFPE32B.pdf] | + | The commented version of ''HelloWorld.s'' in the downloadable source archive gives a brief description of why the '''mtctr''' (move to Count Register) & '''bctrl''' (branch to Count Register and link) instruction pair are used in preference to '''mtlr''' (move to Link Register) & '''blrl''' (branch to Link Register and link) - being that the latter pair can degrade performance optimisations of some PPC cpus. However, the primary use of the Count Register is as a 32 bit loop counter that can be automatically decremented by certain branch instructions. |
+ | |||
+ | When choosing which registers to use in your own programs, be aware that use of '''r0''' in some instruction operands will not always work as you might expect. In these instructions, the actual content of '''r0''' is ignored and the result is based on the constant value zero instead. For example, imagine that '''r0''' contains the value 100 when this instruction is executed: '''addi r0,r0,50''' It looks like the result should be '''r0 = r0 + 50 = 150''' The result will actually be '''r0 = 0 + 50 = 50''' This may seem odd but, as long as the programmer is aware of it, this behaviour can be useful. A good PPC instruction reference manual will explain this, and many other things, in much greater detail. There are many of these reference documents available online - this is one of them: [http://www.freescale.com/files/product/doc/MPCFPE32B.pdf MPCFPE32B.pdf] | ||
==MorphOS SDK, Objdump, ELFs & Sections== | ==MorphOS SDK, Objdump, ELFs & Sections== | ||
Line 242: | Line 248: | ||
In addition to installing vbcc, it is also recommended to install the [http://www.morphos-team.net/files/sdk-20100617.lha MorphOS SDK] and the [http://mail.pb-owl.de/~frank/vbcc/current/vbcc_target_ppc-morphos.lha vbcc MorphOS compiler target]. | In addition to installing vbcc, it is also recommended to install the [http://www.morphos-team.net/files/sdk-20100617.lha MorphOS SDK] and the [http://mail.pb-owl.de/~frank/vbcc/current/vbcc_target_ppc-morphos.lha vbcc MorphOS compiler target]. | ||
− | The installer script of the latter seems to expect that there is a pre-existing assign for include: and, if your system already assigns include: to somewhere, please skip over this next part. | + | The installer script of the latter seems to expect that there is a pre-existing assign for ''include:'' and, if your system already assigns ''include:'' to somewhere, please skip over this next part. |
− | What follows is my S:user-startup after the installation of vbcc, it's MorphOS compiler target and an additional assign that I made preceded by this comment - | + | What follows is my ''S:user-startup'' after the installation of vbcc, it's MorphOS compiler target and an additional assign that I made preceded by this comment - |
− | ;vbcc MorphOS target needs include: to be assigned to somewhere | + | ''; vbcc MorphOS target needs include: to be assigned to somewhere'' |
<pre> | <pre> | ||
Line 264: | Line 270: | ||
;Mount TCP: | ;Mount TCP: | ||
;BEGIN vbcc | ;BEGIN vbcc | ||
− | assign >NIL: vbcc: | + | assign >NIL: vbcc: SYS:vbcc |
assign >NIL: C: vbcc:bin ADD | assign >NIL: C: vbcc:bin ADD | ||
setenv VBCC vbcc: | setenv VBCC vbcc: | ||
;END vbcc | ;END vbcc | ||
− | ;vbcc MorphOS target needs include: to be assigned to somewhere | + | ; vbcc MorphOS target needs include: to be assigned to somewhere |
− | assign include: | + | assign include: SDK:GG/os-include |
;BEGIN vbcc-ppc-morphos | ;BEGIN vbcc-ppc-morphos | ||
Line 279: | Line 285: | ||
</pre> | </pre> | ||
− | As mentioned just before the previous source code example, some quick and dirty methods were used for the sake of simplicity and readability and this refers to the way library function offsets were established. | + | At this point it would be useful to have a number of freshly generated object files and executables to look at with ''objdump'' although any ELF can be used for this purpose. Note that while MorphOS 2.x system files are compiled as ELFs, they are also signed to prevent their use on MorphOS 1.x where they may not work. This signing also has the effect of causing ''objdump'' not to recognise them as ELFs. |
+ | |||
+ | ''objdump'' can be used to reveal a lot of interesting information about an ELF including listing any symbols and sections that are present as well as disassembling. Be aware that, when disassembled, seemingly small programs can produce more output than the shell window's history buffer can hold so either redirect the output to a file or specify start and stop addresses to limit the output. Having said that, none of the files generated so far in this tutorial are at risk of overwhelming the shell's history buffer when disassembled. | ||
+ | |||
+ | ''more content needed here'' | ||
+ | |||
+ | As mentioned just before the previous source code example, some quick and dirty methods were used for the sake of simplicity and readability and this refers to the way library function offsets were established. The approach used will quickly become detrimental as more offsets are added. There are a number of solutions to this problem but only one will be presented here and it involves the removal of the leading underscore character from function names so that, for example, '''li r3,_LVOOpenLibrary''' becomes '''li r3,LVOOpenLibrary'''. Remove or comment out the first three '''.set''' directives that define the library function offsets and assemble with ''vasm'' as before but note that the resulting object file is no longer executable. To make this object executable, ''vlink'' is needed. | ||
<pre> | <pre> | ||
− | vlink | + | vlink -o <desired executable name> <existing object name> -lamiga |
</pre> | </pre> | ||
− | In the above shell command, -lamiga refers to the libamiga.a file that vlink knows to look for in vlibmos: This file contains information used to resolve many library function names to numerical values. Note that any executable generated from the above vlink command will contain a lot of symbol information that, while potentially useful, increases the size of the executable. This can be avoided with a few additions to the above command. | + | In the above shell command, ''-lamiga'' refers to the ''libamiga.a'' file that ''vlink'' knows to look for in ''vlibmos:'' This file contains information used to resolve many library function names to numerical values. Note that any executable generated from the above ''vlink'' command will contain a lot of symbol information that, while potentially useful, increases the size of the executable. This can be avoided with a few additions to the above command. |
<pre> | <pre> | ||
− | vlink | + | vlink -s -P__abox__ -o <desired executable name> <existing object name> -lamiga |
</pre> | </pre> | ||
− | -s strip symbols from the output file<br> | + | '''-s''' strip all symbols from the output file<br> |
− | + | '''-P<symbol>''' preserve this symbol<br> | |
− | -P<symbol> preserve this symbol<br> | ||
− | |||
− | |||
− | |||
− | |||
− | '' | + | For more information, please refer to the documentation for ''vlink'' and other programs installed with ''vbcc''. |
Latest revision as of 09:40, 16 February 2011
Contents
Jump In The Deep End And Compile Something Right Now
You could copy and paste but just entering the two lines below into a suitable text editor is probably quicker.
.text blr
Save as shorty.s
.text is an assembler directive - more about it later. blr is one of many branch instructions. This particular one means 'Branch to Link Register'. When a program terminates, this is the last instruction to be executed and it causes the sequence of instruction execution to pass back to the calling program or environment (shell, Ambient etc). Subroutines can also terminate with this instruction but, again, more about this later.
This is as good a time as any to ask if you've installed vbcc yet - if not, go and do it. Download vbcc
Once vbcc is installed, open a shell window and change directory to where shorty.s has been saved.
Enter:
vasmppc_std -Felf -o ram:shorty.o shorty.s
This tells vasm to generate an ELF formatted object file called shorty.o in ram: from the shorty.s source file.
Hello? World's Shortest MorphOS Program - Or Is It?
There are times when a linker is needed to further process an object file to generate an executable - this isn't one of those times.
Go ahead and enter this into the shell window:
ram:shorty.o
It will appear as if nothing happens - but at least it happens very quickly. However, something did happen. shorty.o was identified as an executable program and loaded into memory where the cpu instruction execution sequence passed to it. As mentioned above, the single blr instruction in this program caused the instruction execution sequence to pass directly back to where it was called from - in this case, the shell.
It may be of interest to know that all PPC instructions are four bytes long. In PPC nomenclature, this length is known as a word. But let's take a moment to have a look at the size of shorty.o
Enter this into the shell window:
list ram:shorty.o
2664 bytes?! But this is typical for object files as they contain much additional information that is used in the process of debugging and linking. Linking, particularly with the -s 'strip all symbols' option, will generate an executable that is noticeably smaller.
Try entering this into the shell window:
vlink -s -o ram:shorty ram:shorty.o list ram:shorty
The newly generated executable is 328 bytes - much smaller than its object file but what are those other 324 bytes doing there? They are the ELF 'container' that holds the single, four byte instruction of shorty. ELF stands for Executable and Linkable Format and it is used by MorphOS, AmigaOS4 and AROS - not to forget, Linux, UNIX, BSD and even video game consoles. There is abundant information about this file format online but this topic will be discussed a little further when the use of the objdump program is introduced.
The Old And The New
Generally, in order to make a program that does something useful or to at least produce an observable result, it is necessary to use operating system library functions. Despite the markedly different cpu family that MorphOS runs on compared to the 68k family used by the classic AmigaOS, MorphOS is largely compatible with AmigaOS. This compatibility is reflected in a very similar API shared with AmigaOS and extends to using a MorphOS system structure as if it were the internal data and address registers of an actual 68k processor. This is called the EmulHandle structure and it is always available through the PPC GPR2 register. Also, as with 68k AmigaOS, the address of the Exec library base is always to be found at memory location 4. Many aspects of the AmigaOS API fit closely to the features available in the 68k cpu and the following code comparison should illustrate how it is echoed in the PPC MorphOS API.
A common task performed during the initialisation of many programs - opening the Dos library. It is a particularly simple example of library usage but most functions follow this form.
@h @ha & @l attributes are needed to specify the high and low halfwords of 32 bit immediate values because the fixed 32 bit size of PPC instructions does not allow enough space for an instruction opcode and additional 32 bits of data. The difference between @h and @ha is due to the common use of the addi (add immediate) instruction, which only takes a signed immediate value, to provide the lower 16 bits of a 32 bit constant - such as an address. Sometimes these lower 16 bits will equate to a negative value and it is then that prior use of the @ha attribute will cause its immediate value to be adjusted to compensate. These details are only resolved before execution, when the program is loaded. The @h attribute just specifies the upper halfword of a 32 bit constant without regard for what follows. While I'm banging on about such intricacies, the meaning of zero in load word and zero of the lwz instruction is that on 64 bit PPC processors, this instruction will load a 32 bit value from memory into the lower word of a 64 bit GPR and then clear (or zero) its upper word. Something to consider when MorphOS is available on G5 class computers...
One could be forgiven for thinking that the PPC code snippet looks a little ungainly compared to the former. Whereas every PPC instruction is four bytes long, 68k instructions can be as little as two bytes but up to as many as ten. A single 68k instruction can load a value from a 32 bit memory address specified by one instruction operand and store it at another 32 bit memory address specified by the second instruction operand. The 68k can also perform other operations beyond simple loads and stores directly on memory. Or at least appear to... In truth, computer memory only sends and receives data - no other data processing (like adding, subtracting & etc) occurs in memory. While the 68k instruction add.l #$12345678,(a0) appears to add the immediate value of it's first operand to whatever may already be stored at the address pointed to by a0, the contents of that address are actually loaded from memory into a private work register, the addition is performed and then the result is stored back to the same memory location. So this instruction actually performs two memory accesses where it might appear that there was only one. Contrast this with PPC assembly programming where memory loads and stores are all done explicitly. Before data can be operated upon it must be loaded from memory into a GPR (General Purpose Register), zero (in the case of a simple memory copy) or more operations can then be performed and then the data may be stored back to memory.
Unless you are brand-spanking-new to the topic of PPC assembly, you will already know that the PPC has 32 GPRs - r0 through r31. But did you know that the desired use of these registers is set out in something called the System V.4 ABI which MorphOS adheres to?
r0 - volatile r1 - stack pointer r2 - for system use (with MorphOS, r2 points to the EmulHandle structure) r3 - initialised with a pointer to a Dos command buffer r4 - initialised with the length of the Dos command buffer r5 - initialised with a pointer to an ELF structure r3 ... r10 volatile & can be used to pass function arguments. If more arguments are required, the stack is used. r11 & r12 - volatile r13 - small data area pointer. If this register is needed by a function or subroutine, it must be saved first and then restored before returning to where it was called from. r14 ... r31 - No predefined purpose. If these registers are needed by a function or subroutine, they must be saved first and then restored before returning to where it was called from.
The above use of the word 'volatile' means that functions are not expected to preserve the contents of these registers. r1 & r2 are the only registers that must be restored to their initial values by a terminating program, however, while most programs will modify r1 and later restore it, it is best to have not modified r2 in the first place.
This System V.4 ABI also sets out a particular way for programs and subroutines to organise their stack frames. When a program is loaded into memory and the instruction execution sequence passes to it, the stack pointer is still pointing to the calling program's stack frame and there is also an important address stored in the Link Register at this time. Unless this program is very simple (like shorty) it must save the Link Register address and, most likely, create it's own stack frame. Somewhat conveniently, there is a special position in the caller's stack frame that a callee program can save the contents of the Link Register to. This is a fairly common example of a program or subroutine's very first instructions.
'Uh?' indeed... Let's try to visualise some memory being used as stack space.
Note that stwu r1,-8(r1) creates a two-word stack frame which is the smallest stack that a program or subroutine can have if it, in turn, calls another program, subroutine or library function. More often, a larger stack frame is created although, stwu r1,-4(r1) could be used by a program or subroutine to create a one-word stack frame but this would be redundant and such a program must not call any other program, subroutine or library function. Such a program would not need to store the value in the Link Register (LR) to prevent it from being overwritten by subsequent calls and can terminate and return to it's caller with a simple blr instruction - just like shorty does.
A further note about larger stack frames and appropriate stack sizes: For reasons relating to PPC architecture, it is a good idea to choose stack frame sizes that are multiples of 16 bytes.
This is a less common example of a program's first instructions but it may better illustrate how to use stack frames.
Let's jump ahead and look at the last few instructions involved in program termination that would 'undo' the above instructions.
A Little Less Talk And A Little More Action Please
It's time to compile another program - this one will actually do something.
But wait...
Not another 'Hello World' program... I'm afraid so. This time, copy and paste is probably quicker. Alternatively, download this fully commented source archive.
Note that this example does not use the MorphOS SDK. Instead, some 'quick and dirty' methods are used for the sake of simplicity and readability.
# Various library function offsets .set _LVOOpenLibrary,-552 .set _LVOCloseLibrary,-414 .set _LVOVPrintf,-954 .set _AbsExecBase,4 # EmulHandle structure (always pointed to by r2) .set reg_d0,0 .set reg_d1,4 .set reg_d2,8 .set reg_d3,12 .set reg_d4,16 .set reg_d5,20 .set reg_d6,24 .set reg_d7,28 .set reg_a0,32 .set reg_a1,36 .set reg_a2,40 .set reg_a3,44 .set reg_a4,48 .set reg_a5,52 .set reg_a6,56 .set reg_a7,60 .set EmulCallDirectOS,100 # Stack frame offsets .set stack_pos0_caller_stack,0 .set stack_pos1_callerLR,4 .set stack_pos2_ExecBase,8 .set stack_pos3_DosBase,12 .set new_4_word_stack,16 .text mflr r0 stw r0,stack_pos1_callerLR(r1) stwu r1,-new_4_word_stack(r1) lis r3,dosName@ha addi r3,r3,dosName@l stw r3,reg_a1(r2) li r3,0 stw r3,reg_d0(r2) li r3,_AbsExecBase lwz r3,0(r3) stw r3,stack_pos2_ExecBase(r1) stw r3,reg_a6(r2) li r3,_LVOOpenLibrary lwz r0,EmulCallDirectOS(r2) mtctr r0 bctrl cmpwi r3,0 beq exit stw r3,stack_pos3_DosBase(r1) lis r4,string1@ha addi r4,r4,string1@l stw r4,reg_d1(r2) li r4,0 stw r4,reg_d2(r2) stw r3,reg_a6(r2) li r3,_LVOVPrintf lwz r0,EmulCallDirectOS(r2) mtctr r0 bctrl lwz r3,stack_pos3_DosBase(r1) stw r3,reg_a1(r2) lwz r3,stack_pos2_ExecBase(r1) stw r3,reg_a6(r2) li r3,_LVOCloseLibrary lwz r0,EmulCallDirectOS(r2) mtctr r0 bctrl li r3,0 exit: addi r1,r1,new_4_word_stack lwz r0,stack_pos1_callerLR(r1) mtlr r0 blr .rodata .global __abox__ #__abox__ is a special MorphOS symbol that will __abox__: #differentiate this program from other PPC .word 1 #executables that can run on MorphOS... .type __abox__,@object #When linking, care should be taken .size __abox__,4 #to avoid stripping this symbol. dosName: .string "dos.library" string1: .string "Hello World\n"
Save this source file as HelloWorld.s and open a shell window. Change directory to where HelloWorld.s was just saved and enter:
vasmppc_std -Felf -o ram:hw.o HelloWorld.s
Once again, because of the simplicity of this program, linking isn't necessary so just enter the following in the shell window:
ram:hw.o
Are you impressed? Feel free to modify, experiment with and improve this source code. A not-too-difficult challenge might be to change this program so that it prints the arguments given to it in the shell window - a number of small but significant changes would be needed to do this. Hint: Have another look at the System V.4 register usage above.
The commented version of HelloWorld.s in the downloadable source archive gives a brief description of why the mtctr (move to Count Register) & bctrl (branch to Count Register and link) instruction pair are used in preference to mtlr (move to Link Register) & blrl (branch to Link Register and link) - being that the latter pair can degrade performance optimisations of some PPC cpus. However, the primary use of the Count Register is as a 32 bit loop counter that can be automatically decremented by certain branch instructions.
When choosing which registers to use in your own programs, be aware that use of r0 in some instruction operands will not always work as you might expect. In these instructions, the actual content of r0 is ignored and the result is based on the constant value zero instead. For example, imagine that r0 contains the value 100 when this instruction is executed: addi r0,r0,50 It looks like the result should be r0 = r0 + 50 = 150 The result will actually be r0 = 0 + 50 = 50 This may seem odd but, as long as the programmer is aware of it, this behaviour can be useful. A good PPC instruction reference manual will explain this, and many other things, in much greater detail. There are many of these reference documents available online - this is one of them: MPCFPE32B.pdf
MorphOS SDK, Objdump, ELFs & Sections
In addition to installing vbcc, it is also recommended to install the MorphOS SDK and the vbcc MorphOS compiler target.
The installer script of the latter seems to expect that there is a pre-existing assign for include: and, if your system already assigns include: to somewhere, please skip over this next part.
What follows is my S:user-startup after the installation of vbcc, it's MorphOS compiler target and an additional assign that I made preceded by this comment -
; vbcc MorphOS target needs include: to be assigned to somewhere
; ; MorphOS user-startup ; ; This script is executed on system boot by ; startup-sequence. You can make personal ; changes in here. ; ; $VER: user-startup 1.1 ; ; Enable the following to mount the inet-handler. Note that TCP: allows ; easy access to internet, and allows scripts to listen for incoming ; connections. Some malware could abuse this. ;Mount TCP: ;BEGIN vbcc assign >NIL: vbcc: SYS:vbcc assign >NIL: C: vbcc:bin ADD setenv VBCC vbcc: ;END vbcc ; vbcc MorphOS target needs include: to be assigned to somewhere assign include: SDK:GG/os-include ;BEGIN vbcc-ppc-morphos assign >NIL: vincludemos: vbcc:targets/ppc-morphos/include assign >NIL: vincludemos: include: add assign >NIL: vlibmos: vbcc:targets/ppc-morphos/lib ;END vbcc-ppc-morphos
At this point it would be useful to have a number of freshly generated object files and executables to look at with objdump although any ELF can be used for this purpose. Note that while MorphOS 2.x system files are compiled as ELFs, they are also signed to prevent their use on MorphOS 1.x where they may not work. This signing also has the effect of causing objdump not to recognise them as ELFs.
objdump can be used to reveal a lot of interesting information about an ELF including listing any symbols and sections that are present as well as disassembling. Be aware that, when disassembled, seemingly small programs can produce more output than the shell window's history buffer can hold so either redirect the output to a file or specify start and stop addresses to limit the output. Having said that, none of the files generated so far in this tutorial are at risk of overwhelming the shell's history buffer when disassembled.
more content needed here
As mentioned just before the previous source code example, some quick and dirty methods were used for the sake of simplicity and readability and this refers to the way library function offsets were established. The approach used will quickly become detrimental as more offsets are added. There are a number of solutions to this problem but only one will be presented here and it involves the removal of the leading underscore character from function names so that, for example, li r3,_LVOOpenLibrary becomes li r3,LVOOpenLibrary. Remove or comment out the first three .set directives that define the library function offsets and assemble with vasm as before but note that the resulting object file is no longer executable. To make this object executable, vlink is needed.
vlink -o <desired executable name> <existing object name> -lamiga
In the above shell command, -lamiga refers to the libamiga.a file that vlink knows to look for in vlibmos: This file contains information used to resolve many library function names to numerical values. Note that any executable generated from the above vlink command will contain a lot of symbol information that, while potentially useful, increases the size of the executable. This can be avoided with a few additions to the above command.
vlink -s -P__abox__ -o <desired executable name> <existing object name> -lamiga
-s strip all symbols from the output file
-P<symbol> preserve this symbol
For more information, please refer to the documentation for vlink and other programs installed with vbcc.