Difference between revisions of "An Introduction to MorphOS PPC Assembly"

From MorphOS Library

m
m
Line 299: Line 299:
 
</pre>
 
</pre>
  
-s strip all symbols from the output file<br>
+
'''-s''' strip all symbols from the output file<br>
-P<symbol> preserve this symbol<br>
+
'''-P<symbol>''' preserve this symbol<br>
  
 
For more information, please refer to the documentation for ''vlink'' and other programs installed with ''vbcc''.
 
For more information, please refer to the documentation for ''vlink'' and other programs installed with ''vbcc''.

Revision as of 00:10, 7 January 2011

Jump In The Deep End And Compile Something Right Now

You could copy and paste but just entering the two lines below into a suitable text editor is probably quicker.

.text
	blr

Save as shorty.s

.text is an assembler directive - more about it later. blr is one of many branch instructions. This particular one means 'Branch to Link Register'. When a program terminates, this is the last instruction to be executed and it causes the sequence of instruction execution to pass back to the calling program or environment (shell, Ambient etc). Subroutines can also terminate with this instruction but, again, more about this later.

This is as good a time as any to ask if you've installed vbcc yet - if not, go and do it. Download vbcc

Once vbcc is installed, open a shell window and change directory to where shorty.s has been saved.

Enter:

vasmppc_std -Felf -o ram:shorty.o shorty.s

This tells vasm to generate an ELF formatted object file called shorty.o in ram: from the shorty.s source file.

Hello? World's Shortest MorphOS Program - Or Is It?

There are times when a linker is needed to further process an object file to generate an executable - this isn't one of those times.

Go ahead and enter this into the shell window:

ram:shorty.o

It will appear as if nothing happens - but at least it happens very quickly. However, something did happen. shorty.o was identified as an executable program and loaded into memory where the cpu instruction execution sequence passed to it. As mentioned above, the single blr instruction in this program caused the instruction execution sequence to pass directly back to where it was called from - in this case, the shell.

It may be of interest to know that all PPC instructions are four bytes long. In PPC nomenclature, this length is known as a word. But let's take a moment to have a look at the size of shorty.o

Enter this into the shell window:

list ram:shorty.o

2664 bytes?! But this is typical for object files as they contain much additional information that is used in the process of debugging and linking. Linking, particularly with the -s 'strip all symbols' option, will generate an executable that is noticeably smaller.

Try entering this into the shell window:

vlink -s -o ram:shorty ram:shorty.o

list ram:shorty

The newly generated executable is 328 bytes - much smaller than its object file but what are those other 324 bytes doing there? They are the ELF 'container' that holds the single, four byte instruction of shorty. ELF stands for Executable and Linkable Format and it is used by MorphOS, AmigaOS4 and AROS - not to forget, Linux, UNIX, BSD and even video game consoles. There is abundant information about this file format online but this topic will be discussed a little further when the use of the objdump program is introduced.

The Old And The New

Generally, in order to make a program that does something useful or to at least produce an observable result, it is necessary to use operating system library functions. Despite the markedly different cpu family that MorphOS runs on compared to the 68k family used by the classic AmigaOS, MorphOS is largely compatible with AmigaOS. This compatibility is reflected in a very similar API shared with AmigaOS and extends to using a MorphOS system structure as if it were the internal data and address registers of an actual 68k processor. This is called the EmulHandle structure and it is always available through the PPC GPR2 register. Also, as with 68k AmigaOS, the address of the Exec library base is always to be found at memory location 4. Many aspects of the AmigaOS API fit closely to the features available in the 68k cpu and the following code comparison should illustrate how it is echoed in the PPC MorphOS API.

A common task performed during the initialisation of many programs - opening the Dos library. It is a particularly simple example of library usage but most functions follow this form.

Old-and-new.png

@h @ha & @l attributes are needed to specify the high and low halfwords of 32 bit immediate values because the fixed 32 bit size of PPC instructions does not allow enough space for an instruction opcode and additional 32 bits of data. The difference between @h and @ha is due to the common use of the addi (add immediate) instruction, which only takes a signed immediate value, to provide the lower 16 bits of a 32 bit constant - such as an address. Sometimes these lower 16 bits will equate to a negative value and it is then that prior use of the @ha attribute will cause its immediate value to be adjusted to compensate. These details are only resolved before execution, when the program is loaded. The @h attribute just specifies the upper halfword of a 32 bit constant without regard for what follows. While I'm banging on about such intricacies, the meaning of zero in load word and zero of the lwz instruction is that on 64 bit PPC processors, this instruction will load a 32 bit value from memory into the lower word of a 64 bit GPR and then clear (or zero) its upper word. Something to consider when MorphOS is available on G5 class computers...

One could be forgiven for thinking that the PPC code snippet looks a little ungainly compared to the former. Whereas every PPC instruction is four bytes long, 68k instructions can be as little as two bytes but up to as many as ten. A single 68k instruction can load a value from a 32 bit memory address specified by one instruction operand and store it at another 32 bit memory address specified by the second instruction operand. The 68k can also perform other operations beyond simple loads and stores directly on memory. Or at least appear to... In truth, computer memory only sends and receives data - no other data processing (like adding, subtracting & etc) occurs in memory. While the 68k instruction add.l #$12345678,(a0) appears to add the immediate value of it's first operand to whatever may already be stored at the address pointed to by a0, the contents of that address are actually loaded from memory into a private work register, the addition is performed and then the result is stored back to the same memory location. So this instruction actually performs two memory accesses where it might appear that there was only one. Contrast this with PPC assembly programming where memory loads and stores are all done explicitly. Before data can be operated upon it must be loaded from memory into a GPR (General Purpose Register), zero (in the case of a simple memory copy) or more operations can then be performed and then the data may be stored back to memory.

Unless you are brand-spanking-new to the topic of PPC assembly, you will already know that the PPC has 32 GPRs - r0 through r31. But did you know that the desired use of these registers is set out in something called the System V.4 ABI which MorphOS adheres to?

	r0 - volatile
	r1 - stack pointer
	r2 - for system use (with MorphOS, r2 points to the EmulHandle structure)

	r3 - initialised with a pointer to a Dos command buffer
	r4 - initialised with the length of the Dos command buffer
	r5 - initialised with a pointer to an ELF structure

	r3 ... r10 volatile & can be used to pass function arguments.  If more 
		arguments are required, the stack is used.  

	r11 & r12 - volatile

	r13 - small data area pointer.  If this register is needed by a function or 
		subroutine, it must be saved first and then restored before 
		returning to where it was called from.  

	r14 ... r31 - No predefined purpose.  If these registers are needed by a 
		function or subroutine, they must be saved first and then restored 
		before returning to where it was called from.  

The above use of the word 'volatile' means that functions are not expected to preserve the contents of these registers. r1 & r2 are the only registers that must be restored to their initial values by a terminating program, however, while most programs will modify r1 and later restore it, it is best to have not modified r2 in the first place.

This System V.4 ABI also sets out a particular way for programs and subroutines to organise their stack frames. When a program is loaded into memory and the instruction execution sequence passes to it, the stack pointer is still pointing to the calling program's stack frame and there is also an important address stored in the Link Register at this time. Unless this program is very simple (like shorty) it must save the Link Register address and, most likely, create it's own stack frame. Somewhat conveniently, there is a special position in the caller's stack frame that a callee program can save the contents of the Link Register to. This is a fairly common example of a program or subroutine's very first instructions.

Awkward-program-initialisation.png

'Uh?' indeed... Let's try to visualise some memory being used as stack space.

Stack-diagram16.png

Note that stwu r1,-8(r1) creates a two-word stack frame which is the smallest stack that a program or subroutine can have if it, in turn, calls another program, subroutine or library function. More often, a larger stack frame is created although, stwu r1,-4(r1) could be used by a program or subroutine to create a one-word stack frame but this would be redundant and such a program must not call any other program, subroutine or library function. Such a program would not need to store the value in the Link Register (LR) to prevent it from being overwritten by subsequent calls and can terminate and return to it's caller with a simple blr instruction - just like shorty does.

A further note about larger stack frames and appropriate stack sizes: For reasons relating to PPC architecture, it is a good idea to choose stack frame sizes that are multiples of 16 bytes.

This is a less common example of a program's first instructions but it may better illustrate how to use stack frames.

Program-initialisation.png

Let's jump ahead and look at the last few instructions involved in program termination that would 'undo' the above instructions.

Program-termination.png

A Little Less Talk And A Little More Action Please

It's time to compile another program - this one will actually do something.

But wait...

Not another 'Hello World' program... I'm afraid so. This time, copy and paste is probably quicker. Alternatively, download this fully commented source archive.

Note that this example does not use the MorphOS SDK. Instead, some 'quick and dirty' methods are used for the sake of simplicity and readability.

# Various library function offsets
.set	_LVOOpenLibrary,-552
.set	_LVOCloseLibrary,-414
.set	_LVOVPrintf,-954

.set	_AbsExecBase,4

# EmulHandle structure (always pointed to by r2)
.set	reg_d0,0
.set	reg_d1,4
.set	reg_d2,8
.set	reg_d3,12
.set	reg_d4,16
.set	reg_d5,20
.set	reg_d6,24
.set	reg_d7,28
.set	reg_a0,32
.set	reg_a1,36
.set	reg_a2,40
.set	reg_a3,44
.set	reg_a4,48
.set	reg_a5,52
.set	reg_a6,56
.set	reg_a7,60
.set	EmulCallDirectOS,100

# Stack frame offsets
.set	stack_pos0_caller_stack,0
.set	stack_pos1_callerLR,4
.set	stack_pos2_ExecBase,8
.set	stack_pos3_DosBase,12
.set	new_4_word_stack,16

.text

	mflr	r0
	stw	r0,stack_pos1_callerLR(r1)
	stwu	r1,-new_4_word_stack(r1)

	lis	r3,dosName@ha
	addi	r3,r3,dosName@l
	stw	r3,reg_a1(r2)
	li	r3,0
	stw	r3,reg_d0(r2)
	li	r3,_AbsExecBase
	lwz	r3,0(r3)
	stw	r3,stack_pos2_ExecBase(r1)
	stw	r3,reg_a6(r2)
	li	r3,_LVOOpenLibrary
	lwz	r0,EmulCallDirectOS(r2)
	mtctr	r0
	bctrl

	cmpwi	r3,0
	beq	exit

	stw	r3,stack_pos3_DosBase(r1)

	lis	r4,string1@ha
	addi	r4,r4,string1@l
	stw	r4,reg_d1(r2)
	li	r4,0
	stw	r4,reg_d2(r2)
	stw	r3,reg_a6(r2)
	li	r3,_LVOVPrintf
	lwz	r0,EmulCallDirectOS(r2)
	mtctr	r0
	bctrl

	lwz	r3,stack_pos3_DosBase(r1)
	stw	r3,reg_a1(r2)
	lwz	r3,stack_pos2_ExecBase(r1)
	stw	r3,reg_a6(r2)
	li	r3,_LVOCloseLibrary
	lwz	r0,EmulCallDirectOS(r2)
	mtctr	r0
	bctrl

	li	r3,0

exit:	addi	r1,r1,new_4_word_stack
	lwz	r0,stack_pos1_callerLR(r1)
	mtlr	r0
	blr

.rodata
		#__abox__ is a special MorphOS symbol that will differentiate
__abox__:	#this program from other PPC executables that can run on MorphOS...
		#When linking, care should be taken to avoid stripping this symbol.
dosName:
.string	"dos.library"
	
string1:
.string "Hello World\n"

Save this source file as HelloWorld.s and open a shell window. Change directory to where HelloWorld.s was just saved and enter:

vasmppc_std -Felf -o ram:hw.o HelloWorld.s

Once again, because of the simplicity of this program, linking isn't necessary so just enter the following in the shell window:

ram:hw.o

Are you impressed? Feel free to modify, experiment with and improve this source code. A not-too-difficult challenge might be to change this program so that it prints the arguments given to it in the shell window - a number of small but significant changes would be needed to do this. Hint: Have another look at the System V.4 register usage above.

The commented version of HelloWorld.s in the downloadable source archive gives a brief description of why the mtctr (move to Count Register) & bctrl (branch to Count Register and link) instruction pair are used in preference to mtlr (move to Link Register) & blrl (branch to Link Register and link) - being that the latter pair can degrade performance optimisations of some PPC cpus. However, the primary use of the Count Register is as a 32 bit loop counter that can be automatically decremented by certain branch instructions.

When choosing which registers to use in your own programs, be aware that use of r0 in some instruction operands will not always work as you might expect. In these instructions, the actual content of r0 is ignored and the result is based on the constant value zero instead. For example, imagine that r0 contains the value 100 when this instruction is executed: addi r0,r0,50 It looks like the result should be r0 = r0 + 50 = 150 The result will actually be r0 = 0 + 50 = 50 This may seem odd but, as long as the programmer is aware of it, this behaviour can be useful. A good PPC instruction reference manual will explain this, and many other things, in much greater detail. There are many of these reference documents available online - this is one of them: MPCFPE32B.pdf

MorphOS SDK, Objdump, ELFs & Sections

In addition to installing vbcc, it is also recommended to install the MorphOS SDK and the vbcc MorphOS compiler target.

The installer script of the latter seems to expect that there is a pre-existing assign for include: and, if your system already assigns include: to somewhere, please skip over this next part.

What follows is my S:user-startup after the installation of vbcc, it's MorphOS compiler target and an additional assign that I made preceded by this comment -

; vbcc MorphOS target needs include: to be assigned to somewhere

;
; MorphOS user-startup
;
; This script is executed on system boot by
; startup-sequence. You can make personal
; changes in here.
;
; $VER: user-startup 1.1
;

; Enable the following to mount the inet-handler. Note that TCP: allows
; easy access to internet, and allows scripts to listen for incoming
; connections. Some malware could abuse this.
;Mount TCP:
;BEGIN vbcc
assign >NIL: vbcc: SYS:vbcc
assign >NIL: C: vbcc:bin ADD
setenv VBCC vbcc:
;END vbcc

; vbcc MorphOS target needs include: to be assigned to somewhere
assign include: SDK:GG/os-include

;BEGIN vbcc-ppc-morphos
assign >NIL: vincludemos: vbcc:targets/ppc-morphos/include
assign >NIL: vincludemos: include: add
assign >NIL: vlibmos: vbcc:targets/ppc-morphos/lib
;END vbcc-ppc-morphos

At this point it would be useful to have a number of freshly generated object files and executables to look at with objdump although any ELF can be used for this purpose. Note that while MorphOS 2.x system files are compiled as ELFs, they are also signed to prevent their use on MorphOS 1.x where they may not work. This signing also has the effect of causing objdump not to recognise them as ELFs.

objdump can be used to reveal a lot of interesting information about an ELF including listing any symbols and sections that are present as well as disassembling. Be aware that, when disassembled, seemingly small programs can produce more output than the shell window's history buffer can hold so either redirect the output to a file or specify start and stop addresses to limit the output. Having said that, none of the files generated so far in this tutorial are at risk of overwhelming the shell's history buffer when disassembled.

more content needed here

As mentioned just before the previous source code example, some quick and dirty methods were used for the sake of simplicity and readability and this refers to the way library function offsets were established. The approach used will quickly become detrimental as more offsets are added. There are a number of solutions to this problem but only one will be presented here and it involves the removal of the leading underscore character from function names so that, for example, li r3,_LVOOpenLibrary becomes li r3,LVOOpenLibrary. Remove or comment out the first three .set directives that define the library function offsets and assemble with vasm as before but note that the resulting object file is no longer executable. To make this object executable, vlink is needed.

vlink -o <desired executable name> <existing object name> -lamiga

In the above shell command, -lamiga refers to the libamiga.a file that vlink knows to look for in vlibmos: This file contains information used to resolve many library function names to numerical values. Note that any executable generated from the above vlink command will contain a lot of symbol information that, while potentially useful, increases the size of the executable. This can be avoided with a few additions to the above command.

vlink -s -P__abox__ -o <desired executable name> <existing object name> -lamiga

-s strip all symbols from the output file
-P<symbol> preserve this symbol

For more information, please refer to the documentation for vlink and other programs installed with vbcc.