Reverse engineering a CSL Dualcom GPRS part 14 – interpreting disassembly

A few posts ago, we managed to disassemble the firmware from the CSL Dualcom site.

The entire listing is available here as a zip. There is a lot of blank space in the file which needs to be trimmed down, but for reference this file will be left as-is.

I have also put the code on github. It’s not ideal as you can’t use the web interface to show the code/diffs, but it is a good way of recording history as mistakes will be made.

The process of turning diassembly into something useful isn’t easy. I find the most useful things are to find very commonly called subroutines first, and work out what they do. If they aren’t obvious, skip them.

The raw listing doesn’t show us the frequency with which subroutines are called. Python, to the rescue again. We trim out the fluff from the file. 0x1000-0x2000 is the string table, which the disassmebler doesn’t know about and tries to turn into code. The processor has a mirrored address structure so everything in the range 0x00000. Everything above 0x1FFFF isn’t the code – it’s special function registers and a mirror area.

Now we run the code through a small script:

from collections import Counter
import operator

datafile = open('/Users/andrew/data/Disassemble1.txt', 'r')

callAddress = []

for row in datafile:
    # Rows with CALL in
    if row.find('CALL') > 0:
        values = row.split('CALL')
        # Get value after call, remove unwanted chars, strip
        # ! are for addressing mode, H\r\n aren't wanted
        address = values[1].replace('!', '').replace('H\r\n', '').strip()
        callAddress.append(address)

# Builds a dict of frequencies
freqs = Counter(callAddress)

# sorts the dictionary into a list of tuples
sortedFreqs = sorted(freqs.iteritems(), key=operator.itemgetter(1), reverse=True)

# Whack it out to CSV for copy and paste
for item in sortedFreqs:
    print item[0] + ',' + str(item[1])

And we end up with CSV of the frequency of calls:

0E1B2,182
0E541,160
0E1D1,143
0D764,120
0DC44,105
0DED3,82
0DACC,79
0E322,68

0xE1B2 looks like a good place to start.

0e1ac        bfcce0      MOVW            !0E0CCH,AX
0e1af        c2          POP             BC
0e1b0        61ec        RETB   
// Start of sub
0e1b2        4c01        CMP             A,#1H
0e1b4        df05        BNZ             $0E1BBH
0e1b6        63          MOV             A,B
0e1b7        ec01e100    BR              !!0E101H
0e1bb        4c02        CMP             A,#2H
0e1bd        df05        BNZ             $0E1C4H
0e1bf        63          MOV             A,B
0e1c0        ec47e100    BR              !!0E147H
0e1c4        4c03        CMP             A,#3H
0e1c6        63          MOV             A,B
0e1c7        61f8        SKNZ            
0e1c9        ec6ce100    BR              !!0E16CH
0e1cd        ecdfe000    BR              !!0E0DFH
0e1d1        fdc404      CALL            !4C4H
0e1d4        0233bd      ADDW            AX,!0BD33H
0e1d7        2013        SUBW            SP,#13H
0e1d9        72          MOV             C,A

First thing to be aware of is that disassembly is not an exact science. Sometimes you will see an address CALLed but you can’t find it. This probably means that the disassembly is misaligned in that area – look a couple of adresses above and below. This is not the case here.

We can see immediately above 0xE1B2 there is a POP and RETB, the end of a subroutine.

To work out what a sub does, it helps to know what parameters are passed to it and how. If we look through for all the CALLs to 0xE1B2, we get an idea of what is going on:

03d31        530d        MOV             B,#0DH
03d33        e1          ONEB            A
03d34        fcb2e100    CALL            !!0E1B2H

B is always set to a value over quite a wide range. It’s probably a number or a ASCII character.

A is set to either 0, 1, 2 or 3. This is likely some kind of option or enumeration.

Going back to the subroutine, we can see how this could work:

0e1b2        4c01        CMP             A,#1H
0e1b4        df05        BNZ             $0E1BBH
	0e1b6        63          MOV             A,B		
	0e1b7        ec01e100    BR              !!0E101H	// If A = 1, branch to 0xE101
0e1bb        4c02        CMP             A,#2H
0e1bd        df05        BNZ             $0E1C4H
	0e1bf        63          MOV             A,B
	0e1c0        ec47e100    BR              !!0E147H	// If A = 2, branch to 0xE147
0e1c4        4c03        CMP             A,#3H
0e1c6        63          MOV             A,B
0e1c7        61f8        SKNZ            
	0e1c9        ec6ce100    BR              !!0E16CH 	// If A = 3, branch to 0xE16C
0e1cd        ecdfe000    BR              !!0E0DFH		// If A = 0, branch to 0xE0DF

So we are branching to other addresses based on the parameter in A.

There’s one thing to note about this function. There is no immediate RET instruction there. These have to be dealt with in the code that is branched to.

Let’s look at 0xE101.

0e101        77          MOV             H,A
0e102        8efa        MOV             A,PSW
0e104        9803        MOV             [SP+3H],A
0e106        67          MOV             A,H
0e107        717bfa      DI              
0e10a        c3          PUSH            BC
0e10b        dbb6e0      MOVW            BC,!0E0B6H
0e10e        48b8e4      MOV             0E4B8H[BC],A
0e111        a2b6e0      INCW            !0E0B6H
0e114        afb6e0      MOVW            AX,!0E0B6H
0e117        440a04      CMPW            AX,#40AH
0e11a        dc04        BC              $0E120H
0e11c        f6          CLRW            AX
0e11d        bfb6e0      MOVW            !0E0B6H,AX
0e120        8f0401      MOV             A,!SSR02L
0e123        31631e      BT              A.6H,$0E144H
0e126        362201      MOVW            HL,#122H
0e129        71a2        SET1            [HL].2H
0e12b        71b2        SET1            [HL].3H
0e12d        dbb4e0      MOVW            BC,!0E0B4H
0e130        49b8e4      MOV             A,0E4B8H[BC]
0e133        9e44        MOV             SIO10,A
0e135        a2b4e0      INCW            !0E0B4H
0e138        afb4e0      MOVW            AX,!0E0B4H
0e13b        440a04      CMPW            AX,#40AH
0e13e        dc04        BC              $0E144H
0e140        f6          CLRW            AX
0e141        bfb4e0      MOVW            !0E0B4H,AX
0e144        c2          POP             BC
0e145        61ec        RETB            

It’s pretty long and complex. But there is one really key piece of info in there – the special function register SSR02L. Looking to the 78K0R data sheet, this is “Serial status register 02”. It’s pretty likely this function concerns serial. It has a return at the end as well.

If we look 0xE16C, this has reference to SSR12L. Another serial port.

It’s quite likely that this function concerns either reading or writing to the various serial ports on the board. I’ve not looked at it in enough depth to know exactly what it is doing, so we’ll do the following:

// B has char 
// A has 0,1,2,3 - probably different serial ports
// Return is in the branches
:sub_Serial_UnknownA_e1b3           
	0e1b2        4c01        CMP             A,#1H
	0e1b4        df05        BNZ             $0E1BBH
		0e1b6        63          MOV             A,B
		0e1b7        ec01e100    BR              !!0E101H // A = 1
	0e1bb        4c02        CMP             A,#2H
	0e1bd        df05        BNZ             $0E1C4H
		0e1bf        63          MOV             A,B
		0e1c0        ec47e100    BR              !!0E147H // A = 2
	0e1c4        4c03        CMP             A,#3H
	0e1c6        63          MOV             A,B
	0e1c7        61f8        SKNZ            
		0e1c9        ec6ce100    BR              !!0E16CH // A = 3
	0e1cd        ecdfe000    BR              !!0E0DFH // A = 0

What have I done here?

  • Called the sub :sub_Serial_UnknownA_e1b3. The : denotes that this is the actual sub. It is something to do with serial – the first unknown sub to do with serial. I have put the address on the end just to keep track of where it is.
  • Search and replace on !!0E1B2H with this new name. “sub_Serial_UnknownA_e1b3” now shows instead of the raw address – when I see it called I know it is something to do with serial.
  • Put some brief notes above the sub so I know what it is doing.
  • Indented branches so function is a little clearer

I’m now going to do similar for the other high-frequency subs. Again, I am building up a broad picture, not going into extreme depth at this stage.

Leave a Reply

Your email will not be published. Name and Email fields are required.

This site uses Akismet to reduce spam. Learn how your comment data is processed.