Reverse engineering a CSL Dualcom GPRS part 15 – interpreting disassembly 2

In addition to finding the most frequently called functions, we should go through the memory map and identify importants parts of it.

1154 memory map

One part of this that is very important to how the device operates is the vector table, right at the bottom of the flash.

The vector table contains addresses that are called when certain interrupts are triggered. For these microcontrollers, this is structured like this:
Vector table

So we take a look right at the beginning of the disassembly:

00000        00          NOP             
00001        01          ADDW            AX,AX
00002        82          INC             C
00003        2084        SUBW            SP,#84H
00005        2086        SUBW            SP,#86H
00007        2088        SUBW            SP,#88H
00009        208a        SUBW            SP,#8AH
0000b        208c        SUBW            SP,#8CH
0000d        208e        SUBW            SP,#8EH
0000f        2090        SUBW            SP,#90H
00011        2092        SUBW            SP,#92H
00013        2001        SUBW            SP,#1H
00015        247d23      SUBW            AX,#237DH
00018        292494      MOV             A,9424H[C]
0001b        2096        SUBW            SP,#96H
0001d        203a        SUBW            SP,#3AH
0001f        21          ?               
00020        be20        MOVW            PM0,AX
00022        3c21        SUBC            A,#21H
00024        92          DEC             C
00025        225921      SUBW            AX,!2159H
00028        ba22        MOVW            [DE+22H],AX
0002a        9820        MOV             [SP+20H],A
0002c        00          NOP             
0002d        209a        SUBW            SP,#9AH
0002f        209c        SUBW            SP,#9CH
00031        209e        SUBW            SP,#9EH
00033        20a0        SUBW            SP,#0A0H
00035        20a2        SUBW            SP,#0A2H
00037        20a4        SUBW            SP,#0A4H
00039        20a6        SUBW            SP,#0A6H
0003b        205e        SUBW            SP,#5EH
0003d        23          SUBW            AX,BC
0003e        d7          RET             
0003f        226023      SUBW            AX,!2360H
00042        a820        MOVW            AX,[SP+20H]
00044        aa20        MOVW            AX,[DE+20H]
00046        ac20        MOVW            AX,[HL+20H]
00048        ae20        MOVW            AX,PM0
0004a        b020b2      DEC             !0B220H
0004d        20b4        SUBW            SP,#0B4H
0004f        20b6        SUBW            SP,#0B6H
00051        20b8        SUBW            SP,#0B8H
00053        20ba        SUBW            SP,#0BAH
00055        20ff        SUBW            SP,#0FFH

The disassembler has tried to disassemble when it shouldn’t – a common issue. Though, to be honest, it should know that this area is a vector table.

So if we re-organise the hex file into something a bit more readable, we get this:

0000 -> 0100 * RESET
0004 -> 2082
0006 -> 2086
0008 -> 2088
000A -> 208A
000C -> 208C
000E -> 208E
0010 -> 2090
0012 -> 2092
0014 -> 2401 * INTST3
0016 -> 237D * INTSR3
0018 -> 2429 * INTSRE3
001A -> 2094 
001C -> 2096 
001E -> 213A * INST0 
0020 -> 20BE * INTSR0 
0022 -> 213C * INTSRE0
0024 -> 2292 * INTST1
0026 -> 2159 * INTSR1
0028 -> 22BA * INTSRE1
002A -> 2098
002C -> 2000 * INTTM00
002E -> 209A
0030 -> 209C
0032 -> 209E
0034 -> 20A0
0036 -> 20A2
0038 -> 20A4
003A -> 20A6
003C -> 235E * INTST2
003E -> 22D7 * INTSR2
0040 -> 2360 * INTSRE2
0042 -> 20A8
0044 -> 20AA
0046 -> 20AC
0048 -> 20AE
004A -> 20B0
004C -> 20B2
004E -> 20B4
0050 -> 20B6
0052 -> 20B8
0054 -> 20BA

Notice how a lot of the addresses are just incrementing- 20AA, 20AC, 20AE. This is just a massive block of RETI instructions – i.e. the interrupt handler just returns immediately – it is not implemented.

02092        61fc        RETI            
02094        61fc        RETI            
02096        61fc        RETI            
02098        61fc        RETI            
0209a        61fc        RETI            
0209c        61fc        RETI            
0209e        61fc        RETI            
020a0        61fc        RETI            
020a2        61fc        RETI            
020a4        61fc        RETI            
020a6        61fc        RETI            
020a8        61fc        RETI            
020aa        61fc        RETI            
020ac        61fc        RETI            
020ae        61fc        RETI            
020b0        61fc        RETI            
020b2        61fc        RETI     

All of the vectors that are marked with an asterisk and with a name are implemented or used by the board. There are some important handlers here – mainly the serial IO.

Reset jumps to 0x100. I’ll save looking at that for another time – mostly the reset vector will be setting up buffers, memory, pointers, some checks.

You can also see we have groups of interrupt handlers for INTST* (transmit finished), INTSR* (receive finished), INTSRE* (receive error). These are for the the UARTs 0-3 respectively. Their implementation is very similar – let’s look at UART1 which is used for the GPRS modem.

// INTST1
	02292        c1          PUSH            AX
	02293        c3          PUSH            BC
	02294        c7          PUSH            HL
		02295        fbb6e0      MOVW            HL,!0E0B6H
		02298        afb4e0      MOVW            AX,!0E0B4H
		0229b        47          CMPW            AX,HL
		0229c        dd17        BZ              $22B5H
		0229e        dbb4e0      MOVW            BC,!0E0B4H
		022a1        49b8e4      MOV             A,0E4B8H[BC] 	// Get data from E4B8 using offset from E0B4
		022a4        9e44        MOV             SIO10,A 		// Move to serial data TX register
		022a6        a2b4e0      INCW            !0E0B4H	    // Increment the offset
		022a9        afb4e0      MOVW            AX,!0E0B4H
		022ac        440a04      CMPW            AX,#40AH 		// Is the offset greater than 1034? If so reset to 0
		022af        dc04        BC              $22B5H
		022b1        f6          CLRW            AX
		022b2        bfb4e0      MOVW            !0E0B4H,AX
	022b5        c6          POP             HL
	022b6        c2          POP             BC
	022b7        c0          POP             AX
	022b8        61fc        RETI  

Again – I’m not really currently interested in precise detail, just an idea of what is happening. This handler takes a byte from a buffer at 0xE4B8 and writes it into the transmit register. That buffer will appear elsewhere in the code and hint to us when something is being sent out of UART1.

We can then go through all of the other UART/serial functions and identify potential transmit/receive buffers.

Interestingly, INTST0 and INTST2 are just RETI instructions. Why do these not require a transmit empty interrupt handler? Is it handled in software elsewhere?

The next handler that stands out from the others is INTTM00. This is the timer interrupt for timer 0 which will fire when the timer hits a certain value.

// INTTM00        
	02000        c1          PUSH            AX
	02001        c3          PUSH            BC
	02002        c7          PUSH            HL
	02003        aefc        MOVW            AX,0FFFFCH
	02005        c1          PUSH            AX
	02006        a0b3f6      INC             !0F6B3H
	02009        8fb3f6      MOV             A,!0F6B3H
	0200c        5c03        AND             A,#3H
	0200e        4c03        CMP             A,#3H
	02010        df38        BNZ             $204AH
		02012        a0b4f6      INC             !0F6B4H
		02015        fcfc2801    CALL            !!128FCH
		02019        fc932601    CALL            !!12693H
		0201d        fcf22701    CALL            !!127F2H
		02021        f45c        CLRB            0FFE5CH

		02023        fc132a01    CALL            !!12A13H // 7SEG display
		02027        fcaa3201    CALL            !!132AAH // Buttons

		0202b        8fb4f6      MOV             A,!0F6B4H
		0202e        5c03        AND             A,#3H
		02030        dd08        BZ              $203AH
		02032        91          DEC             A
		02033        dd0b        BZ              $2040H
		02035        91          DEC             A
		02036        dd0e        BZ              $2046H
		02038        ef10        BR              $204AH
		0203a        fccbff00    CALL            !!0FFCBH // Analog
		0203e        ef0a        BR              $204AH
		02040        fcd13101    CALL            !!131D1H
		02044        ef04        BR              $204AH
		02046        fc063301    CALL            !!13306H
	0204a        fc742e01    CALL            !!12E74H
	0204e        fc84ff00    CALL            !!0FF84H
	02052        72          MOV             C,A
	02053        81          INC             A
	02054        dd24        BZ              $207AH
        02056        62          MOV             A,C
        02057        70          MOV             X,A
        02058        f1          CLRB            A
        02059        01          ADDW            AX,AX
        0205a        04b8f5      ADDW            AX,#0F5B8H
        0205d        16          MOVW            HL,AX
        0205e        f6          CLRW            AX
        0205f        b1          DECW            AX
        02060        bb          MOVW            [HL],AX
        02061        62          MOV             A,C
        02062        d1          CMP0            A
        02063        dd11        BZ              $2076H
        02065        2c11        SUB             A,#11H
        02067        dd05        BZ              $206EH
        02069        91          DEC             A
        0206a        dd06        BZ              $2072H
        0206c        ef0c        BR              $207AH
        0206e        e46a        ONEB            0FFE6AH
        02070        ef08        BR              $207AH
        02072        e46b        ONEB            0FFE6BH
        02074        ef04        BR              $207AH
        02076        fcf7fc00    CALL            !!0FCF7H
        0207a        c0          POP             AX
	0207b        befc        MOVW            0FFFFCH,AX
	0207d        c6          POP             HL
	0207e        c2          POP             BC
	0207f        c0          POP             AX

This looks like it is fired periodically. A number of counters are used so that portions of the subroutine are only run now and then.

There are a lot of calls, and if we look to them we can clearly identify function:

// Suspect from IO this is output to 7 seg
	12a13        d45d        CMP0            0FFE5DH
	12a15        f1          CLRB            A
	12a16        61f8        SKNZ            
	12a18        e1          ONEB            A
	12a19        9d5d        MOV             0FFE5DH,A
	12a1b        d45d        CMP0            0FFE5DH
	12a1d        dd24        BZ              $12A43H
	12a1f        8f46f6      MOV             A,!0F646H
	12a22        d448        CMP0            0FFE48H
	12a24        dd0a        BZ              $12A30H
	12a26        36b4f6      MOVW            HL,#0F6B4H
	12a29        31d50e      BF              [HL].5H,$12A3AH
	12a2c        51ff        MOV             A,#0FFH
	12a2e        ef0a        BR              $12A3AH
	12a30        d446        CMP0            0FFE46H
	12a32        dd06        BZ              $12A3AH
	12a34        36b4f6      MOVW            HL,#0F6B4H
	12a37        31f3f2      BT              [HL].7H,$12A2CH
	12a3a        712305      CLR1            P5.2H // These are the common cathodes
	12a3d        713205      SET1            P5.3H
	12a40        9d06        MOV             P6,A // P6 is the 7SEG
	12a42        d7          RET       

	12a43        8f47f6      MOV             A,!0F647H
	12a46        d449        CMP0            0FFE49H
	12a48        dd0a        BZ              $12A54H
	12a4a        36b4f6      MOVW            HL,#0F6B4H
	12a4d        31d50e      BF              [HL].5H,$12A5EH
	12a50        51ff        MOV             A,#0FFH
	12a52        ef0a        BR              $12A5EH
	12a54        d447        CMP0            0FFE47H
	12a56        dd06        BZ              $12A5EH
	12a58        36b4f6      MOVW            HL,#0F6B4H
	12a5b        31f3f2      BT              [HL].7H,$12A50H
	12a5e        712205      SET1            P5.2H // common cathodes flip
	12a61        713305      CLR1            P5.3H
	12a64        9d06        MOV             P6,A
	12a66        d7          RET   

From the IO, we can see this is likely to be updating the 7 segment LED displays.

The method used – of setting one common cathode, then the segments for that half, then the other common cathode, then the segments for that half – means that this needs to be called relatively frequently otherwise flicker will be detected by the eye.

// Button detection and debounce?
	132aa        31220217    BT              P2.2H,$132C5H // Button A
		132ae        4029e0ff    CMP             !0E029H,#0FFH
		132b2        dd24        BZ              $132D8H
		132b4        a029e0      INC             !0E029H
		132b7        4029e007    CMP             !0E029H,#7H
		132bb        df1b        BNZ             $132D8H
		132bd        cf29e0ff    MOV             !0E029H,#0FFH
		132c1        e445        ONEB            0FFE45H
		132c3        ef13        BR              $132D8H
	132c5        d529e0      CMP0            !0E029H
	132c8        dd0e        BZ              $132D8H
	132ca        b029e0      DEC             !0E029H
	132cd        4029e0f8    CMP             !0E029H,#0F8H
	132d1        df05        BNZ             $132D8H
	132d3        f529e0      CLRB            !0E029H
	132d6        f445        CLRB            0FFE45H

	132d8        31320216    BT              P2.3H,$132F2H // Button B
		132dc        402ae0ff    CMP             !0E02AH,#0FFH
		132e0        dd23        BZ              $13305H
		132e2        a02ae0      INC             !0E02AH
		132e5        402ae007    CMP             !0E02AH,#7H
		132e9        df1a        BNZ             $13305H
		132eb        cf2ae0ff    MOV             !0E02AH,#0FFH
		132ef        e444        ONEB            0FFE44H
		132f1        d7          RET             
	132f2        d52ae0      CMP0            !0E02AH
	132f5        dd0e        BZ              $13305H
	132f7        b02ae0      DEC             !0E02AH
	132fa        402ae0f8    CMP             !0E02AH,#0F8H
	132fe        df05        BNZ             $13305H
	13300        f52ae0      CLRB            !0E02AH
	13303        f444        CLRB            0FFE44H
	13305        d7          RET 

Again, from the IO, we can see that the buttons are being polled. There’s also some counters changing – probably some debounce.

// Analog something or other
	0ffcb        f1          CLRB            A
	0ffcc        71042a      MOV1            CY,0FFE2AH.0H
	0ffcf        7189        MOV1            A.0H,CY
	0ffd1        70          MOV             X,A
	0ffd2        f1          CLRB            A
	0ffd3        710ce3      MOV1            CY,ADIF
	0ffd6        61dc        ROLC            A,1
	0ffd8        6158        AND             A,X
	0ffda        dd23        BZ              $0FFFFH
	0ffdc        710be3      CLR1            ADIF
	0ffdf        4031ff07    CMP             !ADS,#7H
	0ffe3        8d1f        MOV             A,ADCRH
	0ffe5        df0b        BNZ             $0FFF2H
	0ffe7        9f0af6      MOV             !0F60AH,A
	0ffea        717b30      CLR1            ADCS
	0ffed        ce3106      MOV             ADS,#6H
	0fff0        ef09        BR              $0FFFBH
	0fff2        9f0bf6      MOV             !0F60BH,A
	0fff5        717b30      CLR1            ADCS
	0fff8        ce3107      MOV             ADS,#7H
	0fffb        00          NOP             
	0fffc        717a30      SET1            ADCS
	0ffff        d7          RET     

This does something with one of the ADC inputs. I’ve not seen anything of interest that uses analog yet, so I’ll not look into this more currently. It could be the input voltage (the boare can alarm on this) or PSTN line voltage.

There aren’t many other clearly idenfiable subroutines, but these few clearly identifiable ones give me confidence that this interrupt handler is most handling periodic IO.

This program structure of calling time-sensitive IO using a timer interrupt is fairly common in embedded systems. It means that IO is serviced regularly, allowing more time consuming (or non deterministic time) processing to happen outside of the interrupt in the main code. It means there are a lot of buffers and global variables to pass data back and forth that we can look at and play with.

From a security perspective, it can also produce problems. If we can stall something in the timer interrupt – by buffer overflow, bad input or so on – it can be possile to lock up a device. I’d hope that the board used a watchdog timer to recover from this though.

Reverse engineering a CSL Dualcom GPRS part 14 – interpreting disassembly

A few posts ago, we managed to disassemble the firmware from the CSL Dualcom site.

The entire listing is available here as a zip. There is a lot of blank space in the file which needs to be trimmed down, but for reference this file will be left as-is.

I have also put the code on github. It’s not ideal as you can’t use the web interface to show the code/diffs, but it is a good way of recording history as mistakes will be made.

The process of turning diassembly into something useful isn’t easy. I find the most useful things are to find very commonly called subroutines first, and work out what they do. If they aren’t obvious, skip them.

The raw listing doesn’t show us the frequency with which subroutines are called. Python, to the rescue again. We trim out the fluff from the file. 0x1000-0x2000 is the string table, which the disassmebler doesn’t know about and tries to turn into code. The processor has a mirrored address structure so everything in the range 0x00000. Everything above 0x1FFFF isn’t the code – it’s special function registers and a mirror area.

Now we run the code through a small script:

from collections import Counter
import operator

datafile = open('/Users/andrew/data/Disassemble1.txt', 'r')

callAddress = []

for row in datafile:
    # Rows with CALL in
    if row.find('CALL') > 0:
        values = row.split('CALL')
        # Get value after call, remove unwanted chars, strip
        # ! are for addressing mode, H\r\n aren't wanted
        address = values[1].replace('!', '').replace('H\r\n', '').strip()
        callAddress.append(address)

# Builds a dict of frequencies
freqs = Counter(callAddress)

# sorts the dictionary into a list of tuples
sortedFreqs = sorted(freqs.iteritems(), key=operator.itemgetter(1), reverse=True)

# Whack it out to CSV for copy and paste
for item in sortedFreqs:
    print item[0] + ',' + str(item[1])

And we end up with CSV of the frequency of calls:

0E1B2,182
0E541,160
0E1D1,143
0D764,120
0DC44,105
0DED3,82
0DACC,79
0E322,68

0xE1B2 looks like a good place to start.

0e1ac        bfcce0      MOVW            !0E0CCH,AX
0e1af        c2          POP             BC
0e1b0        61ec        RETB   
// Start of sub
0e1b2        4c01        CMP             A,#1H
0e1b4        df05        BNZ             $0E1BBH
0e1b6        63          MOV             A,B
0e1b7        ec01e100    BR              !!0E101H
0e1bb        4c02        CMP             A,#2H
0e1bd        df05        BNZ             $0E1C4H
0e1bf        63          MOV             A,B
0e1c0        ec47e100    BR              !!0E147H
0e1c4        4c03        CMP             A,#3H
0e1c6        63          MOV             A,B
0e1c7        61f8        SKNZ            
0e1c9        ec6ce100    BR              !!0E16CH
0e1cd        ecdfe000    BR              !!0E0DFH
0e1d1        fdc404      CALL            !4C4H
0e1d4        0233bd      ADDW            AX,!0BD33H
0e1d7        2013        SUBW            SP,#13H
0e1d9        72          MOV             C,A

First thing to be aware of is that disassembly is not an exact science. Sometimes you will see an address CALLed but you can’t find it. This probably means that the disassembly is misaligned in that area – look a couple of adresses above and below. This is not the case here.

We can see immediately above 0xE1B2 there is a POP and RETB, the end of a subroutine.

To work out what a sub does, it helps to know what parameters are passed to it and how. If we look through for all the CALLs to 0xE1B2, we get an idea of what is going on:

03d31        530d        MOV             B,#0DH
03d33        e1          ONEB            A
03d34        fcb2e100    CALL            !!0E1B2H

B is always set to a value over quite a wide range. It’s probably a number or a ASCII character.

A is set to either 0, 1, 2 or 3. This is likely some kind of option or enumeration.

Going back to the subroutine, we can see how this could work:

0e1b2        4c01        CMP             A,#1H
0e1b4        df05        BNZ             $0E1BBH
	0e1b6        63          MOV             A,B		
	0e1b7        ec01e100    BR              !!0E101H	// If A = 1, branch to 0xE101
0e1bb        4c02        CMP             A,#2H
0e1bd        df05        BNZ             $0E1C4H
	0e1bf        63          MOV             A,B
	0e1c0        ec47e100    BR              !!0E147H	// If A = 2, branch to 0xE147
0e1c4        4c03        CMP             A,#3H
0e1c6        63          MOV             A,B
0e1c7        61f8        SKNZ            
	0e1c9        ec6ce100    BR              !!0E16CH 	// If A = 3, branch to 0xE16C
0e1cd        ecdfe000    BR              !!0E0DFH		// If A = 0, branch to 0xE0DF

So we are branching to other addresses based on the parameter in A.

There’s one thing to note about this function. There is no immediate RET instruction there. These have to be dealt with in the code that is branched to.

Let’s look at 0xE101.

0e101        77          MOV             H,A
0e102        8efa        MOV             A,PSW
0e104        9803        MOV             [SP+3H],A
0e106        67          MOV             A,H
0e107        717bfa      DI              
0e10a        c3          PUSH            BC
0e10b        dbb6e0      MOVW            BC,!0E0B6H
0e10e        48b8e4      MOV             0E4B8H[BC],A
0e111        a2b6e0      INCW            !0E0B6H
0e114        afb6e0      MOVW            AX,!0E0B6H
0e117        440a04      CMPW            AX,#40AH
0e11a        dc04        BC              $0E120H
0e11c        f6          CLRW            AX
0e11d        bfb6e0      MOVW            !0E0B6H,AX
0e120        8f0401      MOV             A,!SSR02L
0e123        31631e      BT              A.6H,$0E144H
0e126        362201      MOVW            HL,#122H
0e129        71a2        SET1            [HL].2H
0e12b        71b2        SET1            [HL].3H
0e12d        dbb4e0      MOVW            BC,!0E0B4H
0e130        49b8e4      MOV             A,0E4B8H[BC]
0e133        9e44        MOV             SIO10,A
0e135        a2b4e0      INCW            !0E0B4H
0e138        afb4e0      MOVW            AX,!0E0B4H
0e13b        440a04      CMPW            AX,#40AH
0e13e        dc04        BC              $0E144H
0e140        f6          CLRW            AX
0e141        bfb4e0      MOVW            !0E0B4H,AX
0e144        c2          POP             BC
0e145        61ec        RETB            

It’s pretty long and complex. But there is one really key piece of info in there – the special function register SSR02L. Looking to the 78K0R data sheet, this is “Serial status register 02”. It’s pretty likely this function concerns serial. It has a return at the end as well.

If we look 0xE16C, this has reference to SSR12L. Another serial port.

It’s quite likely that this function concerns either reading or writing to the various serial ports on the board. I’ve not looked at it in enough depth to know exactly what it is doing, so we’ll do the following:

// B has char 
// A has 0,1,2,3 - probably different serial ports
// Return is in the branches
:sub_Serial_UnknownA_e1b3           
	0e1b2        4c01        CMP             A,#1H
	0e1b4        df05        BNZ             $0E1BBH
		0e1b6        63          MOV             A,B
		0e1b7        ec01e100    BR              !!0E101H // A = 1
	0e1bb        4c02        CMP             A,#2H
	0e1bd        df05        BNZ             $0E1C4H
		0e1bf        63          MOV             A,B
		0e1c0        ec47e100    BR              !!0E147H // A = 2
	0e1c4        4c03        CMP             A,#3H
	0e1c6        63          MOV             A,B
	0e1c7        61f8        SKNZ            
		0e1c9        ec6ce100    BR              !!0E16CH // A = 3
	0e1cd        ecdfe000    BR              !!0E0DFH // A = 0

What have I done here?

  • Called the sub :sub_Serial_UnknownA_e1b3. The : denotes that this is the actual sub. It is something to do with serial – the first unknown sub to do with serial. I have put the address on the end just to keep track of where it is.
  • Search and replace on !!0E1B2H with this new name. “sub_Serial_UnknownA_e1b3” now shows instead of the raw address – when I see it called I know it is something to do with serial.
  • Put some brief notes above the sub so I know what it is doing.
  • Indented branches so function is a little clearer

I’m now going to do similar for the other high-frequency subs. Again, I am building up a broad picture, not going into extreme depth at this stage.

Reverse engineering a CSL Dualcom GPRS part 13 – checking the SIM card

The ICCID is written on the outside of the Dualcom GPRS, stored in the EEPROM, read in from the GRPS modem, and read in from EEPROM immediately before a long, random looking, string is sent to a remote server. It seems quite important.
ICCID on case

The Dualcom board also frequently checks for received SMS.

It might be worth taking a look at the SIM to see what is on it.

From previous projects, I have an Omnikey Cardman 5321 card reader. This reads both RFID cards and smart cards. We can put the SIM card in a carrier and read it with this device.
SIM from Dualcomphoto 2

SimSpy II is a free utility which can read most data from SIM cards inclouding ICCID, IMSI, Kc (which can be used to decrypt communications), SMS messages and more.

Unfortunately, nothing too interesting comes up. The card never seems to have stored any SMS. There’s no numbers in the phone book. We might end up coming back to this at some point.

SIM data

SIM data

Reverse engineering a CSL Dualcom GPRS part 12 – board buzz out

We’ve now got the code disassembled. The disassembler has no concept of what is connected to the microcontroller though, so we need to work out which ports/pins/peripherals are used by which parts of the board. What is P11.1? What about P7? These are all I/O, but meaningless without looking at the physisal board.

The best way of doing this is using a continuity tester and buzzing the board out. It’s not worth exhaustively mapping out the PCB at this stage – just the interesting bits. There might even be some mistakes.

When doing this, I find two tools are essential:

  • A meter with a quick continuity beep. Some have a lag. I’ve not got time for that. I use my Amprobe AM-140-A for this – it’s very quick, if a bit scratchy sounding.
  • Fine probes. I really like the Pomona 6275 probes – they are very sharp and very small.

Pop one end on the peripheral and just brush the other along the sides of the microcontroller. Not too hard or you risk dragging metal between the pins. It makes it very quick to find where things are going.

Watch out for transistors and resistors in the way though e.g. the inputs from the alarm are likely transistor buffered, and some of the peripherals might have resistors to divide voltage.

IC8

IC8 is the socketed 93C86 EEPROM.

DI -> P111
DO -> P20
CLK -> P142
CS -> P141

P111 means port 11, bit 1.

IC11

IC11 is the SMT 93C86 EEPROM.

DI, DO, CLK are shared with IC8

DI -> P111
DO -> P20
CLK -> P142
CS -> P14.5

GPRS Modem

Pin 14 – device control on/off -> P05/TI05/TO05
Pin 21 – GPIO -> P47
Pin 32 – DSR1 -> P26/ANI6
Pin 33 – LED control signal -> SVC LED (not to micro)
Pin 37 – DTR1 -> P04/SCK10/SCL10
Pin 40 – CTS1 -> P21/ANI1
Pin 41 – DTM1 -> P02/SO10/TxD1
Pin 42 – DFM1 -> P03/SI10/RxD1/SDA10

IC02

This is the PSTN modem (Si2401)

Pin 7 – CTS_ -> P10/SCK00
Pin 6 – TXD – 52 P12/SO00/TxD0
Pin 5 – RXD – 53 P11/SI00/RxD0

Buttons

Button A -> P22
Button B -> P23

7 Segment

Segments ->  P60/P61/P62/P63/P64/P65/P66/P67

RH common cathode -> P53

LH common cathode -> P52

LEDs

GSM -> P51/INTP2
PSTN -> P50/INTP1

Programming header

1 -> VCC
2 -> VSS
3 -> P40/TOOL0
4 -> P41/TOOL1
5 -> RESET
6 -> FLMD0
7 -> Switched to ground via reset

(this looks like it would work with a standard Renesas debug tool – the MiniCube2).

I’m not bothered about the other parts at the moment. We can come back to them if we need to.

Next step is to identify a few basic functions inside the disassembled code, probably starting with EEPROM reading.

Reverse engineering a CSL Dualcom GPRS part 11 – disassembling firmware

I find reverse engineering is about building up a broad picture instead of working in-depth on any one aspect of the system. Dip into one bit, check what you are seeing is reliable and makes sense, dip into another area to get more detail, repeat.

We’ve done this with the logic trace – seen the long string sent, looked at the EEPROM access, checked what this in the .prm file, and seen that it is the ICCID. Now I would like to look at the firmware on the device.

CSL Dualcom have v353 hex file available for download on their site. The board I am looking at uses v202. That’s a big difference.
I can see a few paths here:

  1. Get a v202 (or anything nearer to v202 than v353) hex file downloaded. A quick Google doesn’t help here.
  2. Upgrade the board I have to v353. This would require the EEPROM to be updated, possibly other changes. I don’t want to break this board.
  3. Recover the v202 firmware from the board I have. There is a programming header – it could be possible to get the code off. But it may not be possible.
  4. Live with the difference and hope that there is enough consistency between the two to be helpful.

I’m going to run with 4. It’s the lowest effort, and I think it will work. The EEPROM structure between some of the different board seems identical – this is backed up by there only being a single Windows programming tool for the board, regardless of firmware version. In my experience, smaller embedded system firmware is quite consistent as new functionality is added, even if the toolchain changes (this really doesn’t hold true when you move up to anything bigger running Linux).

What can we do with the firmware? Well, we need to disassemble it. What does that actually mean? It means changing the raw machine code back into human-readable assembly language i.e. F7 becomes CLRW BC (clear register BC). This probably doesn’t meet everyone’s idea of human readable, but it is a lot better than machine code.

Some microcontrollers (like the ATmega series) have easy to understand and even read machine code. 90% of instructions are a fixed length (16bits). The number of instructions are limited and there are only limited addressing modes. With some practice you can make sense of machine code in text editor.

The 78K0R is not like this.

MOVs

All of the enclosed red cells are MOV instructions. That’s a lot of them. There are 4 of these maps, with a total of 1024 cells. ~950 of them are populated.

There are several addressing modes and the instruction length varies from 8bits to 32bits. This makes the machine code incredibly hard to read.

We need an automated disassembler. Google isn’t much help here – these microcontrollers aren’t as popular as x86, ARM, AVR, or PIC .

There are two toolchains widely available for these processors. Renesas Cubesuite and IAR Embedded Workbench. There is a chance that one of these has either a disassembler or a simulator that allows a hex file to be loaded.

After a lot of messing around, it appears that Renesas Cubesuite can load the hex, disassemble it, and also simulate it.

1. Download Renesas Cubesuite and install it (Windows only)

2. Start Cubesuite.

Renesas Cubesuite

3. Go to Project -> Create new project.

4. Change the microcontroller to the “uPD78F1154_80” (the 80 pin variant)

Microcontroller

6. Once the project has been created, in the “Project Tree” on the right hand side, right click on “Download files” and click “Add”

Download files

7. Find your hex or bin file and load it (hex is preferable as it seems the disassembler takes into account the missing address space).

8. Go to Debug -> Build and Download

9. The simulator starts up and you can see the disassembled code.

0e7e3        bd22        MOVW            0FFE22H,AX
0e7e5        17          MOVW            AX,HL
0e7e6        70          MOV             X,A
0e7e7        80          INC             X
0e7e8        61f8        SKNZ            
0e7ea        5500        MOV             D,#0H
0e7ec        3149        SHL             A,4H
0e7ee        73          MOV             B,A
0e7ef        fa22        MOVW            HL,0FFE22H
0e7f1        8b          MOV             A,[HL]
0e7f2        fa22        MOVW            HL,0FFE22H
0e7f4        a7          INCW            HL
0e7f5        37          XCHW            AX,HL
0e7f6        bd22        MOVW            0FFE22H,AX
0e7f8        17          MOVW            AX,HL
0e7f9        70          MOV             X,A
0e7fa        80          INC             X
0e7fb        dd07        BZ              $0E804H
0e7fd        618d        XCH             A,D

Great – now we can get to work trying to see what the processor is doing.

The next post will likely be buzzing the board out to find out which I/O is connected to what so we can make some sense of the code.

Reverse engineering a CSL Dualcom GPRS part 10 – analysing the logic trace 2

Last post, we looked at the comms between the board and the GPRS modem. There was a long, interesting, string send to a remote server:

LjS1WQjg8FHqR1a4P4DVsjO8eUITXY6ifHPlaFhkZ2SJ

When we look out to the rest of the logic trace, we can see that the EEPROM is accessed exactly as this begins:

EEPROM access

From this view, it might look like the EEPROM access is too late for it to be used to generate that long string. However, in microcontroller terms, there is ~0.3ms between the end of the EEPROM access and the start of the first character on serial (I suspect the ‘1’ is a start character). I’ve not checked the crystal speed, but it’s between 2MHz and 20MHz. Even at 2MHz, that’s 600 instructions – plenty of time to act on the EEPROM data.

We now need to zoom in on the EEPROM data and see what is happening:

Screen Shot 2014-04-02 at 22.04.43

The trace is maybe a little confusing here. The green bordered binary is the DI line – the microcontroller to the memory. The yellow bordered binary is the DO line – the memory to the microcontroller. The Saleae Logic software has no decoder to deal with Microwire, so we need to use the SPI decoder set with a bit-width to deal with both the receive and transmit sides.

According to the data sheet for the EEPROM, this should be 29 clock cycles. For whatever reason there is an additional clock cycle here though – the very short transition in the middle of the trace. So we set the SPI decoder to 30bits.

The first 3bits of the green bordered binary are 0b110 – a read command. The next 10bits are the address – 0b00 1001 1000 – i.e. 0x098. After this point, we can ignore the green bordered binary.

The last 16bits of the yellow bordered binary are the value from memory – 0x4489.

We should be able to find this in the .prm file. 0x098 is 152. The prm file has a byte per row, and 2 bytes on the first row, so we should need to go to row 304 of the file – and there we go – 89, 44. Perfect.

If we continue going through the trace, we read the following addresses:

0x098
0x098
0x099
0x099
0x09A
0x09A
0x09B
0x09B
0x09C
0x09C

Strangely each of the addresses is read twice. Why? Not sure at this time.

What data do we get back?

8944 1000 3006 3711 7619

This is the ICCID – the unique ID assigned to the SIM card in the GPRS modem.

Could the board be using the ICCID as a key to encrypt the data?

Reverse engineering a CSL Dualcom GPRS part 9 – analysing the logic trace

We’ve captured a trace of:

  • The serial comms to the GPRS modem
  • The serial comms to the PSTN modem
  • The SPI comms to the EEPROM

Now we can take a look at the data in these traces. Let’s start with the communications with the GPRS modem.

The serial comms to the GPRS modem are normal ASCII characters, and it uses the AT command set. The GPRS modem is a Wavecom board – we can download the AT command set documentation.

Stepping through the logic trace and transcribing the commands, we end up with something like this:

TX RX Command  Notes
0202 Dualcom GPRS>  Restart
AT AT OK
AT&F AT&F OK Set to Factory Defined Configuration Profile
ATE0 ATE0 OK Command echo off DCE does not echo characters during command state and online command state
ATX0 OK Call Progress Monitoring Control Busy detection off
AT&D2 OK Circuit 108 (DTR) Response When in on-line data mode, deassert DTR closes the current connection and switch to on-line command mode.
AT+CMEE=1 OK Mobile Equipment Error Enable +CME ERROR: <err> result code and use numeric <err> values.

The whole CSV file of this trace can be downloaded here.

What can we see going on?

  1. The board is reset and some basic settings are sent (don’t echo commands, use DTR to close connection, turn on numeric errors).
  2. Setup SMS messaging to store messages on the ME (the GPRS modem, as compared to the SIM). There seems to be room for 100 messages in the modem.
  3. Setup three PDP contexts. I think these are essentially GRPS connections. The first two are generic and have no username/password – they might be Vodafone APNs. The third is a csldual.com – likely a private APN. An APN is a gateway between a GRPS connection and an IP network.
  4. Setup three Internet accounts. These are credentials used with the PDP contexts. The generic ones have no username or password, but the csldual.com one does – dualcomgprsxx and QO6806xx.
  5. The board periodically checks for network registration and signal strength. The signal strength is shown on the 7-seg display when idling. The GPRS modem is connected to the home network with decent signal strength.
  6. The board then repeatedly scans the first 15 SMS slots for messages. There are no messages, so we get errors back. This is quite interesting – what is it that gets sent to the board as SMS?
  7. The board then tries to connect to a private IP address/port 172.16.6.20:8965 using the csldual.com APN. The first time this is attempted it fails with error code 094, which isn’t listed in the documentation (or on the wider Internet…)
  8. The board then tries to connect to the same IP again. This time it succeeds, and some data is sent back and forth. This is a string of ASCII text which looks, from a human perspective, fairly random.

The data looks like follows (sent on left, receive on right):

DC4 HS87
r (immediately after response above)
LjS1WQjg8FHqR1a4P4DVsjO8eUITXY6ifHPlaFhkZ2SJ EE1404,0122,3343,’6’
‘3’ OK

What things are of note in this trace then?

  • The APN and the username and password used are constant across several devices and the Sample.prm I have looked at. It seems curious to require a password but for it not to vary.
  • SMS messages are checked for frequently, suggesting something important is received by SMS.
  • There is no notion of time/counters/nonce in any of the communications.
  • There doesn’t see to be any key exchange
  • There doesn’t seem to be any authentication of the APN/server with the GRPS Dualcom board.

This has raised a number of questions:

  • What data is used to authenticate a given APN? If the username and password are constant, is the ICCID and other data used?
  • Can anyone send SMS to the GPRS modem, or is there some form of blocking performed by the network in the other direction?
  • Whilst the notion of time/counters/nonce isn’t essential for strong/good encryption, it does make things easier.
  • A common failing of embedded systems that do use encryption is that they don’t change the key. Encryption with a fixed, known key is not really much better than no encryption.
  • It’s been possible to spoof a cell site for a few years now using Software Defined Radio. If the APN/server can be spoofed, then the signalling might stop working.

I’m not sure what the next step is:

  • Gather more traces and see if any patterns can be spotted in the data going between the board and server.
  • Look at EEPROM accesses whilst the GPRS modem is being used.
  • Disassemble the firmware and see if we can spot anything interesting.

I’ll see what takes my fancy next time I sit down and look at it.

Reverse engineering a CSL Dualcom GPRS part 8 – logic analyser

Last time we powered up a board to see what it did just by observing the normal IO with our eyes.

This time we are going to look at what happens in more detail with this particular board using a logic analyser.

First things first, we’ll take the EEPROM out, pop it into our Bus Pirate EEPROM reader, pass the data through our converter, and then open the resulting .prm file in the CS2364 Windows utility.

This indicates that this board only has GPRS and PSTN paths enabled – no LAN.

Comms paths

There isn’t much else of note.

Running the .prm file through the strings utility we wrote provides very similar output to before – the same IP addresses and possibly the same password.

We now need to work out exactly what we want to connect to the logic analyser. The Dualcom has convenient test points grouped in threes and labelled GSM, PSTN, LAN and 485. It’s highly likely that these are serial connections – GND, TX, RX. A quick check of data sheets and use of the continuity tester confirms this.

Let’s solder some pin headers onto GSM serial, PSTN serial and also the socketed EEPROM. Pin headers make connecting and reconnecting the logic analyser very quick and easy compared to using test hooks.

We already know what is on the EEPROM, but we don’t know when and how the data is accessed – using a logic analyser will allow us to see this. This could be compared to static analysis (reading out the EEPROM entirely) and dynamic analysis (seeing how the EEPROM is accessed).

Often test points on hardware end up full of solder due to the manufacturing process. It’s awkward to remove this solder, so I just tend to tack pin headers on at a slight angle. To hold these, I use White Tack – a bit like Blu-Tak but holds out at soldering temperatures. Much easier than using helping hands or pliers.

DSCF0454

DSCF0456

Yep – the joints look dry. Lead-free solder + leaded solder seems to result in joints looking like this.

Once this is done, we connect up the logic analyser – the Saleae Logic. This is a USB logic analyser, and probably my most used reverse engineering tool. It is only 8-channel, but this is frequently more than adequate.

Saleae Logic

The connections end up as follows:

  • 1 (Black) GPRS RX
  • 2 (Brown) GPRS TX
  • 3 (Red) PSTN RX
  • 4 (Orange) PSTN TX
  • 5 (Yellow) DO EEPROM
  • 6 (Green) DI EEPROM
  • 7 (Blue) CLK EEPROM
  • 8 (Purple) CS EEPROM
  • GND (Grey) GND

I don’t have enough channels to monitor CS for the soldered on EEPROM. We’ll have to look at that another day.

Yes – GND is grey and channel 1 is black on the Saleae Logic. This has caught more than a few people out!

After a few trial runs, I find out that I can use the following settings for analysing the data:

  • GPRS RX/TX – 9600baud serial
  • PSTN RX/TX – 2400baud serial
  • EEPROM – SPI, CS active high, 30bits transferred

So now we have a good logic trace. At an overview level, you can see that everything is accessed at one point or another.

Logic trace

If we zoom in we can see EEPROM data transfers (this is a read – 0b110 is the command):

EEPROM

And a GPRS modem response:

GPRS

And the modem as well:

PSTN

You can download the settings/data for this trace here. This can be opened in the freely downloadable Saleae Logic program.

The next step is to decode some of this data further and see what is going on.

 

Reverse engineering a CSL Dualcom GPRS part 7 – board startup

So far, we’ve had a quick look at the hardware, the HEX file firmware, the utility used to program the NVM, and the contents of the NVM. It’s all building up a picture of what the board does and how it does it.

Next I want to power up one of the boards and look at it in operations – what does it actually do when we power it up?

The board just need 9-30V applied to power up. The GPRS module needs an antenna – there is a chance it could be harmed without one. I ordered a cheap GPRS antenna with an MMCX connector on it from eBay for under £5.

Here is one of the boards starting up:

The power-up sequence seems to vary from one board to the next, but for this video, it goes:

  • Flashes 88 along with all LEDs (probably a test)
  • Flashes firmware version number (2.02)
  • Flashes grade (G2)
  • Shows “ro” (reset radio module)
  • Shows c1/2/3 (lower case c is radio call to ARC, 1 = dialling, 2=handshake, 3=sending data). The two GPRS status LEDs flash.
  • Shows A (Comms successful)
  • Shows E 21 – error 21 – “PSTN DC line voltage = low or none” – makes sense as I have no phone line connected

Some boards get to c1 and then fail with one of the lower numbered error codes related to the GPRS comms – probably because the SIM has been de-activated.

The next step will be getting the logic analyser onto some of the signals on the board to see exactly what it is doing.

 

 

Reverse engineering a CSL Dualcom GPRS part 6 – interpreting EEPROM

In the last post we read out the contents of an EEPROM for one of the Dualcom GPRS boards. This is in the native Bus Pirate format:

0x00 0x47 0x00 0x25 0x01 0x25 0x00 0x40 0x32 0x52 0x00 0x41 0x00 0x00 0x00
 0x00 0x33 0x32 0x33 0x35 0x39 0x33 0x30 0x30 0x31 0x31 0x32 0x39 0x32 0x36 0x00
 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
 0x00 0x33 0x32 0x33 0x35 0x39 0x33 0x30 0x30 0x31 0x31 0x32 0x39 0x30 0x36 0x00

and needs translating into a prm file for the Windows utility to read it.

Python comes to the rescue again:

datafile = open('/Users/andrew/data/BP.txt', 'r')
outfile = open('/Users/andrew/data/BP.prm', 'w')

hexValues = []

# Get all of the values into one big list
for row in datafile:
    values = row.split('0x')
    for value in values:
        if len(value) == 3:
            hexValues.append(value.strip())

# First row is different - handle this
first = True

# We want to flip values around
for i in range(0, len(hexValues)-1, 2):
    if first:
        # The first row is a special case
        outfile.write('H,' + hexValues[i+1] + ',' + hexValues[i] + '\n')
        first = False
    else:
        outfile.write(hexValues[i+1] + '\n')
        outfile.write(hexValues[i] + '\n')

# The fluff at the end of the file copidd from Sample.prm - hope no checksums!
footerfile = open('/Users/andrew/data/footer.txt', 'r')

for row in footerfile:
    outfile.write(row)

outfile .close()

We now have BP.prm. Let’s try opening that in the Windows utility:
Real EEPROM data

Excellent! It works fine. A very old version of the firmware – 1.25!

Then if we whack this through the Python utility that converts it into strings, we get very similar output to before:
Screen Shot 2014-03-31 at 15.40.11