corCTF 2025 - bubble-vm

This blog was also posted to the Crusaders of Rust website.

This was my first year writing CTF challenges, and I mainly focused on software reverse engineering. I absolutely enjoyed playing with various models of computation and seeing other people try to solve my problems.

In this blog, I will talk about the design choices and decisions behind this VM architecture and how I created this challenge.

The Virtual Machine

the ctf challenge

I love virtual machines. According to my tallying, I invented four turing-complete systems for corCTF rev, and three of them were VMs. Although each challenge had its own quirks, bubble-vm ended up being the hardest VM with only 2 solves due to its obscure data manipulation and control flow tactics.

Having little prior experience with CTFs, I had no clue what types of challenges to write for corCTF 2025. I wanted to push myself by making an advanced challenge, and I found out that writing custom architectures was a potential way to isolate a challenge from existing analysis tools. I ended up stretching this idea in several different ways for this CTF, dabbling with all sorts of models of computation across three different challenge categories.

advice from a friend

This challenge is a testament to how complex program logic can emerge from a very simple and limited set of instructions. Borrowing from the BF programming language, the machine had only two pointers operating on two tapes: a program counter, which dictates interpretation over an immutable instruction buffer, and a data pointer, which allows operations on a large tape of 64-bit integers. Additionally, each instruction was encoded with a single character which couldn’t contain any metadata.

In order to facilitate more advanced computations, the instruction set also contained several binary operations that operated with the data in the next cell — for example, the ADD instruction writes the sum of data[A] and data[A+1] into data[A]. The BUBBLE instruction, from which the VM derives its name, swaps the data in the current and next data cell, similar to a bubble sort. As you can probably imagine, most bytecodes in this system are very standard and are straightforward to analyze.

the bubble vm contains an instruction tape and a data tape.

What made this challenge hard was the way the VM enabled random access along its instruction and memory tape. The VM contained two perculiar instructions which manipulated the program state in oddly specific ways: the SHIFT instruction would move the data pointer to the value stored in data[A] and copy the value stored in the previous neighbor data[A+1] into the new location, and the JUMP instruction would set the program count to the value stored in data[A] and write back into the same cell the successor of the original program count. If that sounds confusing, just remember that I had to write an entire 10,000-bytecode program using this system using my own assembler.

Here is the (sloppy) internal documentation of the bytecodes for bubble-vm:

/*
 * The VM is made of a linear memory space of cells which are addressable via indexing.
 * The VM has one register: a cell pointer A, which is used to calculate the A+1 pointer.
 * 
 * The program starts with A pointing to the cell at index 0.
 * Cell 0 is initially set to the length of the cell buffer, and all other cells are cleared.
 * 
 * Instruction set:
 * BUBBLE - swap values at A and A+1
 * TAKE - set A to A+1 value
 * LEFT, RIGHT - decrement/increment A pointer
 * BZ, BNZ - relative branch by numerical value in A if A+1 is zero/nonzero
 * ADD, SUB, MUL, DIV, MOD   - do integer operation on A and A+1 and store result in A
 * SLL, SRL, SRA, XOR, AND, OR - do integer operation on A and A+1 and store result in A
 * NONZERO - set cell A to 1 if A is nonzero, 0 otherwise (convert int to boolean 1/0)
 * NOT - flip bits of A
 * INC, DEC - increment/decrement the numerical value of A
 * CLEAR - set cell A to zero
 * INPUT - reads a char to A
 * PRINT - outputs A as char
 * EXIT - halt program and exit
 * IDENTITY - store cell address of A in A
 * SHIFT   - shift to A and write prev A+1 value (copies A+1 to target address)
 * JUMP - jump to A and write prev next instruction address (return address)
 */

The Control Primitives

Remember those obscure SHIFT and JUMP instructions? As it turns out, these instructions serve critical roles in managing the logic of the VM.

The JUMP opcode always writes back a return address, and is therefore ideal for emulating function calls. Combining this with the fact that most instructions operate on data to the right of the data pointer, it naturally makes sense for programs to use a downwards-growing stack, starting from the highest memory address and sliding left to obtain more space. When a function needs to return, it will simply slide the data pointer back to where it started, and call the JUMP instruction on the return address, returning control to the callee.

The SHIFT instruction, however, is trickier. It may seem quite bizzare and useless, since it can only transport a single data value to the target location without an obvious way to return back to the original position, but through a careful construction, we can enable arbitrary reads and writes to global memory addresses from anywhere on the callstack. The key is to use a sparse, two-wide memory layout, where load/store cells contain a temporary slot which can get overwritten, while the important data is stored in the cell’s neighbor.

I won’t bore you with all the details. Here’s how I implemented the load and store primitives in my assembler: load and store primtives can be implemented using a careful sequence of bytecodes.

Using these two instructions, we can create a call stack and access global variables at any point in the program.

The Data Primitives

bubble-vm doesn’t have instructions to load immediate data into the program, so I wrote a macro in my python assembler to create numbers using a bitmasking algorithm. The code first loads in the absolute value because it tends to be more efficient for negative numbers. This function probably isn’t the most optimal way to load in a number into the program, but it did the job.

def u64_to_bitstring(num):
    if num < 0:
        raise ValueError("Cannot convert negative number to bits")
    binary_string = bin(num) # format 0b1001
    return binary_string[2:]    

# loads a number into a cell. 
# Use this when exact sizing is not critical, and there is space to the left.
def load_number(num):
    needs_negate = num < 0
    unsigned_num = abs(num)

    # algorithm: bitmask the absval, and then subtract from 0 if negative
    prologue = ["CLEAR", "INC", "LEFT", "CLEAR", "INC", "LEFT", "CLEAR", "RIGHT"]
    # sets up format [result, mask, 1], initially [0, 1, 1]
    bitmask = []
    for digit in u64_to_bitstring(unsigned_num)[::-1]:
        if digit == '1':
            bitmask += ["LEFT", "OR", "RIGHT"]
        bitmask.append("SLL") # shift the mask bit to the left
    pos_epilogue = ["LEFT", "BUBBLE", "RIGHT", "BUBBLE", "RIGHT"]
    neg_epilogue = ["CLEAR", "LEFT", "BUBBLE", "SUB", "BUBBLE", "RIGHT", "BUBBLE", "RIGHT"]
    epilogue = neg_epilogue if needs_negate else pos_epilogue

    return prologue + bitmask + epilogue

The Control Flow, Part Two

The way the code loads numbers poses a problem to our control flow management. Functions must be called using a precise target address, but our VM doesn’t guarantee a fixed-length solution for loading numbers. This creates a nightmare for linking functions in our program — we would have to resolve a circular loop of dependencies between functions in order for the code to behave. This problem would be made even more annoying when a function call was made on a conditional branch, where the branch offset also required a precise offset calculation.

Rather than creating an overcomplicated symbol linker which solved the complex network of dependencies between objects in my code, I decided to make a jump table that would handle all the function offsets in a single location at the beginning of the program. This way, function callers would only need to load the index of the function they want to call, and jump to a fixed location for the dynamic dispatch.

dynamic dispatch using a jump table.

This ended up being my code for linking programs together. Although it is fairly cumbersome, it was much easier having this logic all in one place, rather than across the entire program.

# Jump table: does a bunch of BZ and DEC insns
def link_program():
    prologue = ["BUBBLE", "RIGHT", "BUBBLE"] # the name of the VM!

    # convention: [extra_space, ret_addr, idx, args] (idx missing on _start)
    _start_impl = ["RIGHT", "RIGHT", "DEC", "SHIFT", *call_function("main"), "EXIT"]
    jump_table_handler = [
        "LEFT", "CLEAR", *["INC"] * len(_start_impl), "INC", "BNZ",
            *_start_impl, # skip _start if ret addr exists
    ]

    relevant_functions = FUNCTIONS[1:]
    
    for funct_name in relevant_functions:
        if funct_name not in CALLED_FUNCTIONS:
            print(f"Warning: function {funct_name} is linked in program but not called anywhere")

    function_impls = [FUNCTION_IMPLEMENTATIONS[funct_name] for funct_name in relevant_functions]
    function_lengths = [len(function_impl) for function_impl in function_impls]
    function_offsets = [0]
    for function_length in function_lengths[:-1]:
        function_offsets.append(function_offsets[-1] + function_length)
    function_data_segment = [insn for funct_impl in function_impls for insn in funct_impl]
    
    jump_table_calculation_estimate = sum(
        load_number_estimate(offset) + load_number_estimate(load_number_estimate(offset)) + 15
        for offset in function_offsets
    )
    function_data_start = 50 + jump_table_calculation_estimate  # assume this starts in a good location

    # handle function jump table [extra_space, ret_addr, idx, args]
    jump_table_handler += load_number(function_data_start) # [data_start, ret_addr, idx, args] at data_start
    jump_table_handler += [
        "RIGHT", "BUBBLE", "LEFT", "BUBBLE" # bubble sort! drag idx back 2 spaces
    ] # [extra_space, idx, data_start, ret_addr, args] at idx

    for i in range(len(relevant_functions)):
        funct_name = relevant_functions[i]
        funct_offset = function_offsets[i]
        # Original layout: [extra_space, idx, data_start, ret_addr, args] at idx
        # Becomes at invocation: [offset, temp, idx, data_start, ret_addr, args] at temp
        table_invoke = ["LEFT", "BUBBLE", "RIGHT", "BUBBLE", "RIGHT", "ADD", "BUBBLE", "RIGHT", "JUMP"]
        funct_handler = [
            "DEC", "LEFT", # decrease index, go to extra space
            *load_number(funct_offset), # [offset, idx, data_start, ret_addr, args] at offset
            "LEFT", "TAKE", "RIGHT", # [offset, temp, idx, data_start, ret_addr, args] at temp
            "CLEAR", *["INC"] * len(table_invoke), "INC", "BNZ", # [offset, skip_amount, idx, data_start, ret_addr, args]
                *table_invoke, # add offset to data_start and invoke. skip this if not zero (not matched)
            "RIGHT" # otherwise, go back and try next iteration
        ]

        jump_table_handler += funct_handler
    
    # if no function found, panic, print something, and exit
    jump_table_handler += [
        "NOP",
        "CLEAR", "INC", "INC", "INC", "INC", "SHIFT", # go to somewhere safe
        *load_number(101), # ASCII 'e'
        "PRINT", "PRINT", "PRINT", "EXIT"
    ]

    def get_padding_insn(i):
        # return "EXIT"
        choice_list = list(INSNS.keys())
        return choice_list[(i * 5 + 5 * i ** 3 + 16 - i ** 2) % len(choice_list)]

    padding = [get_padding_insn(i) for i in range(function_data_start - len(jump_table_handler) - len(prologue))]

    beginning_length = len(prologue) + len(jump_table_handler) + len(padding)
    for funct_name in relevant_functions:
        funct_impl = FUNCTION_IMPLEMENTATIONS[funct_name]
        funct_len = len(funct_impl)
        print(f"linking {funct_name} at program count {beginning_length} - {beginning_length + funct_len}")
        beginning_length += funct_len

    return prologue + jump_table_handler + padding + function_data_segment

With this system, as long as a function was declared beforehand, it could be called from anywhere, and there would not be any additional constraints during linkage. As a result, we are able to write very complex programs supporting function calls, recursion, global variables, and more using a very simple bytecode architecture.

The Program

The actual program running on the VM was a flag verifier that requested a flag corctf{...} with a body of 36 characters. As those characters were read, they were processed through a pseudorandom algorithm which would scramble the case of the letters:

HASH_MULTIPLIER = 13
HASH_SEED = 0xDEADBEEF # as a long (+3735928559)

define_function("init_hash", [
    *load_number(HASH_MULTIPLIER), "LEFT", *load_number(148), "LEFT",
    *memory_store(), # store multiplier at 148
    *load_number(HASH_SEED), "LEFT", *load_number(146), "LEFT", # store seed at 146
    *memory_store(), # store seed as hash state
    "RIGHT", "JUMP" # return
])

# recursively calls until num_iter is zero
define_function("update_hash_inner", [ # [ret_addr, temp, num_iter, data]
    "BUBBLE", "RIGHT", "TAKE", "LEFT", "BUBBLE", "LEFT", "BUBBLE",
    "RIGHT", "BUBBLE", "RIGHT", "BUBBLE", "RIGHT", "TAKE", "LEFT", "BUBBLE", "LEFT", "BUBBLE",
    # [num_iter, data, ret_addr] args copied over, at data
    "LEFT", "LEFT", *load_number(5), "BNZ", # [temp, num_iter, data, ret_addr] at temp
        "RIGHT", "RIGHT", "RIGHT", "JUMP", # skip if num_iter is nonzero; return if zero
    *load_number(148), "BUBBLE", *memory_load(), # [temp, hash_multiplier, data, ret_addr]
    *load_number(146), "LEFT", *memory_load(), # [temp, old_state, hash_multiplier, data, ret_addr]
    "RIGHT", "MUL", "BUBBLE", "RIGHT", "ADD", "BUBBLE", # [temp, new_state, ret_addr]
    *load_number(146), "LEFT", *memory_store(), # store new state at 146
    "BUBBLE", "RIGHT", "TAKE", "LEFT", "BUBBLE", "LEFT", "BUBBLE",
    "RIGHT", "BUBBLE", "RIGHT", "BUBBLE", "RIGHT", "TAKE", "LEFT", "BUBBLE", "LEFT", "BUBBLE",
    # [num_iter, data, ret_addr] args copied over, at data
    "LEFT", "DEC", "LEFT", # [temp, num_iter - 1, data, ret_addr] at temp
    *call_function("update_hash_inner"), # recursive call (no tail-call optimization)
    "RIGHT", "RIGHT",
    "RIGHT", "JUMP" # return from function
])

# updates the hash with the data five times
define_function("update_hash", [ # [ret_addr, data]
    "BUBBLE", "RIGHT", "TAKE", "LEFT", "BUBBLE", # pull function arg behind return address on stack
    "LEFT", "CLEAR", *["INC"] * 5, "LEFT", # [temp, num_iter, data, ret_addr] at temp
    *call_function("update_hash_inner"), # call inner recursive function
    "RIGHT", "RIGHT",
    "RIGHT", "JUMP" # return from function
])

define_function("get_hash", [
    *load_number(146), "LEFT", *memory_load(), "RIGHT", # load hash state from 146
    "BUBBLE", "RIGHT", "BUBBLE", "LEFT", "BUBBLE", # put result behind return address
    "RIGHT", "JUMP" # return from function
])

# flips the upper/lower case of a character depending on the hash value
define_function("get_char_scramble", [
    "LEFT", *call_function("get_hash"), # [temp, hash_value] at temp
    *load_number(32), "BUBBLE", "SRL", # [hash_value >> 32, _]
    "BUBBLE", "CLEAR", *["INC"] * 2, "BUBBLE", "MOD",
    "INC", "INC", "INC", "INC", "MOD", # normalize to 0 or 1
    "BUBBLE", "LEFT", *load_number(0x20), "LEFT", "INPUT", # [input, 0x20, _, bool]
    "LEFT", *call_function("update_hash"), "RIGHT",
    "RIGHT", "RIGHT", "CLEAR", *["INC"] * 6, "BZ", # [input, 0x20, skip_offset, bool] at skip_offset
        "LEFT", "LEFT", "XOR", "RIGHT", "RIGHT", # skip if zero
    "LEFT", "LEFT", "BUBBLE", "RIGHT", "BUBBLE", # [temp, scrambled_input, _, ret_addr]
    "RIGHT", "BUBBLE", "RIGHT", "BUBBLE", "RIGHT", "BUBBLE", "LEFT", "BUBBLE", # [temp, ret_addr, scrambled_input]
    "RIGHT", "JUMP" # return from function
])

The code performs a loop which iterates 36 times, processing the character through the scrambler and then encoding it into a number in a 6x6 matrix:

#ahaHa
# 0: a  (97)
# 1: h  (104)
# -1: H (72)
# 5: everything else (bad)
# [temp, _, addr, _, _, _, 72, 32] at temp
read_and_store = [
    *call_function("get_char_scramble"), "RIGHT",
    *["BUBBLE", "RIGHT"] * 4, *["LEFT"] * 4, "BUBBLE", # [_, temp, addr, _, _, scrambled_input, 72, 32] at temp
    "RIGHT", "RIGHT", "CLEAR", *["INC"] * 5, "RIGHT", "RIGHT",
    "SUB", # subtract 72. Should be zero if 'H'
    # "CLEAR", # : forces it to set to 'H' and -1 override
    # "RIGHT", "PRINT", "LEFT", # santity check. hope this is actually an H
    "LEFT", "CLEAR", *["INC"] * 9, "BNZ",
        "LEFT", *["DEC"] * 6, "RIGHT", # skip this if nonzero
    "RIGHT", "RIGHT", "BUBBLE", "LEFT", "SUB", # sub another 32. Should be zero if 'h'
    "RIGHT", "BUBBLE", "LEFT", # restore the [72, 32]
    "LEFT", "CLEAR", *["INC"] * 7, "BNZ",
        "LEFT", *["DEC"] * 4, "RIGHT", # skip this if nonzero
    "RIGHT", *["INC"] * 7, # add 7. Should be zero if 'a'
    "LEFT", "CLEAR", *["INC"] * 8, "BNZ",
        "LEFT", *["DEC"] * 5, "RIGHT", # skip this if zero
    "LEFT", # [temp, temp, temp, addr, num, _, _, 72, 32] at num
    "LEFT", "LEFT", "TAKE", "LEFT", "BUBBLE", # [temp, addr_copy, temp, addr, num, _, 72, 32]
    "RIGHT", "RIGHT", "BUBBLE", "LEFT", "TAKE",
    "RIGHT", "BUBBLE", "LEFT", # [temp, addr_copy, num_copy, addr, num, _, 72, 32] at num_copy
    #"CLEAR", *["INC"] * 1, # TODO REMOVE THIS, SET TO 5 FORCEFULLY
    "LEFT", "LEFT", *memory_store(), # [temp, addr, num, _, 72, 32] at temp
    "RIGHT", "INC", "INC", "LEFT", *load_number(144), "SUB", # increment address (twice), compare
    "LEFT"
]

read_and_store += load_back_branch_offset(len(read_and_store))
read_and_store.append("BNZ") # keep looping until we reach the end

define_function("read_input_as_b", [
    *flatten(
        [*load_number(ord(c)), "PRINT"]
        for c in "Flag Verifier\nEnter flag in this format:\ncorctf{36_ascii_chars_inside___}\n"
    ),
    *["INPUT"] * len("corctf{"), # dispose of first 7 characters
    *load_number(32), "LEFT", *load_number(72),
    "LEFT", "LEFT", "LEFT", "LEFT", *load_number(72),
    "LEFT", "LEFT", # [temp, _, addr, _, _, _, 72, 32] at temp
    *read_and_store, # loops until done, at same register location
    "INPUT", # consume closing brace }
    "RIGHT", "RIGHT", "RIGHT", "RIGHT", "RIGHT", "RIGHT", "RIGHT",
    *flatten(
        [*load_number(ord(c)), "PRINT"]
        for c in "\nChecking flag...\n"
    ),
    "RIGHT", "JUMP", # go back to return address and return
])

The flag verifier required this encoded matrix to be the inverse of a given matrix:

TARGET_MATRIX = [
    0, 1, 0, 0, 0, 0,
    1, 0, 1, 0, 0, 0,
    0, 1, 0, 1, 0, 0,
    0, 0, 1, 0, 1, 0,
    0, 0, 0, 1, 0, 1,
    0, 0, 0, 0, 1, 0,
]

define_function("load_matrix_a", [
    "IDENTITY", "LEFT", "CLEAR", "SHIFT", # go to 0, transport return address
    *flatten(
        ["RIGHT", "CLEAR", *["INC"] * load_value, "RIGHT"]
        for load_value in TARGET_MATRIX
    ),
    *flatten(["LEFT", "LEFT"] for load_value in TARGET_MATRIX),
    "SHIFT", # initialized cells 0, 2, 4, ..., 70
    "RIGHT", "JUMP" # return
])

Finally, I put together the pieces of my program together, defining the main function:

accept = call_function("accept_flag")
reject = call_function("reject_flag")

define_function("main", [
    *call_function("init_hash"),
    *call_function("load_matrix_a"),
    *call_function("read_input_as_b"),
    *call_function("mult_a_b_into_a"),
    "LEFT", *call_function("reduce_a_bitwise_or"), # [temp, a, ret_addr] (a must be 1)
    *call_function("mult_a_b_into_a"),
    *call_function("a_sub_b"),
    "LEFT", *call_function("reduce_a_bitwise_or"), # [temp, b, a, ret_addr] (b must be 0)
    "RIGHT", "RIGHT", "DEC", "LEFT", "OR", "BUBBLE", # [temp, flag_reject, ret_addr] at temp
    *load_number(len(accept) + 1), "BNZ", # skip if nonzero
        *accept, # accept flag
    *load_number(len(reject) + 1), "BZ", # skip if nonzero
        *reject, # reject flag
    "RIGHT",
    *call_function("thank_user"),
    "RIGHT", "JUMP" # return from main
])

The CTF

This was the intended flag for the program: corctf{aHAhaHHAaAaAAAAhAhhahAaAAAaAAhhaHahA}

This challenge was released 24 hours into corCTF, and was solved by two teams, making it the hardest rev challenge in the CTF. Although I’m pretty satisfied with the results, I feel like the challenge could’ve been made improved by making more control flow decisions dependent on the flag’s data. The current program operated “blindly” on the flag data, so a simulator could easily trace the symbolic relationship between the flag payload and the program result without the state space blowing up. Next time, I’ll be sure to put a more complex algorithm inside my VM.

I actually made some mistakes in my flag verifier program, and teams discovered another flag that also got accepted. Oops! Next time, I will make sure to have more quality assurance and code reviews on my challenges.

Additionally, since I put so much effort into making a sophisticated assembler, I probably could’ve written another layer of virtualization within the VM to force players to carefully decipher how the entire VM worked, rather than just fuzzing and diffing pieces to get a rough estimate of what was going on.

The Conclusion

bubble-vm doesn’t have explicit push, pop, read, write, call, or return instructions, and barely even supports random indexing, yet through careful manipulation of memory layouts and tape positions, we can create powerful abstractions which allow us to scale programs far beyond what a human can statically analyze. Even with extremely limiting architectures, complex behaviors can emerge from a handful of primtive constructions, allowing us to model any computable problem.

I had a lot of fun creating this VM and writing my own program and assembler for it. Working on this challenge has definitely changed my viewpoint on computing and program design, and I hope to continue to write more cool projects and CTF challenges in the future.