So here’s what I’ve spent the last couple days working on. Yes, it’s assembly code.
; assemble with nasm: ; nasm -f elf -g welcome.asm && ld -o welcome welcome.o %macro print 2 mov eax, 4 ; sys_write mov ebx, 1 ; stdout mov ecx, %1 ; address of message mov edx, %2 ; length of message int 0x80 %endmacro section .text global _start section .data prompt db "Hi, what's your name? " prompt_length equ $ - prompt welcome_part_1 db "Hello, " welcome_part_1_length equ $ - welcome_part_1 welcome_part_2 db "! Welcome to assembly.",0x0a welcome_part_2_length equ $ - welcome_part_2 section .bss name resb 40 name_max_length equ $ - name name_length resb 4 extra resb 40 extra_max_length equ $ - name section .text _start: print prompt, prompt_length ; read name mov eax, 3 ; sys_read mov ebx, 0 ; stdin mov ecx, name mov edx, name_max_length int 0x80 ; eax is bytes read ; if 0 (ctrl-d/EOF), exit cmp eax, 0 jz exit ; if max, there may be more input cmp eax, name_max_length jnz read_complete cmp byte [name + eax - 1], 0x0a jz read_complete ; clear out the rest of the input, or it will be read by the shell as the next command! clear_input: push eax ; save the name length ; read extra mov eax, 3 ; sys_read mov ebx, 0 ; stdin mov ecx, extra mov edx, extra_max_length int 0x80 ; if max, there may be more input cmp eax, extra_max_length jnz input_cleared cmp byte [extra + eax - 1], 0x0a jz input_cleared jmp clear_input input_cleared: pop eax read_complete: ; if last is \n, change it to \0 and decrement length cmp byte [name + eax - 1], 0x0a jne length_ok dec eax mov byte [name + eax], 0x00 length_ok: cmp eax, 0 ; only if input was \n jz _start or eax, 0x30 ; convert to ascii mov [name_length], eax print welcome_part_1, welcome_part_1_length print name, name_max_length print welcome_part_2, welcome_part_2_length exit: mov eax, 1 ; sys_exit mov ebx, 0 ; exit code int 0x80
In case you don’t know assembly, and since what this does probably isn’t obvious even if you do, here’s an equivalent shell script.
while test -z "$name" ; do read -p "Hi! What's your name? " name done echo "Hello, $name! Welcome to bash."
And of course, the assembly program will only work on an Intel chip running Linux, and the bash script will work on anything that runs bash.
So why on earth, you might rightly ask, am I doing this?
Partly, it’s just curiosity. Or perhaps something stronger and more negative, an anxiety aroused by awareness of ignorance. It makes me deeply uncomfortable to have to say, “Oh, that’s a black box; I have no idea how it works.” For the work I’m doing now, it’s no more than that, but if I want to move into security work, it’s much more important because a lot of exploits operate at this level.
I’ve compared learning new programming languages to foreign travel before, and this has been a similar experience. It’s really weird and jarring at first, but I acclimated more quickly than I expected. I’m still very much an assembly newbie, but I’ve crossed some basic threshold of wrapping my head around it.
Assembly is different. You’re dealing with the hardware, shuffling data between registers and memory. You have to pay attention to each byte. And you’re keenly aware when you’re handing off control to the operating system (which is another black box that I’m digging into, on which more at a later date). It stops you from taking a lot for granted. It’s also strangely appealing to be working with such simple tools, at such a fundamental level.
As you can see, it’s a lot of work to do something very basic, but this is what any program ultimately boils down to. The bash script is, under the hood, generating a set of instructions like this (but undoubtedly more complicated). All the python code I write day-in, day-out, blithely schlepping objects around between databases and the web, generates an unimaginable spew of assembly instructions. Working at this level gives you an appreciation for what an amazing structure of code we’ve built on top of this.