現在のブログ
ゲーム開発ブログ (2025年~) Gamedev Blog (2025~)
レガシーブログ
テクノロジーブログ (2018~2024年) リリースノート (2023~2025年) MeatBSD (2024年)
【Programming】How to Learn Assembly Language Using C
There are many ways to learn assembly language, but I recommend learning at least one assembly dialect.
Mastering one dialect makes it easier to learn others and allows you to deeply understand how computers work.
Without reading assembly, it's hard to truly understand a computer's mechanics.
Normally, I'd recommend starting with MIPS or RISC-V assembly because they're very simple.
However, this time, I'll assume you're using an Intel or AMD processor and focus on x64 assembly.
Workflow for Learning Assembly from C
- Write a simple C program.
- Compile it with
cc -o program program.c. - Disassemble the binary using
objdump -d -Mintel program | less. - Find the
mainfunction and analyze the assembly instructions. - Map each assembly instruction to the corresponding C code.
Compile with no optimization (default -O0) using cc main.c -o ftoc.
Using optimization flags like -O2 can produce different assembly, so use -O0 for learning.
Converting C to Assembly
The easiest way to learn assembly is to first create a simple C program, compile it, and disassemble the binary with objdump.This article uses GhostBSD, but Linux, Illumos, FreeBSD, OpenBSD, NetBSD, or other Unix-like OSes will work fine.All the tools used in this article are likely already installed on your computer.We'll use Intel syntax for assembly (specified with -Mintel in objdump).This makes instructions read like mov eax, ebx instead of AT&T's movl %ebx, %eax.
Below is C code directly quoted from The C Programming Language, 2nd Edition by B.W. Kernighan and D.M. Ritchie, a monumental book for all C programmers.
#include <stdio.h>
int main() {
int fahr, celsius;
int lower, upper, step;
lower = 0; /* 温度表の下限 */
upper = 300; /* 上限 */
step = 20; /* きざみ */
fahr = lower;
while (fahr <= upper) {
celsius = 5 * (fahr-32) / 9;
printf("%d\t%d\n", fahr, celsius);
fahr = fahr + step;
}
}
This C program converts Fahrenheit to Celsius, outputting a table from 0°F to 300°F in 20°F increments.
The formula used is celsius = 5 * (fahr - 32) / 9.
Next, compile it with cc main.c -o ftoc.
Then, run objdump -d -Mintel ./ftoc | less.
Press / and type main to find the main function.
What you see may vary slightly depending on your CPU architecture or OS.
In my case, the output looks like this:
00000000002016a0 <main>:
2016a0: 55 push rbp
2016a1: 48 89 e5 mov rbp, rsp
2016a4: 48 83 ec 20 sub rsp, 0x20
2016a8: c7 45 fc 00 00 00 00 mov dword ptr [rbp - 0x4], 0x0
2016af: c7 45 f0 00 00 00 00 mov dword ptr [rbp - 0x10], 0x0
2016b6: c7 45 ec 2c 01 00 00 mov dword ptr [rbp - 0x14], 0x12c
2016bd: c7 45 e8 14 00 00 00 mov dword ptr [rbp - 0x18], 0x14
2016c4: 8b 45 f0 mov eax, dword ptr [rbp - 0x10]
2016c7: 89 45 f8 mov dword ptr [rbp - 0x8], eax
2016ca: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
2016cd: 3b 45 ec cmp eax, dword ptr [rbp - 0x14]
2016d0: 7f 36 jg 0x201708 <main+0x68>
2016d2: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
2016d5: 83 e8 20 sub eax, 0x20
2016d8: 6b c0 05 imul eax, eax, 0x5
2016db: b9 09 00 00 00 mov ecx, 0x9
2016e0: 99 cdq
2016e1: f7 f9 idiv ecx
2016e3: 89 45 f4 mov dword ptr [rbp - 0xc], eax
2016e6: 8b 75 f8 mov esi, dword ptr [rbp - 0x8]
2016e9: 8b 55 f4 mov edx, dword ptr [rbp - 0xc]
2016ec: 48 bf d8 04 20 00 00 00 00 00 movabs rdi, 0x2004d8
2016f6: b0 00 mov al, 0x0
2016f8: e8 c3 00 00 00 call 0x2017c0 <printf@plt>
2016fd: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
201700: 03 45 e8 add eax, dword ptr [rbp - 0x18]
201703: 89 45 f8 mov dword ptr [rbp - 0x8], eax
201706: eb c2 jmp 0x2016ca <main+0x2a>
201708: 8b 45 fc mov eax, dword ptr [rbp - 0x4]
20170b: 48 83 c4 20 add rsp, 0x20
20170f: 5d pop rbp
201710: c3 ret
201711: cc int3
201712: cc int3
201713: cc int3
201714: cc int3
201715: cc int3
201716: cc int3
201717: cc int3
201718: cc int3
201719: cc int3
20171a: cc int3
20171b: cc int3
20171c: cc int3
20171d: cc int3
20171e: cc int3
20171f: cc int3
The int3 instruction (opcode cc) at the end is padding added by the compiler for alignment, debugging traps, or invalid control flow.
It doesn't affect the program's logic, so we can ignore it for now.
You'll notice all numbers are in hexadecimal.
For those who missed it, I wrote a detailed article on bitwise operations here.
Assembly Basics
Assembly language is a low-level language that directly corresponds to machine instructions.
Key concepts include:
- Registers: Small, fast storage in the CPU (e.g.,
rax,rbp,rsp). - Stack: A memory area for managing local variables and function state, controlled by the stack pointer (
rsp) and base pointer (rbp). - Instructions: Commands like
mov(move data),cmp(compare),jmp(jump to an address).
Simple Parts
Defining Integers
You might first notice this part:
2016af: c7 45 f0 00 00 00 00 mov dword ptr [rbp - 0x10], 0x0
2016b6: c7 45 ec 2c 01 00 00 mov dword ptr [rbp - 0x14], 0x12c
2016bd: c7 45 e8 14 00 00 00 mov dword ptr [rbp - 0x18], 0x14
2016c4: 8b 45 f0 mov eax, dword ptr [rbp - 0x10]
2016c7: 89 45 f8 mov dword ptr [rbp - 0x8], eax
This corresponds to the following C code:
lower = 0;
upper = 300;
step = 20;
fahr = lower;
rbp - 0x10 holds lower, rbp - 0x14 holds upper, rbp - 0x18 holds step, and rbp - 0x8 holds fahr.
The instruction mov eax, dword ptr [rbp - 0x10] loads lower (0) into eax (bytes 0-3 of the rax register).rax is used to store a function's return value.mov dword ptr [rbp - 0x8], eax assigns lower to fahr.
rbp is the base pointer for the current stack frame, and rsp is the current stack pointer.
At the start of the program, you'll see something like this:
2016a0: 55 push rbp
2016a1: 48 89 e5 mov rbp, rsp
2016a4: 48 83 ec 20 sub rsp, 0x20
These instructions set up the stack frame for the main function.
push rbp, mov rbp, rsp, and sub rsp, 0x20 form the standard prologue for an x64 function.push rbp saves the caller's base pointer, mov rbp, rsp sets the current stack pointer as the new base pointer for main's stack frame, and sub rsp, 0x20 allocates 32 bytes for local variables and alignment.
This appears in all code and doesn't directly correspond to C code.
It's what the machine does in the background.
You'll also see notations like [rbp - 0x10].
This means rbp is the base pointer for the current stack frame, and subtracting the hexadecimal offset 0x10 (16) loads the local variable stored at that address (in this case, lower).
printf Function
Next, you can easily spot this part:
2016e6: 8b 75 f8 mov esi, dword ptr [rbp - 0x8]
2016e9: 8b 55 f4 mov edx, dword ptr [rbp - 0xc]
2016ec: 48 bf d8 04 20 00 00 00 00 00 movabs rdi, 0x2004d8
2016f6: b0 00 mov al, 0x0
2016f8: e8 c3 00 00 00 call 0x2017c0 <printf@plt>
This corresponds to the printf() function inside the loop.mov esi, dword ptr [rbp - 0x8] loads fahr into esi (printf's second argument).mov edx, dword ptr [rbp - 0xc] loads celsius into edx (printf's third argument).movabs rdi, 0x2004d8 loads the address of the format string "%d\t%d\n" into rdi (printf's first argument).mov al, 0x0 sets al to 0, indicating no floating-point arguments are passed to printf.
This is a requirement for x64 processors.
Next, you'll see call 0x2017c0 <printf@plt>.
This calls printf via the Procedure Linkage Table (PLT) used in dynamically linked binaries, resolving the address of printf in libc.so at runtime.
In statically linked binaries, as we'll discuss later, the full printf implementation is included.
Let's check the address 0x2004d8.
First, press q to exit the current objdump instance.
Then, run objdump -d -Mintel -s -j .rodata ./ftoc.
In my case, the output looks like this:
./ftoc: file format elf64-x86-64
Contents of section .rodata:
2004d8 25640925 640a0000 %d.%d...
Disassembly of section .rodata:
00000000002004d8 <.rodata>:
2004d8: 25 64 09 25 64 and eax, 0x64250964
2004dd: 0a 00 or al, byte ptr [rax]
2004df: 00 <unknown>
In particular, look at 2004d8: 25 64 09 25 64 and eax, 0x64250964.% = 0x25d = 0x64
Since x64 is a little-endian architecture, the byte sequence 25 64 represents %d.\t = 0x09
Then, 25 64 repeats, which makes sense since %d is used twice.
The next line shows 2004dd: 0a 00 or al, byte ptr [rax].\n = 0x0a
The final 0x00 is the null terminator inserted by the compiler.
This is where printf is defined.
I won't go into detail, but if you search for address 2017c0, in my case, it's at the end of the binary.
You'll see something like this:
00000000002017c0 <printf@plt>:
2017c0: ff 25 aa 21 00 00 jmp qword ptr [rip + 0x21aa] # 0x203970 <printf+0x203970>
2017c6: 68 02 00 00 00 push 0x2
2017cb: e9 c0 ff ff ff jmp 0x201790 <.plt>
As homework, figure out what this means.
However, this is printf for a dynamically linked binary.
In a statically linked binary, the output would look like this:
0000000000227320 <printf>:
227320: 55 push rbp
227321: 48 89 e5 mov rbp, rsp
227324: 48 81 ec d0 00 00 00 sub rsp, 0xd0
22732b: 49 89 fa mov r10, rdi
22732e: 84 c0 test al, al
227330: 74 26 je 0x227358 <printf+0x38>
227332: 0f 29 85 60 ff ff ff movaps xmmword ptr [rbp - 0xa0], xmm0
227339: 0f 29 8d 70 ff ff ff movaps xmmword ptr [rbp - 0x90], xmm1
227340: 0f 29 55 80 movaps xmmword ptr [rbp - 0x80], xmm2
227344: 0f 29 5d 90 movaps xmmword ptr [rbp - 0x70], xmm3
227348: 0f 29 65 a0 movaps xmmword ptr [rbp - 0x60], xmm4
22734c: 0f 29 6d b0 movaps xmmword ptr [rbp - 0x50], xmm5
227350: 0f 29 75 c0 movaps xmmword ptr [rbp - 0x40], xmm6
227354: 0f 29 7d d0 movaps xmmword ptr [rbp - 0x30], xmm7
227358: 48 89 b5 38 ff ff ff mov qword ptr [rbp - 0xc8], rsi
22735f: 48 89 95 40 ff ff ff mov qword ptr [rbp - 0xc0], rdx
227366: 48 89 8d 48 ff ff ff mov qword ptr [rbp - 0xb8], rcx
22736d: 4c 89 85 50 ff ff ff mov qword ptr [rbp - 0xb0], r8
227374: 4c 89 8d 58 ff ff ff mov qword ptr [rbp - 0xa8], r9
22737b: 48 8b 05 ae 3c 07 00 mov rax, qword ptr [rip + 0x73cae] # 0x29b030 <__stack_chk_guard>
227382: 48 89 45 f8 mov qword ptr [rbp - 0x8], rax
227386: 48 8d 85 30 ff ff ff lea rax, [rbp - 0xd0]
22738d: 48 89 45 f0 mov qword ptr [rbp - 0x10], rax
227391: 48 8d 45 10 lea rax, [rbp + 0x10]
227395: 48 89 45 e8 mov qword ptr [rbp - 0x18], rax
227399: 48 b8 08 00 00 00 30 00 00 00 movabs rax, 0x3000000008
2273a3: 48 89 45 e0 mov qword ptr [rbp - 0x20], rax
2273a7: 48 8b 3d 22 ea 06 00 mov rdi, qword ptr [rip + 0x6ea22] # 0x295dd0 <__stdoutp>
2273ae: 48 8d 55 e0 lea rdx, [rbp - 0x20]
2273b2: 4c 89 d6 mov rsi, r10
2273b5: e8 46 40 00 00 call 0x22b400 <vfprintf>
2273ba: 48 8b 0d 6f 3c 07 00 mov rcx, qword ptr [rip + 0x73c6f] # 0x29b030 <__stack_chk_guard>
2273c1: 48 3b 4d f8 cmp rcx, qword ptr [rbp - 0x8]
2273c5: 75 09 jne 0x2273d0 <printf+0xb0>
2273c7: 48 81 c4 d0 00 00 00 add rsp, 0xd0
2273ce: 5d pop rbp
2273cf: c3 ret
2273d0: e8 1b cc 00 00 call 0x233ff0 <__stack_chk_fail_local>
2273d5: 66 66 2e 0f 1f 84 00 00 00 00 00 nop word ptr cs:[rax + rax]
As additional homework, investigate the printf@plt (dynamic) and printf (static) functions.
In static linking, the printf function and all its dependencies are included, but in dynamic linking, it only points to an address in the libc.so file somewhere on the system.
In that case, you can check with objdump -d -Mintel /lib/libc.so.7 | less (the filename or path may vary by OS).You'll see something like this:
00000000001cd370 <printf@plt>:
1cd370: ff 25 82 1d 01 00 jmp qword ptr [rip + 0x11d82] # 0x1df0f8
1cd376: 68 93 01 00 00 push 0x193
1cd37b: e9 b0 e6 ff ff jmp 0x1cba30 <.plt>
Unfortunately, you won't find the full definition in this object dump.
However, you can dump the entire library with objdump -d /lib/libc.so.7 > libc_disasm.txt.
Then, run less libc_disasm.txt, press /, and search for <printf>.
You'll see the full definition, which looks like this (in AT&T syntax):
000000000011b220 <printf>:
11b220: 55 pushq %rbp
11b221: 48 89 e5 movq %rsp, %rbp
11b224: 53 pushq %rbx
11b225: 48 81 ec d8 00 00 00 subq $0xd8, %rsp
11b22c: 49 89 fa movq %rdi, %r10
11b22f: 84 c0 testb %al, %al
11b231: 74 29 je 0x11b25c <printf+0x3c>
11b233: 0f 29 85 50 ff ff ff movaps %xmm0, -0xb0(%rbp)
11b23a: 0f 29 8d 60 ff ff ff movaps %xmm1, -0xa0(%rbp)
11b241: 0f 29 95 70 ff ff ff movaps %xmm2, -0x90(%rbp)
11b248: 0f 29 5d 80 movaps %xmm3, -0x80(%rbp)
11b24c: 0f 29 65 90 movaps %xmm4, -0x70(%rbp)
11b250: 0f 29 6d a0 movaps %xmm5, -0x60(%rbp)
11b254: 0f 29 75 b0 movaps %xmm6, -0x50(%rbp)
11b258: 0f 29 7d c0 movaps %xmm7, -0x40(%rbp)
11b25c: 48 89 b5 28 ff ff ff movq %rsi, -0xd8(%rbp)
11b263: 48 89 95 30 ff ff ff movq %rdx, -0xd0(%rbp)
11b26a: 48 89 8d 38 ff ff ff movq %rcx, -0xc8(%rbp)
11b271: 4c 89 85 40 ff ff ff movq %r8, -0xc0(%rbp)
11b278: 4c 89 8d 48 ff ff ff movq %r9, -0xb8(%rbp)
11b27f: 48 8b 1d 3a dc 0b 00 movq 0xbdc3a(%rip), %rbx # 0x1d8ec0
11b286: 48 8b 03 movq (%rbx), %rax
11b289: 48 89 45 f0 movq %rax, -0x10(%rbp)
11b28d: 48 8d 85 20 ff ff ff leaq -0xe0(%rbp), %rax
11b294: 48 89 45 e0 movq %rax, -0x20(%rbp)
11b298: 48 8d 45 10 leaq 0x10(%rbp), %rax
11b29c: 48 89 45 d8 movq %rax, -0x28(%rbp)
11b2a0: 48 b8 08 00 00 00 30 00 00 00 movabsq $0x3000000008, %rax # imm = 0x3000000008
11b2aa: 48 89 45 d0 movq %rax, -0x30(%rbp)
11b2ae: 48 8b 05 1b dd 0b 00 movq 0xbdd1b(%rip), %rax # 0x1d8fd0
11b2b5: 48 8b 38 movq (%rax), %rdi
11b2b8: 48 8d 55 d0 leaq -0x30(%rbp), %rdx
11b2bc: 4c 89 d6 movq %r10, %rsi
11b2bf: e8 cc 11 0b 00 callq 0x1cc490 <vfprintf@plt>
11b2c4: 48 8b 0b movq (%rbx), %rcx
11b2c7: 48 3b 4d f0 cmpq -0x10(%rbp), %rcx
11b2cb: 75 0a jne 0x11b2d7 <printf+0xb7>
11b2cd: 48 81 c4 d8 00 00 00 addq $0xd8, %rsp
11b2d4: 5b popq %rbx
11b2d5: 5d popq %rbp
11b2d6: c3 retq
11b2d7: e8 a4 07 0b 00 callq 0x1cba80 <__stack_chk_fail@plt>
11b2dc: 0f 1f 40 00 nopl (%rax)
Simple, right?
Now, let's move on to the trickier parts!
Trickier Parts
The while loop condition is constructed like this:
2016ca: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
2016cd: 3b 45 ec cmp eax, dword ptr [rbp - 0x14]
2016d0: 7f 36 jg 0x201708 <main+0x68>
This checks the while loop condition, i.e., whether fahr is less than or equal to upper.mov eax, dword ptr [rbp - 0x8] loads fahr into eax.cmp eax, dword ptr [rbp - 0x14] compares fahr with upper (stored at rbp - 0x14).
Together, these represent fahr <= upper.jg 0x201708 jumps to address 201708 (the end of the loop) if fahr is greater than upper.
This is the negation of fahr <= upper, so the loop continues if fahr is less than or equal to upper.
Inside the loop, you'll see code like this:
2016d2: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
2016d5: 83 e8 20 sub eax, 0x20
2016d8: 6b c0 05 imul eax, eax, 0x5
2016db: b9 09 00 00 00 mov ecx, 0x9
2016e0: 99 cdq
2016e1: f7 f9 idiv ecx
2016e3: 89 45 f4 mov dword ptr [rbp - 0xc], eax
This calculates celsius = 5 * (fahr - 32) / 9.mov eax, dword ptr [rbp - 0x8] loads fahr into eax.sub eax, 0x20 subtracts 32 (0x20) from eax, corresponding to fahr - 32 in C.imul eax, eax, 0x5 multiplies eax by 5, corresponding to 5 * (fahr - 32) in C.mov ecx, 0x9 loads 9 into ecx (the divisor).cdq sign-extends eax into edx:eax to prepare for the idiv instruction.
This is used to perform signed division on the 64-bit value in edx:eax, correctly handling negative numbers for 5 * (fahr - 32) / 9.idiv ecx divides edx:eax by ecx (9), storing the quotient in eax.mov dword ptr [rbp - 0xc], eax stores the result in celsius (rbp - 0xc).
In other words, one line of C code requires seven assembly instructions.
Next, the following calculation is performed:
2016fd: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
201700: 03 45 e8 add eax, dword ptr [rbp - 0x18]
201703: 89 45 f8 mov dword ptr [rbp - 0x8], eax
With all the explanations so far, you should understand that this means fahr = fahr + step;.
Then, you'll see this:
201706: eb c2 jmp 0x2016ca <main+0x2a>
This jumps back to the start of the while loop (address 2016ca) to check if fahr is still less than or equal to upper.
If the answer is false, the loop ends; if true, it continues.
Finally, there's this:
201708: 8b 45 fc mov eax, dword ptr [rbp - 0x4]
20170b: 48 83 c4 20 add rsp, 0x20
20170f: 5d pop rbp
201710: c3 ret
This cleans up the stack frame and returns from main.mov eax, dword ptr [rbp - 0x4] loads the value at rbp - 0x4 into eax.
This is the return value for main, implicitly initialized at address 2016a8 (mov dword ptr [rbp - 0x4], 0x0).
In C, if main doesn't explicitly specify a return value, the C standard returns 0.
In assembly, mov dword ptr [rbp - 0x4], 0x0 (at address 2016a8) initializes fahr, not the return value.
The actual return value is set at the end with mov eax, dword ptr [rbp - 0x4].add rsp, 0x20 frees the 32 bytes of stack space.pop rbp restores the caller's base pointer.ret returns to the caller.
Still pretty simple, right?
Explicit vs. Implicit Return Value
In this example, we omitted return 0;, which is allowed in C and C++ only for int main().
However, adding return 0; to the C code slightly changes the assembly output.
00000000002016a0 <main>:
2016a0: 55 push rbp
2016a1: 48 89 e5 mov rbp, rsp
2016a4: 48 83 ec 20 sub rsp, 0x20
2016a8: c7 45 fc 00 00 00 00 mov dword ptr [rbp - 0x4], 0x0
2016af: c7 45 f0 00 00 00 00 mov dword ptr [rbp - 0x10], 0x0
2016b6: c7 45 ec 2c 01 00 00 mov dword ptr [rbp - 0x14], 0x12c
2016bd: c7 45 e8 14 00 00 00 mov dword ptr [rbp - 0x18], 0x14
2016c4: 8b 45 f0 mov eax, dword ptr [rbp - 0x10]
2016c7: 89 45 f8 mov dword ptr [rbp - 0x8], eax
2016ca: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
2016cd: 3b 45 ec cmp eax, dword ptr [rbp - 0x14]
2016d0: 7f 36 jg 0x201708 <main+0x68>
2016d2: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
2016d5: 83 e8 20 sub eax, 0x20
2016d8: 6b c0 05 imul eax, eax, 0x5
2016db: b9 09 00 00 00 mov ecx, 0x9
2016e0: 99 cdq
2016e1: f7 f9 idiv ecx
2016e3: 89 45 f4 mov dword ptr [rbp - 0xc], eax
2016e6: 8b 75 f8 mov esi, dword ptr [rbp - 0x8]
2016e9: 8b 55 f4 mov edx, dword ptr [rbp - 0xc]
2016ec: 48 bf d8 04 20 00 00 00 00 00 movabs rdi, 0x2004d8
2016f6: b0 00 mov al, 0x0
2016f8: e8 b3 00 00 00 call 0x2017b0 <printf@plt>
2016fd: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
201700: 03 45 e8 add eax, dword ptr [rbp - 0x18]
201703: 89 45 f8 mov dword ptr [rbp - 0x8], eax
201706: eb c2 jmp 0x2016ca <main+0x2a>
201708: 31 c0 xor eax, eax
20170a: 48 83 c4 20 add rsp, 0x20
20170e: 5d pop rbp
20170f: c3 ret
First, the debug padding is gone.
The relevant changed lines are:
201708: 8b 45 fc mov eax, dword ptr [rbp - 0x4]
Instead, you'll see this:
201708: 31 c0 xor eax, eax
In other words, instead of looking up the return value, the binary directly accesses 0 as the return value.
This speeds up processing and saves a few bytes.
Conclusion
Overall, it's not that difficult.
It might seem intimidating at first, but once you understand what's happening, learning assembly becomes easy quickly.
Once you understand assembly, all software becomes open-source.
That's all