現在のブログ
ゲーム開発ブログ (2025年~) Gamedev Blog (2025~)
レガシーブログ
テクノロジーブログ (2018~2024年) リリースノート (2023~2025年) MeatBSD (2024年)
【Programming】Learning Machine Code Through Assembly Language
Before proceeding, if you haven't read the following articles or need a review, please read them in this order first:
This article builds on the knowledge explained in the above two articles, and you won't be able to understand its content without reading them beforehand.
Assembly is Almost Machine Code
Do you remember the objdump command used in the previous article?
We'll use it again in this article!
The reason is that it produces output like this:
2016db: b9 09 00 00 00 mov ecx, 0x9
Pay attention to the hexadecimal values on the left and the assembly code on the right.
Here's the secret: they mean exactly the same thing!
That's right, assembly code is a human-readable representation of those hexadecimal values!
b9 means "move the next 32-bit immediate value", and 09 00 00 00 represents the immediate value "9" in little-endian format.
In binary, it's written as: 10111001 00001001 00000000 00000000 00000000
And this binary representation is the machine code!
Next example:
2016d2: 8b 45 f8 mov eax, dword ptr [rbp - 0x8]
2016e6: 8b 75 f8 mov esi, dword ptr [rbp - 0x8]
2016e9: 8b 55 f4 mov edx, dword ptr [rbp - 0xc]
8b means "move the next 32-bit value from a memory location to the specified register".45, 75, and 55 indicate the addressing mode, including the registers used or the destination register.
In other words, 4 is eax, 7 is esi, and 5 is edx.
The 5 that follows all three refers to rbp.
Finally, f8 represents -0x8, and f4 represents -0xc, which are offsets from the base pointer rbp.
Converting the hexadecimal to binary:
8b 45 f8 → 10001011 01000101 11111000
8b 75 f8 → 10001011 01110101 11111000
8b 55 f4 → 10001011 01010101 11110100
Again, the binary representation is what the machine truly understands.
ModR/M Byte
Let's look at the middle part, namely 45, 75, and 55.
Now that we have the binary representation, we can explain the ModR/M byte.
In binary, it will become the following bytes:
Bit: 7 6 | 5 4 3 | 2 1 0
Field: Mod | Reg | R/M
In binary, the byte for 45 in 8b 45 f8 becomes 01000101.
The Mod is composed of the bits 01.
This indicates a memory reference with an 8-bit displacement ([rbp - 0x8]).
In other words, the memory address is calculated as [base register + displacement], where the displacement is a signed 8-bit value.
Here's a list of possible Mod field values:
- 00 → No displacement, register indirect addressing
- 01 → 8-bit signed displacement
- 10 → 32-bit signed displacement
- 11 → Register direct addressing
The Reg is composed of the bits 000.
This specifies the destination register, in this case eax, which is register code 000 in x86 encoding.
In the other two examples, it becomes clear that 110 (7) means esi, and 010 (5) means edx.
Here's a list of standard x86 (32-bit) register codes:
000→eax001→ecx010→edx011→ebx100→esp101→ebp110→esi111→edi
The R/M is composed of the bits 101.
This specifies the base register, in this case rbp.
Code list:
000→ Memory address[eax]001→ Memory address[ecx]010→ Memory address[edx]011→ Memory address[ebx]100→ Special case: Indicates that a SIB (Scale-Index-Base) byte follows to specify a complex memory address (e.g.,[eax + ebx*4])101→ IfMod = 00, it means a 32-bit displacement (no base register); ifMod = 01orMod = 10, it means[ebp + displacement]110→ Memory address[esi]111→ Memory address[edi]
The displacement is f8, which comes right after 45, and as explained earlier, it means -8, or -0x8 in hexadecimal.
By decompiling a program written in C, you can learn not only assembly but also the lowest-level machine code!
Pretty cool, right!
That's all