現在のブログ

レガシーブログ

テクノロジーブログ (2018～2024年) リリースノート (2023～2025年) MeatBSD (2024年)

2025-09-23 15:31:46

Suwako

lowlevel

assembly

【Programming】How to Learn Assembly Language Using C

There are many ways to learn assembly language, but I recommend learning at least one assembly dialect.
Mastering one dialect makes it easier to learn others and allows you to deeply understand how computers work.
Without reading assembly, it's hard to truly understand a computer's mechanics.
Normally, I'd recommend starting with MIPS or RISC-V assembly because they're very simple.
However, this time, I'll assume you're using an Intel or AMD processor and focus on x64 assembly.

Workflow for Learning Assembly from C

Write a simple C program.
Compile it with cc -o program program.c.
Disassemble the binary using objdump -d -Mintel program | less.
Find the main function and analyze the assembly instructions.
Map each assembly instruction to the corresponding C code.

Compile with no optimization (default -O0) using cc main.c -o ftoc.
Using optimization flags like -O2 can produce different assembly, so use -O0 for learning.

Converting C to Assembly

The easiest way to learn assembly is to first create a simple C program, compile it, and disassemble the binary with objdump.This article uses GhostBSD, but Linux, Illumos, FreeBSD, OpenBSD, NetBSD, or other Unix-like OSes will work fine.All the tools used in this article are likely already installed on your computer.We'll use Intel syntax for assembly (specified with -Mintel in objdump).This makes instructions read like mov eax, ebx instead of AT&T's movl %ebx, %eax.

Below is C code directly quoted from The C Programming Language, 2nd Edition by B.W. Kernighan and D.M. Ritchie, a monumental book for all C programmers.

#include <stdio.h>

int main() {
  int fahr, celsius;
  int lower, upper, step;

  lower = 0;      /* 温度表の下限 */
  upper = 300;    /* 上限 */
  step = 20;      /* きざみ */

  fahr = lower;
  while (fahr <= upper) {
    celsius = 5 * (fahr-32) / 9;
    printf("%d\t%d\n", fahr, celsius);
    fahr = fahr + step;
  }
}

This C program converts Fahrenheit to Celsius, outputting a table from 0°F to 300°F in 20°F increments.
The formula used is celsius = 5 * (fahr - 32) / 9.
Next, compile it with cc main.c -o ftoc.
Then, run objdump -d -Mintel ./ftoc | less.
Press / and type main to find the main function.

What you see may vary slightly depending on your CPU architecture or OS.
In my case, the output looks like this:

00000000002016a0 <main>:
  2016a0: 55                            push    rbp
  2016a1: 48 89 e5                      mov     rbp, rsp
  2016a4: 48 83 ec 20                   sub     rsp, 0x20
  2016a8: c7 45 fc 00 00 00 00          mov     dword ptr [rbp - 0x4], 0x0
  2016af: c7 45 f0 00 00 00 00          mov     dword ptr [rbp - 0x10], 0x0
  2016b6: c7 45 ec 2c 01 00 00          mov     dword ptr [rbp - 0x14], 0x12c
  2016bd: c7 45 e8 14 00 00 00          mov     dword ptr [rbp - 0x18], 0x14
  2016c4: 8b 45 f0                      mov     eax, dword ptr [rbp - 0x10]
  2016c7: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax
  2016ca: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
  2016cd: 3b 45 ec                      cmp     eax, dword ptr [rbp - 0x14]
  2016d0: 7f 36                         jg      0x201708 <main+0x68>
  2016d2: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
  2016d5: 83 e8 20                      sub     eax, 0x20
  2016d8: 6b c0 05                      imul    eax, eax, 0x5
  2016db: b9 09 00 00 00                mov     ecx, 0x9
  2016e0: 99                            cdq
  2016e1: f7 f9                         idiv    ecx
  2016e3: 89 45 f4                      mov     dword ptr [rbp - 0xc], eax
  2016e6: 8b 75 f8                      mov     esi, dword ptr [rbp - 0x8]
  2016e9: 8b 55 f4                      mov     edx, dword ptr [rbp - 0xc]
  2016ec: 48 bf d8 04 20 00 00 00 00 00 movabs  rdi, 0x2004d8
  2016f6: b0 00                         mov     al, 0x0
  2016f8: e8 c3 00 00 00                call    0x2017c0 <printf@plt>
  2016fd: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
  201700: 03 45 e8                      add     eax, dword ptr [rbp - 0x18]
  201703: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax
  201706: eb c2                         jmp     0x2016ca <main+0x2a>
  201708: 8b 45 fc                      mov     eax, dword ptr [rbp - 0x4]
  20170b: 48 83 c4 20                   add     rsp, 0x20
  20170f: 5d                            pop     rbp
  201710: c3                            ret
  201711: cc                            int3
  201712: cc                            int3
  201713: cc                            int3
  201714: cc                            int3
  201715: cc                            int3
  201716: cc                            int3
  201717: cc                            int3
  201718: cc                            int3
  201719: cc                            int3
  20171a: cc                            int3
  20171b: cc                            int3
  20171c: cc                            int3
  20171d: cc                            int3
  20171e: cc                            int3
  20171f: cc                            int3

The int3 instruction (opcode cc) at the end is padding added by the compiler for alignment, debugging traps, or invalid control flow.
It doesn't affect the program's logic, so we can ignore it for now.

You'll notice all numbers are in hexadecimal.
For those who missed it, I wrote a detailed article on bitwise operations here.

Assembly Basics

Assembly language is a low-level language that directly corresponds to machine instructions.
Key concepts include:

Registers: Small, fast storage in the CPU (e.g., rax, rbp, rsp).
Stack: A memory area for managing local variables and function state, controlled by the stack pointer (rsp) and base pointer (rbp).
Instructions: Commands like mov (move data), cmp (compare), jmp (jump to an address).

Simple Parts

Defining Integers

You might first notice this part:

  2016af: c7 45 f0 00 00 00 00          mov     dword ptr [rbp - 0x10], 0x0
  2016b6: c7 45 ec 2c 01 00 00          mov     dword ptr [rbp - 0x14], 0x12c
  2016bd: c7 45 e8 14 00 00 00          mov     dword ptr [rbp - 0x18], 0x14
  2016c4: 8b 45 f0                      mov     eax, dword ptr [rbp - 0x10]
  2016c7: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax

This corresponds to the following C code:

lower = 0;
upper = 300;
step = 20;
fahr = lower;

rbp - 0x10 holds lower, rbp - 0x14 holds upper, rbp - 0x18 holds step, and rbp - 0x8 holds fahr.
The instruction mov eax, dword ptr [rbp - 0x10] loads lower (0) into eax (bytes 0-3 of the rax register).
rax is used to store a function's return value.
mov dword ptr [rbp - 0x8], eax assigns lower to fahr.

rbp is the base pointer for the current stack frame, and rsp is the current stack pointer.
At the start of the program, you'll see something like this:

  2016a0: 55                            push    rbp
  2016a1: 48 89 e5                      mov     rbp, rsp
  2016a4: 48 83 ec 20                   sub     rsp, 0x20

These instructions set up the stack frame for the main function.

push rbp, mov rbp, rsp, and sub rsp, 0x20 form the standard prologue for an x64 function.
push rbp saves the caller's base pointer, mov rbp, rsp sets the current stack pointer as the new base pointer for main's stack frame, and sub rsp, 0x20 allocates 32 bytes for local variables and alignment.
This appears in all code and doesn't directly correspond to C code.
It's what the machine does in the background.

You'll also see notations like [rbp - 0x10].
This means rbp is the base pointer for the current stack frame, and subtracting the hexadecimal offset 0x10 (16) loads the local variable stored at that address (in this case, lower).

printf Function

Next, you can easily spot this part:

  2016e6: 8b 75 f8                      mov     esi, dword ptr [rbp - 0x8]
  2016e9: 8b 55 f4                      mov     edx, dword ptr [rbp - 0xc]
  2016ec: 48 bf d8 04 20 00 00 00 00 00 movabs  rdi, 0x2004d8
  2016f6: b0 00                         mov     al, 0x0
  2016f8: e8 c3 00 00 00                call    0x2017c0 <printf@plt>

This corresponds to the printf() function inside the loop.
mov esi, dword ptr [rbp - 0x8] loads fahr into esi (printf's second argument).
mov edx, dword ptr [rbp - 0xc] loads celsius into edx (printf's third argument).
movabs rdi, 0x2004d8 loads the address of the format string "%d\t%d\n" into rdi (printf's first argument).
mov al, 0x0 sets al to 0, indicating no floating-point arguments are passed to printf.
This is a requirement for x64 processors.

Next, you'll see call 0x2017c0 <printf@plt>.
This calls printf via the Procedure Linkage Table (PLT) used in dynamically linked binaries, resolving the address of printf in libc.so at runtime.
In statically linked binaries, as we'll discuss later, the full printf implementation is included.

Let's check the address 0x2004d8.
First, press q to exit the current objdump instance.
Then, run objdump -d -Mintel -s -j .rodata ./ftoc.

In my case, the output looks like this:

./ftoc:  file format elf64-x86-64
Contents of section .rodata:
 2004d8 25640925 640a0000                    %d.%d...

Disassembly of section .rodata:

00000000002004d8 <.rodata>:
  2004d8: 25 64 09 25 64                and     eax, 0x64250964
  2004dd: 0a 00                         or      al, byte ptr [rax]
  2004df: 00                            <unknown>

In particular, look at 2004d8: 25 64 09 25 64 and eax, 0x64250964.
% = 0x25
d = 0x64
Since x64 is a little-endian architecture, the byte sequence 25 64 represents %d.
\t = 0x09
Then, 25 64 repeats, which makes sense since %d is used twice.

The next line shows 2004dd: 0a 00 or al, byte ptr [rax].
\n = 0x0a
The final 0x00 is the null terminator inserted by the compiler.

This is where printf is defined.
I won't go into detail, but if you search for address 2017c0, in my case, it's at the end of the binary.
You'll see something like this:

00000000002017c0 <printf@plt>:
  2017c0: ff 25 aa 21 00 00             jmp     qword ptr [rip + 0x21aa] # 0x203970 <printf+0x203970>
  2017c6: 68 02 00 00 00                push    0x2
  2017cb: e9 c0 ff ff ff                jmp     0x201790 <.plt>

As homework, figure out what this means.

However, this is printf for a dynamically linked binary.
In a statically linked binary, the output would look like this:

0000000000227320 <printf>:
  227320: 55                            push    rbp
  227321: 48 89 e5                      mov     rbp, rsp
  227324: 48 81 ec d0 00 00 00          sub     rsp, 0xd0
  22732b: 49 89 fa                      mov     r10, rdi
  22732e: 84 c0                         test    al, al
  227330: 74 26                         je      0x227358 <printf+0x38>
  227332: 0f 29 85 60 ff ff ff          movaps  xmmword ptr [rbp - 0xa0], xmm0
  227339: 0f 29 8d 70 ff ff ff          movaps  xmmword ptr [rbp - 0x90], xmm1
  227340: 0f 29 55 80                   movaps  xmmword ptr [rbp - 0x80], xmm2
  227344: 0f 29 5d 90                   movaps  xmmword ptr [rbp - 0x70], xmm3
  227348: 0f 29 65 a0                   movaps  xmmword ptr [rbp - 0x60], xmm4
  22734c: 0f 29 6d b0                   movaps  xmmword ptr [rbp - 0x50], xmm5
  227350: 0f 29 75 c0                   movaps  xmmword ptr [rbp - 0x40], xmm6
  227354: 0f 29 7d d0                   movaps  xmmword ptr [rbp - 0x30], xmm7
  227358: 48 89 b5 38 ff ff ff          mov     qword ptr [rbp - 0xc8], rsi
  22735f: 48 89 95 40 ff ff ff          mov     qword ptr [rbp - 0xc0], rdx
  227366: 48 89 8d 48 ff ff ff          mov     qword ptr [rbp - 0xb8], rcx
  22736d: 4c 89 85 50 ff ff ff          mov     qword ptr [rbp - 0xb0], r8
  227374: 4c 89 8d 58 ff ff ff          mov     qword ptr [rbp - 0xa8], r9
  22737b: 48 8b 05 ae 3c 07 00          mov     rax, qword ptr [rip + 0x73cae] # 0x29b030 <__stack_chk_guard>
  227382: 48 89 45 f8                   mov     qword ptr [rbp - 0x8], rax
  227386: 48 8d 85 30 ff ff ff          lea     rax, [rbp - 0xd0]
  22738d: 48 89 45 f0                   mov     qword ptr [rbp - 0x10], rax
  227391: 48 8d 45 10                   lea     rax, [rbp + 0x10]
  227395: 48 89 45 e8                   mov     qword ptr [rbp - 0x18], rax
  227399: 48 b8 08 00 00 00 30 00 00 00 movabs  rax, 0x3000000008
  2273a3: 48 89 45 e0                   mov     qword ptr [rbp - 0x20], rax
  2273a7: 48 8b 3d 22 ea 06 00          mov     rdi, qword ptr [rip + 0x6ea22] # 0x295dd0 <__stdoutp>
  2273ae: 48 8d 55 e0                   lea     rdx, [rbp - 0x20]
  2273b2: 4c 89 d6                      mov     rsi, r10
  2273b5: e8 46 40 00 00                call    0x22b400 <vfprintf>
  2273ba: 48 8b 0d 6f 3c 07 00          mov     rcx, qword ptr [rip + 0x73c6f] # 0x29b030 <__stack_chk_guard>
  2273c1: 48 3b 4d f8                   cmp     rcx, qword ptr [rbp - 0x8]
  2273c5: 75 09                         jne     0x2273d0 <printf+0xb0>
  2273c7: 48 81 c4 d0 00 00 00          add     rsp, 0xd0
  2273ce: 5d                            pop     rbp
  2273cf: c3                            ret
  2273d0: e8 1b cc 00 00                call    0x233ff0 <__stack_chk_fail_local>
  2273d5: 66 66 2e 0f 1f 84 00 00 00 00 00      nop     word ptr cs:[rax + rax]

As additional homework, investigate the printf@plt (dynamic) and printf (static) functions.

In static linking, the printf function and all its dependencies are included, but in dynamic linking, it only points to an address in the libc.so file somewhere on the system.

In that case, you can check with objdump -d -Mintel /lib/libc.so.7 | less (the filename or path may vary by OS).You'll see something like this:

00000000001cd370 <printf@plt>:
  1cd370: ff 25 82 1d 01 00             jmp     qword ptr [rip + 0x11d82] # 0x1df0f8
  1cd376: 68 93 01 00 00                push    0x193
  1cd37b: e9 b0 e6 ff ff                jmp     0x1cba30 <.plt>

Unfortunately, you won't find the full definition in this object dump.
However, you can dump the entire library with objdump -d /lib/libc.so.7 > libc_disasm.txt.
Then, run less libc_disasm.txt, press /, and search for <printf>.
You'll see the full definition, which looks like this (in AT&T syntax):

000000000011b220 <printf>:
  11b220: 55                            pushq   %rbp
  11b221: 48 89 e5                      movq    %rsp, %rbp
  11b224: 53                            pushq   %rbx
  11b225: 48 81 ec d8 00 00 00          subq    $0xd8, %rsp
  11b22c: 49 89 fa                      movq    %rdi, %r10
  11b22f: 84 c0                         testb   %al, %al
  11b231: 74 29                         je      0x11b25c <printf+0x3c>
  11b233: 0f 29 85 50 ff ff ff          movaps  %xmm0, -0xb0(%rbp)
  11b23a: 0f 29 8d 60 ff ff ff          movaps  %xmm1, -0xa0(%rbp)
  11b241: 0f 29 95 70 ff ff ff          movaps  %xmm2, -0x90(%rbp)
  11b248: 0f 29 5d 80                   movaps  %xmm3, -0x80(%rbp)
  11b24c: 0f 29 65 90                   movaps  %xmm4, -0x70(%rbp)
  11b250: 0f 29 6d a0                   movaps  %xmm5, -0x60(%rbp)
  11b254: 0f 29 75 b0                   movaps  %xmm6, -0x50(%rbp)
  11b258: 0f 29 7d c0                   movaps  %xmm7, -0x40(%rbp)
  11b25c: 48 89 b5 28 ff ff ff          movq    %rsi, -0xd8(%rbp)
  11b263: 48 89 95 30 ff ff ff          movq    %rdx, -0xd0(%rbp)
  11b26a: 48 89 8d 38 ff ff ff          movq    %rcx, -0xc8(%rbp)
  11b271: 4c 89 85 40 ff ff ff          movq    %r8, -0xc0(%rbp)
  11b278: 4c 89 8d 48 ff ff ff          movq    %r9, -0xb8(%rbp)
  11b27f: 48 8b 1d 3a dc 0b 00          movq    0xbdc3a(%rip), %rbx     # 0x1d8ec0
  11b286: 48 8b 03                      movq    (%rbx), %rax
  11b289: 48 89 45 f0                   movq    %rax, -0x10(%rbp)
  11b28d: 48 8d 85 20 ff ff ff          leaq    -0xe0(%rbp), %rax
  11b294: 48 89 45 e0                   movq    %rax, -0x20(%rbp)
  11b298: 48 8d 45 10                   leaq    0x10(%rbp), %rax
  11b29c: 48 89 45 d8                   movq    %rax, -0x28(%rbp)
  11b2a0: 48 b8 08 00 00 00 30 00 00 00 movabsq $0x3000000008, %rax     # imm = 0x3000000008
  11b2aa: 48 89 45 d0                   movq    %rax, -0x30(%rbp)
  11b2ae: 48 8b 05 1b dd 0b 00          movq    0xbdd1b(%rip), %rax     # 0x1d8fd0
  11b2b5: 48 8b 38                      movq    (%rax), %rdi
  11b2b8: 48 8d 55 d0                   leaq    -0x30(%rbp), %rdx
  11b2bc: 4c 89 d6                      movq    %r10, %rsi
  11b2bf: e8 cc 11 0b 00                callq   0x1cc490 <vfprintf@plt>
  11b2c4: 48 8b 0b                      movq    (%rbx), %rcx
  11b2c7: 48 3b 4d f0                   cmpq    -0x10(%rbp), %rcx
  11b2cb: 75 0a                         jne     0x11b2d7 <printf+0xb7>
  11b2cd: 48 81 c4 d8 00 00 00          addq    $0xd8, %rsp
  11b2d4: 5b                            popq    %rbx
  11b2d5: 5d                            popq    %rbp
  11b2d6: c3                            retq
  11b2d7: e8 a4 07 0b 00                callq   0x1cba80 <__stack_chk_fail@plt>
  11b2dc: 0f 1f 40 00                   nopl    (%rax)

Simple, right?
Now, let's move on to the trickier parts!

Trickier Parts

The while loop condition is constructed like this:

  2016ca: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
  2016cd: 3b 45 ec                      cmp     eax, dword ptr [rbp - 0x14]
  2016d0: 7f 36                         jg      0x201708 <main+0x68>

This checks the while loop condition, i.e., whether fahr is less than or equal to upper.
mov eax, dword ptr [rbp - 0x8] loads fahr into eax.
cmp eax, dword ptr [rbp - 0x14] compares fahr with upper (stored at rbp - 0x14).
Together, these represent fahr <= upper.
jg 0x201708 jumps to address 201708 (the end of the loop) if fahr is greater than upper.
This is the negation of fahr <= upper, so the loop continues if fahr is less than or equal to upper.

Inside the loop, you'll see code like this:

  2016d2: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
  2016d5: 83 e8 20                      sub     eax, 0x20
  2016d8: 6b c0 05                      imul    eax, eax, 0x5
  2016db: b9 09 00 00 00                mov     ecx, 0x9
  2016e0: 99                            cdq
  2016e1: f7 f9                         idiv    ecx
  2016e3: 89 45 f4                      mov     dword ptr [rbp - 0xc], eax

This calculates celsius = 5 * (fahr - 32) / 9.
mov eax, dword ptr [rbp - 0x8] loads fahr into eax.
sub eax, 0x20 subtracts 32 (0x20) from eax, corresponding to fahr - 32 in C.
imul eax, eax, 0x5 multiplies eax by 5, corresponding to 5 * (fahr - 32) in C.
mov ecx, 0x9 loads 9 into ecx (the divisor).
cdq sign-extends eax into edx:eax to prepare for the idiv instruction.
This is used to perform signed division on the 64-bit value in edx:eax, correctly handling negative numbers for 5 * (fahr - 32) / 9.
idiv ecx divides edx:eax by ecx (9), storing the quotient in eax.
mov dword ptr [rbp - 0xc], eax stores the result in celsius (rbp - 0xc).
In other words, one line of C code requires seven assembly instructions.

Next, the following calculation is performed:

  2016fd: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
  201700: 03 45 e8                      add     eax, dword ptr [rbp - 0x18]
  201703: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax

With all the explanations so far, you should understand that this means fahr = fahr + step;.

Then, you'll see this:

  201706: eb c2                         jmp     0x2016ca <main+0x2a>

This jumps back to the start of the while loop (address 2016ca) to check if fahr is still less than or equal to upper.
If the answer is false, the loop ends; if true, it continues.

Finally, there's this:

  201708: 8b 45 fc                      mov     eax, dword ptr [rbp - 0x4]
  20170b: 48 83 c4 20                   add     rsp, 0x20
  20170f: 5d                            pop     rbp
  201710: c3                            ret

This cleans up the stack frame and returns from main.
mov eax, dword ptr [rbp - 0x4] loads the value at rbp - 0x4 into eax.
This is the return value for main, implicitly initialized at address 2016a8 (mov dword ptr [rbp - 0x4], 0x0).

In C, if main doesn't explicitly specify a return value, the C standard returns 0.
In assembly, mov dword ptr [rbp - 0x4], 0x0 (at address 2016a8) initializes fahr, not the return value.
The actual return value is set at the end with mov eax, dword ptr [rbp - 0x4].
add rsp, 0x20 frees the 32 bytes of stack space.
pop rbp restores the caller's base pointer.
ret returns to the caller.

Still pretty simple, right?

Explicit vs. Implicit Return Value

In this example, we omitted return 0;, which is allowed in C and C++ only for int main().
However, adding return 0; to the C code slightly changes the assembly output.

00000000002016a0 <main>:
  2016a0: 55                            push    rbp
  2016a1: 48 89 e5                      mov     rbp, rsp
  2016a4: 48 83 ec 20                   sub     rsp, 0x20
  2016a8: c7 45 fc 00 00 00 00          mov     dword ptr [rbp - 0x4], 0x0
  2016af: c7 45 f0 00 00 00 00          mov     dword ptr [rbp - 0x10], 0x0
  2016b6: c7 45 ec 2c 01 00 00          mov     dword ptr [rbp - 0x14], 0x12c
  2016bd: c7 45 e8 14 00 00 00          mov     dword ptr [rbp - 0x18], 0x14
  2016c4: 8b 45 f0                      mov     eax, dword ptr [rbp - 0x10]
  2016c7: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax
  2016ca: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
  2016cd: 3b 45 ec                      cmp     eax, dword ptr [rbp - 0x14]
  2016d0: 7f 36                         jg      0x201708 <main+0x68>
  2016d2: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
  2016d5: 83 e8 20                      sub     eax, 0x20
  2016d8: 6b c0 05                      imul    eax, eax, 0x5
  2016db: b9 09 00 00 00                mov     ecx, 0x9
  2016e0: 99                            cdq
  2016e1: f7 f9                         idiv    ecx
  2016e3: 89 45 f4                      mov     dword ptr [rbp - 0xc], eax
  2016e6: 8b 75 f8                      mov     esi, dword ptr [rbp - 0x8]
  2016e9: 8b 55 f4                      mov     edx, dword ptr [rbp - 0xc]
  2016ec: 48 bf d8 04 20 00 00 00 00 00 movabs  rdi, 0x2004d8
  2016f6: b0 00                         mov     al, 0x0
  2016f8: e8 b3 00 00 00                call    0x2017b0 <printf@plt>
  2016fd: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
  201700: 03 45 e8                      add     eax, dword ptr [rbp - 0x18]
  201703: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax
  201706: eb c2                         jmp     0x2016ca <main+0x2a>
  201708: 31 c0                         xor     eax, eax
  20170a: 48 83 c4 20                   add     rsp, 0x20
  20170e: 5d                            pop     rbp
  20170f: c3                            ret

First, the debug padding is gone.
The relevant changed lines are:

201708: 8b 45 fc                      mov     eax, dword ptr [rbp - 0x4]

Instead, you'll see this:

201708: 31 c0                         xor     eax, eax

In other words, instead of looking up the return value, the binary directly accesses 0 as the return value.
This speeds up processing and saves a few bytes.

Conclusion

Overall, it's not that difficult.
It might seem intimidating at first, but once you understand what's happening, learning assembly becomes easy quickly.
Once you understand assembly, all software becomes open-source.

That's all