2025-09-23 15:31:46
Suwako
lowlevel
assembly

【Programming】How to Learn Assembly Language Using C

There are many ways to learn assembly language, but I recommend learning at least one assembly dialect.
Mastering one dialect makes it easier to learn others and allows you to deeply understand how computers work.
Without reading assembly, it's hard to truly understand a computer's mechanics.
Normally, I'd recommend starting with MIPS or RISC-V assembly because they're very simple.
However, this time, I'll assume you're using an Intel or AMD processor and focus on x64 assembly.

Workflow for Learning Assembly from C

  • Write a simple C program.
  • Compile it with cc -o program program.c.
  • Disassemble the binary using objdump -d -Mintel program | less.
  • Find the main function and analyze the assembly instructions.
  • Map each assembly instruction to the corresponding C code.
  • Compile with no optimization (default -O0) using cc main.c -o ftoc.
    Using optimization flags like -O2 can produce different assembly, so use -O0 for learning.

    Converting C to Assembly

    The easiest way to learn assembly is to first create a simple C program, compile it, and disassemble the binary with objdump.This article uses GhostBSD, but Linux, Illumos, FreeBSD, OpenBSD, NetBSD, or other Unix-like OSes will work fine.All the tools used in this article are likely already installed on your computer.We'll use Intel syntax for assembly (specified with -Mintel in objdump).This makes instructions read like mov eax, ebx instead of AT&T's movl %ebx, %eax.

    Below is C code directly quoted from The C Programming Language, 2nd Edition by B.W. Kernighan and D.M. Ritchie, a monumental book for all C programmers.

    #include <stdio.h>
    
    int main() {
      int fahr, celsius;
      int lower, upper, step;
    
      lower = 0;      /* 温度表の下限 */
      upper = 300;    /* 上限 */
      step = 20;      /* きざみ */
    
      fahr = lower;
      while (fahr <= upper) {
        celsius = 5 * (fahr-32) / 9;
        printf("%d\t%d\n", fahr, celsius);
        fahr = fahr + step;
      }
    }

    This C program converts Fahrenheit to Celsius, outputting a table from 0°F to 300°F in 20°F increments.
    The formula used is celsius = 5 * (fahr - 32) / 9.
    Next, compile it with cc main.c -o ftoc.
    Then, run objdump -d -Mintel ./ftoc | less.
    Press / and type main to find the main function.

    What you see may vary slightly depending on your CPU architecture or OS.
    In my case, the output looks like this:

    00000000002016a0 <main>:
      2016a0: 55                            push    rbp
      2016a1: 48 89 e5                      mov     rbp, rsp
      2016a4: 48 83 ec 20                   sub     rsp, 0x20
      2016a8: c7 45 fc 00 00 00 00          mov     dword ptr [rbp - 0x4], 0x0
      2016af: c7 45 f0 00 00 00 00          mov     dword ptr [rbp - 0x10], 0x0
      2016b6: c7 45 ec 2c 01 00 00          mov     dword ptr [rbp - 0x14], 0x12c
      2016bd: c7 45 e8 14 00 00 00          mov     dword ptr [rbp - 0x18], 0x14
      2016c4: 8b 45 f0                      mov     eax, dword ptr [rbp - 0x10]
      2016c7: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax
      2016ca: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
      2016cd: 3b 45 ec                      cmp     eax, dword ptr [rbp - 0x14]
      2016d0: 7f 36                         jg      0x201708 <main+0x68>
      2016d2: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
      2016d5: 83 e8 20                      sub     eax, 0x20
      2016d8: 6b c0 05                      imul    eax, eax, 0x5
      2016db: b9 09 00 00 00                mov     ecx, 0x9
      2016e0: 99                            cdq
      2016e1: f7 f9                         idiv    ecx
      2016e3: 89 45 f4                      mov     dword ptr [rbp - 0xc], eax
      2016e6: 8b 75 f8                      mov     esi, dword ptr [rbp - 0x8]
      2016e9: 8b 55 f4                      mov     edx, dword ptr [rbp - 0xc]
      2016ec: 48 bf d8 04 20 00 00 00 00 00 movabs  rdi, 0x2004d8
      2016f6: b0 00                         mov     al, 0x0
      2016f8: e8 c3 00 00 00                call    0x2017c0 <printf@plt>
      2016fd: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
      201700: 03 45 e8                      add     eax, dword ptr [rbp - 0x18]
      201703: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax
      201706: eb c2                         jmp     0x2016ca <main+0x2a>
      201708: 8b 45 fc                      mov     eax, dword ptr [rbp - 0x4]
      20170b: 48 83 c4 20                   add     rsp, 0x20
      20170f: 5d                            pop     rbp
      201710: c3                            ret
      201711: cc                            int3
      201712: cc                            int3
      201713: cc                            int3
      201714: cc                            int3
      201715: cc                            int3
      201716: cc                            int3
      201717: cc                            int3
      201718: cc                            int3
      201719: cc                            int3
      20171a: cc                            int3
      20171b: cc                            int3
      20171c: cc                            int3
      20171d: cc                            int3
      20171e: cc                            int3
      20171f: cc                            int3

    The int3 instruction (opcode cc) at the end is padding added by the compiler for alignment, debugging traps, or invalid control flow.
    It doesn't affect the program's logic, so we can ignore it for now.

    You'll notice all numbers are in hexadecimal.
    For those who missed it, I wrote a detailed article on bitwise operations here.

    Assembly Basics

    Assembly language is a low-level language that directly corresponds to machine instructions.
    Key concepts include:

  • Registers: Small, fast storage in the CPU (e.g., rax, rbp, rsp).
  • Stack: A memory area for managing local variables and function state, controlled by the stack pointer (rsp) and base pointer (rbp).
  • Instructions: Commands like mov (move data), cmp (compare), jmp (jump to an address).
  • Simple Parts

    Defining Integers

    You might first notice this part:

      2016af: c7 45 f0 00 00 00 00          mov     dword ptr [rbp - 0x10], 0x0
      2016b6: c7 45 ec 2c 01 00 00          mov     dword ptr [rbp - 0x14], 0x12c
      2016bd: c7 45 e8 14 00 00 00          mov     dword ptr [rbp - 0x18], 0x14
      2016c4: 8b 45 f0                      mov     eax, dword ptr [rbp - 0x10]
      2016c7: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax

    This corresponds to the following C code:

    lower = 0;
    upper = 300;
    step = 20;
    fahr = lower;

    rbp - 0x10 holds lower, rbp - 0x14 holds upper, rbp - 0x18 holds step, and rbp - 0x8 holds fahr.
    The instruction mov eax, dword ptr [rbp - 0x10] loads lower (0) into eax (bytes 0-3 of the rax register).
    rax is used to store a function's return value.
    mov dword ptr [rbp - 0x8], eax assigns lower to fahr.

    rbp is the base pointer for the current stack frame, and rsp is the current stack pointer.
    At the start of the program, you'll see something like this:

      2016a0: 55                            push    rbp
      2016a1: 48 89 e5                      mov     rbp, rsp
      2016a4: 48 83 ec 20                   sub     rsp, 0x20

    These instructions set up the stack frame for the main function.

    push rbp, mov rbp, rsp, and sub rsp, 0x20 form the standard prologue for an x64 function.
    push rbp saves the caller's base pointer, mov rbp, rsp sets the current stack pointer as the new base pointer for main's stack frame, and sub rsp, 0x20 allocates 32 bytes for local variables and alignment.
    This appears in all code and doesn't directly correspond to C code.
    It's what the machine does in the background.

    You'll also see notations like [rbp - 0x10].
    This means rbp is the base pointer for the current stack frame, and subtracting the hexadecimal offset 0x10 (16) loads the local variable stored at that address (in this case, lower).

    printf Function

    Next, you can easily spot this part:

      2016e6: 8b 75 f8                      mov     esi, dword ptr [rbp - 0x8]
      2016e9: 8b 55 f4                      mov     edx, dword ptr [rbp - 0xc]
      2016ec: 48 bf d8 04 20 00 00 00 00 00 movabs  rdi, 0x2004d8
      2016f6: b0 00                         mov     al, 0x0
      2016f8: e8 c3 00 00 00                call    0x2017c0 <printf@plt>

    This corresponds to the printf() function inside the loop.
    mov esi, dword ptr [rbp - 0x8] loads fahr into esi (printf's second argument).
    mov edx, dword ptr [rbp - 0xc] loads celsius into edx (printf's third argument).
    movabs rdi, 0x2004d8 loads the address of the format string "%d\t%d\n" into rdi (printf's first argument).
    mov al, 0x0 sets al to 0, indicating no floating-point arguments are passed to printf.
    This is a requirement for x64 processors.

    Next, you'll see call 0x2017c0 <printf@plt>.
    This calls printf via the Procedure Linkage Table (PLT) used in dynamically linked binaries, resolving the address of printf in libc.so at runtime.
    In statically linked binaries, as we'll discuss later, the full printf implementation is included.

    Let's check the address 0x2004d8.
    First, press q to exit the current objdump instance.
    Then, run objdump -d -Mintel -s -j .rodata ./ftoc.

    In my case, the output looks like this:

    ./ftoc:  file format elf64-x86-64
    Contents of section .rodata:
     2004d8 25640925 640a0000                    %d.%d...
    
    Disassembly of section .rodata:
    
    00000000002004d8 <.rodata>:
      2004d8: 25 64 09 25 64                and     eax, 0x64250964
      2004dd: 0a 00                         or      al, byte ptr [rax]
      2004df: 00                            <unknown>

    In particular, look at 2004d8: 25 64 09 25 64 and eax, 0x64250964.
    % = 0x25
    d = 0x64
    Since x64 is a little-endian architecture, the byte sequence 25 64 represents %d.
    \t = 0x09
    Then, 25 64 repeats, which makes sense since %d is used twice.

    The next line shows 2004dd: 0a 00 or al, byte ptr [rax].
    \n = 0x0a
    The final 0x00 is the null terminator inserted by the compiler.

    This is where printf is defined.
    I won't go into detail, but if you search for address 2017c0, in my case, it's at the end of the binary.
    You'll see something like this:

    00000000002017c0 <printf@plt>:
      2017c0: ff 25 aa 21 00 00             jmp     qword ptr [rip + 0x21aa] # 0x203970 <printf+0x203970>
      2017c6: 68 02 00 00 00                push    0x2
      2017cb: e9 c0 ff ff ff                jmp     0x201790 <.plt>

    As homework, figure out what this means.

    However, this is printf for a dynamically linked binary.
    In a statically linked binary, the output would look like this:

    0000000000227320 <printf>:
      227320: 55                            push    rbp
      227321: 48 89 e5                      mov     rbp, rsp
      227324: 48 81 ec d0 00 00 00          sub     rsp, 0xd0
      22732b: 49 89 fa                      mov     r10, rdi
      22732e: 84 c0                         test    al, al
      227330: 74 26                         je      0x227358 <printf+0x38>
      227332: 0f 29 85 60 ff ff ff          movaps  xmmword ptr [rbp - 0xa0], xmm0
      227339: 0f 29 8d 70 ff ff ff          movaps  xmmword ptr [rbp - 0x90], xmm1
      227340: 0f 29 55 80                   movaps  xmmword ptr [rbp - 0x80], xmm2
      227344: 0f 29 5d 90                   movaps  xmmword ptr [rbp - 0x70], xmm3
      227348: 0f 29 65 a0                   movaps  xmmword ptr [rbp - 0x60], xmm4
      22734c: 0f 29 6d b0                   movaps  xmmword ptr [rbp - 0x50], xmm5
      227350: 0f 29 75 c0                   movaps  xmmword ptr [rbp - 0x40], xmm6
      227354: 0f 29 7d d0                   movaps  xmmword ptr [rbp - 0x30], xmm7
      227358: 48 89 b5 38 ff ff ff          mov     qword ptr [rbp - 0xc8], rsi
      22735f: 48 89 95 40 ff ff ff          mov     qword ptr [rbp - 0xc0], rdx
      227366: 48 89 8d 48 ff ff ff          mov     qword ptr [rbp - 0xb8], rcx
      22736d: 4c 89 85 50 ff ff ff          mov     qword ptr [rbp - 0xb0], r8
      227374: 4c 89 8d 58 ff ff ff          mov     qword ptr [rbp - 0xa8], r9
      22737b: 48 8b 05 ae 3c 07 00          mov     rax, qword ptr [rip + 0x73cae] # 0x29b030 <__stack_chk_guard>
      227382: 48 89 45 f8                   mov     qword ptr [rbp - 0x8], rax
      227386: 48 8d 85 30 ff ff ff          lea     rax, [rbp - 0xd0]
      22738d: 48 89 45 f0                   mov     qword ptr [rbp - 0x10], rax
      227391: 48 8d 45 10                   lea     rax, [rbp + 0x10]
      227395: 48 89 45 e8                   mov     qword ptr [rbp - 0x18], rax
      227399: 48 b8 08 00 00 00 30 00 00 00 movabs  rax, 0x3000000008
      2273a3: 48 89 45 e0                   mov     qword ptr [rbp - 0x20], rax
      2273a7: 48 8b 3d 22 ea 06 00          mov     rdi, qword ptr [rip + 0x6ea22] # 0x295dd0 <__stdoutp>
      2273ae: 48 8d 55 e0                   lea     rdx, [rbp - 0x20]
      2273b2: 4c 89 d6                      mov     rsi, r10
      2273b5: e8 46 40 00 00                call    0x22b400 <vfprintf>
      2273ba: 48 8b 0d 6f 3c 07 00          mov     rcx, qword ptr [rip + 0x73c6f] # 0x29b030 <__stack_chk_guard>
      2273c1: 48 3b 4d f8                   cmp     rcx, qword ptr [rbp - 0x8]
      2273c5: 75 09                         jne     0x2273d0 <printf+0xb0>
      2273c7: 48 81 c4 d0 00 00 00          add     rsp, 0xd0
      2273ce: 5d                            pop     rbp
      2273cf: c3                            ret
      2273d0: e8 1b cc 00 00                call    0x233ff0 <__stack_chk_fail_local>
      2273d5: 66 66 2e 0f 1f 84 00 00 00 00 00      nop     word ptr cs:[rax + rax]

    As additional homework, investigate the printf@plt (dynamic) and printf (static) functions.

    In static linking, the printf function and all its dependencies are included, but in dynamic linking, it only points to an address in the libc.so file somewhere on the system.

    In that case, you can check with objdump -d -Mintel /lib/libc.so.7 | less (the filename or path may vary by OS).You'll see something like this:

    00000000001cd370 <printf@plt>:
      1cd370: ff 25 82 1d 01 00             jmp     qword ptr [rip + 0x11d82] # 0x1df0f8
      1cd376: 68 93 01 00 00                push    0x193
      1cd37b: e9 b0 e6 ff ff                jmp     0x1cba30 <.plt>

    Unfortunately, you won't find the full definition in this object dump.
    However, you can dump the entire library with objdump -d /lib/libc.so.7 > libc_disasm.txt.
    Then, run less libc_disasm.txt, press /, and search for <printf>.
    You'll see the full definition, which looks like this (in AT&T syntax):

    000000000011b220 <printf>:
      11b220: 55                            pushq   %rbp
      11b221: 48 89 e5                      movq    %rsp, %rbp
      11b224: 53                            pushq   %rbx
      11b225: 48 81 ec d8 00 00 00          subq    $0xd8, %rsp
      11b22c: 49 89 fa                      movq    %rdi, %r10
      11b22f: 84 c0                         testb   %al, %al
      11b231: 74 29                         je      0x11b25c <printf+0x3c>
      11b233: 0f 29 85 50 ff ff ff          movaps  %xmm0, -0xb0(%rbp)
      11b23a: 0f 29 8d 60 ff ff ff          movaps  %xmm1, -0xa0(%rbp)
      11b241: 0f 29 95 70 ff ff ff          movaps  %xmm2, -0x90(%rbp)
      11b248: 0f 29 5d 80                   movaps  %xmm3, -0x80(%rbp)
      11b24c: 0f 29 65 90                   movaps  %xmm4, -0x70(%rbp)
      11b250: 0f 29 6d a0                   movaps  %xmm5, -0x60(%rbp)
      11b254: 0f 29 75 b0                   movaps  %xmm6, -0x50(%rbp)
      11b258: 0f 29 7d c0                   movaps  %xmm7, -0x40(%rbp)
      11b25c: 48 89 b5 28 ff ff ff          movq    %rsi, -0xd8(%rbp)
      11b263: 48 89 95 30 ff ff ff          movq    %rdx, -0xd0(%rbp)
      11b26a: 48 89 8d 38 ff ff ff          movq    %rcx, -0xc8(%rbp)
      11b271: 4c 89 85 40 ff ff ff          movq    %r8, -0xc0(%rbp)
      11b278: 4c 89 8d 48 ff ff ff          movq    %r9, -0xb8(%rbp)
      11b27f: 48 8b 1d 3a dc 0b 00          movq    0xbdc3a(%rip), %rbx     # 0x1d8ec0
      11b286: 48 8b 03                      movq    (%rbx), %rax
      11b289: 48 89 45 f0                   movq    %rax, -0x10(%rbp)
      11b28d: 48 8d 85 20 ff ff ff          leaq    -0xe0(%rbp), %rax
      11b294: 48 89 45 e0                   movq    %rax, -0x20(%rbp)
      11b298: 48 8d 45 10                   leaq    0x10(%rbp), %rax
      11b29c: 48 89 45 d8                   movq    %rax, -0x28(%rbp)
      11b2a0: 48 b8 08 00 00 00 30 00 00 00 movabsq $0x3000000008, %rax     # imm = 0x3000000008
      11b2aa: 48 89 45 d0                   movq    %rax, -0x30(%rbp)
      11b2ae: 48 8b 05 1b dd 0b 00          movq    0xbdd1b(%rip), %rax     # 0x1d8fd0
      11b2b5: 48 8b 38                      movq    (%rax), %rdi
      11b2b8: 48 8d 55 d0                   leaq    -0x30(%rbp), %rdx
      11b2bc: 4c 89 d6                      movq    %r10, %rsi
      11b2bf: e8 cc 11 0b 00                callq   0x1cc490 <vfprintf@plt>
      11b2c4: 48 8b 0b                      movq    (%rbx), %rcx
      11b2c7: 48 3b 4d f0                   cmpq    -0x10(%rbp), %rcx
      11b2cb: 75 0a                         jne     0x11b2d7 <printf+0xb7>
      11b2cd: 48 81 c4 d8 00 00 00          addq    $0xd8, %rsp
      11b2d4: 5b                            popq    %rbx
      11b2d5: 5d                            popq    %rbp
      11b2d6: c3                            retq
      11b2d7: e8 a4 07 0b 00                callq   0x1cba80 <__stack_chk_fail@plt>
      11b2dc: 0f 1f 40 00                   nopl    (%rax)

    Simple, right?
    Now, let's move on to the trickier parts!

    Trickier Parts

    The while loop condition is constructed like this:

      2016ca: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
      2016cd: 3b 45 ec                      cmp     eax, dword ptr [rbp - 0x14]
      2016d0: 7f 36                         jg      0x201708 <main+0x68>

    This checks the while loop condition, i.e., whether fahr is less than or equal to upper.
    mov eax, dword ptr [rbp - 0x8] loads fahr into eax.
    cmp eax, dword ptr [rbp - 0x14] compares fahr with upper (stored at rbp - 0x14).
    Together, these represent fahr <= upper.
    jg 0x201708 jumps to address 201708 (the end of the loop) if fahr is greater than upper.
    This is the negation of fahr <= upper, so the loop continues if fahr is less than or equal to upper.

    Inside the loop, you'll see code like this:

      2016d2: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
      2016d5: 83 e8 20                      sub     eax, 0x20
      2016d8: 6b c0 05                      imul    eax, eax, 0x5
      2016db: b9 09 00 00 00                mov     ecx, 0x9
      2016e0: 99                            cdq
      2016e1: f7 f9                         idiv    ecx
      2016e3: 89 45 f4                      mov     dword ptr [rbp - 0xc], eax

    This calculates celsius = 5 * (fahr - 32) / 9.
    mov eax, dword ptr [rbp - 0x8] loads fahr into eax.
    sub eax, 0x20 subtracts 32 (0x20) from eax, corresponding to fahr - 32 in C.
    imul eax, eax, 0x5 multiplies eax by 5, corresponding to 5 * (fahr - 32) in C.
    mov ecx, 0x9 loads 9 into ecx (the divisor).
    cdq sign-extends eax into edx:eax to prepare for the idiv instruction.
    This is used to perform signed division on the 64-bit value in edx:eax, correctly handling negative numbers for 5 * (fahr - 32) / 9.
    idiv ecx divides edx:eax by ecx (9), storing the quotient in eax.
    mov dword ptr [rbp - 0xc], eax stores the result in celsius (rbp - 0xc).
    In other words, one line of C code requires seven assembly instructions.

    Next, the following calculation is performed:

      2016fd: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
      201700: 03 45 e8                      add     eax, dword ptr [rbp - 0x18]
      201703: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax

    With all the explanations so far, you should understand that this means fahr = fahr + step;.

    Then, you'll see this:

      201706: eb c2                         jmp     0x2016ca <main+0x2a>

    This jumps back to the start of the while loop (address 2016ca) to check if fahr is still less than or equal to upper.
    If the answer is false, the loop ends; if true, it continues.

    Finally, there's this:

      201708: 8b 45 fc                      mov     eax, dword ptr [rbp - 0x4]
      20170b: 48 83 c4 20                   add     rsp, 0x20
      20170f: 5d                            pop     rbp
      201710: c3                            ret

    This cleans up the stack frame and returns from main.
    mov eax, dword ptr [rbp - 0x4] loads the value at rbp - 0x4 into eax.
    This is the return value for main, implicitly initialized at address 2016a8 (mov dword ptr [rbp - 0x4], 0x0).

    In C, if main doesn't explicitly specify a return value, the C standard returns 0.
    In assembly, mov dword ptr [rbp - 0x4], 0x0 (at address 2016a8) initializes fahr, not the return value.
    The actual return value is set at the end with mov eax, dword ptr [rbp - 0x4].
    add rsp, 0x20 frees the 32 bytes of stack space.
    pop rbp restores the caller's base pointer.
    ret returns to the caller.

    Still pretty simple, right?

    Explicit vs. Implicit Return Value

    In this example, we omitted return 0;, which is allowed in C and C++ only for int main().
    However, adding return 0; to the C code slightly changes the assembly output.

    00000000002016a0 <main>:
      2016a0: 55                            push    rbp
      2016a1: 48 89 e5                      mov     rbp, rsp
      2016a4: 48 83 ec 20                   sub     rsp, 0x20
      2016a8: c7 45 fc 00 00 00 00          mov     dword ptr [rbp - 0x4], 0x0
      2016af: c7 45 f0 00 00 00 00          mov     dword ptr [rbp - 0x10], 0x0
      2016b6: c7 45 ec 2c 01 00 00          mov     dword ptr [rbp - 0x14], 0x12c
      2016bd: c7 45 e8 14 00 00 00          mov     dword ptr [rbp - 0x18], 0x14
      2016c4: 8b 45 f0                      mov     eax, dword ptr [rbp - 0x10]
      2016c7: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax
      2016ca: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
      2016cd: 3b 45 ec                      cmp     eax, dword ptr [rbp - 0x14]
      2016d0: 7f 36                         jg      0x201708 <main+0x68>
      2016d2: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
      2016d5: 83 e8 20                      sub     eax, 0x20
      2016d8: 6b c0 05                      imul    eax, eax, 0x5
      2016db: b9 09 00 00 00                mov     ecx, 0x9
      2016e0: 99                            cdq
      2016e1: f7 f9                         idiv    ecx
      2016e3: 89 45 f4                      mov     dword ptr [rbp - 0xc], eax
      2016e6: 8b 75 f8                      mov     esi, dword ptr [rbp - 0x8]
      2016e9: 8b 55 f4                      mov     edx, dword ptr [rbp - 0xc]
      2016ec: 48 bf d8 04 20 00 00 00 00 00 movabs  rdi, 0x2004d8
      2016f6: b0 00                         mov     al, 0x0
      2016f8: e8 b3 00 00 00                call    0x2017b0 <printf@plt>
      2016fd: 8b 45 f8                      mov     eax, dword ptr [rbp - 0x8]
      201700: 03 45 e8                      add     eax, dword ptr [rbp - 0x18]
      201703: 89 45 f8                      mov     dword ptr [rbp - 0x8], eax
      201706: eb c2                         jmp     0x2016ca <main+0x2a>
      201708: 31 c0                         xor     eax, eax
      20170a: 48 83 c4 20                   add     rsp, 0x20
      20170e: 5d                            pop     rbp
      20170f: c3                            ret

    First, the debug padding is gone.
    The relevant changed lines are:

    201708: 8b 45 fc                      mov     eax, dword ptr [rbp - 0x4]

    Instead, you'll see this:

    201708: 31 c0                         xor     eax, eax

    In other words, instead of looking up the return value, the binary directly accesses 0 as the return value.
    This speeds up processing and saves a few bytes.

    Conclusion

    Overall, it's not that difficult.
    It might seem intimidating at first, but once you understand what's happening, learning assembly becomes easy quickly.
    Once you understand assembly, all software becomes open-source.

    That's all