2025-10-08 21:43:12
Suwako
lowlevel
c

【C Language】Why Doesn't the C Language Have a String Type?

For many high-level programmers, the string type is an essential variable type.
C++ has std::string, and languages like Go, PHP, C#, Java, and JavaScript have string.
Rust, for some reason, has various string types.
However, C and Zig stand out as some of the few languages with no string type at all.
Why is that?

What is a String?

To understand what a string is, we first need to explain what a character is.
A character is simply an 8-bit number.
ASCII characters range from 00 to 7F in 7 bits (7F is 0111 1111 in binary) but are stored as 8 bits.
SHIFT-JIS is a variable-length encoding where single-byte characters are ASCII-compatible, and two-byte characters use specific ranges.

In C, a string is essentially an array of char values terminated by a null character (\0).
Arrays have a fixed size in memory, but the logical length of a string (up to the null terminator) can vary.
C strings are stored in fixed-size memory (static or dynamic).
To resize a string, you need to use pointers and manual memory allocation (malloc, realloc, etc.).
This is different from high-level languages that handle this automatically.
Zig follows the same philosophy for strings.

In assembly language, which I've been writing a lot lately, characters are stored using the db (Define Byte) directive, assigning 8-bit values.
These bytes are interpreted as characters based on the encoding.
As mentioned earlier, a byte is 8 bits, and C's char is also 8 bits.
Zig simplifies this with variables like u8, u16, u32, i8, i16, i32, etc.

Thus, in a high-level language, writing string foo = "Hello World"; translates in C to char foo[12] = "Hello World"; and in Zig to const foo: [12]u8 = "Hello World";.
You might wonder why the length is 12 instead of 11.
This is to reserve space for the null terminator.
C, Zig, or assembly programmers must manually account for this.
In other words, the string "Hello World" is actually ['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0'].
Failing to include the null terminator can lead to undefined behavior, such as reading beyond allocated memory or outputting garbage data to the terminal.
You can use const char *foo = "Hello World"; to point to a string literal stored in read-only memory.
For dynamic strings, you'd allocate memory with char *foo = malloc(12); and need to manually free it to avoid memory leaks.
Failing to free memory causes the issues mentioned earlier.
C++ introduced features like smart pointers and safe containers in C++11 and beyond to improve memory safety, but these remain optional to allow low-level programming.

Why Manage Memory Manually? Isn't the Future Memory-Safe?

That's what Rust zealots want you to believe.
In reality, it depends on the type of software you're working on.

Sure, using memory-safe languages is a smart choice for most fields, but it's not ideal for game development, embedded systems, or legacy systems where every bit of memory matters.
Memory safety is still an abstraction, meant to free you from worrying about the underlying technology, but it comes at the cost of losing some control flow, performance, and added complexity.
Game developers need to squeeze maximum performance from hardware for high frame rates, impressive graphics, or great music.
Embedded hardware operates in highly constrained environments, sometimes with as little as 256 bytes of RAM and a single-core 740kHz CPU.
In embedded systems, every bit and byte counts, so memory safety isn't an option.

Linux, FreeBSD, Android, and Windows are increasingly adding Rust code to their kernels, and fully Rust-based OSes have been proven to work, but I don't think this will be beneficial in the long term.
The kernel is the most critical component of a computer's operating system, and full memory control remains essential.

On the other hand, for userland software, smartphone applications, server daemons, web backends, etc., memory control is less critical, so feel free to use Rust, Zig, C++11 or later, or other memory-safe languages if you want.

Of course, you could build a web server in Bash or make a roller coaster in Excel, and while such strange ideas are great, they're not a good idea for production environments.
That's why Apache and Nginx are written in C, and RollerCoaster Tycoon is written in assembly language.
That's why, despite memory safety demands, we still write in C, C++99, and assembly language.

Conclusion

Ultimately, the reason C, Zig, and assembly language don't have a string type is simply because they don't need one.
If you're used to high-level languages (C++11 and above, Rust, JavaScript, C#, etc.), this might be confusing.
However, once you understand how computers handle strings, you'll see why this is the case and even come to appreciate the absence of a string type.

That's all