mmap memory mapping Cmmap in C: High-Performance Memory Mapping and Shared Memory Guide
Master the mmap system call in C. Learn how to map files to memory for high-performance I/O, implement shared memory, and use anonymous mappings.
mmap is one of the most powerful and versatile system calls in the Unix-like programming world. For C developers, it represents the bridge between memory management and file I/O, allowing you to map a file or device directly into your process's virtual address space.
Instead of using traditional read() and write() calls to move data between a file and a buffer, mmap allows you to treat a file as if it were a giant array in memory. This "Memory-Mapped I/O" approach can significantly boost performance, simplify code, and enable advanced features like shared memory between processes.
This comprehensive guide explores the mmap system call in detail—from basic file mapping and anonymous memory allocation to shared memory IPC (Inter-Process Communication) and performance tuning. Whether you are building a high-performance database or a simple file processor, mastering mmap is essential for modern system-level C programming.
What Is mmap?
mmap() (short for Memory Map) is a system call that creates a new mapping in the virtual address space of the calling process. It establishes a direct relationship between a region of virtual memory and a "backing store," which is usually a file on disk or an anonymous region of RAM.
When a file is mapped, the operating system's kernel handles the movement of data between disk and memory using the virtual memory paging system. This means data is only loaded into RAM when you actually access it (demand paging), and changes are written back to disk by the kernel's background flusher.
mmap vs. malloc
While both can allocate memory, malloc is a C library function that manages a heap on top of system-level calls like brk or mmap. Large malloc requests (typically >128KB) are often implemented using mmap under the hood. However, calling mmap directly gives you much more control over permissions, page alignment, and file synchronization.
The mmap Function Signature
#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
Parameter Breakdown
| Parameter | Type | Description |
|---|---|---|
addr | void * | Hint for the starting address. Almost always passed as NULL to let the kernel decide. |
length | size_t | Total size of the mapping in bytes (should be a multiple of page size). |
prot | int | Memory protection: PROT_READ, PROT_WRITE, PROT_EXEC, or PROT_NONE. |
flags | int | Mapping type: MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS, etc. |
fd | int | File descriptor of the file to map (use -1 for anonymous). |
offset | off_t | Starting point in the file (must be a multiple of page size). |
8-12 Use Cases for mmap
- High-Performance File I/O – Reading and writing large files without the overhead of multiple
read()/write()system calls. - Shared Memory IPC – Multiple processes mapping the same file or anonymous region to communicate at RAM speeds.
- Database Buffer Pools – Managed mapping of large data files for fast random access.
- Executable Loading – The OS kernel uses
mmapwithPROT_EXECto load program code and libraries into memory. - Anonymous Memory Allocation – Implementing custom allocators (like
mallocitself) for large blocks of memory. - Zero-Copy Networking – Mapping network card buffers directly into user space for ultra-low latency.
- Direct Device Access – Mapping hardware registers (e.g., frame buffers for graphics) directly into memory.
- Logging and Persistence – Mapping a log file so that updates are automatically persisted by the OS, even if the program crashes.
- Big Data Processing – Accessing datasets larger than physical RAM by letting the OS handle swap and paging.
- Memory-Saving Clones – Using
MAP_PRIVATEwithfork()to allow children to share the parent's memory until they modify it (Copy-on-Write). - Fault Isolation – Using
mprotect()on mapped regions to catch illegal access or prevent changes during critical operations. - Fixed-Address Allocation – Mapping specific hardware addresses for embedded or kernel-level work (dangerous and platform-specific).
6-10 Practical Examples
Example 1: Basic File Read Mapping
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
void read_file_mmap(const char *filename) {
int fd = open(filename, O_RDONLY);
struct stat st;
fstat(fd, &st);
char *map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (map == MAP_FAILED) return;
// Use file content directly like a string!
printf("First byte: %c\n", map[0]);
munmap(map, st.st_size);
close(fd);
}
Example 2: Shared Memory between Processes
void shared_memory_example() {
// MAP_ANONYMOUS | MAP_SHARED creates memory shared with children
int *shared = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
*shared = 100;
if (fork() == 0) {
*shared = 200; // Child changes it
exit(0);
}
wait(NULL);
printf("Parent sees: %d\n", *shared); // 200
}
Example 3: Private CoW (Copy-on-Write) Mapping
If you map a file with MAP_PRIVATE and PROT_WRITE, the kernel will create a new copy of the page only when you try to change it. Your changes are never written back to the disk.
Example 4: Using Anonymous Memory for Custom Allocators
void *custom_huge_alloc(size_t size) {
return mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
}
Example 5: Appending to a Pervasive Log
Mapping a file and using msync to force the kernel to write changes to disk immediately.
void sync_to_disk(void *addr, size_t len) {
msync(addr, len, MS_SYNC); // Force physical write
}
Example 6: Reading a Massive File (Multi-GB)
You can map a 10GB file even if you only have 4GB of RAM. The OS will just keep the parts you are currently reading in memory.
Example 7: MAP_FIXED for Specific Addresses
(Advanced) Forcing a mapping to a specific address. If the address is already taken, it will be unmapped first. Extremely risky.
8-12 Tips for Performance and Best Practices
- Alignment matters – The
offsetandlengthshould always be multiples of the system page size (usually 4KB). Usesysconf(_SC_PAGESIZE)to check. - Handle MAP_FAILED –
mmapdoes NOT returnNULLon error. It returnsMAP_FAILED(usually -1 cast tovoid *). - ** munmap for Cleanup** – Always unmap memory when it's no longer needed to avoid virtual address space fragmentation.
- msync Strategy – Use
MS_ASYNCfor background flushing orMS_SYNCif you need a guarantee that data reached the disk. - Protect Your Pages – Use
PROT_NONEto create "guard pages" that cause a segmentation fault if accessed, helping catch bugs. - Madvise for Hints – Use
madvise()to tell the kernel your access pattern (e.g.,MADV_SEQUENTIALorMADV_RANDOM). This helps the kernel's pre-fetching. - Huge Pages – For massive allocations, use
MAP_HUGETLB(Linux) to use 2MB or 1GB pages, reducing TLB miss overhead. - Check File Limits – You cannot map a zero-length file. If you need to grow a file before mapping it, use
ftruncate(). - Thread Safety – While
mmapitself is thread-safe, the standard C access to the resulting memory is not. Use mutexes or atomics for shared access. - Avoid Over-Mapping – Every mapping uses kernel resources. Don't map thousands of tiny files; better to map one large file containing multiple objects.
- Understand Signal Handling – If the underlying file is truncated by another process while you hold the mapping, accessing that memory will trigger a
SIGBUSsignal. - Virtual Memory vs RAM – Remember that
mmapuses virtual address space. On 32-bit systems, this is a scarce resource (max 4GB). On 64-bit systems, it's effectively infinite.
Troubleshooting mmap Common Issues
Issue 1: MAP_FAILED (Invalid Argument)
Problem: mmap returns MAP_FAILED.
Cause: Often an unaligned offset or length, or invalid combinations of flags.
Solution: Ensure offset % pagesize == 0. Check that MAP_PRIVATE or MAP_SHARED is present.
Issue 2: SIGBUS Crash
Problem: The program crashes with a "Bus Error" when reading from the mapping.
Cause: You mapped a file of length N, but the file was shrunk to length < N by another process.
Solution: Check the file size before access or handle SIGBUS with a signal handler.
Issue 3: SIGSEGV (Invalid Permission)
Problem: Accessing memory causes a segmentation fault.
Cause: Trying to write to a mapping that was created with PROT_READ only.
Solution: Ensure prot includes PROT_WRITE.
Issue 4: Memory Not Syncing to Disk
Problem: You wrote to a MAP_SHARED file mapping, but the file on disk didn't change.
Cause: The kernel hasn't flushed the page yet, or you used MAP_PRIVATE.
Solution: Use msync() or ensure you close the file properly. Double-check you are using MAP_SHARED.
Related Concepts
Virtual Memory Paging
The underlying hardware mechanism that enables mmap. The CPU and OS work together to swap pages in and out of RAM.
Zero-Copy
Techniques that avoid copying data between kernel and user space. mmap is the foundation of many zero-copy patterns.
IPC (Inter-Process Communication)
mmap with MAP_SHARED is one of the fastest ways to share data between unrelated processes.
Frequently Asked Questions
Is mmap always faster than fread?
Not always. For small, sequential files, fread might be faster due to library-level buffering. For large files or random access, mmap usually wins by avoiding system call overhead.
can I map a directory?
No. mmap works only on files, devices, or anonymous regions.
what happens to a mapping when a process exits?
All mappings are automatically unmapped when the process terminates. However, changes in MAP_SHARED are persisted to disk.
how do I check the system page size?
Use getpagesize() or sysconf(_SC_PAGESIZE). On most X86 systems, it's 4096 bytes.
can I grow a file by writing to an mmap region?
No. You cannot write past the initial mapped length. To grow a file, you must use ftruncate() before or after mapping.
what is anonymous mapping?
Mapping that isn't backed by any file. It is just a block of zero-initialized RAM. It is the modern alternative to sbrk/brk.
Is mmap portable?
It is part of the POSIX standard. However, some flags (like MAP_ANONYMOUS) might have different names (like MAP_ANON) on different Unix systems. Windows has a different but similar API called "File Mapping."
what is PROT_EXEC?
It allows the memory to be executed as code. This is essential for JIT compilers but should be used sparingly for security reasons (W^X - Write XOR Execute).
Quick Reference Card
| Flag | Effect | Use Case |
|---|---|---|
MAP_SHARED | Changes are visible to all and persisted | Shared data, large file I/O |
MAP_PRIVATE | Changes are local (Copy-on-Write) | Loading binaries, temp work |
MAP_ANONYMOUS | No file backing | Custom heap/allocator |
Try MemC to Visualize Mapped I/O
Ever wondered how a file "becomes" memory? MemC includes a simulation of the virtual memory table. Watch as our virtual MMU maps disk blocks to RAM addresses in real-time.
Summary
mmap is a cornerstone of advanced C programming, providing the ultimate control over how your application interacts with memory and storage. By treating files as memory, you can achieve performance and simplicity that traditional I/O simply cannot match.
- Use mmap for large file access and shared memory.
- Remember alignment and page size rules.
- Always check for MAP_FAILED.
- Use msync when data persistence is critical.
Whether you're developing high-throughput systems or optimizing existing applications, the power of mmap is some of the most potent mojo in a C programmer's toolkit.