We have some very strange memory corruption problem in one of our daemons. As one of the many approaches to find the bug, we have built and pushed to production this daemon, but without jemalloc, memory allocator library that we use in all our projects.
Few hours after the release monitoring team reported an unusal amount of errors returned from the system. It happens that few percent of the requests returned "out of memory" error. It was strange because there were a lot of a memory available in the system.
Output of a strace utility showed that mprotect() call that we use to install a guard page for a coroutines stack returns an error:
strace -emprotect -ttt -T -p 25754 -c
1438269965.801936 mmap(NULL, 69632, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f73baf9c000 <0.000092>
1438269965.802065 mprotect(0x7f73baf9c000, 4096, PROT_NONE) = -1 ENOMEM (Cannot allocate memory) <0.000005>
First we thought that this could not happen because mmap returned without an error and that something is wrong with the kernel we are using. Especially because this particular machine have very old kernel
# uname -a
Linux cppbig31 3.0.82-0.7.99-default #1 SMP Thu Jun 27 13:19:18 UTC 2013 (6efde93) x86_64 x86_64 x86_64 GNU/Linux
But brief googling and kernel source gave us a much more simple answer: memory mappings limit. mmap() call creates a map, mprotect() can split one mapping into two. And we have a lot of these calls.
[email protected]:~# cat /proc/25754/maps | wc -l
[email protected]:~# cat /proc/sys/vm/max_map_count
libc memory allocator seems to be another source of a maps because similar daemon with jemalloc doesn't use so much mappings:
# cat /proc/5667/maps | wc -l
This is how one can increase the limit:
echo "655300" > /proc/sys/vm/max_map_count