• Uncategorized

About linux : glibc-function-fails-on-large-file

Question Detail

I have a utility I wrote years ago in C++ which takes all the files in all the subdirectories of a given directory and moves them to new numbered subdirectories based on a count of the files. It has worked without error for several years.

Yesterday it failed for the first time. It always fails on a 2.7Gig video file, perhaps the largest this utility has ever encountered. The file itself is not corrupt. It will play in a video player. I can move it with command line or file manager apps without a problem.

I use ntfw() to walk the directory subtree. On this file, ntfw() returns an error code of -1 on encountering the file, before calling my callback function. Since (I thought) the code is only dealing with filenames and not actually opening or reading the file, I don’t understand why the file size should be an issue.

The number of open file descriptors is not the problem. Nor the number of files. It was in a subtree of over 5,000 files, but when moving it to one of only 50 it still fails, while the original subtree is processed without error. File permissions are not the problem. This file has the same as all the others. This includes ACL permissions.

The question is: Is file size the issue? Why?

The file system is ext4.

ldd --version /usr/lib/i386-linux-gnu/libc.so
ldd (Ubuntu GLIBC 2.27-3ubuntu1.4) 2.27
Linux version 4.15.0-161-generic ([email protected])
(gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04))
#169-Ubuntu SMP Fri Oct 15 13:39:59 UTC 2021

Question Answer

As you’re using a 32-bit application, in order to work properly with files larger than 2 GB you should compile with -D_FILE_OFFSET_BITS=64 in order to use 64-bit file handling syscalls and types.

In particular, nftw() calls stat() which fails with EOVERFLOW if the size of the file exceeds 2 GB: https://man7.org/linux/man-pages/man2/stat.2.html

Also, for using mmap() (which it seems you’re not using, but just in case as a comment was mentioning it), you can’t allocate all of 4 GB, some of the address space is reserved for the kernel (typically 1 GB on Linux). Then some space is used by the stack(s), shared libraries etc. Maybe you’ll be able to map 2 GB at a time, if you’re lucky.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.