File Size Discrepancies Across File Systems

If data is copied between the center-wide $CENTER and $ARCHIVE archival file systems, command-line programs may report a different file size for the same file on the different file systems. The cause of this discrepancy is that some tools report the amount of data contained in the file itself (i.e. the exact number of bytes in the file) whereas some tools report the amount of space the file system has allocated for the individual file.  These two reported values may differ depending on the type of file system.

For efficiency reasons, file systems will allocate a minimum amount of space for each file, called a block, even if the file is smaller than the total space allocated.  This minimal difference is observable when comparing the reported size of an identical file stored on both $CENTER and $ARCHIVE, as shown in the following example.

pacman12% du -h $CENTER/testfortran.f90 4.0K /center/w/username/testfortran.f90 pacman12% cp $CENTER/testfortran.f90 $ARCHIVE pacman12% du -h $ARCHIVE/testfortran.f90 2.0M /archive/u1/uaf/username/testfortran.f90

If you wish to ensure your data has maintained its integrity during the copy process, the best approach is to compare the original and new copies following the copy between $CENTER and $ARCHIVE.  For two files, the "cmp" tool can be used:

cmp ${CENTER}/myfile.txt ${ARCHIVE}/myfile.txt

If you have many files in two directories, and the names of the files have remained the same, the "diff" tool can be applied recursively:

diff -r ${CENTER}/result_data ${ARCHIVE}/result_data

If the file data in the directories is exactly the same, "diff" will output nothing. Otherwise, "diff" will output the names of the files which do not match and show the differences between them. (Also, there are checksum programs available - for example, cksum and md5sum - which can be used instead of cmp and diff, if you wish to record before-and-after checksum values.)

Be aware that comparing files may take a very long time if there are many files being compared.

And finally, remember that copying files may preserve the file data exactly, yet modify the metadata of the file. For example, use the command "cp -p" or "cp -a" instead of only "cp" if you wish to preserve the original timestamps.

Back to Top