How to deal with sparse files

Jephe Wu - http://linuxtechres.blogspot.com

Objective: understanding sparse files under Linux
Environment: CentOS 5.5 64bit


1.  What is the sparse file?
a sparse file is a type of computer file that attempts to use file system space more efficiently when blocks allocated to the file are mostly empty

http://en.wikipedia.org/wiki/Sparse_file

dd if=/dev/zero of=sparse-file bs=1k count=0 seek=5120


Will create a file of five megabytes in size, but with no data stored on disk (only metadata).

2. How to detect it

ls -lhs sparsefile  (it's sparse file once you see the different sizes)
or
du -sh sparsefile  (check the real size)
ls -lh sparsefile   (check the "visible" size)

3. how to copy/transfer with sparse file
a. transfer through network
tar cvzSpf - sparsefiles |ssh jephe@server '(cd /path/to; tar xzSpf -)'

b. copy locally
cp [ --sparse=always ] sparsefile newsparsefile
note: after copying, use 'ls -lhs newsparsefile' to check if it's sparse file

c. tar locally
tar Scvpzf sparsefile.tar sparsefiles

d. rsync to remote server
rsync -S --progress sparsefile jephe@server:newsparsefile

4. how to create sparse files under CentOS 5
(https://access.redhat.com/kb/docs/DOC-2282)
# dd if=/dev/zero of=jephe.img bs=1M count=1 seek=4K
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.005998 seconds, 175 MB/s
# ls -lh jephe.img
-rw-rw-r-- 1 user user 4.1G Oct 17 10:55 jephe.img
# du -h jephe.img
2.1M    jephe.img
# mkfs.ext3 jephe.img
...
# du -h jephe.img
196M    jephe.img
# mount -o loop gbfs.img /mnt/gbfs
# df -h /mnt/jephe/
Filesystem            Size  Used Avail Use% Mounted on
/shared/jephe.img      4.0G  137M  3.7G   4% /mnt/jephe

5. examples of sparse files
a. /var/log/faillog
b. KVM guest disk image file
c. database snapshots