Re: [Nolug] Fastest way to create a local mirror of a huge tree? from Jerry Wilborn on 2011-09-09 (nolugarchives)

From: Jerry Wilborn <jerrywilborn_at_gmail.com>
Date: Fri, 9 Sep 2011 08:28:20 -0500
Message-ID: <CAK2QZfTv9d2V8j_x9EVocONza80h8Hqk9w7uv8yMZSLX1WSD5Q@mail.gmail.com>

You're trading the overhead of reading non-filedata blocks for the overhead
of having to go through a filesystem to fetch the data you do want. With
large numbers of small files I'd be willing to bet that dd would outperform
any file-level copy on modern hardware.

This is (the guts of) 'cp file1 file2' with file1 containing the string
'hello':
stat("file2", 0x7fff3eb822c0) = -1 ENOENT (No such file or
directory)
stat("file1", {st_mode=S_IFREG|0644, st_size=6, ...}) = 0
stat("file2", 0x7fff3eb82110) = -1 ENOENT (No such file or
directory)
open("file1", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0
open("file2", O_WRONLY|O_CREAT, 0100644) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0
read(3, "hello\n", 4096) = 6
write(4, "hello\n", 6) = 6
read(3, "", 4096) = 0
close(4) = 0
close(3) = 0

Additionally, with DD you're eliminating all of the variability in the
filesystem and would at a minimum get a more firm estimate of when the copy
would complete.

Jerry Wilborn
jerrywilborn@gmail.com

On Thu, Sep 8, 2011 at 11:37 PM, Jimmy Hess <mysidia@gmail.com> wrote:

> On Wed, Sep 7, 2011 at 7:43 AM, B. Estrade <estrabd@gmail.com> wrote:
> > On Wed, Sep 07, 2011 at 06:34:34AM -0500, Ron Johnson wrote:
> >> If I wanted to make *bad* choices, I'd go back to Windows!
> >
> > Why is dd a bad choice?
>
> Because DD of a live filesystem copies a large number of bits you don't
> need...
> Like filesystem blocks that correspond to free or unused space.
> Like filesystem metadata structure, inode blocks, padding.
> There are a lot of low-level details in the file system, that really
> don't need to pass through
> the network.
>
> Zero blocks on the block device that have not been used, and blocks
> that belong to deleted files.
> He didn't say he was trying to mirror a cramfs or other inherently
> space efficient
> filesystem that has no such thing as an "unused block".
>
> Now the "real" efficient method for mirroring massive directories
> doesn't exist on Linux, but Solaris 11.
> At the dataset level
> zfs send -D ....
> zfs recv ....
>
> Which when compiled with LZJB compression and snapshots, also
> provides a block layer
> mechanism for transmitting incremental changes, that is much more
> efficient than rsync for
> sufficiently massive number of files.
>
> Even in incremental mode where an initial copy has been done, RSYNC
> seriously tanks
> or has enormous memory requirements if there is a sufficiently massive
> number of files
> (e.g. backing up Maildirs where you might have a few hundred million
> files to deal with).
>
> In those cases, even a basic tar command piped into a NC/RSH command
> can
> result in enormously faster transfer progress than rsync.
>
> :/
>
>
> --
> -JH
> ___________________
> Nolug mailing list
> nolug@nolug.org
>

___________________
Nolug mailing list
nolug@nolug.org
Received on 09/09/11