Re: [Nolug] Fastest way to create a local mirror of a huge tree? from Jimmy Hess on 2011-09-09 (nolugarchives)

From: Jimmy Hess <mysidia_at_gmail.com>
Date: Thu, 8 Sep 2011 23:37:07 -0500
Message-ID: <CAAAwwbWr6ASO-0WVY2sTTfW_RH2=9bc-Vgyj_SivGvYDD1PCFw@mail.gmail.com>

On Wed, Sep 7, 2011 at 7:43 AM, B. Estrade <estrabd@gmail.com> wrote:
> On Wed, Sep 07, 2011 at 06:34:34AM -0500, Ron Johnson wrote:
>> If I wanted to make *bad* choices, I'd go back to Windows!
>
> Why is dd a bad choice?

Because DD of a live filesystem copies a large number of bits you don't need...
Like filesystem blocks that correspond to free or unused space.
Like filesystem metadata structure, inode blocks, padding.
There are a lot of low-level details in the file system, that really
don't need to pass through
the network.

Zero blocks on the block device that have not been used, and blocks
that belong to deleted files.
He didn't say he was trying to mirror a cramfs or other inherently
space efficient
filesystem that has no such thing as an "unused block".

Now the "real" efficient method for mirroring massive directories
doesn't exist on Linux, but Solaris 11.
At the dataset level
zfs send -D ....
zfs recv ....

Which when compiled with LZJB compression and snapshots, also
provides a block layer
mechanism for transmitting incremental changes, that is much more
efficient than rsync for
sufficiently massive number of files.

Even in incremental mode where an initial copy has been done, RSYNC
seriously tanks
or has enormous memory requirements if there is a sufficiently massive
number of files
(e.g. backing up Maildirs where you might have a few hundred million
files to deal with).

In those cases, even a basic tar command piped into a NC/RSH command can
result in enormously faster transfer progress than rsync.

--
-JH
___________________
Nolug mailing list
nolug@nolug.org

Received on 09/09/11