Re: [Nolug] Fastest way to create a local mirror of a huge tree?

From: Jerry Wilborn <jerrywilborn_at_gmail.com>
Date: Wed, 7 Sep 2011 07:50:26 -0500
Message-ID: <CAK2QZfSoamHbYYhs-B-XeHwDM+Pt=jCD2QcxrVEAMHdo_e93Gg@mail.gmail.com>

This is something I use from time-to-time to sync up large amounts of data
(millions of files consuming hundreds of gigs). If you have large numbers
of sub directories this script may help you copy them in parallel. It's
still not as fast as a dd, but you should be able to crank up the
throughput. The /tmp/maxchildren file can be tuned as its running, but at
some point you reach diminishing returns.

Jerry Wilborn
jerrywilborn@gmail.com

On Wed, Sep 7, 2011 at 7:43 AM, B. Estrade <estrabd@gmail.com> wrote:

> On Wed, Sep 07, 2011 at 06:34:34AM -0500, Ron Johnson wrote:
> > If I wanted to make *bad* choices, I'd go back to Windows!
>
> Why is dd a bad choice?
>
> And rsync doesn't "datamine". It uses rolling checksums and other
> heuristics (with an incredibly high probability of success) to determine
> what bits of a file to transmit, thus making it very efficient for
> r"sync"'ing. That you have to initially transfer all files the first
> time is a consequence of the case where the target mirror is in no way
> similar to the source. Maybe there is a "do initial copy" mode, but I
> doubt it.
>
> Maybe you want to ghost your machine, if so check out g4u. You won't
> get any speed benefits from it, though.
>
> Bret
>
> >
> > On 09/07/2011 06:21 AM, Brad Bendily wrote:
> >>Because you can?
> >>Isn't that what Linux is all about, choices!
> >>
> >>
> >>
> >>On Sep 7, 2011, at 5:54 AM, Ron Johnson<ron.l.johnson@cox.net> wrote:
> >>
> >>>Why in God's name would I dd-over-ssh on a single machine?
> >>>
> >>>On 09/07/2011 05:46 AM, B. Estrade wrote:
> >>>>I like the dd (over ssh) idea.
> >>>>
> >>>>B. Estrade<estrabd@gmail.com>
> >>>>On Sep 7, 2011 12:23 AM, "Jimmy Hess"<mysidia@gmail.com> wrote:
> >>>>>On Tue, Sep 6, 2011 at 10:21 PM, Ron Johnson<ron.l.johnson@cox.net>
> >>>>wrote:
> >>>>>>The mirror doesn't exist yet, so rsync's clever data-minimizing
> >>>>algorithms
> >>>>>>aren't valid. (Also, there are lots of symlinks that need be
> >>>>>>preserved.)
> >>>>>>1. cp -av /data /mnt/backups/data
> >>>>>>2. cd /data&& tar -cvf ??? . | (cd /mnt/backups/data&& tar -xpvf
> -)
> >>>>>>3. rsync -avz --stats --progress /data /mnt/backups/data
> >>>>>>
> >>>>>Netcat + XZ or gzip + CPIO or tar.
> >>>>>
> >>>>>nc otherhost portnumber | xz -d | cpio -idm
> >>>>>find pathname -print | cpio -o -Hnewc | xz -1 | nc -l portnumber
> >>>>>
> >>>>>Then rsync to reconcile.
> >>>>>
> >>>>>
> >>>>>If on a dedicated partition, consider a block-based tool that won't
> >>>>>need to traverse the filesystem
> >>>>>directory structure and won't need to copy unused disk blocks, e.g.
> >>>>partimage.
> >>>>>
> >
> > --
> > Supporting World Peace Through Nuclear Pacification
> > ___________________
> > Nolug mailing list
> > nolug@nolug.org
>
> --
> B. Estrade <estrabd@gmail.com>
> ___________________
> Nolug mailing list
> nolug@nolug.org
>

___________________
Nolug mailing list
nolug@nolug.org

Received on 09/07/11

This archive was generated by hypermail 2.2.0 : 09/07/11 EDT