Re: [Nolug] Getting rid of old files

From: Mark A. Hershberger <mah_at_everybody.org>
Date: Wed, 28 Jan 2004 15:25:09 -0600
Message-ID: <87r7xjdave.fsf@weblog.localhost>

Tim Kelley <entropy@r00tserverz.net> writes:

>> > Nope. xargs knows how much of a command line buffer you have and
>> > splits up the input appropriately.
>>
>> Ah, interesting....
>
> Actually the -n or --max-args switch does that for xargs ...

Incorrect. Those arguments set a specific limit. The default
behavior of xargs is to limit input by the maximum buffer size.

Try it out. To prepare, create a directory with a bunch of files:

  $ mkdir tmp
  $ cd tmp
  $ perl -e 'print "$_\n" for 0..60000' | xargs touch

(WARNING: Some filesystems will give you trouble if you create this
many files in a single directory. Time for a real file system. XFS
works great.)

On my laptop, this creates 60001 files:

  $ ls | wc
    60001 60001 348896

Now, instead of using rm, let's use /bin/echo with time to see the
difference.

  $ time sh -c 'find . -mtime +7 -exec /bin/echo {} \;| wc'
        0 0 0
 
  real 0m0.232s
  user 0m0.050s
  sys 0m0.150s

So, find works pretty quickly. wc returns all zeros because we just
created these files and they are all less than 7 days old. So switch
it around so that find returns all the files:

  $ time sh -c 'find . -mtime -7 -exec /bin/echo {} \;| wc'
    60002 60002 468900
 
  real 0m49.763s
  user 0m14.820s
  sys 0m32.150s

Note that each file was echoed to its own line.

Do the same thing with xargs:
 
  $ time sh -c 'find . -mtime -7 | xargs /bin/echo | wc'
       59 60002 468900
 
  real 0m0.427s
  user 0m0.220s
  sys 0m0.210s

Over 100 times improvement!

Note that xargs has split the input up over 59 lines, but the output
is otherwise the same.

And, yes, you can split up the input using -n:

  $ time sh -c 'find . -mtime -7 | xargs -n 20 /bin/echo | wc'
     3001 60002 468900
 
  real 0m2.928s
  user 0m0.820s
  sys 0m1.870s

But, unless you have a good reason to do that, you should just let
xargs do its work.

This is yet another reason why the combination of find and xargs is
such a powerful one.

It's also the reason that find and xargs are both part of GNU
findutils.

As I originally said, xargs saves you time.

Mark.

Postscript: While testing all of this, I tried giving xargs some
ridiculous args. The result was interesting:

  $ time sh -c 'find . -mtime -7 | xargs -n 2000 /bin/echo | wc'
       31 60002 468900
 
  real 0m0.419s
  user 0m0.240s
  sys 0m0.170s

This time, xargs managed to stuff more on the line than the default
behavior did. Trying something else:

  $ time sh -c 'find . -mtime -7 | xargs -n 10000 /bin/echo | wc'
       25 60002 468900
 
  real 0m0.415s
  user 0m0.210s
  sys 0m0.170s

Here is where I hit xargs real limit. If it had really tried to
stuff 10,000 lines of input in the command buffer (and succeeded),
there would've only been 6 lines. Instead it split the input so that
I got 25 lines.

So, the default behavior isn't "Max out the command line buffer" but
something more gentle. Looking at the source, I see that xargs forces
a maximum command line buffer of 20k minus the size of the environment
(which would fit what we see above)

Mark.

-- 
Peace is only better than war if peace isn't hell, too.
    -- Walker Percy, "The Second Coming"
___________________
Nolug mailing list
nolug@nolug.org
Received on 01/29/04

This archive was generated by hypermail 2.2.0 : 12/19/08 EST