Unix shell geekery: Finding the ten largest residents of a directory

Finding the largest files in a directory on Linux is ridiculously easy, with the right flags to ls:

[oracle@vir demo]$ ls -1srShA | tail
4.0K jptest_mmnl_4899.trc
4.0K jptest_mmon_3011.trc
4.0K jptest_mmon_10186.trc
4.0K jptest_mmon_4897.trc
4.0K jptest_p001_3996.trc
4.0K jptest_p000_3994.trc
4.0K jptest_p000_3034.trc
4.0K jptest_p001_3036.trc
4.0K big directory with annoying name
104K alert_JPTEST.log

But hey, wait…what about that directory? It certainly seems to be conveniently named, almost as if placed there specifically for my follow-up question: What if I what I really want is to know the sizes of the largest files and subdirectories in my current directory? For that, I have to do something a bit more complicated than ls, but less complicated than writing a script:

[oracle@vir demo]$ topten
4.0K    ./jptest_mmnl_4899.trc
4.0K    ./jptest_mmon_10186.trc
4.0K    ./jptest_mmon_3011.trc
4.0K    ./jptest_mmon_4897.trc
4.0K    ./jptest_p000_3034.trc
4.0K    ./jptest_p000_3994.trc
4.0K    ./jptest_p001_3036.trc
4.0K    ./jptest_p001_3996.trc
104K    ./alert_JPTEST.log
529M    ./big directory with annoying name

topten is a shell function defined as follows (you'll probably need to scroll a bit):

[oracle@vir demo]$ type topten
topten is a function
topten ()
{
    find . -maxdepth 1 -not -name '.' -print0 | xargs -0 du -s | sort -n | tail | cut -f2- | awk '{ printf("%s%c", $0, 0) }' | xargs -0 du -sh
}

One could claim that this is ridiculously over-engineered, and I would sheepishly agree. In fact, sharing this on a blog is my only hope of recouping the time spent in constructing the function in the first place. After all, what’s wrong with something as simple as du -s * | sort -n | tail? Well, starting with item #1, below, things just sort of snowball:

  1. In directories with a lot of files, du -s fails with a “too many arguments” error. $APPLCSF/log on an E-Business Suite midtier springs to mind.
  2. Using xargs solves the "too many arguments" problem, but introduces a new one: xargs will misinterpret whitespace in file names as an argument separator, and split the file name. Whitespace in file names? Hey, I work on a Mac. This stuff happens. Users of Cygwin on Windows probably get bitten by this, too.
  3. The use of find -print0 | xargs -0 is a common solution to the "whitespace in filenames" issue.
  4. I like the human-readable output from du -h, but that means I need to pass my top ten files through xargs again, which requires more filtering: cut removes the leading numbers from the output of the first du, and the awk command provides null-terminated output like find's -print0.

Caveats and variations

  • Presumably, this function should work in Bourne-shell variants on any Unix-like system (works fine on Linux and Mac, for example). If you aren't running the GNU versions of the utilities invoked in this function, however, you may need to make some adjustments.
  • It can be expensive to use du. Choose your starting directory with care. If you run this command in the E-Business Suite APPL_TOP, for example, you might want to find something else to do for a few minutes. If you run this command on an already-busy production system, you may also want to find a place to hide from your sysadmins. ;-)
  • If you don’t care about the prettiness of the human-readable file sizes, then you can shorten the function to omit everything after the tail command:
    topten ()
    {
    find . -maxdepth 1 -not -name '.' -print0 | xargs -0 du -s | sort -n | tail
    }
    
  • The function is called topten because the default output for the tail command is 10 lines. Provide the appropriate option to tail to change the output length (tail -5 for 5, tail -20 for 20, etc.)
  • It may be advantageous to add the -x option to du to prevent it from crossing filesystem boundaries.
  • There's probably an easier, and still portable, way to do this. I'm kinda hoping that someone stops by with a comment to tell me what that is. :-)

2 Comments

  1. Posted 7 April 2009 at 1:05 | Permalink

    Hi, not portable but on linux I tend to use the maxdepth argument to du. as in du -sh --maxdepth=1

  2. Posted 7 April 2009 at 7:55 | Permalink

    Hi Niall,

    Good point. I started this work on my Mac, and moved to Linux later for more testing, so I didn't consider the -maxdepth option. Though, after a closer review of the man page, du on OS X has a similar option -d.

    Thanks for the suggestion!

    Regards,

    John P.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*