Article 3567 of comp.lang.perl: Xref: feenix.metronet.com comp.lang.perl:3567 Path: feenix.metronet.com!news.ecn.bgu.edu!wupost!howland.reston.ans.net!gatech!usenet.ufl.edu!travis.csd.harris.com!travis!jimbo From: jimbo@rcx1.ssd.csd.harris.com (Jim Winters) Newsgroups: comp.lang.perl Subject: durf: disc usage report formatter Date: 19 Jun 1993 17:53:23 GMT Organization: Harris Computer Systems Division Lines: 430 Distribution: world Message-ID: NNTP-Posting-Host: rcx1.ssd.csd.harris.com If you are like me, you find the output of du(1) a little hard to interpret. I mean, the fields are simple enough to understand, but it's just hard to get the big picture if you're du'ing a large tree. durf is a perl script that that formats the output of du differently. "Durf" is for "disk usage report formatter", (really just because I like the name). I'm something of a novice perl programmer, so my perl code is probably not the best, but I've been using this script for a few months now with no problems or even any desire to change it! :-) I don't know how portable this will be to other systems, but I don't particularly expect any problems. (I'm on a flavor of unix.) Du on our system has a pretty simple format, and durf doesn't really care where the input comes from (if you use the '-' argument), as long as it's in that format. I have noticed that durf does not work with some older versions of perl. The script has had the man page thingie (forgot the name, sorry) applied to it, so it is its own man page. Also, durf has a -H option for help. Go to some (small) directory and type durf. The output will look something like this (from my bin directory): Tot% Blocks Full-pathname Tot% Sub% Blocks Child 100% 5019 . <--- name of top level directory 75% 75% 3804 m88k <--- a child, "m88k" 15 15 779 . <--- special child, "." 4 4 243 hcx 2 2 120 gcx 1 1 73 sun 75% 3804 ./m88k 75% 100% 3804 . (I've truncated the output for this article.) Basically durf is telling you how the space is distributed in the tree. Here 75% of the total space is in the m88k subtree. m88k also has 75% of the space in the subdirectory. (Since this is the top level, all the percentages are the same.) The other numbers are sizes in blocks. The special child "." is the amount of space in the directory exclusive of the children. After the list of top-level children, we see a listing for the m88k subtree. Within this subtree, we see that 75% of the total space is in ".", which is consistent with the previous entry. However, since m88k has no children, 100% of the space in THIS subtree is in ".". Naturally, this is all more interesting with bigger trees. durf requires "yagrip.pl" by jgreely@cis.ohio-state.edu, 89/11/1, so I have included it as well. It needs to go in your perl library. I release this code for any purpose whatsoever. Enjoy. --- Here comes the script, then "yagrip.pl". Trim signature at bottom. --- #!/usr/bin/perl 'di'; 'ig00'; # # $Header$ # # $Log$ # durf - disk usage report formatter. # sub Usage { die &usage( @Usage ) , <) { chop; if ( /^\s*(\d+)\s+(\S.*)$/ ) { &load( $1, $2 ); } else { print STDERR "Can't understand line:\n$_\n"; } } # # Establish list of roots # foreach (keys %size) { push( @root_list, $_ ) if ( ! $parent{$_} || ! $size{ $parent{$_} } ); } # # Calculate total size # grep( $total_size += $size{$_}, @root_list ); &print_headers(); # # Output formatted tree for each root # while( $_ = pop( @root_list ) ) { &do_subtree( $_ ); } exit( 0 ); sub do_subtree { push( @subtree_list, @_ ); while( $_ = pop( @subtree_list ) ) { &do_output( $_ ); } } sub sname { local( $name ) = @_; local( @name_array ) = split( '/', $name ); pop( @name_array ); } sub dname { local( $name ) = @_; local( @name_array ) = split( '/', $name ); if ( scalar( @name_array ) > 1 ) { pop( @name_array ); join( '/', @name_array ); } else { $name; } } sub load { local( $size, $name ) = @_; $size{ $name } = $size; $parent = &dname( $name ); if ( $parent ne $name ) { $parent{ $name } = $parent; @children = split( "\n", $children{ $parent } ); push( @children, $name ); $children{ $parent } = join( "\n", @children ); } } sub do_output { local( $name ) = @_; &print_long( $size{$name}, $name ); @children = split( "\n", $children{ $name } ); $dot_size = $size{$name}; $subtree_size = $dot_size; grep( $dot_size -= $size{$_}, @children ); $dot_name = $name . "/."; push( @children, $dot_name ); $size{$dot_name} = $dot_size; if ( $threshold_is_percent ) { @children = grep( ($size{$_}*100)/$total_size >= $threshold, @children ); } else { @children = grep( $size{$_} >= $threshold, @children ); } @children = reverse sort { $size{$a} <=> $size{$b}; } @children; @push_list = reverse grep( $_ ne $dot_name, @children ); push( @subtree_list, @push_list ); $num_components = scalar( split( '/', $name ) ); # do short output here $percent_char = '%'; foreach (@children) { $size = $size{$_}; @name_array = split( '/', $_ ); splice( @name_array, $[, $num_components ); $_ = @name_array[$[]; &print_short( $size, $_ ); $percent_char = ' '; } } sub print_short { local( $size, $name ) = @_; $sub_percent = int( ($size*100)/$subtree_size ); printf " %3d%s %3d%s %6d %s\n", int( ($size*100)/$total_size ), $percent_char, $sub_percent, $percent_char, $size, $name; } sub print_long { local( $size, $name ) = @_; print "\n"; printf "%3d%% %6d %s\n", int( ($size*100)/$total_size ), $size, $name; } sub print_headers { print "Tot% Blocks Full-pathname\n"; print " Tot% Sub% Blocks Child\n"; } ############################################################### # These next few lines are legal in both Perl and nroff. .00; # finish .ig 'di \" finish diversion--previous line must be blank .nr nl 0-1 \" fake up transition to first page again .nr % 0 \" start at page 1 '; __END__ ##### From here on it's a standard manual page ##### .TH DURF 1 "February 10, 1993" .AT 3 .SH NAME durf \- disk usage report formatter .SH SYNOPSIS .B durf [\-H] [\-t threshold[%]] [{du-args|\-}] .SH DESCRIPTION .I Durf reformats the output of .IR du\^ (1) to make it a little easier to get an understanding of the distribution of space. .P The output is grouped by subtrees. Each subtree is shown as a line like this: .sp .nf \fITotal\fP%\fI Blocks Full-pathname\fP .fi .sp which shows the position in the hierarchy and reports size information for the subtree, followed by a group of indented lines like this: .sp .nf \fITotal\fP%\fI Subtree\fP%\fI Blocks Child\fP .fi .sp which reports size information for that portion of the subtree. .I Child is an immediate child of .IR Full-pathname . .IR Total % is the percentage contribution to the total size. .IR Subtree % is the percentage contribution to the subtree. .I Blocks are disk blocks as reported by .IR du . The list of children is in decreasing order of size. After the list of all the children, the next non-blank line begins the subtree for the first child, and so on. .P A special fake child, .BR . , is listed in each subtree (if it meets the threshold requirement). This is simply the contribution of the directory independent of the children. (It is the size of the subtree less the size of the children.) .P If the argument to .I durf is .BR \- , then .I durf reads the standard input for the .I du list. Otherwise, it runs .IR du , passing it .IR du-args . You may pass options to .I du by terminating the .I durf options with .BR \-\- . .SS Options .TP 12 \-H Writes a usage message and dies. .TP \-t \fIthreshold\fP[%] Sets a threshold for which entries .I durf will show. If the threshold ends with '%', then .I durf will not show entries that contribute less than .I threshold percent to the total space. Otherwise, the threshold is in terms of disk blocks. The default is \fB\-t 1%\fP. To get a complete listing, use \fB\-t 0\fP. .SH AUTHOR Jim Winters .SH "SEE ALSO" du(1). .SH CAUTION Believes .IR du . ---------- End of script. "yagrip.pl" follows. ----------------------- #Yet Another Getopt Routine In Perl # jgreely@cis.ohio-state.edu, 89/11/1 #usage: #&getopt("f:bar") || # die &usage("script","f:bar","oo","[files ...]"); # sub getopt { local($_,$flag,$opt,$f,$r,@temp) = @_; @temp = split(/(.):/); while ($#temp >= $[) { $flag .= shift(@temp); $opt .= shift(@temp); } while ($_ = $ARGV[0], /^-(.)(.*)/ && shift(@ARGV)) { ($f,$r) = ($1,$2); last if $f eq '-'; if (index($flag,$f) >= $[) { eval "\$opt_$f++;"; $r =~ /^(.)(.*)/,redo if $r ne ''; }elsif (index($opt,$f) >= $[) { $r = $r eq '' ? shift(@ARGV) : $r; eval "\$opt_$f = \$r;"; }else{ print STDERR "Unrecognized switch \"-$f\".\n"; return 0; } } return 1; } #usage: usage: # &usage(progname,arglist,@names,@last); #ex: # &usage("script","f:bar","oo","[file ...]"); #would return # "usage: script [-f oo] [-bar] [file ...]" # sub usage { local($prog,$_,@list) = @_; local($string,$flag,@string,@temp,@last) = (); @temp = split(/(.):/); push(@string,"usage:",$prog); while ($#temp >= $[) { if (($flag = shift(@temp)) ne '') { push(@string,"[-$flag]"); } if (($flag = shift(@temp)) ne '') { push(@string,sprintf("[-%s %s]",$flag,shift(@list))); } } push(@string,@list) if $#list >= $[; return join(' ',@string) . "\n"; } 1; ------------ end of "yagrip.pl" --------------- -- -- Jim Winters jimbo@ssd.csd.harris.com ]O .