How to get a random line from a file in bash.

I work with a lot of data, and while I’d like to pretend it’s all in upside-down quasi-indexed b-tree rocket ships or some other advanced database, the truth is that much of it is in text files. I often find myself wanting to see a random line from one of these files, just to get a sense of what the data looks like.

I thought there must be an easy bash way to do this, but I couldn’t find it (‘shuf’ isn’t installed on my server), so I turned to twitter, and now I’m pleased to present more methods for finding a random line than you ever expected!

sort -R | head -n 1

If you can use this, do so! If it isn’t available, consider one of the following commands:

@andrewgilmartin suggests using awk:

awk 'BEGIN { srand() } rand() >= 0.5 { print; exit }'

@devinteske offered one of the easiest to solutions to read:

tail -$((RANDOM/(32767/`wc -l</etc/group|tr -d ' '`))) /etc/group|head -1

@terrycojones piped up with this gem:

split -l 1 < file; cat `for i in x*; do echo $RANDOM $i; done | sort -n | cut -f2 -d' ' | head -n 1`; rm x*

@FirefighterBlu3 does sed++:

file=/etc/passwd; lc="$(($RANDOM % $(wc -l $file|awk '{print $1}')))"; sed -n "${lc}p" $file

@burleyarch collects the whole set:

f=YOUR_FILE; n=$(expr $RANDOM \* `cat $f | wc -l` \/ 32768 + 1); head -n $n $f | tail -1

All of the options using $RANDOM should be used with the understanding that the max possible value is 32767, so it will only be random on files that have fewer than 32,767 lines.

@xn with an excellent use of cut:

awk 'BEGIN { OFS="\t"; srand() } { print rand(), $0 }' | sort -n | cut -f2- | head -1

@paulrbrown with a badass example of od:

echo `cat /dev/urandom | od -N4 -An -i `' % '`wc -l < file` | bc | sed 's/-//g' | xargs -I % head -n % file | tail -n 1

And finally, from @alexlines, who actually developed his solution into a blog post:

dd if=file skip=$(expr $(date +%N) \% $(stat -c "%s" file)) ibs=1 count=200 2>/dev/null|sed -n '2{p;q;}'

And, of course, @ceonyc brought some comic relief:

@hmason Good bash one-liner? Take my code, please.


  • http://xn.pinkhamster.net/ Christian G. Warden

    shuf is the best answer. sort -R is slow on large files, e.g. /usr/share/dict/words.

    @andrewgilmartin’s solution is invalid if by random you mean that the probability of getting line x in a file containing n lines should be 1/n. The probability of getting each line is 1/2^x.

  • http://profiles.google.com/bperkins Brian Perkins

    This is more efficient:

    perl -ne ‘ if ( rand() < 1/$.) { $save=$_;} END{ print "$save"; }

    and it theory you get a uniform distribution over all the lines; though for particularly large files you may run into limitations of the random number generation.

    • http://www.hilarymason.com Hilary Mason

      Perl is not bash!!

      • http://profiles.google.com/bperkins Brian Perkins

        True, though that’s a flaw that bash can’t help having.

        • http://www.hilarymason.com Hilary Mason

          Ha! :)

      • http://twitter.com/schmerg Tim M

        If “bash solutions” are allowed to include awk, then you can’t really exclude perl.

        Personally I write it as
        perl -ne ‘ $save = $_ if rand($.) < 1; END{ print "$saven"; }'
        but YMMV

        The perldoc FAQs note
        This has a significant advantage in space over reading the whole file in.
        You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2.

      • http://www.thatpixguy.com pix

        in pure bash (the seq is just to give it some test input)

        seq 10 | ( FNR=1 ; while read ; do if [ $(($RANDOM % $FNR)) = 0 ] ; then x=$REPLY ; fi ; FNR=$(($FNR+1)) ; done ; echo $x )

        it looks a bit clearer in awk:
        seq 10 | awk ‘{if(rand()<1/FNR)x=$0}END{print x}'

  • Will

    This is sort of similar to FirefighterBlu3′s solution, but replaces the wc/awk line with sed:
    sed -n “$(($RANDOM % $(sed -n ‘$=’ test.txt)))p” test.txt

  • http://calliopesounds.blogspot.com/ Andrew Gilmartin

    Warden, thanks for the correction.

  • http://twitter.com/petdance Andy Lester

    If you’re able to slurp the entire file:

    perl -e’@X = ; print @X[rand @X];’ file

  • http://profiles.google.com/jhofman Jake Hofman

    a slightly more compact version using sed:

    sed -n $((RANDOM%$(wc -l < file)+1))p file

    http://twitter.com/#!/jakehofman/status/62625233869348864

    • http://www.dancingbison.com vasudevram

      a slightly more compact version of that using sed:

      sed -n $((RANDOM%$(wc -l file)+1))p file

      I removed the “< " (less than, space) after "wc -l" because wc, like all well-written UNIX filters, can read from either stdin or a filename argument.

      Also, "< file" (if using that form) can be written as "<file", i.e. no intervening space, since the shell doesn't care about that. Though I normally prefer a space between tokens for readability, even when not required, in this case, I prefer the latter form, since it indicates more visually that the input is coming from "file". IIRC, Kernighan and Pike use that form too, in their classic book "The UNIX Programming Environment", one of the best books on all this stuff (i.e. UNIX shell/sed/awk/other tool wizardry), though it is not at all up to date for newer bash and UNIX features.

      • http://www.dancingbison.com vasudevram

        Oh, crap, something about this blog’s settings – I’m guessing – messed up half the text of my comment above, probably due to all the UNIX special characters I used. I’m not sure why, though, when others’ comments seem to be displaying fine (unless I used more “special” characters than the others :-), unless it is because we are supposed to use some syntax like Markdown for comments, but I didn’t see any notice about that. Can you implement a preview feature on the blog if possible?

    • Nini

       splendid !

  • http://jacobandreas.net/ Jacob Andreas

    Save yourself 2 characters: sort -R | head -1

  • Adriano

    awk “BEGIN { LUCK = `awk ‘END { srand(); print 1+int(NR*rand()) }’ $1`} NR == LUCK” $1
    # which is about five times slower than “shuf -n1 $1″
    # Note that $1 intentionally refers to filename, rather than first field;
    # bash syntax sensitive wrt command substitution, single and double quotes.
    #
    # Clearly the “awk within awk” solution could avoided if there
    # was a built-in variable for total number of records ;-)

  • http://twitter.com/silentbicycle Scott Vokes

    It’s C, not bash, but sample (http://github.com/silentbicycle/sample) works for this. Unlike shuf, it returns a randomly filtered subset of the input *in the original order*.

  • Pingback: Random image from the Library of Congress « Alec's Web Log

  • Ishan Oshadi Jayawardene

    Had the same problem – wrote a Node.js script to serve lines over http. The script loaded the whole file into a Node.js Buffer and calculated the line offsets at load time.

    The application was to feed test data to a load-generating script running in The Grinder.

  • http://twitter.com/khanklatt Khan Klatt

    This one /might/ be a little more linear (to follow) for some folks:

    perl -e ‘$f=”filename”;$_=`wc -l $f`;@l=split( );$r=int rand(@l[0]);system(“head -n$r $f|tail -1″)’Define a file, use wc to get # of lines, use split to sanitize wc output; use rand to pick a random # in the range; use head -n on the file using the random number and show only one line. TMTOWTDI ;)

  • Pingback: Nutch Tutorial: Supplemental III « Another Word For It

  • Pingback: Ubuntu Personalized Shell Greeting « Deep Thought

  • http://www.paulkorir.com/ Paul Korir

    Thanks for your post. I have a variant of this problem. Any easy way to get n randomly sampled grouped lines from a file in which each group is made up of r lines? The file has N lines such that each line is in a group of r lines (N must be divisible by r). This is used a lot in bioinformatics e.g. FASTA files.

  • goobasatch

    The first command (@andrewgilmartin) will not print a random line; but, rather will print lines according to a geometric distribution from the beginning of the file. Later lines will have an exponentially smaller chance of getting chosen.

  • Jeramey

    > awk ‘BEGIN { srand() } rand() >= 0.5 { print; exit }’

    This solution should absolutely NOT be used. While there’s a non-0 chance of it not returning anything, that’s not the most important reason to not use it. The output results will, on average, be heavily weighted towards the beginning of the file. By the time you hit only the 6th line of the file, there’s already a 98.4375% chance that output will have been emitted.

    (FWIW, in my testing of a 10-line file, I did manage to hit that magical non-0 probability just once. Lucky me!)