How to get a random line from a file in bash.

I work with a lot of data, and while I’d like to pretend it’s all in upside-down quasi-indexed b-tree rocket ships or some other advanced database, the truth is that much of it is in text files. I often find myself wanting to see a random line from one of these files, just to get a sense of what the data looks like.

I thought there must be an easy bash way to do this, but I couldn’t find it (‘shuf’ isn’t installed on my server), so I turned to twitter, and now I’m pleased to present more methods for finding a random line than you ever expected!

sort -R | head -n 1

If you can use this, do so! If it isn’t available, consider one of the following commands:

@andrewgilmartin suggests using awk:

awk 'BEGIN { srand() } rand() >= 0.5 { print; exit }'

@devinteske offered one of the easiest to solutions to read:

tail -$((RANDOM/(32767/`wc -l</etc/group|tr -d ' '`))) /etc/group|head -1

@terrycojones piped up with this gem:

split -l 1 < file; cat `for i in x*; do echo $RANDOM $i; done | sort -n | cut -f2 -d' ' | head -n 1`; rm x*

@FirefighterBlu3 does sed++:

file=/etc/passwd; lc="$(($RANDOM % $(wc -l $file|awk '{print $1}')))"; sed -n "${lc}p" $file

@burleyarch collects the whole set:

f=YOUR_FILE; n=$(expr $RANDOM \* `cat $f | wc -l` \/ 32768 + 1); head -n $n $f | tail -1

All of the options using $RANDOM should be used with the understanding that the max possible value is 32767, so it will only be random on files that have fewer than 32,767 lines.

@xn with an excellent use of cut:

awk 'BEGIN { OFS="\t"; srand() } { print rand(), $0 }' | sort -n | cut -f2- | head -1

@paulrbrown with a badass example of od:

echo `cat /dev/urandom | od -N4 -An -i `' % '`wc -l < file` | bc | sed 's/-//g' | xargs -I % head -n % file | tail -n 1

And finally, from @alexlines, who actually developed his solution into a blog post:

dd if=file skip=$(expr $(date +%N) \% $(stat -c "%s" file)) ibs=1 count=200 2>/dev/null|sed -n '2{p;q;}'

And, of course, @ceonyc brought some comic relief:

@hmason Good bash one-liner? Take my code, please.


25 Comments on “How to get a random line from a file in bash.”

  1. shuf is the best answer. sort -R is slow on large files, e.g. /usr/share/dict/words.

    @andrewgilmartin’s solution is invalid if by random you mean that the probability of getting line x in a file containing n lines should be 1/n. The probability of getting each line is 1/2^x.

  2. This is more efficient:

    perl -ne ‘ if ( rand() < 1/$.) { $save=$_;} END{ print "$save"; }

    and it theory you get a uniform distribution over all the lines; though for particularly large files you may run into limitations of the random number generation.

    • Hilary Mason says:

      Perl is not bash!!

      • True, though that’s a flaw that bash can’t help having.

      • Tim M says:

        If “bash solutions” are allowed to include awk, then you can’t really exclude perl.

        Personally I write it as
        perl -ne ‘ $save = $_ if rand($.) < 1; END{ print "$saven"; }'
        but YMMV

        The perldoc FAQs note
        This has a significant advantage in space over reading the whole file in.
        You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2.

      • pix says:

        in pure bash (the seq is just to give it some test input)

        seq 10 | ( FNR=1 ; while read ; do if [ $(($RANDOM % $FNR)) = 0 ] ; then x=$REPLY ; fi ; FNR=$(($FNR+1)) ; done ; echo $x )

        it looks a bit clearer in awk:
        seq 10 | awk ‘{if(rand()<1/FNR)x=$0}END{print x}'

  3. Will says:

    This is sort of similar to FirefighterBlu3’s solution, but replaces the wc/awk line with sed:
    sed -n “$(($RANDOM % $(sed -n ‘$=’ test.txt)))p” test.txt

  4. Warden, thanks for the correction.

  5. Andy Lester says:

    If you’re able to slurp the entire file:

    perl -e’@X = ; print @X[rand @X];’ file

  6. Jake Hofman says:

    a slightly more compact version using sed:

    sed -n $((RANDOM%$(wc -l < file)+1))p file

    http://twitter.com/#!/jakehofman/status/62625233869348864

    • vasudevram says:

      a slightly more compact version of that using sed:

      sed -n $((RANDOM%$(wc -l file)+1))p file

      I removed the “< " (less than, space) after "wc -l" because wc, like all well-written UNIX filters, can read from either stdin or a filename argument.

      Also, "< file" (if using that form) can be written as "<file", i.e. no intervening space, since the shell doesn't care about that. Though I normally prefer a space between tokens for readability, even when not required, in this case, I prefer the latter form, since it indicates more visually that the input is coming from "file". IIRC, Kernighan and Pike use that form too, in their classic book "The UNIX Programming Environment", one of the best books on all this stuff (i.e. UNIX shell/sed/awk/other tool wizardry), though it is not at all up to date for newer bash and UNIX features.

      • vasudevram says:

        Oh, crap, something about this blog’s settings – I’m guessing – messed up half the text of my comment above, probably due to all the UNIX special characters I used. I’m not sure why, though, when others’ comments seem to be displaying fine (unless I used more “special” characters than the others :-), unless it is because we are supposed to use some syntax like Markdown for comments, but I didn’t see any notice about that. Can you implement a preview feature on the blog if possible?

    • Nini says:

       splendid !

  7. Save yourself 2 characters: sort -R | head -1

  8. Adriano says:

    awk “BEGIN { LUCK = `awk ‘END { srand(); print 1+int(NR*rand()) }’ $1`} NR == LUCK” $1
    # which is about five times slower than “shuf -n1 $1″
    # Note that $1 intentionally refers to filename, rather than first field;
    # bash syntax sensitive wrt command substitution, single and double quotes.
    #
    # Clearly the “awk within awk” solution could avoided if there
    # was a built-in variable for total number of records ;-)

  9. Scott Vokes says:

    It’s C, not bash, but sample (http://github.com/silentbicycle/sample) works for this. Unlike shuf, it returns a randomly filtered subset of the input *in the original order*.

  10. […] mason lists a bunch of ways to get a random line from a file with bash. I use the last one because I have a large number of lines and this benchmarked as the fastest […]

  11. Ishan Oshadi Jayawardene says:

    Had the same problem – wrote a Node.js script to serve lines over http. The script loaded the whole file into a Node.js Buffer and calculated the line offsets at load time.

    The application was to feed test data to a load-generating script running in The Grinder.

  12. Khan Klatt says:

    This one /might/ be a little more linear (to follow) for some folks:

    perl -e ‘$f=”filename”;$_=`wc -l $f`;@l=split( );$r=int rand(@l[0]);system(“head -n$r $f|tail -1″)’Define a file, use wc to get # of lines, use split to sanitize wc output; use rand to pick a random # in the range; use head -n on the file using the random number and show only one line. TMTOWTDI ;)

  13. […] Hillary Mason (source of the sort suggestion, has collected more extract one line (not exactly our requirement but you can be creative) at: How to get a random line from a file in bash. […]

  14. […] searching for the source on this code, I found this post. There are lots of ways to get random lines from a […]

  15. Paul Korir says:

    Thanks for your post. I have a variant of this problem. Any easy way to get n randomly sampled grouped lines from a file in which each group is made up of r lines? The file has N lines such that each line is in a group of r lines (N must be divisible by r). This is used a lot in bioinformatics e.g. FASTA files.

  16. goobasatch says:

    The first command (@andrewgilmartin) will not print a random line; but, rather will print lines according to a geometric distribution from the beginning of the file. Later lines will have an exponentially smaller chance of getting chosen.

  17. Jeramey says:

    > awk ‘BEGIN { srand() } rand() >= 0.5 { print; exit }’

    This solution should absolutely NOT be used. While there’s a non-0 chance of it not returning anything, that’s not the most important reason to not use it. The output results will, on average, be heavily weighted towards the beginning of the file. By the time you hit only the 6th line of the file, there’s already a 98.4375% chance that output will have been emitted.

    (FWIW, in my testing of a 10-line file, I did manage to hit that magical non-0 probability just once. Lucky me!)