• Uncategorized

About shell : Whats-an-easy-way-to-read-random-line-from-a-file

Question Detail

What’s an easy way to read random line from a file in a shell script?

Question Answer

You can use shuf:

shuf -n 1 $FILE

There is also a utility called rl. In Debian it’s in the randomize-lines package that does exactly what you want, though not available in all distros. On its home page it actually recommends the use of shuf instead (which didn’t exist when it was created, I believe). shuf is part of the GNU coreutils, rl is not.

rl -c 1 $FILE

……………………………………………………
Another alternative:

head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1 ............................................................ sort --random-sort $FILE | head -n 1 (I like the shuf approach above even better though - I didn't even know that existed and I would have never found that tool on my own) ............................................................ This is simple. cat file.txt | shuf -n 1 Granted this is just a tad slower than the "shuf -n 1 file.txt" on its own. ............................................................ perlfaq5: How do I select a random line from a file? Here's a reservoir-sampling algorithm from the Camel Book: perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;’ file

This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
……………………………………………………
using a bash script:

#!/bin/bash
# replace with file to read
FILE=tmp.txt
# count number of lines
NUM=$(wc – l < ${FILE}) # generate random number in range 0-NUM let X=${RANDOM} % ${NUM} + 1 # extract X-th line sed -n ${X}p ${FILE} ............................................................ Single bash line: sed -n $((1+$RANDOM%`wc -l test.txt | cut -f 1 -d ' '`))p test.txt Slight problem: duplicate filename. ............................................................ Here's a simple Python script that will do the job: import random, sys lines = open(sys.argv[1]).readlines() print(lines[random.randrange(len(lines))]) Usage: python randline.py file_to_get_random_line_from ............................................................ Another way using 'awk' awk NR==$((${RANDOM} % `wc -l < file.name` + 1)) file.name ............................................................ A solution that also works on MacOSX, and should also works on Linux(?): N=5 awk 'NR==FNR {lineN[$1]; next}(FNR in lineN)' <(jot -r $N 1 $(wc -l < $file)) $file Where: N is the number of random lines you want NR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2 --> save line numbers written in file1 and then print corresponding line in file2
jot -r $N 1 $(wc -l < $file) --> draw N numbers randomly (-r) in range (1, number_of_line_in_file) with jot. The process substitution <() will make it look like a file for the interpreter, so file1 in previous example. ............................................................ #!/bin/bash IFS=$'\n' wordsArray=($(<$1)) numWords=${#wordsArray[@]} sizeOfNumWords=${#numWords} while [ True ] do for ((i=0; i<$sizeOfNumWords; i++)) do let ranNumArray[$i]=$(( ( $RANDOM % 10 ) + 1 ))-1 ranNumStr="$ranNumStr${ranNumArray[$i]}" done if [ $ranNumStr -le $numWords ] then break fi ranNumStr="" done noLeadZeroStr=$((10#$ranNumStr)) echo ${wordsArray[$noLeadZeroStr]} ............................................................ Here is what I discovery since my Mac OS doesn't use all the easy answers. I used the jot command to generate a number since the $RANDOM variable solutions seems not to be very random in my test. When testing my solution I had a wide variance in the solutions provided in the output. RANDOM1=`jot -r 1 1 235886` #range of jot ( 1 235886 ) found from earlier wc -w /usr/share/dict/web2 echo $RANDOM1 head -n $RANDOM1 /usr/share/dict/web2 | tail -n 1 The echo of the variable is to get a visual of the generated random number. ............................................................ Using only vanilla sed and awk, and without using $RANDOM, a simple, space-efficient and reasonably fast "one-liner" for selecting a single line pseudo-randomly from a file named FILENAME is as follows: sed -n $(awk 'END {srand(); r=rand()*NR; if (r

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.