• Uncategorized

About linux : bash-sort—how-do-I-sort-using-timestamp

Question Detail

I need to sort a file using shell sort in linux. The sort needs to be based on timestamp values contained within each of file’s rows. The timestamps are of irregular format and don’t specify the leading zeros to months, days, etc, so the sorts I am performing are not correct (i.e. their format is “M/D/YYYY H:MI:S AM”; so so “10/12/2012 12:16:18 PM” comes before “7/24/2012 12:16:18 PM”, which comes before “7/24/2012 12:17:18 AM”).

Is it possible to sort based on timestamps?

I am using the following command to sort my file:

sort -t= -k3 file.txt -o file.txt.sorted

(use equal sign as a separator => -t=; use 3rd column as a sort column => -k3)

A sample file is as follows:

<r id="abcd" t="10/12/2012 12:16:17 AM"><d><nv n="name" v="868" /><nv n="name0" v="73" /><nv n="name1" v="13815004" /></d></r>
<r id="defg" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="abcd" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="zxy" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="59542676" /></d></r>
<r id="fghj" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="38" /><nv n="name0" v="0" /><nv n="name1" v="3004537" /></d></r>
<r id="defg" t="7/24/2012 12:16:18 AM"><d><nv n="name" v="177" /><nv n="name0" v="0" /><nv n="name1" v="5888870" /></d></r>

Question Answer

The linux date command does a fine job of parsing dates like this, and it can translate them into more sortable things, like simple unix-time integers.

Example:

cat file | while read line; do
    datestring=$(sed -e 's/^.* t="\([^"]*\)".*$/\1/' <<<"$line")
    echo "$(date -d "$datestring" +%s) $line"
done | sort -n

then you could pass that through the appropriate cut invocation if you want that unix timestamp removed again.

sort is a nice tool but it doesn’t have enough bells and whistles to take pseudo-xml apart, convert an attribute to a sensible time value, and then sort on it.

However, such tools do exist. While the best way to do this would probably be with an XSLT transform, if the file is really as consistent as your example command expects, you could extract the time values with cut -d'"' -f4, and you can convert each one to a more sensible format with date. For example (needs GNU date):

paste <(cut -d'"' -f4 file.txt | date -f- +%s) file.txt | sort -n | cut -f2-

which extracts the date-times, one per line; feeds them to date to convert them to seconds-since-epoch; pastes each timestamp on the beginning of each line; sorts the pasted result numerically, now with numeric timestamps at the beginning, and finally removes the timestamp to get the original file back.

Test:

$ cat >file.txt <<'EOF'
<r id="abcd" t="10/12/2012 12:16:17 AM"><d><nv n="name" v="868" /><nv n="name0" v="73" /><nv n="name1" v="13815004" /></d></r>
<r id="defg" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="abcd" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="zxy" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="59542676" /></d></r>
<r id="fghj" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="38" /><nv n="name0" v="0" /><nv n="name1" v="3004537" /></d></r>
<r id="defg" t="7/24/2012 12:16:18 AM"><d><nv n="name" v="177" /><nv n="name0" v="0" /><nv n="name1" v="5888870" /></d></r>
EOF
$ paste <(cut -d'"' -f4 file.txt | date -f- +%s) file.txt | sort -n | cut -f2-
<r id="defg" t="7/24/2012 12:16:18 AM"><d><nv n="name" v="177" /><nv n="name0" v="0" /><nv n="name1" v="5888870" /></d></r>
<r id="abcd" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="defg" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="fghj" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="38" /><nv n="name0" v="0" /><nv n="name1" v="3004537" /></d></r>
<r id="zxy" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="59542676" /></d></r>
<r id="abcd" t="10/12/2012 12:16:17 AM"><d><nv n="name" v="868" /><nv n="name0" v="73" /><nv n="name1" v="13815004" /></d></r>

You may also like...

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.