• Uncategorized

About linux : delete-duplicate-file-based-on-modified-time-limited-by-minutes

Question Detail

I have a webcam that is uploading a image every few minutes via FTP into a directory. The webcam unfortunately has a bug which uploads two images at once. (the vendor does not want to fix it…). The image is still different so I can not check the md5sum of the .jpg and delete the duplicated one.

The filename of the image is similar, but still different. It looks like this:

  • Jan 3 09:43 image220103_094305_20.jpg
  • Jan 3 09:43 image220103_094306_00.jpg

The format is as follows:

imageYYMMDD_HHMMSS_??.jpg

My goal is to keep the first uploaded image and delete the second one.

My idea was to use “stat” and check the modified time. However, stat goes very much into detail and even shows the seconds when it was uploaded. So my question is:

Is it possible to limit the time when it was modified to minutes? I don’t care about the seconds or further.

If two images get uploaded at 09:43, I want to delete the last one.

The bash script will be executed via a crontab every hour.

My approach would be:

  • List all files in the directory and their modified time, limited until minutes (don’t show seconds)
  • If there are any files with the same modified time, delete the last one of the result

Question Answer

Here is a simple script to remove files which are less than one minute apart, assuming the file names sort them by time stamp.

prev=0
for file in /path/to/files/*; do
    timestamp=$(stat -c %Y -- "$file")
    if ((timestamp - prev < 60)); then
        rm -- "$file"
    else
        prev=$timestamp
    fi
done

The shell expands the wildcard * to an alphabetical listing; I am assuming this is enough to ensure that any two files which were uploaded around the same time will end up one after the other.

The arguments to stat are system-dependent; on Linux, -c %Y produces the file’s last modification time in seconds since the epoch.

The -- should not be necessary as long as the wildcard has a leading path, but I’m including it as a “belt and suspenders” solution in case you’d like to change the wildcard to something else. (Still, probably a good idea to add ./ before any relative path, which then again ensures that the -- will not be necessary.)

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.