ImageSort — using Python!

Palash Sharma
4 min readAug 15, 2020

How to sort images according to the dates that they were clicked on!

Hey everyone! I am a sophomore undergoing a CS degree in India. I learnt Python early this year and have made the cliché projects successfully ;) You know, like a game, an RSS feed filter and a website using Django. I was feeling good about my capabilities and was confident on my knowledge of the Python programming language. But I really wanted to build something I could actually use!

Then, something happened…

Photo by William Montout on Unsplash

My uncle has this huge dataset of photographs and he wanted to sort them according to their dates. For example, a photo clicked on 20th July 2020 will be saved as IMG_2020–07–20 or a similar format. Now, him, being the annoying uncle, wanted me to do this with the help of Python.

I did!

Here is how I did it…

What are we trying to accomplish here?

Problem statement: Given a set of images/videos/any other file really, store these files in the parent directory under separate folders for each date.

For example, an image stored as “IMG_2020–07–2020.png” in the Pictures directory will go under the directory named 2020–07–20 in the parent directory of Pictures, which in my case is my home directory.

How I did it:

If you haven’t already, install Python! python.org

NOTE: This tutorial is tailored for Linux. The GitHub link at the bottom of the page contains code for the Windows OS as well.

Next, let’s import the dependencies.

import shutil, os, re
from PIL import Image
import datetime
from datetime import datetime

Follow this thread on stackoverflow if you have trouble implementing the 3rd and 4th lines. I didn’t, so it should work just fine for you as well.

parent_dir = "/home/palash/" # for my system                       photo_dir = '/home/palash/Pictures/' # OR os.getcwd()
# parent_dir = photo_dir[photo_dir.rfind('/')] if you used getcwd()

These lines are used to store the name of the directory and the parent directory. Pretty self explanatory.

Now, we need to walk the photo directory and store everything in it.

photos = []                                                         for photo in os.walk(photo_dir):
photos += [photo]

Now, we need to parse the input and figure out where the images are. You see, the walk function does it’s job, but this job is a bit more than what we need at this moment.

Here is what the photos list might look like after the above loop:

In [1]: photos
Out[1]: [(‘/home/palash/Pictures/’, [], [IMG_2020-07-2020, IMG_2020-09-30])]

So, let’s pick out the files we need from this, which are stored inside a list, which is inside a tuple, which again, is inside a list.

a = []
for i in photos:
for j in i:
if type(j) == list and not j == []:
a += j

This stores all the needed files inside the list ‘a’. Finally, we can get started with the sorting and placing!

for images in a:
match = re.search( r'\d{4}-\d{2}-\d{2}', images)
date = str(datetime.strptime(match2.group(), '%Y%m%d').date())
if os.path.isdir(parent_dir + date):
shutil.move(photo_dir + images, parent_dir + date)
else:
os.mkdir(parent_dir + date)
shutil.move(photo_dir + images, parent_dir + date)

There’s a lot going on here. So, let’s break this down.

  • We loop over the list containing all the images and we try to match/find the expression of the form ‘\d{4}-\d{2}-\d{2}’ in the name of the file we are currently at while iterating over the list.
  • We then convert this expression into a datetime type object.
  • If there exists a directory in the parent directory with the same name as the datetime object, we add this image to that directory.
  • Otherwise, we create a new directory with the name same as the datetime object and add the image to this new directory.

And that is it! Congratulations! You have now stored each image in a new directory according to it’s date!

Next Steps:

Here’s a list of what you could add to this project:

  1. Store the created folders into different month folders and again different year folders.
  2. Check if you have multiple copies of images; they usually end with (1), (2), etc.
  3. To take it one step further, you could check if these copied folders are actually copied or just falsely named, using machine learning.

The entire code for this can be found on my GitHub here: https://github.com/palashsharma891/ImageSort

I write about MOOCs that I take and cool projects that I make. If that sounds interesting, please do consider following me.

You could also connect with me on LinkedIn.

Thanks for reading. Have a great day!

--

--