I am trying to build a similar image retrieval system where given an image, the system is able to show top ‘k’ most similar images to it. For this particular example, I am using the DeepFashion dataset where given an image containing say a shirt, you show top 5 clothes most similar to a shirt. A subset of this has 289,222 diverse clothes images in it. Each image is of shape: (300, 300, 3).
The approach I have includes:
- Train an autoencoder
- Feed each image in the dataset through the encoder to get a reduced n-dimensional latent space representation. For example, it can be 100-d latent space representation
- Create a table of shape m x (n + 2) where ‘m’ is the number of images and each image is compressed to n-dimensions. One of the column is the image name and the other column is a path to where the image is stored on your local system
- Given a new image, you feed it through the encoder to get the n-dimensional latent space representation
- Use something like cosine similarity, etc to compare the n-d latent space for new image with the table m x (n + 2) obtained in step 3 to find/retrieve top k closest clothes
How do I create the table mentioned in step 3?
I am planning on using TensorFlow 2.5 with Python 3.8 and the code for getting an image generator is as follows:
image_generator = ImageDataGenerator( rescale = 1./255, rotation_range = 135) train_data_gen = image_generator.flow_from_directory( directory = train_dir, batch_size = batch_size, shuffle = False, target_size = (IMG_HEIGHT, IMG_WIDTH), class_mode = 'sparse'
How can get image name and path to image to create the m x (n + 2) table in step 3?
Also, is there any other better way that I am missing out on?