Image Captioning with Visual Attention: TF, TPU, BLEU, BEAM

Anyone who is interested in deep learning image captioning has probably come across the Show, Attend and Tell paper. And anyone who is interested in implementing the architecture in TensorFlow has probably come across TensorFlow’s tutorial. @ratthachat provided a great notebook that extends TensorFlow’s tutorial with additions such as TPU training, Efficientnet encoder, and Glove embeddings. When I was interested in image captioning for my own custom dataset his tutorial was the best starting point I could find online. While working on my own dataset I needed to customize his notebook to add the features listed below. After doing so, I felt many others could benefit from the extensions so I am deciding to share it. Hope you all find it helpful.

Bleu Score metrics
Decoders
1. Pure Sampling
2. Top K Sampling
3. Greedy Search
4. Beam Search
Scheduled Sampling from https://arxiv.org/pdf/1506.03099.pdf
Early Stopping based off of validation bleu score

https://www.kaggle.com/kagglethomas88/flickr-image-captioning-tpu-tf2-glove-extended

submitted by /u/International_Fix_94
[visit reddit] [comments]

Leave a Reply Cancel reply