Categories
Misc

Image Captioning with Visual Attention: TF, TPU, BLEU, BEAM

Anyone who is interested in deep learning image captioning has probably come across the Show, Attend and Tell paper. And anyone who is interested in implementing the architecture in TensorFlow has probably come across TensorFlow’s tutorial. @ratthachat provided a great notebook that extends TensorFlow’s tutorial with additions such as TPU training, Efficientnet encoder, and Glove embeddings. When I was interested in image captioning for my own custom dataset his tutorial was the best starting point I could find online. While working on my own dataset I needed to customize his notebook to add the features listed below. After doing so, I felt many others could benefit from the extensions so I am deciding to share it. Hope you all find it helpful.

  • Bleu Score metrics
  • Decoders
  • 1. Pure Sampling
  • 2. Top K Sampling
  • 3. Greedy Search
  • 4. Beam Search
  • Scheduled Sampling from https://arxiv.org/pdf/1506.03099.pdf
  • Early Stopping based off of validation bleu score

https://www.kaggle.com/kagglethomas88/flickr-image-captioning-tpu-tf2-glove-extended

submitted by /u/International_Fix_94
[visit reddit] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *