Hi, guys 🤗
I just want to share my Github repository for the Custom training loop with “Custom layers,” “XLA compiling,” “Distributed learning,” and “Gradient accumulator.”
As you know, TF2 operates better on a static graph, so TF2 with XLA compiling is easy and powerful. However, to my knowledge, there is no source code or tutorial for XLA compiling for distributed learning. Also, TF2 doesn’t natively provide a gradients accumulator, which is a well-known strategy for small hardware users.
My source code provides all of them and makes it possible to train ResNet-50 with 512 mini-batch sizes on two 1080ti. All parts are XLA compiled so that the training loop is sufficiently fast considering old-fashioned GPUs.
Actually, this repository is source code for a search-based filter pruning algorithm, so if you want to know about it, please look around Readme and the paper.