TFLite optimization best practices for deployment on Android?

Hi everyone. I’m deploying a resnet based 928×928 UNet on an android device. Performance is suboptimal even with GPU. Currently I’m only optimizing the models using the tf.lite.Optimize.DEFAULT flag. I was wondering if any of you have had experience with more intricate optimization techniques aimed specifically at latency and not neccesarily size reduction.

