Tensorflow performing worst in better hardware

I am migrating from a laptop with a RTX 3080 mobile to a desktop with a RTX3080 Ti. I am using the Tensorflow Docker image from the NGC Catalog (tensorflow:22.03-tf2-py3) in both instances and the same code/dataset. The laptop was running Pop OS 20.04 LTS and the desktop Ubuntu 20.04 LTS, basically the same setup.

The laptop took about 2 seconds per epoch (can’t re test again because I have to turn in the device, but epoch time is stored in the Jupiter notebook) while the desktop (with better a GPU) takes between 4 to 5 seconds. I already re-installed drivers and the whole docker engine with no luck. Currently while checking nvidia-smi gpu usage is only about 30%.

Does anyone has encountered a problem like this? Thanks.

