I’m benchmarking a model in a controlled environment (docker container with 1 CPU and 4GB RAM).
Running 100 inferences on SATRN model with batch size 1 takes on average 1.26 seconds/inference using the TFLite model and 0.86 seconds /inference using the SavedModel.
Is it expected? What would explain the performance difference?
submitted by /u/BarboloBR
[visit reddit] [comments]