BERT FineTuning training time

Hello

I am trying to fine-tune a Bert model using the well-known Movie Review dataset on M1 Chip.

The ETA for an epoch is estimated at 10 hours to refine all 66M of parameters.

In order to reduce the ETA, I thought to set the first two layers as `trainable=False`, so the trainable parameters now are 2K.

Even if I dropped the trainable parameters, nothing is changed, ETA is still 10h.

Do you think it is normal or there is something wrong on my side?

Thanks

Leave a Reply Cancel reply