BERT FineTuning training time


I am trying to fine-tune a Bert model using the well-known Movie Review dataset on M1 Chip.

The ETA for an epoch is estimated at 10 hours to refine all 66M of parameters.

In order to reduce the ETA, I thought to set the first two layers as `trainable=False`, so the trainable parameters now are 2K.

Even if I dropped the trainable parameters, nothing is changed, ETA is still 10h.

Do you think it is normal or there is something wrong on my side?


submitted by /u/i_cook_bits
[visit reddit] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *