Training slows down and eventually freezes?

Hello, I’m training a segmentation model and it ends up freezing in the middle of training with no error. Running on a google cloud dedicated notebook with 2 GPUs (not COLAB), about 900 training images and it freezes in the middle of the ~30th epoch. For context it’s taking ~1.5 min per epoch with a batch size of 1 until it slows down and freezes. Wondering if this is a problem with google cloud cutting out or if I’m running out of room on my GPUs? I also included the resizing of the images and normalization of from 0-255 to 0-1 in the sequential model, can remove that if it is causing an issue but I doubt that’s the problem. Any advice very welcome, not finding much on SO or here. Cheers!

submitted by /u/quipkick
[visit reddit] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *