DL server configuration for Unet training

My company is solving very specific task of semantic segmentation using small UNet and we need to speed up training. Our current workstation has single Tesla v100 and we are looking for new workstation with several A100 but can’t measure how much speed up we will get with the increase of GPU number with new Ampere architecture. The second question is do we need NVSwitch or NVLink for training UNet and what speed improvement it would give us. According to our budget we can possibly get DGX A100(40GB A100) or custom configuration without useless options for our task, for example NVSwitch. The only thing I find is NVidia Unet industrial performance but the evaluation has been done on DGX-1 and DGX A100 with NVLink/NVSwitch both so the impact of GPU interconnection is not obvious.

