Incorrect dimensions for output of speech extraction model

Hello everyone, below I have linked a recent SO post of mine going more in depth to my problem

But an overview is that I am unable to get proper output from my model because of what I believe to be issues with my input and my lack of understanding regarding shape of input vs. shape of a tensor. If there is anything I can provide to give a better idea of my problem let me know. Appreciate any help I could get

