I am have searched a lot of tutorials and courses, most start
with a BERT model or some variation of it. I want to watch/ learn
how a transformer/ attention is trainned from scratch.
I want to try to build a attention/ transformer model for solved
games like chess, (ie I will have generate-able data)