Anyone have a good example/ tutorial for TF attention/ transformers from scratch?

I am have searched a lot of tutorials and courses, most start
with a BERT model or some variation of it. I want to watch/ learn
how a transformer/ attention is trainned from scratch.

I want to try to build a attention/ transformer model for solved
games like chess, (ie I will have generate-able data)

