I am currently stuck in a dead end. I am trying to make an image caption generator from a federated approach. My initial idea was to have a different tokenizer for each client. That poses these issues however:
Every client will have a different sized vocabulary, and thus a different shape of y, which will cause issues with the global model configuration.
To counter the above issue, I could make size of y in each client equivalent to the largest size across all clients, and fill the extra columns in each client with 0.
E.g: [0,1,1,1] mapped to a size of 6 would become [0,1,1,1,0,0]
This brings me to the last possible flaw, which is that the same words in different clients will be having different indices. A word “rock” in client 1 might have an index of 6, while the same can have an index of 9 in another client. While training the global model, it will cause issues since the model is trying to learn different label indices for the same word, which will impact the accuracy?
This brings me to the final question: Is it against the idea of Federated Learning to tokenize all the words of all the training clients in a single tokenizer?