Here in this example: Using pre-trained word embeddings | Keras, we can see that by providing pre-trained word embedding in embedding layer initialization, we can boost the performance of the model. But before doing that, they are removing the tokens which are not available in the current data-set. But I wonder, if it is helpful or not. If we kept all the tokens wouldn’t it be more helpful to classify text? As in that case the unknown words would also get some representations and which would be helpful in further classification process. Why we are not taking this advantage? Correct me if I am mistaken.