Categories
Misc

Algorithm for interrogating a model to find what activates a specific class?

I have an NN multi-class text classifier. Input is a vector of embeddings (the input is a sequence of words that are looked up on a vocab, and its embedding is used, of course). Since the domain of this model is a closed set (the vocabulary, and the size of the sequence), I’m thinking it should be possible to try a small set of all possible inputs (random phrases) to get an idea of what words would maximize a given class.

My question: is there a more efficient algorithm to do this, other than trying random sequences of words from the vocabulary?

I’m thinking there has to be a way to do this backwards – looking at the weights in the various units in the last layers relative to the embedding values.

Thanks

submitted by /u/AbIgnorantesBurros
[visit reddit] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *