I have a large database of hundreds of thousands of users interacting with thousands of products for given amount of time, with more time indicating more interest. My company wants to understand if there are particular subgroups of similar consumers. In order to discover if that’s the case, I’ve built a 2-stage ML approach:
– using a tfrs model based on the basic retrieval tutorial (https://www.tensorflow.org/recommenders/examples/basic_retrieval), I’ve trained embeddings to represent my users and products.
– using k-means clustering on the user embeddings, I classify a particular user as a member of a particular cluster.
With this approach, I run into 2 challenges:
– the basic retrieval does not take into account the implicit feedback of the amount of time. This seems like a recurring theme in this space – to weigh the user-item interactions by some measure of implicit feedback. I can’t seem to find any TF implementations though – any tips?
– my trained user embedding layer does not seem suitable for k-means clustering in a sense, since its measure of inter vs intra cluster distance does not meaningfully reduce over training iterations, and (more importantly) decreases linearly(!) with a higher value for k, making it impossible to use the elbow method to determine an objectively good trade-off between k and explained variance.
what would you advice to tackle both of these issues? Thanks for thinking along!
I know some of these questions are more ‘applied machine learning’ than ‘tensorflow’ per se, but I didn’t know where else to take this question, so apologies if this in the wrong category.