I am trying to understand what a tensor is and how its properties are useful in machine learning.
I’m looking for feedback on whether I’m on the right track in this journey.
I want to answer why and how a tensor works for classifying, really, anything.
A tensor classifies by its very nature by defining a space. The more dimensions you add to that space the more complex a space you can describe.
Is the act of applying transform rules to a tensor what allows it to describe all the other variations of the concept the tensor is trying to describe? Or is it just transforming the tensor into one of the mirror representations (more on this later)?
A tensor is like a feature. Hotdog is something you can classify using a tensor. The “crispness” of that classification increases as you increase the ranks. The more ranks you add to the tensor the better you can represent what a hotdog is.
Not enough ranks and a nose feature is easily confused as a carrot. Maybe a feature described with a tensor of lower ranks will find it impossible to gain the resolution required to distinguish between a hotdog and a carrot at all.
Is there such a thing as too many ranks? Or does it just become harder and harder to train? Do more ranks increase the possibility of overfitting? I don’t know – but I’d love someone to reason though it.
The permutations of values in the dimensions the object represents must have to have an unimaginable number of mirror representations that would also represent a hotdog That’s why trained models with different values can give the same outcome. Could this be what a transform is doing?
There are even more slightly skewed representations of a hotdog that exist as the values at each dimension are wiggled. But those skews exist, for example, in the visual data because adding a rank of “what is it used for” to the data makes those visual confusions impossible. You would never confuse a hotdog for a flash light if the value of edible was added to the dataset being trained.
One or more of the tensors dimensions would be 99.99999% successful because they would all conclude to use that as the best datapoint.
But visual data doesn’t have such obvious binary data – I mean it’s binary, but it visual data can’t take the district property of:
“1” – edible and “0” – inedible.
Instead the binary nature of the decision exists in a more complex representation between the dimensional values (lol, actually also all the mirrors) – eg you can represent it as:
“0” – edible and “1” – inedible.
Training on data is the process of bumping the values at random until they fall into one of these permutations that’s able answer the question with the desired classification.
Overtraining is when you bump it to only recognize the data itself as the “space”, being the data is the defining binary decision it encodes for – not what the data is trying to embody.