[R] Transposed matrix of the matrix containing the probabilities not changing despite loss term?
Hello,
I’ll keep it short. Say we have a neural network with a layer that outputs probabilities using a softmax. This gives us a [batch size, probabilities] tensor. Lets call it P
If I do P_transposed x P, I get a PxP matrix. My loss uses the Frobenius norm to enforce that this PxP matrix is diagonal (so the off-diagonal values are 0). My hope is that this directly impacts the original matrix’s P structure.
However, this is not the case the PxP matrix does not approach a digital structure nor does P get impacted. This is the case even if I scale the loss by 100.
I would think this would work, am I wrong? Would this not indirectly affect our P matrix? Thanks!
submitted by /u/Grand_Comparison2081
[link] [comments]