The torch nn.linear operator is used to implement what is traditionally called the fully connected (FC) layer in neural networks. Looking at the nn.linear documentation I noticed that though the weights matrix is used as (in_features, out_features)
in the FC, it is stored as its transpose (out_features, in_features)
. Why is that?
The consensus seems to be that: