The torch nn.linear operator is used to implement what is traditionally called the fully connected (FC) layer in neural networks. Looking at the nn.linear documentation I noticed that though the weights matrix is used as
(in_features, out_features) in the FC, it is stored as its transpose
(out_features, in_features). Why is that?
The consensus seems to be that: