As mentioned in class, a perceptron without a step or sigmoid threshold is simply a linear function, and therefore the learning is the same as linear regression. Is this still true for multi-layer networks built out of the same type of (non-thresholding) perceptrons? Explain specifically why or why not.