Problem
In some cases, it is possible to decompose a nonlinear dependency Y = f(X) into finer-grained dependencies. For example, we may be able to decompose the nonlinear function f as f(X) = g(g1(X1), g2(X2)), where X1, X2 ⊂ X are smaller subsets of variables. Show how this decomposition can be used in the context of linearizing the function f in several steps rather than in a single step. What are the trade-os for this approach versus linearizing f directly?