Figure P12.20 depicts a neural-network-based scheme for approximating the target Q-factor denoted by Qtarget (i, α, w), where i denotes the state of the network, α denotes the action to be taken, and w denotes the weight vector of the neural network used in the approximation. Correspondingly Table P12.16 presents a summary of the approximate Q-learning algorithm. Explain the operation of the approximate dynamic programming scheme of Fig. P12.20 to justify the summary presented in Table P12.16.