Short Answer: Short Answer: What is the difference between PCA and LDA for feature dimension reduction? What is the difference between maximum likelihood and Bayesian method for parameter estimation?With respect to the estimation of probablity density function p(x|X) where X is the set of given training samples, under what conditions does the Bayesian estimate approximate the Maximum Likelihood solution? As far as convergence property and optimization criterion are concerned, what is the difference between perceptron learning and MSE solution? What methods could be used for Multilayer Perceptron in order to avoid overfitting? As it is in RBFs, why is nonlinear transformation used which is then followed by a linear one? What unsupervised methods can be used to choose RBF centers? What is the basic principle of structural risk minimization? What is the objective function of linear support vector machine? What is the basic idea behind the nonlinear SVM? Does nonlinear SVM share som similarity with Radial Basis Function, and if so what is that? In kernel density estimation, does kernel independence imply feature independence? How can Maximum Likelihood method be used to select bandwidth parameter for kernel density estimation? What is the basic assumption of naive Bayes classifier? One short coming with basic kNN method is its sensitivity to noisy axes(features) , What method do you think can be used to remedy this? Does simulated annealing method have the ability to escape local minima? As far as the estimation of error rate is concerned, what is the difference between random subsampling method and bootstrap method? 2.Show that if and are two points in high dimensional space, the hyperplane bisecting the segment with end points , , leaving at its positive side, is given by