1: One question will be on the following section:”In all cases, though, the larger n is, the smaller the SE.– Estimates get “more precise (less variable) as the sample size increases”Lecture 2.2: One question will be on how to compute the slope for a certain model.Lecture 2.3: Not going to be tested Lecture 2.4: Very important. Read all of it. Conceptual understanding of the universal function, model error, irreducible error, etc. Lecture 3:I will test you on an example similar to the one you see in Figure 2.Validation Error versus Test error, what’s the difference?One question will be on the following section: This is used a lot, but there are obvious problems with it:– Smaller training set means higher variance in fitted models, especially more complex ones.– Maintaining large training set means potentially small validation and/or test sets* Remember that MSPE is a statistic and has its own variability and standard error* If computed on too few observations, MSPE is too variable to be useful• Typical recommendation: 70%/15%/15%– But it depends on complexity of models and sample size– If models are all relatively simple and sample is large, can make training set smaller fraction (e.g., 40/30/30)– If models are complex and/or sample size is small, need to keep training set pretty large.• Good idea to repeat splitting/MSPE calculation– Average results from each split– Reduces variability of MSPE calculation– Allows maintaining larger training sets.One question will be on how to use the R code to do the cross validation for a certain model. For bootstrap, one question will be on the following section: Observations may be drawn 0, 1, 2, 3,… times, with pure chance determining how many of each observation are in resample– On average, about 2/3 of observations get included at least once– Remaining observations (random, but about 1/3 on average) are the validation/test sample– These are sometimes called “out-of-bag” dataR code to plot the boxplot of MSPE/relative MSPE.Lecture 4:Not going to be testedLecture 5:Not going to test you on the detailed steps of stepwise/all subset/forward/backward. One question will be on information criterion.Lecture 6:One question will be on the following section.Some of the worst predictions from stepwise in Figure 3 are from cases where it actually found the right model!– When X1 was included in the model, its parameter was usually overestimated– Generally, variables are more likely to be included when delta errors in data help tomake them look important* When slope magnitude appears to be too close to 0, it gets set to 0– LS follows by chasing the same errors, overestimating slope magnitude.Focus on 3.1 and 3.2 Difference between ridge and lasso.Relationship between lasso and linear regression least square estimates.One question will be on the following section:Variability is a serious issue when it comes to doing prediction• Often, using biased regression parameter estimates (“shrinkage”) can yield predictions that have smaller average error than using unbiased estimates• Variable selection does the same thing in a different way, reducing variance by eliminating needless parameters from a model• Combining these two things, which LASSO does, can be a winning strategy for prediction in linear models.• BUT: NONE OF THIS HELPS US TO OVERCOME BIAS IF THE TRUE STRUCTURE IS NOT LINEAR!Lecture 7:Math relationship between the principal components, their variance, etc. Focus on 2.2. I may show you a plot and ask you which M to choose. Not going to test you on partial least square.May test you on the R code for doing PCA and PCR. Lecture 12:One question will be on 2.3Lecture 13:Two questions will be on 3.2 and 4.1 Lecture 14:One question will be on 3.1 One question will be on the following section.Bagging is most effective when– It is applied to machines that have low bias.– It is applied to machines that are easy to fit—fast and with minimal tuning—since the process needs to be automated and replicated many times.Lecture 15:Difference between random forest and bagging regression tree.May test you on the R code for random forest. Lecture 16:Focus on 2.2 A few questions on the effect on the model when you change number of trees, tree size, learning rate. May test you on the R code for boosting.