联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2020-04-03 11:02

STA302/1001 - Assignment # 4

Due Friday April 10 by 11:59PM on Crowdmark

Student 1 Name:

Student 1 Email:

Student 2 Name:

Student 2 Email:

Instructions:

Assignments must be submitted electronically through Crowdmark. Each student will receive a

personalized link to view the assignment (this is where you will submit your assignment when

finished). If you do not receive this email from Crowdmark, check your spam/junk folder. Instructions

for how to upload completed assignments can be found here: https://crowdmark.com/

help/completing-and-submitting-an-assignment/. Note that only PDF, PNG or JPG

file types are accepted by Crowdmark. You will need to upload certain questions into certain

places, so make sure you are submitting pages in the right place.

Students may work in groups of no more than 2 people, with only one assignment submitted to

Crowdmark per group. When you receive your personalized link to the assignment, you may then

enter your group members name. A shared submission link will be sent to both group members, so

you can both submit the assignment or edit submissions. Only one assignment should be submitted

per group.

The assignment is divided into four questions, each with subparts. Each question needs to be uploaded

under the correct section in Crowdmark, otherwise it may be overlooked when graded. One

question is a calculation-type question, one will be a theoretical/derivation/proof type question, and

one will involve using R. You should make sure to show all your work with the first two questions,

while the R questions should be presented in a report-type format (i.e. include output and graphs

with written explanations of answers in the main document, R code places in an appendix at the

end). If you are comfortable with RMarkdown, it is recommended to complete your assignment

with it. Otherwise, any word processing document will suffice for the R question. You may submit

handwritten answers for questions 1 and 2, but they must be legible and neat.

Note that there is a 20% per day late penalty on assignments. After 48 hours of being late, the

assignment will no longer be accepted. This means that you should submit your assignment no

later than Sunday April 12 at 11:59PM to avoid receiving a grade of zero.

1

Question 1 (14 points) - Derivations/Proof Question

This question walks you through how to prove the equivalence of two of the formulae for the

Cook’s Distance using matrix notation. Suppose X(i)

is the (n ? 1) × (p + 1) design matrix with

observation i removed, and X is the n × (p + 1) design matrix with all observations. Let xi be

a column vector representing the predictor values for observation i, and yi

is the response value

(scalar) for observation i. Finally, β? is the column vector of predictors estimated using the full data,

and β?

(i)

is the column vector of predictors estimated without using observation i.

(a) (2 points) Show how we would predict the response for observation i using the regression

model that has been fit without observation i.

(b) (3 points) Using the fact that β? = (X0X)

?1X0Y, and the following result,

show that post-multiplication of this result by (X0Y ? xiyi) yields

β?(i) = β? ? (X0X)?1xiyi +(X0X)?1xix.

(d) (4 points) Using your result in (c),

Question 2 (16 points) - Hand Calculation Question

We previously considered building multiple linear regression models for gas mileage of cars based

on characteristics of each vehicle model. We can now consider a few different models and attempt

to determine which model is better.

(a) (4 points) Using the table of summary values below, and that we have taken a sample of 30

vehicles, compute the AIC for each of the three models. Based on these values, which model

would you say is better?

Model Predictors Residual Standard Error

Model 1 all 11 predictors 3.227

Model 2 Displacement, Horsepower, Torque, Number of

Transmission Speeds, Weight 3.245

Model 3 Displacement, Horsepower, Weight 3.171

(b) (4 points) Using the above summary table, calculate the corrected AIC for each of the above

models. Based on this, would we prefer the same model as in part (a)?

(c) (4 points) Now, knowing that the sample variance of gas mileage is 39.28 MPG, find the

adjusted coefficient of determination for each of the models in (a). Based on this measure,

which model is preferred?

(d) (4 points) Suppose we consider the smallest model (model 3 from part (a)). We can fit a

model using each predictor as a response using the remaining predictors as predictors. Below

is a summary of each of these models.

Response Predictors Residual SE Sample Variance of Response

Displacement Horsepower, Weight 27.21 13511.05

Horsepower Displacement, Weight 15.64 1993.689

Weight Displacement, Horsepower 299.1 885420.2

Find the Variance Inflation Factor of each predictor. Should we be concerned about multicollinearity

in the model 3 from (a)?

3

Question 3 (14 points) - R Data Analysis Question

Use the dataset found on Quercus under Assignment 4 materials to answer the below questions.

These data contain information on 50 men, on which were measured their percent body fat, their

height, their waist size, and their chest size.

(a) (1 points) Fit a multiple linear model to predict Percent Body Fat (Pct.BF) from waist size,

height and chest size.

(b) (3 points) Determine whether there are any:

(a) leverage points

(b) outlier observations

(c) influential observations. If there are, in what way do they influence the regression surface?

(c) (3 points) Can we use our residual plots to make conclusions regarding which assumptions

are violated? Support your answer using appropriate plots.

(d) (3 points) Determine whether the model from (a) is a valid model using appropriate plots.

(e) (2 points) Determine whether there is multicollinearity among the predictor variables.

(f) (2 points) Create an indicator variable for each of the observations 19 and 38. Add these

indicator variables to the model in (a) as main effect terms. Show that we have now removed

the influence of these observations from the model by showing that they no longer appear as

influential observations.

4


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp