联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2020-05-10 10:07

Stat 462 - Individual Project 2

Due May 10, 2020

Regression Analysis

This project is to be completed individually. You may submit pdf only (Rmd is not needed and you can use

another word processing tool if you like).

We will use the Ames Housing dataset, which has 82 variables and 2930 observations (AmesHousing.txt).

The 82 features include 23 nominal, 23 ordinal, 14 discrete, and 20 continuous variables. You may just use 20

continuous variables in this project.

Your goal is to predict the sale price. Exclude the Order, PID, and of course SalesPrice variables

from your predictors. You may want to combine variables (e.g. summing square feet) and perform various

manipulations (e.g. transformations for nonlinearity) we have learned about.

You should try at least multiple linear regression (with and without transformations) and weighted least

squares (or generalized least squares), but you are welcome to explore more! You need to preprocess the

dataset. After the preprocessing, use the following code to define your test set, and its complement the

training set.

set.seed(2020)

testindices = sample(2930, round(2930/4)) ## train indices are the rest

Write up your results in a professional report, like you would present to a client or internal customer for

your analysis. The report should be no more than 4 double-spaced pages long and submitted in

PDF format. You should put important tables/figures in the report and put additional tables/figures in

the appendix.

It should include an appropriate analysis of the performance of the models you consder, and the reasons

for your final choice of model(s). Include any other details from your analysis that you feel are worthy of

mention.

The report should have four sections (Introduction, Analysis, Results, Conclusion) and provide sufficient

details that anyone with a reasonable statistics background could understand exactly what you have done and

what you concluded. In the introduction part, you should present some background and motivation to analyze

the housing data. In the analysis part, you should outline the analysis and some necessary methodological

details. In the result part, you should use tables/figures to summarize your results and explain your findings.

In the conclusion part, you should connect your analysis and results to your motivation and discuss some

possible future work. Do not embed R code in the body of your report (if you are using rmarkdown, use

{r echo=FALSE} to supress the printing of the r code), but instead attach the code in an appendix. The

appendix does not count towards the page limit.

Grading criteria (out of 15)

10 points: fulfilling the project requirements. You may want to remove variables or observations with missing

values and combine variables (e.g. summing square feet)

5 points: the quality of your report (including: clarity of writing, organization, and layout; appropriate use of

tables and figures; careful proof-reading; adherence to report guidelines

1

Project Requirements

Requirement 1:

Preprocess the dataset (e.g., checking missing values, combining highly correlated features). Perform some

exploratory data analysis such as summary statistics, boxplot, correlation plot, and so on.

Requirement 2:

Fit the regression model on the training dataset. Perform the appropriate diagnostics for your regression

analysis (checking the regression assumption, influential observations, outliers, collinearity). You may need to

remove some collinear variables and/or outliers.

Requirement 3:

After removing the collinear variables and/or outliers, you may fit the following regression models: (i) the

full model, (ii) the sub model chosen by AIC, (iii) the sub model chosen by BIC, (iv) the sub model chosen

by Lasso, and (v) the sub model chosen by the Elastic Net. Summarize the fit of these models and compare

their coefficient estimates.

Requirement 4:

For each model, calculate the “mean prediction error” in the testing dateset.

2


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp