联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-05-19 09:33

Applied Statistics

Assignment Semester 1. Due: Thursday 23rd May 5:00pm 2019

You are expected to write your assignment using R Markdown (see Lecture 6) or MS Word and submit a PDF.

You are required to write your name, student ID and Unit Code on the first page. You need to submit your

assignment via the provided submission link on iLearn.

You may discuss the assignment in the early stages with your fellow students. However, the assignment

submitted should be your own individual work.

The R Markdown ‘Cheatsheet’ from the RStudio team is given here.

In your answers to the questions below, produce the appropriate R output and/or explanation of the steps and

results. Don’t include any more R output than necessary and include only concise explanations.

Question 1 [28 marks]

Since 1979, satellites have regularly measured the extent of sea ice in the Arctic Ocean. Rapid melting of

Arctic sea ice is seen both as a symptom and a cause of a changing climate. The average September Sea

Ice Extent (in 1, 000, 000km2

) are recorded for each year and the data is available in the file seaice.dat on

iLearn. The variables are defined below.

Extent Sea Ice Extent (in 1, 000, 000km2

)

Year Calendar Year

Using the data for the years 1979 to 2002 only, answer the following questions.

Hint: dat1 <- subset(dat, dat$Year<=2002) is one way to create a new dataset (dat1) in R that is a subset

of the original dataset (dat) and only contains data up to the year 2002.

a. [3 marks] State the statistical model for a simple linear regression of Extent explained by Year. Carefully

define all the necessary variables and parameters in your answer.

b. [3 marks] A simple linear regression seems appropriate for the 1979-2002 data. Justify the use of a

simple linear regression model.

c. [2 marks] Fit a simple linear regression to the 1979-2002 data. Explain why there is a linear relationship.

d. [2 marks] Is this a strong linear relationship? Explain your answer in the context of this data.

e. [2 marks] Predict the extent of the sea ice (in km2

) for the year 2000.

f. [2 marks] Compute a 95% prediction band for the Extent of Sea Ice (in km2

) in the year 2000.

g. [2 marks] Compute a 95% confidence band for the Extent of Sea Ice (in km2

) in the year 2000.

h. [2 marks] Explain clearly what the prediction band represents and what the confidence band represents.

1

Using all the data for the years 1979 to 2012, answer the following questions.

i. [2 marks] Justify why a simple linear regression is inappropriate for the 1979-2012 data.

j. [3 marks] Fit a second order polynomial regression model to the data and validate the model.

k. [1 mark] Plot the fitted polynomial to your data

l. [2 marks] Using the second order model you fitted, predict the extent of the sea ice (in km2

) for the

year 2000.

m. [2 marks] Compare your answers in part e) and part l). Which prediction value do you recommend and

why?

2

Question 2 [25 marks]

A study into the quality of Portuguese Vinho Verde red wine was conducted to examine the possible

relationship between wine quality and the chemical composition of the wine. Overall quality scores were

obtained by combining the scores from several tasters. Information was recorded on wine bottles of Vinho

Verde and is availble in the file pwine.dat on iLearn. The variables are defined below.

Quality Aggregated score across the taste testers

Alcohol Level of alcohol in percent

Density Density or specific gravity of the wine

pH Acidity level of the wine

a. [4 marks] State the statistical model for a multiple regression with Quality as the response using all

other variables as predictors, defining any parameters as necessary.

b. [2 marks] Fit this multiple regression model and write down the fitted model.

c. [4 marks] What are the assumptions required for a multiple regression analysis? If possible, validate

those assumptions for the multiple regression model you fitted in part b.

d. [6 marks] Conduct an F-test for the overall regression i.e. is there any relationship between the response

and the predictors. Write your answer as a formal hypothesis test and include the ANOVA table (one

combined regression SS source is sufficient.)

e. [3 marks] From the analysis in part b. determine the 95% CI for the Alcohol slope parameter and

comment on its meaning in this context.

f. [2 marks] Using the model selection procedures used in this course, find the best multiple regression

model that explains the data giving reasons for your choice(s).

g. [4 marks] State the final fitted regression model and comment on its interpretation.

3


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp