联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-05-13 10:15

RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS

REGRESSION MODELLING

(STAT2008/STAT4038/STAT6014/STAT6038)

Assignment 2 for Semester 1, 2019

INSTRUCTIONS:

This assignment is worth 20% of your overall marks for this course.

Please submit your assignment on Wattle. When uploading to Wattle you must submit the following,

combined into a single document:

1. Your assignment/report in a pdf or word document.

2. The R code you have used for the assignment as an appendix. Failure to upload the R code

will result in a penalty.

Assignments should be typed. Scanned pdf les will not be marked and result in a penalty. Your

assignment may include some carefully edited computer output (e.g. graphs, tables) showing the

results of your data analysis and a discussion of these results, as well as some carefully selected code.

Please be selective about what you present and only include as many pages and as much computer

output as necessary to justify your solution. It is important to be be concise in your discussion of

the results. Clearly label each part of your report with the part of the question that it refers to.

Unless otherwise advised, use a signi cance level of 5% and two decimal places for all answers.

Marks may be deducted if these instructions are not strictly adhered to, and marks will certainly be

deducted if the total report is of an unreasonable length, i.e. more than 10 pages including graphs

and tables. You may include an appendix that is in addition to the above page limits; however the

appendix will not be assessed. It will only be used if there is some question about what you have

actually done.

You may ask me (Abhinav Mehta) questions about this assignment up to 24 hours before the

submission time. This will allow me enough time to respond to your questions.

Late submissions will attract a penalty of 5% of your mark for each day of delay. No assignments

will be accepted 10 days beyond the due date.

Extensions will usually be granted on medical or compassionate grounds on production of appropriate

evidence, but must have my permission by no later than 24hours before the submission

date. If you are granted an extension and submit your assignment after the extended deadline then

the late submission penalty will still apply.

Assignment 2 - Sem 1, 2019 Page 1 of 3

Question 1 [40 Marks]

A group of researchers in the US attempted to look at the pollution related factors a ecting mortality.

Sixty US cities were sampled. Total age-adjusted mortality, (mortality), from all causes, in

deaths per 100,000 population, was measured, along with the following covariates: mean annual

precipitation (in inches) (precipitation); median number of school years completed for persons

aged 25 years or older (education); percentage of population that is non-white (nonwhite); relative

pollution potential of oxides of nitrogen (nox); and relative pollution potential of sulphur

dioxide (so2). “Relative pollution potential” is the product of tons emitted per day per square kilometre

and a factor correcting for the city dimension and exposure. The data is available in a .csv le,

pollution.

(a) [6 marks] Fit a multiple linear regression (MLR) model withMortality as the response variable

and all other covariates as predictors. Is the regression model signi cant?

(b) [8 marks] What are the estimated coecients of the (MLR) model in part (a) and the standard

errors associated with these coecients? Interpret the values of these estimated coecients with

regards to model speci cation.

(c) [8 marks] There is a t-test associated with each of these coecients. Brie

y explain, what these

tests can or cannot be used for? In your answer, be sure to mention the appropriate hypotheses

that can be assessed using these t-tests.

(d) [6 marks] Construct an appropriate test of the hypothesis that education and nox are not

signi cant contributors to the model. That is, test βeducation = βnox = 0.

(e) [6 marks] A researcher from this group suggested a model with coecients: βprecipitation = 2,

βeducation = 10, βnonwhite = 3, βnox = 0, and βso2 = 1 may be a better model. Can you

test whether this new model is signi cant? How would you t such a model and what would

be the estimate of the intercept term with these coecients?

(f) [6 marks] One of the researcher is from the city of San Antonio, and has recorded a new set

of measurements on each of the predictors. The precipitation is 33, education is 11.5,

nonwhite is 17.2 and nox and so2 are each 1. What do you predict the mortality rate to be?

Find a 99% interval for this prediction.

Assignment 2 - Sem 1, 2019 Page 2 of 3

Question 2 [60 Marks]

The data for this question comprises measurements on breeding pairs of land-bird species collected

from 16 islands around Britain over the course of several decades available in a .csv le, bird. For

each species, the data set contains an average time of extinctions, extinct, on those islands where

the species appeared. (This is actually the reciprocal of the average of 1/T where T is the length of

time the species remained on the island and 1/T is taken to be zero if the species did not become

extinct on the island); the average number of nesting pairs per year, over all islands where the species

appeared (nest.pair); the size (size) of the species, (S = Small, L = Large); and the migratory

status (mig.status) of the species, (R = Resident, M = Migrant). It is expected that species

with large numbers of nesting pairs will tend to remain longer before becoming extinct. Of particular

interest is whether, after accounting for the number of nesting pairs, size or migratory status has any

e ect.

(a) [10 marks] Fit a multiple linear regression (MLR) model with extinct as the response variable

and all other covariates as predictors. Is the regression model signi cant? Interpret the coe-

cients for the categorical variables in this model. Does the coecient support the expectations

that large number of nesting pairs tend to delay extinction?

(b) [6 marks] As the question indicates, of particular interest is whether, after accounting for the

number of nesting pairs, size or migratory status has any e ect. Conduct a formal test of the

hypothesis that βSize = βMigStatus = 0 using an appropriate anova table. Evaluate the Fstatistic

and the corresponding p-value.

(c) [6 marks] The Red-crested Periwinkle is a small, migratory species of bird, while the Great

Plover is a large, resident species of bird. Assuming that the number of nesting pairs is the same

for each species over the period, based on the model in part (a), what would you predict the

di erence in extinction times to be for these two species?

(d) [8 marks] A noted theory suggests that Size and Migratory Status should contribute equally

to the extinction time. Test whether the coecients of size and mig.status are the same.

Construct an appropriate model to test this hypothesis.

(e) [20 marks] Produce the appropriate diagnostic plots for the model tted in part (a) and assess

the model assumptions. Produce the relevant in

uence diagnostics for this model. Which

data points appear to be in

uential in the analysis, and in what sense would you consider them

in

uential? Also, do any points appear to be outliers? If so, to which species do these points

correspond?

(f) [10 marks] Two transformations are suggested for the response variable, log(extinct) and

1/extinct. Investigate whether using these transformations improves on the model t. Comment

on the assumptions of MLR for these models as compared to your original model. Which

of three models would you choose based on your analysis?

Assignment 2 - Sem 1, 2019 Page 3 of 3


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp