联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-09-13 11:13

Assignment 2, Question 2 MAST90125: Bayesian

Statistical Learning

Due: Friday 20 September 2019

There are places in this assignment where R code will be required. Therefore set the random

seed so assignment is reproducible.

set.seed(123456) #Please change random seed to your student id number.

Question Two (20 marks)

In lecture 3, we discussed how a Bayesian framework readily lends itself to combining information from

sequential experiments. To demonstrate, consider the following data extracted from the HealthIron study.

Serum ferritin levels were measured for two samples of women, one of C282Y homozygotes (n = 88) and

the other of women with neither of the key mutations (C282Y and H63D) in the HFE gene, so-called HFE

‘wildtypes’(n = 242). The information available is

? idnum: Participant id.

? homc282y: Indicator whether individual is Homozygote (1) or Wildtype (0).

? time: Time since onset of menopause, measured in years.

? logsf: The natural logarithm of the serum ferritin in μg/L.

The data required to answer this question are Hiron.csv, which can be downloaded from LMS.

a) Fit a standard linear regression,

E(logsf) = β0 + β1time

with responses restricted to those who are homozygote (homc282y = 1). This can be done using the lm

function in R. Report the estimated coefficients β?, estimated error variance,

b) Fit a Bayesian regression using a Gibbs sampler to only the wildtype (homc282y=0) data. Use the

output from your answer in a) to define proper priors for β, τ . For help, refer to lecture 13. For the

Gibbs sampler, run two chains for 10,000 iterations. Discard the first 1000 iterations as burn-in and

then remove every second remaining iteration to reduce auto-correlation. When storing results, convert

τ back to σ2

. When running the Gibbs sampler, incorporate posterior predictive checking, using the

test statistic T(y, β) = Pn

, where ei

is the predicted residual for

observation i at simulation j and e

is the replicate residual for observation i at simulation j. Report

posterior means, standard deviations and 95 % central credible intervals for β0, β1, σ2

combining results

for the two chains.

c) Perform convergence checks for the chain obtained in b). Report both graphical summaries and

Gelman-Rubin diagnostic results. For the calculation of Gelman-Rubin diagnostics, you will need to

install the R package coda. An example of processing chains for calculating Gelman-Rubin diagnostics

is given below.

Processing chains for calculation of Gelman-Rubin diagnostics. Imagine you have 4 chains of

a multi-parameter problem, and thinning already completed, called par1,par2,par3,par4

1

Step one: Converting the chains into mcmc lists.

library(coda)

par1<-as.mcmc.list(as.mcmc((par1)))

par2<-as.mcmc.list(as.mcmc((par2)))

par3<-as.mcmc.list(as.mcmc((par3)))

par4<-as.mcmc.list(as.mcmc((par4)))

Step two: Calculating diagnostics

par.all<-c(par1,par2,par3,par4)

gelman.diag(par.all)

d) Fit a standard linear regression,

E(logsf) = β0 + β1time

to all the data using the lm function in R. Report β?, and associated 95 % confidence intervals. Comparing

these results to the results from b), do you believe that sequential analysis gave the same results as fitting

the regression on the full data.

e) Report the results of posterior predictive checking requested in b). Do you believe the postulated model

was plausible. If not, what do you think is a potential flaw in the postulated model.

2


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp