代做STATS762、代写R编程设计、R设计代做、代写data-代写Algorithm 算法作业

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Algorithm 算法作业Algorithm 算法作业

代做STATS762、代写R编程设计、R设计代做、代写data

日期：2020-05-31 03:54

STATS762 Regression for Data Science

Assignment 3

Due date: 10am, 1 June 2020

Instruction

• Please submit both your R Markdown document and a pdf file containing

the document it generates. To create a pdf you should start your R Markdown

document with the following lines (having made the appropriate

changes):

---

title: "STATS 762 Assignment 3"

author: "Your Name, ID 1234567"

date: "Due: 10am, 1 June 2020"

output: pdf_document

---

• Add the set.seed-function before your R-script to obtain the same output

when it is resimulated.

• All answers should be written with corresponding question numbers.

• Working must be shown.

• Each answer should be written explicitly and a R-code itself does not

make an answer.

For example, the question is finding an average height of 6 trees: (1, 2, 1,

3, 1.5).

Good answer Bad answer

• If any of above is unsatisfied, a penalty may be applied.

1. The spreadsheet avocado2.csv contains historical 338 avocado sales in

various markets in California, US. The attributes follow;

Total.Volume Total number of sold avocados

AveragePrice Average price of a single avocado

type Production type; organic and conventionally produced avocados

A researcher wants to investigate how the amount of sales relates to an average

price and a production type (organic/conventional). Total.Volume

is transformed in a log-scale to fit a linear regression model with AveragePrice

and type.

(a) Write how a log-transformed total number of sold avocados is useful

for modelling a quantile using a linear regression. [2 marks]

(b) Find a suitable linear regression model for the 0.2 quantile of log(Total.Volume)

and express a typical 0.2 quantile of total number of sold avocados

for a given price and production type. [5 marks]

and express a typical 0.8 quantile of total number of sold avocados

for a given price and production type. [5 marks]

(d) Using your model, predict the 0.2 quantile of the total sales for $1.2

conventional avocados and $1.8 organic avocados. [1 marks]

(e) What conventional avocado price does result that 80% of markets

sold at most 5.4 millions avocados? [3 marks]

2. The spreadsheets (banktrain.csv and banktest.csv) are related with

direct marketing campaigns of a bank. The marketing campaigns were

based on phone calls. Often, more than one contact to the same client was

required, in order to access if the product (bank term deposit) would be

(or not) subscribed. The interest is to predict if the client will subscribe a

term deposit (variable y).

The attributions follow;

gender - gender (categorical: ”male”,”female”)

age - age (numeric)

marital - marital status (categorical: ”married”,”divorced”,”single”)

education - education information of client (categorical: ”unknown”,”secondary”,”primary”,”tertiary”)

default - credit account status (categorical: ”yes”,”no”)

balance - average yearly balance, in euros (numeric)

housing - housing loan status (categorical: ”yes”,”no”)

loan - personal loan status (categorical: ”yes”,”no”)

contact - contact communication type (categorical: ”unknown”,”telephone”,”cellular”)

duration - last contact duration, in seconds (numeric)

campaign - number of contacts performed during this campaign and for this client (numeric)

previous - number of contacts performed before this campaign and for this client (numeric)

poutcome - outcome of the previous marketing campaign (categorical: ”unknown”,”other”,”failure”,”success”)

y - Has the client subscribed a term deposit? (categorical: ”yes”,”no”)

We use the train data (banktrain.csv) to find a model and the test data

(banktest.csv) to examine the predictability of a model. Note that the

number of cross validation folders is 10.

The function in make.r reforms a data that each categorical variable creates

indicator variables corresponding to categorical levels. It produces

a list with two objects; a reformed data (data) and a vector of group

memberships (gpname).

(a) Using the train data, complete the following questions.

i. Using an appropriate penalty on the model complexity, find a

model minimizing the cross validation error. Show how you

found the model and describe the model with the client characters

included. [4 marks]

ii. Using an appropriate penalty on the model complexity, find

a parsimonious model. Show how you found the model and

describe the model with the client characters included. [4 marks]

(b) Estimate the predictability of each model using an appropriate measure

and, compare the predictability. [3 marks]

very likely to subscribe a term deposit. [3 marks]

(d) If a marketing focuses on a single client character what would be the

feature to succeed the marketing campaign? [3 marks]

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代写FIT3152、代做Data analytics、代写R、R编程语言调试

【下一篇】：代写FIT3152、代做Data analytics、代写R、R编程语言调试

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Algorithm 算法作业Algorithm 算法作业

代做STATS762、代写R编程设计、R设计代做、代写data

日期：2020-05-31 03:54

相关文章