联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-11-11 11:05

MATH1041 Statistics for Life and Social Science

Term 3, 2019

MATH1041 Assignment

Assignment release date: The assignment will be released to all students on Friday

the 1st of November on Moodle (see “Assessments” section).

Submission due date: Friday 15th November (Week 9) before 2pm (Sydney

time).

Please submit your assignment through Turnitin via Moodle, see the “Assessments Information”

section on Moodle for further information regarding online submission. You

must submit a neatly typed assignment converted to pdf format.

Data: A data set (in the text file format) will be sent to you via email at your official

university email address (see page 2 of this document for further details).

Assignment length: No more than SIX single-sided A4 pages including this cover

sheet as the first page. Also, please make sure that you include your name and zID

somewhere in the assignment.

Obtaining the data via email and reading it into RStudio

The data (that is, your data set) are available in a text file with a name similar to:

“z1234567.txt”, (where z1234567 in the text file name is replaced by your unique student

zID number). This text file has been sent to you via email at your official

university email address. PLEASE CHECK YOUR UNIVERSITY EMAILS

REGULARLY TO MAKE SURE THAT YOU HAVE OBTAINED YOUR

DATA SET. Please email Dr Jakub Stoklosa (j.stoklosa@unsw.edu.au) if haven’t

received your data set yet.

The first step is to read the data into RStudio. The data format is simple and similar

to what you have already done in the Introduction labs. Follow the instructions given in

section R1.4 “How to import a text file into RStudio”of the RStudio “How-To-Manual”

available on Moodle. Once you’ve uploaded the data then you are ready to start your

analysis!

Computing assignment format

Here are some more details that may assist you:

? Regarding the overall assignment structure, please answer all questions in the given

order (that is, 1a), b), etc.). You don’t need to re-write the assignment questions

again. Keep your answers brief, clear and concise.

? You are required to type up your entire assignment (rather than scanning and taking

screenshots), including any equations. If you are using Word you should use the

equation editor for any maths notation. If you don’t have Word then please use the

School computers, or you can download Word for free, see:

https://student.unsw.edu.au/notices/office

? Please convert and submit your assignment in pdf.

? We recommend adding some working out for some of the questions involving calculations.

But try to keep your solutions brief and concise (since there is a page

limit). It’s good practice for the exam and in case you get the wrong answer you

have some workings to gain marks from. Depending on what the question is asking,

your working could consist of RStudio commands or perhaps the main steps on how

you arrived at your answer. You don’t need to add all of your R-code!

? Keeping your results to 2 or 3 decimal places should be fine.

? There is no requirement for font size and line spacing but obviously don’t make

things too small.

2

Scenario

A group of research ecologists were interested in studying the impacts of climate change

on different species of plants that grow in New South Wales, Australia. Some of these

plants are native to Australia while others are non-native (exotic).

To obtain their data, the research team decided to collect a random sample of plants

from a national park. Some measurements were then taken on each plant. The random

sample of data consists of plant height measurements (measured in centimeters), dry

weight measurements (measured in grams), whether the plant was native or non-native

to Australia and the polinization mode of the plant (this could one of four types: wind,

water, insect and self-polinization).

The text file contains your unique data of length n in separate rows consisting of 4

variables: Height which corresponds to the heights, Weight which corresponds to dry

weight of a plant, Type which corresponds to plant type (native = 0 and exotic = 1), and

Polin which corresponds to the polinization mode of the plant (Wind, Water, Insect and

Self).

Your job is to assist the research team by analysing the data set provided to you.

The Analysis Tasks

The questions you need to answer in your assignment submission are given below. Please

make sure your assignment is converted to pdf format.

1. (a) Calculate the sample mean and sample standard deviation of your plant height

(Height) measurements.

(b) Produce a normal quantile plot of your sample of plant height measurements

(see Section R2.6 “How to produce a normal quantile plot using RStudio”).

Include this plot in your submitted assignment, properly labelled.

(c) By referring to the normal quantile plot obtained in Part 1b briefly discuss if

the plant heights are approximately normally distribution.

2. Let μ be the population mean plant height (in centimeters) of plant heights in

the national park now (Spring, 2019). The research team decided to compare the

current plant height mean with the mean from 20 years ago using plant height data

obtained from the same national park. The known mean plant height from 20 years

ago was 190 centimeters.

(a) Test the hypothesis that μ is equal to 190 centimeters. You must summarize

all steps: state the null (H0) and alternative hypotheses (Ha) relevant to the

research objectives stated in this scenario, the value of a suitable test statistic,

the sampling distribution for this statistic, a P-value, your summary of

significance and conclusion in plain language.

3

(b) Some assumptions need to be made for the sampling distribution of the test

statistic (as given in Part 2a) to be valid. State these assumptions, and briefly

discuss whether these assumptions are satisfied.

(c) Produce a 95% confidence interval for μ, the mean heights. For this question

you may assume that it is appropriate to use a t-distribution. Make sure you

write down all the required steps to calculate this interval.

(d) Does your confidence interval (constructed in Part 2c) include the value 190

centimeters?

(e) Explain whether your confidence interval (constructed in Part 2c) is consistent

with your conclusions from the hypothesis test in Part 2a.

(f) Next, produce a 90% and a 99% confidence interval for μ, the mean plant

heights. Again, for this question you may assume that it is appropriate to

use a t-distribution. You don’t need to write down all the required steps to

calculate these intervals, reporting the values is fine.

(g) Briefly comment on how these confidence intervals compare with the confidence

interval you calculated in Part 2c.

(h) Other than changing the confidence level, what two other quantities could we

change to decrease the length of a confidence interval?

3. The research team were also interested in studying the relationship between:

? Plant type and height

(a) Produce a comparative boxplot for plant type against height. Include this plot

in your submitted assignment, properly labelled.

(b) Describe any differences or similarities in the distribution of plant height for the

different types (native or exotic) using your comparative boxplot from Part 3a.

Include in your answer comments on shape, location, and spread.

? Plant type and polinization mode

(c) Construct an appropriate numerical summary for the plant type and polinization

mode.

(d) Briefly describe any differences or similarities of plant type and polinization

mode from your numerical summary from Part 3c.

? Plant height and weight

(e) Construct an appropriate graphical summary to visualize the relationship between

plant height and weight. Include this plot in your assignment, properly

labelled.

(f) Summarize the key features of your plot from Part 3e.

4

(g) Suggest an appropriate numerical summary to quantify the strength of the linear

relationship between plant weight and height. Report and briefly comment

on this value.

(h) The research team wanted to predict plant weight from plant height measurements

by fitting a linear regression model. Would you recommend the research

team do this? Explain briefly. You are not required to carry out any prediction

in this question.

4. The research team decided to investigate the plant weight (Weight) measurement

in more detail.

(a) Produce a five number summary for the Weight measurements.

(b) Using the appropriate measure found in Part 4a, comment on the location of

the Weight measurements.

(c) Produce a histogram for the Weight measurements. Include this histogram in

your submitted assignment properly labelled.

(d) Comment on the shape (skewness/symmetry) of your histogram from Part 4c.

(e) A common technique that can be used to remove skewness in data is known as

a log-transformation. That is, for each value in your data (denoted by xi), you

can log-transform it as yi = log(xi). The function in RStudio that performs a

log-transformation on a value is log().

Produce a histogram for the log(Weight) measurements. Include this new

histogram in your submitted assignment properly labelled.

(f) Again, comment on the shape (skewness/symmetry) of your histogram from

Part 4e.

(g) Do you think this log-transformation reduced any skewness? Explain briefly.

5


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp