IE 6200 Midterm 2 Project
190 points
Fall 2019
For your final project you will find a dataset online and complete an extensive statistical
analysis. The material relevant to this final project will be covered through the December
2nd lecture but I recommend you find your dataset and complete most of the analysis this
week as you already have the tools to complete a large portion of this project.
1 Logistics
Due: December 6th, 9:00 pm PST - No Exceptions
2 Grading
Grading of the project: The point distribution for the project will be as follows:
50% Writing, professionalism, organization, clarity and completeness.
50% Statistical analysis.
You will need to turn in:
1. A complete, well formatted, professional report in pdf format. The report should not
contain code or raw R output. The report can be completed in R Markdown just use
the appropriate formatting to render the report so that it looks good. The appendix
should contain a copy of your code and a bibliorgraphy that includes sources including
the source of your data
2. An .R file that contains all code and allows re-production of the analysis by pushing
”run.”
3. The complete dataset in csv or excel format that can be loaded and run with your R
script. We will run your R script to ensure that it loads the data, runs and matches
the analysis presented in your report.
1
3 Dataset Requirements
To complete this final project you’ll need to use a dataset where the samples are independent
and the sample size is at least 30. The dataset should contain at least two quantitative
variables and at least 3 categorical variables. One of the categorical variables should have
at least 3 categories. The dataset cannot have been used previously in class, homeworks or
any other setting for this course. You cannot use the same dataset as any other student. As
soon as you know which dataset you’ll be using, post a link to it in the discussion area on
Blackboard.
4 Analysis Objectives
You will complete all of the following tests using your chosen dataset.
1. One sample t-test
(a) traditional statistical tools
(b) bootstrap methods
2. One sample test of proportion
(a) traditional statistical tools
(b) bootstrap methods
3. Two sample t-test for difference in means
(a) traditional statistical tools
(b) bootstrap methods
4. Two sample test for difference in proportions
(a) traditional statistical tools
(b) bootstrap methods
5. ANOVA
(a) traditional statistical tools
6. Chi-square goodness of fit OR test of association (you can do both but you only need
to pick one).
(a) traditional statistical tools
2
5 Report
The analysis should contain the following parts.
Part 1. Introduction: Describe the dataset that you have chosen. Describe why you chose
this dataset and why is of interest to you. You will want to consider which questions can be
answered with your chosen dataset and how you should frame the questions for each of the
tests.
• You will need to be provide an introduction to your data and question that draws the
reader in - why should I care about your analysis?
• You must use a dataset that has at least 30 data points.
• Try to find information about the sampling strategy used for your data, you should
summarize it. You should describe your concerns about the sampling strategy and
surface any questions you have about the methodology.
• You will need to describe why this dataset is of interest to you.
• You will need to describe what each variable measures, the type of the variable and
the scale of the variable.
• Data must be included in the appendix and it should also uploaded as a csv or excel
format so that it can be used to re-create your analysis.
Part 2. Exploratory analysis and data visualization: Familiarize the reader with your data
using visual tools.
• Provide an exploratory analysis of your data.
• Provide at least 4 different types of graphs that help the reader understand important
aspects of your dataset. Graphs must have all appropriate titles, labels and legends.
For each graph provide a description of why it is relevant to your study and the
question you are trying to answer. It should be clear that the visualization adds color
and interest to the statistical analysis and question of interest.
Part 3. Statistical analysis: You must complete all of the tests listed in the analysis objectives
sections using traditional statistical methods and bootstrap methods. You should
compare the results of the two methods for each test.
• You will also need to include and discuss these points for each test in your report.
– Finalize your question of interest.
– What is the statistical test you are going to use?
– What is the population parameter you would like to make inference to?
– What is the test statistic (aka sample statistic)?
3
– What is your null hypothesis? State clearly in words and using correct mathematical
notation.
– What is your alternate hypothesis? State clearly in words and using correct
mathematical notation.
• Explain your choice of statistical methodology and why it is the right choice in answering
your question.
• Confirm that the requirements to use the statistical method have been met. If they are
not met, explain why they are not met and what the impact will be in your analysis.
• Provide the results of your analysis in the context of your problem, complete with
correct units and interpretation.
• You must include a histogram of the sampling distribution of your statistic along with
a description of the distribution.
• You must include a histogram of the null distribution.
• You must provide a confidence interval and interpret it.
Part 4. Discussion: The discussion should gracefully conclude your analysis.
• Summary of your findings.
• Implications of your findings.
• Extensions and limitations.
• Further questions, next steps.
Part 5. Appendix: The Appendix should contain a copy of all of your code. It should also
contain a bibliography that cites any resources used including the dataset source. Use proper
citation guidelines.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。