联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2018-12-15 10:45

Fall 2018 IS542 Final

Due Tuesday December 18, 5:00PM US Central Time

Discuss two or more of the following questions, in your own words. You may choose to address any two,

three, four, or even all questions but should target 3-4 pages of text in total (not counting figures, tables,

and references). Upload your answers to the final section of the class Moodle page as a single narrative

document in pdf format. You may, and are encouraged to, illustrate your answers using R, but that's no

substitute for lucid natural language explanations. To preserve the natural flow of the narrative, figures

and tables should be embedded into the document near their first mention. Any supplementary files like

code or data should be referenced in the text and separately uploaded. You may use books, articles, notes,

search engines, or computers, but may not solicit or receive direct assistance from other human beings.

Cite sources if you use them. For the first three question you may want to illustrate technical detail using

R, discuss practical aspects that are important for applications, and theoretical aspects of the subject.

Question 1. Construct a dataset with at least 8 observations and 3 variables (y, x1, and x2) such that least

squares linear regression of y versus x1 produces y = - 2x1 + e1 and regressing y versus x1 and x2

produces y = 2x1 - x2 + e2. How might you interpret the relationship between y and x1? Show your work

in R.

Question 2. Write a short essay, in your own words, explaining the four assumptions of linear regression

and show how to test them on a dataset of your choice. Show your work in R.

Question 3. Write a short essay, in your own words, on the subject of the Bayes theorem illustrate its use

in an application of your making.

Question 4. R challenge. During the last class session we worked with the circle.arff dataset, assessing

the cross-validated performance of a wide variety of classification algorithms such as decision trees,

random forest, rules, support vector machine, Na?ve Bayes, Bayes Net, logistic regression, neural net, knearest

neighbor, and boosting. Replicate some of these experiments using R.

http://abel.lis.illinois.edu/data/circle.arff

Question 5. R challenge: The data directory contains a file with author names and associated Ethnea and

Genni predictions. Use logistic regression to identify character n-grams of first and/or last names that may

help predict the Ethnea categories. It might be helpful to install and use an R package such as tm that is

able to extract character n-grams. Classification performance can be assessed using precision and recall

for each ethnicity Ethnea category, and classes that are the most similar can be identified using the

confusion matrix.

Full dataset:

http://abel.ischool.illinois.edu/data/names_ethnea_genni_country.csv

Of which a smaller, random sample is given here:

http://abel.ischool.illinois.edu/data/names_ethnea_genni_country_sample.csv

References:

Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a largescale

bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of

Congress, Washington DC, USA http://hdl.handle.net/2142/88927


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp