联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2024-03-31 04:19

Page 1 of 11

RMHI/ARMP Assignment 2024

Hello everyone! This is the description for the assignment, which is due on Canvas on Monday

April 15, 2024 before 08:00am Melbourne time. You’ll need to submit a Word-knitted version of

the completed R Markdown file found in this zip file, according to the following instructions:

1. Rename the document called pset1.Rmd as studentID-pset1.Rmd. (Replace studentID with your

student ID number). This is your R Markdown file, where you’ll be putting all your code and

answers.

2. Replace “Your name and ID goes here” in the header of the R Markdown file with your name and

student ID. (Keep the quotes or it won’t knit properly.)

3. While we encourage collaboration in tutorials and learning in general, you should not be

collaborating with anybody AT ALL for this assignment. That means sharing code privately

or publicly; even talking in the abstract about problems will effectively be collusion.

You should be completing it independently, with no help from any other person in any capacity. Of

course, as always, you are free to use any of the resources from the class to help you, and you're

also free to google or look anything up that you like (as long as you aren't asking anybody,

including discussion boards or AIs, questions related to this assignment). Note that we do look at

places like chegg and will follow up if anything from this problem set is posted there.

4. Plagiarism check is enabled and you can check the similarity report on your submission. In

previous years we have found people who tried to cheat, so please don’t risk it! That said,

understand that we will not be naively looking at the overall % figure: with this sort of assignment a

certain amount of overlap is inevitable, so don’t worry if you get what looks like a high % score as

long as you know you didn’t plagiarise or collude. With this sort of assessment, that % overlap is

higher than essays and the like. We will be using the plagiarism check for the parts of the

assignment where we'd expect some variability, and to give a general sense of the overall gestalt.

5. Complete all of the problems below in the R Markdown document. Do not remove any of the

arguments to the code chunks, like the names of the code chunks or where it says message=FALSE

or whatever. If a problem asks you to display a tibble or variable so it shows up in the knitted

version, make sure that you do as the marker cannot evaluate it without seeing it, and if they

can't see it then they won’t be able to award you points for it! Remember that to display a tibble (or

any variable) you just type its name on a line of its own within the R chunk, or use print().

6. We've structured this so that, as much as possible, questions do not build on each other.

That means that if, say, you can't get Q5 then you can still get Q6. Try to do all of them.

7. Go for partial credit! Many of these questions have some form of partial credit possible. What

that means is that if it is asking for some R code, break down the problem into pieces. Even if you

can only do some of the pieces, or do them part of the way, that will be worth something. [Note that

there is no question-by-question rubric available because designing one would mean giving away

the answers. In general we will give full credit for responses that correctly address all of the parts of

the question.] Short answer questions (SAQs) can also be given partial credit and are generally

asking for some thoughtful interpretation. If it is based on a previous graph or test you've done, if

you did the first part wrong but discuss it well, you can still get most or all points for the SAQ part.

If your code does not run but you want to include it for possible partial credit, just comment it out

(using the # sign) or type eval=FALSE in the R chunk so that it shows up in the knitted document

but R does not try to run it. If you include a lot of commented-out code and some is correct and

some isn’t, we will not give you credit for the commented-out code; put the thing in there that you

think is the closest to the correct answer, don’t just include everything.

Page 2 of 11

8. We are not overly worried about to what decimal place you round answers to and

you will not lose credit for this unless you round so much that your answer is impossible to discern

(e.g., don’t round p-values to the nearest integer!), or unless it is specifically instructed by the

question. Similarly, you will not lose points for trivial presentation things like using parentheses

instead of commas around statistical references, as long it’s clear. That said, for those who want a

guideline, we suggest that you follow APA format or round p-values to three decimal places,

degrees of freedom to one, and test statistics and probabilities to two. (Note: this problem set

doesn’t incorporate all of these things, this is just our standard guideline).

9. Some questions specify a word count. In that case you need to either calculate it from the knitted

document or type up your answer in Word1 and then cut and paste it into the R Markdown file.

(Please put your answer in between the word ANSWER and [Word count: XX]; needless to say,

those two bits do not count towards your word count.) We know that's annoying; sorry. Anything

else we thought of, like specifying a number of sentences or having no limit, was worse in terms of

equity across students. The word counts we've specified in each question are designed to give you a

guideline about the maximum amount of words you should need answer completely and correctly.

So don’t feel like you must use all of the words; if you can answer it fully with less, that’s fine. In

fact, the total word count for the solution set I wrote up is around 1070, so it’s possible to fully

answer the questions while going substantially under the word limit. That said, it is okay to go over

the word limit for individual questions as long as the total word count for all of the questions

combined is fewer than 1320 words (i.e., fewer than 1200+10%, with the standard penalty if it is

1200+10% or over. See the student manual for details on word count penalties).

10. There is no word count for code chunks. Word count only applies to the short answer questions

as indicated. Remember to report your total word count for the assignment as a whole at the

top of the document. Your total word count is the sum of the word counts for all of the SAQs.

10. You'll be turning in the knitted output of your R Markdown file. We prefer that you knit to

Word but if you can't get Word to knit then html is okay. In the worst case, you can turn in the

completed Rmd file. I highly, highly recommend that you knit as you go: (a) knitting can

identify problems in your code that you would have otherwise missed; and (b) you do not want to

get close to the deadline and think you’re done only to find that you’re having troubles knitting.

Save yourself the panic and knit often.

11. Similarly, you can turn in the assignment multiple times before the deadline, so I

strongly encourage you to turn it in even before it’s perfectly polished. We will automatically mark

the latest submitted assignment. Submitting often will save you last-minute panic or computer

issues. Also, take a screenshot for proof of having turned it in just in case you need it. If you submit

a corrupted file or the wrong assignment that is not grounds for waiving any late penalties; it is

your responsibility to make sure that the submission is correct. If you run into last-minute

computer issues and can’t even succeed in uploading an Rmd, email us (rmhiarmp@unimelb.edu.au) your assignment as soon as possible to demonstrate that it was done at

that time. We cannot make promises about whether you will receive any late penalties if you do

this, but if you don’t, you very probably will get penalised because we have no way of knowing if the

problems were genuine.

1 We know different software calculates word count in slightly different ways, so we are using Word as the

standard, as per the guidelines in the student manual.

Page 3 of 11

Talent Show!

Our friends in Bunnyland are starting to get upset and angry at each other, so in an effort to have

some fun and promote bonding, they all decide to have a talent show. They decide to have two

different levels: a fun one where people just do their talent, and a competitive one where there are

judges giving 1st, 2nd, and 3rd place trophies. There are also lots of different kinds of talents and

some rules for participation, explained in the description of the dataset below.

The nerds of the group (ahem, Shadow) decided to keep track of how it went. This data can be

found in the tibble d, which has been loaded for you in the R Markdown document. Each row is a

person, and Table 1 below describes the columns.

The Markdown also loads a few other tibbles. dd contains additional data and will be explained in

Q4; you don’t need it before then. There are a few other tibbles (e.g., d3b, d6) which will be

explained on the questions where they are relevant and you can ignore until then.

Q1 [8% of total mark]

(a) Use the table() function to determine how many performances there were for each type of

talent at each level. Make sure the table shows up in the knitted Markdown. You don’t need to

report anything else or assign the table to a variable.

(b) Change the order that the talents show up in the table. We have not taught you how to do this

but the very first chunk in the Markdown contains code that changes the order of the level variable

in d, so you just need to adapt that code and apply it to the talent variable. The new order should be

the same order as the talent variable description in Table 1. Now use the table() function to

display how many performances there were for each talent (don’t split by level this time). You don’t

need to assign the table to a variable but make sure the output of the table() function shows up in

the knitted Markdown. Which talent was most common, and how many performances of it were

there?

Page 4 of 11

(c) Rename the kind variable to species and use the head() function to make sure that only the top

rows of d are visible in the knitted document. (Note: we have not taught you how to rename

variables, you will need to google around yourself to figure out how to do this. It can be done with

one function but if you code it in another way, as long as it works and your code comments make it

clear that you understand what it does and how, it is possible to earn full marks).

Q2 [11% of total mark]

(a) Use baseR only (i.e., only things you were taught before Week 3) to keep only the people who

won 1st or 2nd and achieved an audience rating of 8 or more. You don’t need to assign the result to

any tibble (and don’t write over the existing d!) but your output should look like the screenshot

below when it is knitted. (Don’t worry if the order of the rows/columns is different, but there

should be the same number of rows and columns and they should have the same values).

(b) Use function(s) from tidyverse that you were taught in Week 3 to accomplish the same task as

in part (a): keep only the people who won 1st or 2nd and achieved an audience rating of 8 or more.

As before, you don’t need to assign the result to any tibble (and don’t write over the existing d!).

Your output should look like the screenshot below when it is knitted. (Don’t worry if the order of

the rows/columns is different, but there should be the same number of rows and columns and they

should have the same values).

(c) You will notice that (b) and (a) do not match. Why? Answer in terms of what exactly the

relevant part of baseR code is doing and how that is different from what exactly the relevant

tidyverse code is doing. Note that you don’t need to discuss all of the components of your code, just

the parts that are relevant to explaining the difference between (a) and (b).

[Suggested word count: 100]

(d) Use baseR only (i.e., only things you were taught before Week 3) to create output that matches

the screenshot in (b). As before you don’t need to assign the result to any tibble, just make sure that

the output when knitted looks like (b). (Don’t worry if the order of the rows/columns is different).

Page 5 of 11

Q3 [12% of total mark]

(a) Use a single tidyverse function you were taught to remove the judge and audience columns

from d and assign the result to a new tibble called dshort. Make sure that the top rows of dshort

are visible in the knitted Markdown.

(b) Use tidyverse function(s) you were taught in Week 3 to transform dshort so that it looks like

the tibble in the screenshot below. (Don’t worry if the order of the rows/columns is different, but

there should be the same number of rows and columns with the same values). Assign the result to a

new tibble called d2. Make sure that the top rows of d2 are visible in your knitted Markdown.

(c) Why did we have you perform the transformation in (b) using dshort instead of d? In other

words, what happens if you were to do it on d, and why does this happen? You do not need to show

any code or output to get full marks on this question but you can if you want to. If you do, be sure

to refer to the code or output in your answer so it is clear why/how it is relevant.

[Suggested word count: 100]

(d) Use your d2 tibble to determine if anybody broke either of the two rules of the talent show that

are explained in the description for level in Table 1. For each rule, you should include code that

identifies individuals that broke this rule – don’t just look at the tibble manually to find them. In

your answer, be sure to list everyone who broke a rule along with what rule(s) they broke. If you did

not succeed in creating d2 in part (b), you can use the tibble called d3b that has already been

loaded for you.

Q4 [7% of total mark]

(a) Change d so that the order of the name variable in it is alphabetical. Make sure that the top

rows of d are visible in the knitted Markdown.

(b) One of the tibbles that has already been loaded for you is called dd. It contains the same data as

d in the columns name, level, and talent (i.e., the same people and performances) but contains a

new variable. A full explanation of the variables in dd is shown in Table 2.

Page 6 of 11

Combine d and dd together using the function full_join(). We have not taught you this function

so you will need to use your investigative skills to look it up and play around with it until you have

figured it out. Assign the combined dataset to a new tibble called d_full, and make it so the top

rows of d_full show up in the knitted Markdown. It should look like the screenshot below (rows

may be in a different order, but the column order, column names2, size of the tibble, and data in

each cell should be the same).

(c) The code given in the chunk here combines two tibbles by using the function cbind() rather

than the function full_join(). The output has been assigned to a tibble called dc whose output in

the console is shown below. Based on a comparison of dc and d_full, describe two major

differences between what cbind() and full_join() do, making clear reference to the parts of the

tibbles that illustrate each difference. Finally, explain why these differences have occurred: how

exactly cbind() combines tibbles that is different from how full_join() combines tibbles.

[Suggested word count: 90]

2 Note that if you did not succeed in Q1(c) in renaming kind to species, your tibble here will have a column

called kind instead. That is fine; you will only be penalised for this in Q1(c) and can still obtain full marks in

Q4(b).

Page 7 of 11

Q5 [15% of total mark]

(a) A tibble has been loaded for you called df, which is the same as d_full. We are providing you

with df here in case you weren’t able to create d_full in Q4(b). Use the mutate() function along

with case_when() to make a new character variable in df called durType. [Note: We have not

taught you case_when()]. The value of durType is "long" if duration is more than 10, "short" if it is

less than 5, and "medium" otherwise. Be sure to show the top of df in the knitted Markdown.

(b) Using only functions we have taught you, use df as the basis to create the tibble shown in the

screenshot below. Assign it to the name ds, and make sure ds is visible in your knitted Markdown.

Helpful hint: all of the variables are calculated from the audience variable. medAud indicates the

median, and the others are self-explanatory.

(c) Based on the data in ds, what talent is the least popular based on the mean audience ratings,

and what is the least popular based on median audience ratings? Why do the mean and median

ratings for these give different results? Your answer should refer to the idea of central tendency

that both mean and median each capture, and it should explain the discrepancy by relating this

idea to the actual talent show data.

[Suggested word count: 100]

Page 8 of 11

Q6 [12% of total mark]

(a) Make a bar plot like the one below using the d6 tibble, which has been loaded for you. For full

credit, your figure should have all the components in the figure below (i.e., two panels, semitransparent bars, dots, error bars, title, angled x-axis tick labels, three y-axis tick labels, etc.). Note

that your individual data points will not be in exactly the same place as here because the geom

introduces randomness; that is fine. The error bars should indicate one standard error. It’s fine if

your colours aren’t exactly the same (you aren’t expected to guess what palette was used) as long as

you use a sensible palette and theme, and the colours of the dots match the bars and vary as they do

here. Note that if your knitted figure has a slightly different aspect ratio that is fine, as long as all of

the elements are present and correct; different systems knit figures in slightly different ways.

(b) Based on the graph in 6(a), describe any trends or regularities in performance that you observe.

This is not a R question but rather a thought question asking you to critically think about what the

data might be demonstrating and why this might be happening (you should speculate; just make

sure to ground the speculation in the pattern of data and clearly indicate the part that is

speculative). You’re not expected to make claims about significance but think about the meaning of

the variables and discuss what (if anything) this figure might suggest about the talent show.

[Suggested word count: 120]

Q7 [11% of total mark]

(a) Make a figure of your own using any of the tibbles provided (or any that you make from them if

you want). Your goal is to show something new about the data that hasn't been shown by the

previous figure. You should use at least one geom that you didn’t use in Q6, and you also need to

incorporate two elements that you haven’t been taught in this subject. These can be anything from

new geoms, a different palette package than RColor Brewer, a different theme, changing the size or

style of your fonts, putting text inside the figure, changing aesthetic properties, or many other

possibilities; you can do basically whatever you want as long as it’s new. The figure should have an

informative title and axis labels, and a theme and colour palette other than the default. The

aesthetic choices should add to its clarity rather than detract from it; part of what you are being

marked on is if the figure illustrates the data in a clear and useful way.

Page 9 of 11

(b) Explain what each of the two new elements are and how you made them. Your explanations

don’t need to be extensive – for instance, if you hadn’t already been taught show.legend you might

say “I got rid of the legend by adding show.legend=FALSE as an argument to the geom”.

[Suggested word count: 50]

(c) Explain what your figure suggests about the data. In your explanation be sure to describe the

variables on each axis (and panel, if you have multiple panels) as well as what the pattern is and

what it suggests about what is going on. (It is fine for you to say there is no pattern and it suggests

that nothing much is happening if that is what you observe!) You won’t be evaluated on how

interesting your result is, but on how clear and appropriate your explanation is given the figure.

That said, it’s worth thinking about what kinds of research questions would be interesting to look

at, since those are more likely to yield interesting patterns which are easier to discuss.

[Suggested word count: 130]

Q8 [3% of total mark]

Gladly ran a statistical test and obtained a p-value of 0.07. “That means the null hypothesis is true

according to the traditional alpha threshold of 0.05,” he explains. “However, I’m going to set my

alpha threshold to be 0.1 instead; that will make the test statistic significant, so I can conclude the

null hypothesis is false instead.” There are several distinct problems with Gladly’s idea. Explain two

of them to him. For each, be sure to be clear about what the problem is and why it is a problem.

[Suggested word count: 80]

Q9 [11% of total mark]

You are provided with a code chunk that calculates the highest and lowest audience scores in our

dataset (called highest and lowest respectively). Note also that part (b) and (c) use the tibble that

you used in Q5 called df. Regardless of whether or not you succeeded in completing Q5, you can

use df for Q9.

(a) Bunny observes that on average, in past talent shows about 70% of the audience sample has

liked any given act. If we presume that average describes this talent show as well, what is the

probability of observing the highest score we saw? The lowest? You should answer these questions

using the function(s) taught in Week 5; you do not need to use any of the datasets themselves.

Report probabilities as percentages, rounded to one decimal place.

(b) Gladly points out that they have other data from previous talent shows as well, not just about

audience ratings. For instance, in previous years the average duration was 6.5 minutes, with a

standard deviation of 3. Shadow, inspired, writes the code given to you in the code chunk. What

does the calculated variable prob reflect? How is this related to the idea of a p-value? Is it possible

to identify which individual data points are significantly different from previous averages? If so,

which ones, and why? If not, why not?

[Suggested word count: 100]

(c) Can we draw conclusions about how significant the entire variable duration (i.e., the full dataset

of data about duration) is, based on a single calculation combining only the individual prob values?

If so, explain why. If not, explain why not and what other information is necessary. Note that you

do not need to do any calculations here; this is a thought question about Week 5 concepts.

[Suggested word count: 130]

Page 10 of 11

Q10 [8% of total mark]

It’s evident from the data in Q6 that some kinds of talents have a much larger range of audience

ratings than others. For instance, the range of magic tricks is 7 (i.e., with a low rating of 3 to a high

rating of 10) while the range for singing and dancing is 4 (i.e., a low rating of 6 to a high of 10).

Foxy starts wondering what kind of range one might expect to see in a random talent show, and

how to determine if magic tricks are unusual.

Let’s help her out! Remember that one can have sampling distributions of any kind of statistic.

We’ve spent a lot of time talking about the sampling distribution of the mean, but we could also

think about the sampling distribution of the range, which applies when thinking about this

question. In this problem you will reason about this situation, by direct analogy and extrapolation

from what you’ve learned about the sampling distribution of the mean.

Foxy thinks that the true underlying distribution the audience ratings looks something like the

figure directly below this paragraph: it’s very unlikely for 0 people to like a performance, slightly

more likely for exactly 1 people to like it, and so forth, with it being most likely that 10 audience

members like it. For the purposes of this question, let’s assume that she is correct and this is the

true distribution.

(a) Suppose talent shows become the next huge thing and as a result over the next few years there

are 1000 talent shows. Each of the 1000 shows is divided into timeslots with 30 performances

each. It is possible to calculate the range of audience rating for each of these timeslots.

Consider now the six panels U through Z below. Give the letter of the panel that most accurately

captures what you expect the sampling distribution of the range to look like, on the

assumption that the true distribution of audience ratings is as shown in the figure above. Explain

your answer, making reference to the definition of sampling distribution and the figure. Hint: begin

by thinking about what you would expect the range for a single timeslot of 30 performances to be.

[Suggested word count: 100]

Page 11 of 11

(b) Suppose now that the underlying true distribution was uniform, as in the figure directly below

this sentence.

How would this change your answer to part (a), if at all? Considering the same panels U through Z,

give the letter of the panel that you would pick as being the closest answer in this case. Explain

why. How is the behaviour of the sampling distribution of the range similar to and different from

the behaviour of the sampling distribution of the mean, as the shape of the underlying true

distribution varies?

[Suggested word count: 100]

* Note: You do not need to code or do any calculations in order to answer this question. This is a

conceptual question designed to probe your knowledge about what a sampling distribution is.

Moreover, if your intuition about the nature of a range are incorrect but your explanation of sampling

distributions in general is solid, you can still get most of the partial credit.

Q11 [2% of total mark]

These marks are free as long as you say anything! What is your current theory about why everyone

in Bunnyland is going hungry? (No word limit here, say as much or as little as you want)


相关文章

【上一篇】:到头了
【下一篇】:没有了

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp