联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2021-03-09 11:00

Computer Analysis

and Visualisation

Assignment 1

Tweet Analysis

Worth: 5% of the unit

Submission: Answer the questions on the quiz server.

Deadline: 11 March 2021 5pm

Late submissions: late submissions attract 5% raw penalty per day up to 7 days (i.e., 18 March 2021 5pm). After, the mark

will be 0 (zero). Also, any plagiarised work will be marked zero.

1. Outline

Natural language processing (NLP) is useful yet a difficult task. Our UWA Cybersecurity Research Group

has been focusing on rumour detection and generation in order to prevent rumours causing harm to the

society. As a first step, we built an Automated Rumour Generation Hub (ARGH) that uses various machine

learning (ML) and NLP techniques to generate rumours that are difficult to be identified by both humans

and machines. In particular, Twitter has been used as the source dataset as we often observe different

rumours circulating this social media platform. However, they don't provide analytical functions for us to

summarise the data, so we have to do that ourselves.

In this assignment, you will be carrying out simple data analysis tasks using tweets as outlined is the

Tasks section below, mostly just to test your basic Excel competency. More complex tasks will be carried

out in other assignments (stay tuned!).

Note1: This is an individual assignment, please don't share your solution/code/files with others (only

high-level discussion is allowed, e.g., the syntax of the formula, use of array formula with other examples

etc.). If it is found to be not your original work, then you may be penalised.

Note2: You may use intelligent formatting and colour combinations to display your worksheet in an

understandable manner. However, don't "pimp" up the worksheet.

Note3: You can find ARGH here: https://github.com/argh-rumor-detection/ARGH-Rumor-Generation,

where you can run ARGH yourself using Google Colab.

CITS 2401

Computer Analysis

and Visualisation

2. Tasks

Task 1

Import the original.txt into excel word by word. Here the term "word" refers to any sequence of letters

separated by a space. Note, the text qualifier should be set to {none} when you import the text. This

sheet should be named words_data. Finally, the whole data range should be named words. Figure 1

shows the example output of what it would look like if this task is done correctly.

Figure 1. words_data sheet snippet.

Task 2

Create a new sheet named uniques_data. Import the list of unique words from the uniques.txt file

provided. The words should be located from Cell A1. The whole range should be named uniques.

Task 3

1. In Column B: Calculate the frequency of the unique words from the words_data sheet. You must

use an array formula to do this. Name the cell range as freq.

2. In Column C: Calculate the number of letters used for each word from the words_data. This can

be calculated by simply multiplying the number of letters by its frequency count. Name the cell

range as letters.

3. In Column D: Calculate the rank based on the frequency values. You must use an array formula

to do this. Name the cell range as rank.

In addition, apply conditional formatting on rank where the bottom 10 ranked values (i.e., the 10

smallest values) are formatted with light red filled with dark red text.

Task 4

Create a new sheet named stats. Add the following columns From A2 to A7:

CITS 2401

Computer Analysis

and Visualisation

1. Average

2. Max

3. Min

4. Median

5. Mode

6. SD

Note, SD stands for standard deviation. Also, Average and SD should be rounded to 2 decimal places.

Next, add labels as follows (Cell: Value):

1. B1: Frequency

2. C1: Letters

3. D1: >Average

4. E1: <Average

? The "Frequency" category (Column B) is using freq to calculate the statistics of the data.

? The "Letters" category (Column C) is using letters.

? The ">Average" category (Column D) uses letters where values are greater than the average (i.e.,

value in cell C2).

? The "<Average" category (Column E) uses letters where values are smaller than the average.

Populate all the statistics fields (i.e., B2:C7). You must not use any other supporting cells (i.e., you

should calculate all those stats directly using Excel formulas using previously populated cells only).

Note: the values in the uniques_data tab (i.e., freq and letters) should be treated as the entire

population.

Task 5

Create a new sheet named charts. In this sheet, create a histogram of letters. The bin size should

start from 0 and the gap between the bins are 20. For this, you must use an array formula.

Then, format the chart as follows:

1. The gap width is set to 0%.

2. The series outline is solid black line.

3. The series data labels are set to "Outside End".

4. The title is removed.

Task 6

In "charts" sheet, create a scatter chart using freq in x-axis and letters in y-axis. Then, format the

chart as follows:

1. Remove the title.

2. Add a linear trend line and display the R

2 value.

3. Add x-axis label "Frequency".

4. Add y-axis label "Letters".

The sample image of the charts sheet is provided in Figure 2, and your solution may look similar to this.

However, the data shown in the image is sample data and is not the correct result (I.e., your figures

may look different). The yellow blocks are added to the image to hide the sample data to avoid confusion.

Task 7

Insert a Treemap of uniques and letters into the "charts" sheet. Remove the title and the legend.

Change the size to 15cm height and 25cm width.

CITS 2401

Computer Analysis

and Visualisation

Figure 2. Sample image of the charts sheet (treemap not shown)

3. Submission

You should answer the questions related to the tasks above on the quiz server by the due date - 11 March

2021 5pm (drop dead due date 18 March 2021 with 5% raw penalty per day).

Submit your Excel workbook on the quiz server, you should name the file as A1_[student id].xlsx.

For example, if your student ID is 12345678, then your file name is A1_12345678.xlsx.

Fail to follow this will result in a penalty of 50%.

CITS 2401

Computer Analysis

and Visualisation

4. Rubrics

Criteria Highly Satisfactory (D, HD) Satisfactory (P, CR) Unsatisfactory (N)

Excel functions

(10 marks)

? Understand various

Excel functions.

? Demonstrate the

ability to carry out

various Excel

functionalities and

tools.

Demonstrated the ability to use

Excel functions fluently:

? Correct use of Excel functions

as appropriate.

Demonstrated the ability

to use Excel functions:

? Some correct uses of

Excel functions.

Failed to demonstrate

the ability to use Excel

functions:

? Incorrect use of

Excel functions.

Excel formulas

(20 marks)

? Understand the use of

Excel formulas.

Demonstrated the ability to

utilise Excel formulas fluently:

? Correct use of Excel formulas

most adequate for the

problem.

? Comprehensive understanding

of Excel formulas and their

usage.

Demonstrated the ability

to utilise Excel formulas:

? Correct use of Excel

formulas for the

problem.

? Understanding of

Excel formulas and

their usage.

Failed to demonstrate

the ability to utilise

Excel formulas:

? Incorrect use of

Excel formulas.

? Misunderstanding

of Excel formulas

and their usage..

Excel visualisation

(20 marks)

? Understand the use of

Excel visualisation

tools.

Demonstrated the ability to

visualise using Excel:

? The visualisation generated is

accurate and comprehensive.

Demonstrated the ability

to visualise using Excel:

? The visualisation

generated is accurate.

Demonstrated the

ability to visualise using

Excel:

? The visualisation

generated is not

accurate.

This assignment is worth a total of 50 marks.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。