联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2021-10-07 11:04

The University of Melbourne

School of Computing and Information Systems

COMP90038 Algorithms and Complexity

Assignment 2, Semester 2 2021

Released: Monday September 27 2021. Deadline: Wednesday October 13 2021 23:59.

This assignment is marked out of 30 and is worth 15% of your grade for COMP90038.

Objectives

To improve your understanding of the time complexity of algorithms and data structures. To design computational

solutions to problems and develop problem-solving skills. To improve written communication

skills; in particular the ability to present algorithms clearly, precisely and unambiguously.

Scenario

In this assignment we consider a data center that receives requests for webpages and returns back the

requested pages. The data center receives millions of requests and needs to answer them with minimal

delay. As a result, it stores millions of webpages across servers to ensure it can answer requests in parallel.

The data center has two o?erings on hosting sites with di?erent guarantees on availability and response

time. The first one is aimed at critical web applications (e.g., hosted by hospitals and banks) and the

second one for less critical applications that can sustain a small delay. In this assignment you will design

data structures and algorithms for answering webpage requests eciently

over time.

Note that you can work on each problem separately as each problem is independent of others even if

the notation and the scenario is shared. However, it is advisable to read all problems before commencing.

Problem 1 [8 points]

Each request r arrives with the following information represented as a tuple: r.arrival time, r.webaddress,

r.is critical and r.source, where

? r.arrival time is the arrival time in milliseconds (represented as integers);

? r.webaddress is the address of the requested page;

? r.is critical is the o?ering that was chosen by the owner of webaddress and can either be True to

denote a critical address and False otherwise;

? r.source denotes where the request was made from and where the webpage should be sent back to.

More than one request can arrive at the same time. Hence, every new request that comes has arrival time

that is either the same or higher than arrival time of previous requests. More than one request can arrive

for the same page. However, if more than one request arrives for the same page, their r.source will be all

distinct. Note that is critical is the same for all requests with the same webaddress as it depends on the

webaddress only.

A dispatcher program processes the requests, stores them, decides which request to process next based

on their priorities. In order to determine which request to answer next, the dispatcher chooses the request

depending on its arrival time and its is critical value. Any request r1 with is critical of True is answered

before a request r2 with is critical of False, if r1 arrives at most 5 milliseconds after r2. Ties are broken

arbitrarily.

1

A. [3 points] Suppose you have access to is higher priority(r1, r2) function that takes two requests r1

and r2 and returns True if r1 should be answered before r2 given the priority rule described above,

and False otherwise. Given is higher priority function one can use a heap to insert a new request and

eject the next request. We refer to this as Approach 1.

Consider Approach 2 that does not use is higher priority. Instead it uses two heaps: one heap to

maintain requests to critical pages (i.e., those with is critical set to True) and one heap for requests

to non-critical pages.

Let m0 and m1 be the number of requests to critical and non-critical webpages, respectively. In

the worst case, what is the number of comparisons that have to be performed by each of the

approaches for inserting and ejecting a new request. Give your answers using asymptotical notation.

Is Approach 1 or Approach 2 more superior asymptotically? Argue your answer.

B. [4 points] As users, we tend to start refreshing our browsers if a webpage is not loading. This results

in a duplicate request coming to a data center with a new arrival time. We call a request rnew a

duplicate of rold if their webaddress and source fields are the same. To avoid creating a duplicate

request, we wish to update an existing request instead. However, the update to a new request time

also results in the update of the priority of this request.

Write a pseudo-code for a function HeapUpdate(H, rold, rnew) that updates a data structure H of

Approach 1 so that rold is replaced with rnew and the data structure is adopted to reflect the change

in the priority. You can assume that rold is in H. When operating on heap, assume that H is stored

as an array, starting at index 1 as defined in Lecture 13 (slides “Heaps as Array”).

(No mark: Think whether this is this a good idea to update requests based on their new time.)

C. [1 point] The data center has k servers to improve the response time and hence can answer up to

k requests in parallel. Consider the following incorrect algorithm for finding top k requests using

Approach 1: the algorithm returns elements H[1], H[2],...,H[k].

Give two examples of heaps with number of requests m

5 and k

3. In the first example,

the above algorithm would return an incorrect top k requests. In the second example, the above

algorithm would return the correct top k requests.

In your examples, assume that is critical is True in all requests. Your heaps should be given using

array notation, e.g., H = [ , a, b, c], defines a heap with root a, left node b and right node c. You

only need to indicate the value of arrival time in your heap and ignore all other fields.

cen

Problem 2 [12 points]

Once the dispatcher finds the next request to answer, it needs to decide which server to send the request

to. Webpages are replicated across the servers to improve the throughput of the system as some webpages

are more popular than others and receive more requests.

Each server has an ID, however server IDs are not sequential. Every webaddress is associated with

lo and hi which signify the range of server IDs that have a replica of the corresponding webpage. Each

webaddress can be answered by any one of the servers whose ID falls in the range [lo, hi]. For example, if

there are in total 5 servers with IDs 1, 3, 7, 8, 10, and www.myaddress.com has lo and hi set to 3 and 8,

respectively, then either server 3, 7 or 8 can answer the request for www.myaddress.com.

Servers are stored in a balanced binary search tree rooted at T. You can assume that each node of the

tree has fields T.key, T.left, T.right for accessing the server ID, and nodes on the left and right of node T,

correspondingly. T is null if the node is empty.

2

A. [4 points] Given T, lo and hi, devise an algorithm FindServerBounds(T, lo, hi) that returns two server

IDs k1 and k2 in T representing the smallest and highest server ID that fall in the range [lo, hi].

That is, if S are all the server IDs in the tree, then lo ? k1 ? k2 ? hi, k1, k2 2 S and there is no k0

1

s.t. k0

1 2 S and lo < k0

1 < k1 and no k0

2 2 S such that k2 < k0

2 < hi.

Your task here is to implement the FindServerBounds(T, lo, hi) algorithm. Only algorithms with

optimal time complexity may receive full marks.

B. [1 point] If the number of server IDs stored in the BST is k, what is the asymptotic performance

of FindServerBounds(T, lo, hi)? Show your derivations.

C. [7 points] We now will consider a setting where the system administrator could secure k servers

with sequential IDs. The servers are stored in an array S. Hence, any server stored at S[lo],

S[lo + 1], S[lo + 2], ..., S[hi

1], S[hi] could answer a webpage request. However, some servers may

be more busy than others. We wish to send the current request to the least busy server among

S[lo], S[lo + 1],..., S[hi]. We define the number of current requests assigned to a server, as the load

of that server. Let L be an array that stores the load associated with each server in S (the size

of S and L are the same), such that L[j] stores the number of current requests assigned to the

server S[j]. A naive way to find the server with the smallest number of requests is by traversing

L[lo], L[lo + 1],..., L[hi] and finding the minimum. However, this would incur linear cost.

In this question you are asked to write pseudo-code of a function FindLeastLoadedServer(S, L, lo, hi)

that returns the ID of the least busy server in the range between lo and hi. Your algorithm has

to use the knowledge that no address is assigned to more than (log k)2 servers. Your algorithm

should run in time O(log k) where k is the total number of servers.

Hint: consider precomputing an auxiliary data structure of minimums for some predefined ranges

that you can later use to answer other range queries. That is your solution can consist of two

functions: one function that does the precomputation to generate an auxiliary data structure and

FindLeastLoadedServer that uses this data structure later on when answering a query. Your precomputation

function can take time O(k) or O(k log k).

Problem 3 [10 points]

In this problem, we assume that each webaddress is assigned to only one server, there are k servers in

total and each server is assigned a unique ID between 1 and k.

Once in a while a system administrator notices that some servers become overloaded with requests and

have a low response time as a result. At this point the administrator is required to reassess the assignment

of webaddresses to servers so that popular webaddresses are distributed among multiple servers as opposed

to being loaded to one server.

Information about each server is stored in a tuple s that contains the following fields:

? s.p, the number of addresses it is assigned to.

? s.A, an array of addresses assigned to it. The array size is s.p.

? s.R, an array of requests that an address in s.A has received such that s.R[i] stores the number of

requests of address s.A[i], for each 1 ? i ? s.p. The array size is s.p.

All the above server information is stored in an array W of size k. That is W[j], for 1 ? j ? k, returns a

tuple s as defined above for a server with ID j.

Consider the following request balancing algorithm, ReBalanceServers. It creates a new array W0 of k

servers with no addresses assigned to them at the beginning. It then goes sequentially over old servers

3

in W, one server at a time starting at index 1, and reassigns each webaddress from this old server, one

webaddress at a time starting at index 1, to one of the new servers. It selects the least request-loaded new

server when doing so (breaking ties arbitrarily). At the beginning of the process, the load of each new

server is 0. It is then increased by the number of requests of the webaddresses assigned to it (as given

in s.R[i]). At the end of the algorithm, all addresses stored in W are assigned to W0

. At this point, the

content of W can be overwritten with W0

.

A. [2 points] Consider the following 9 addresses and their request numbers that exist in the system

represented as a tuple (address, number of requests):

(a1.com, 5), (b1.com, 20), (c1.com, 30),

(d.com, 50), (a2.com, 5), (b2.com, 20),

(c2.com, 30), (b3.com, 20), (c3.com, 30).

For example, address b3.com received 20 requests.

An optimal solution to balancing the above requests assigns each server 70 requests in total. Below

is one optimal assignment that achieves this:

? server 1 is allocated (b2.com, 20), (d.com, 50)

? server 2 is allocated (a1.com, 5), (c1.com, 30), (c2.com, 30), (a2.com, 5)

? server 3 is allocated (b1.com, 20), (c3.com, 30), (b3.com, 20)

For 1 point, give an example of how these addresses had to be assigned to original servers in W so

that the balancing algorithm ReBalanceServers would output this optimal assignment.

For 1 point, give another example of how these addresses had to be assigned to to original servers

such that ReBalanceServers algorithm would end up assigning d.com, b1.com and c1.com all to one

server, and hence leading to servers being unbalanced.

You answer for each example should be of the form:

? server 1 is allocated ....com, ....com, ...

? server 2 is allocated ...

? server 3 is allocated ....

If a server is not assigned to any address, put null.

B. [6 points] Write the pseudo-code of ReBalanceServers(W) algorithm, defining any data structures

that you are using. Your algorithm should run in time O(w log k) where w is the number of all

addresses in the system. Only algorithms with this time complexity, justifying why/how their

pseudo-code achieves it, will receive full marks.

Note: You can use any sorting algorithm or data structure that we have studied in the lectures,

without implementing them. If you choose to use a data structure, you are asked to state what it

is in the comments. For example, if T is a binary search tree please add

// T is a BST

If a data structure or algorithm you choose to use relies on comparing two elements, you need to also

provide a function cmp(x1, x2) that describes the pseudo-code for comparing x1 and x2. cmp(x1, x2)

returns 0, if x1 = x2, 1 if x1 < x2 and -1, otherwise. You are asked to then capture it as

// T is a BST using comparator function cmp defined below/above

4

C. [2 points] Describe how ReBalanceServers algorithm can be modified to avoid giving a non-optimal

assignment as described in Problem 3.A. Your solution should be within four sentences in English

and no pseudo-code is needed. If your solution exceeds this limit, only the first four sentences will

be assessed.

Submission and Evaluation

? You must submit a PDF document via the LMS.

Note: handwritten and scanned images, are not acceptable.

Do not submit Microsoft Word documents — if you use Word, create a PDF version for submission.

? Marks are primarily allocated for correctness, but elegance of algorithms and how clearly you communicate

your thinking will also be taken into account. Where indicated, the complexity of algorithms

also matters.

? We expect your work to be neat – parts of your submission that are dicult

to understand or

decipher will be deemed incorrect. Make sure that you have enough time towards the end of the

assignment to present your solutions carefully. Time you put in early will usually turn out to be

more productive than a last-minute e?ort.

? You are reminded that your submission for this assignment is to be your own individual work.

For many students, discussions with friends will form a natural part of the undertaking of the

assignment work. However, it is still an individual task. You should not share your answers (even

draft solutions) with other students. Do not post solutions (or even partial solutions) on social

media or the discussion board. It is University policy that cheating by students in any form is not

permitted, and that work submitted for assessment purposes must be the independent work of the

student concerned.

Please see https://academicintegrity.unimelb.edu.au

If you have any questions, you are welcome to post them on Piazza so long as you do not reveal

details about your own solutions. You can also email Olya (email can be found on LMS). In your message,

make sure you include COMP90038 in the subject line. In the body of your message, include a precise

description of the problem.

Late Submission and Extension

Late submission will be possible, but a late submission penalty will apply of 3 marks (10% of the assignment)

per day.

Extensions will only be awarded in extreme/emergency cases, assuming appropriate documentation is

provided – simply submitting a medical certificate on the due date will not result in an extension.

5


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp