• QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-20:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2022-04-19 10:39

COMP3310 2022 - Assignment 2: An annoying web-proxy


This assignment is worth 15% of the final mark

It is due by 23:55 Friday 22 April AEST - note: CANBERRA TIME (gmt+10)

Late submissions will not be accepted, except in special circumstances

o Extensions must be requested as early as possible before the due date, via the course

convenor, with suitable evidence or justification.

If you would like feedback on particular aspects of your submission, please note that in the

README file within your submission.

This is a coding assignment, to enhance and check your network programming skills. The main focus is

on native socket programming, and your ability to understand and implement the key elements of an

application protocol from its RFC specification.

Assignment 2 outline

A web-proxy is a simple web-client and web-server wrapped in a single application. It receives requests

from one or more clients (web-browsers) for particular content URLs, and forwards them on to the

intended server, then returns the result to your web-browser - in some form. How is this useful?

It can cache content, so the second and later clients to make the same request get a more rapid

response, and free up network capacity.

It can filter content, to ensure that content coming back is ‘safe’, e.g. for children or your home,

or for staff/their computers inside an organisation.

It can filter requests, to ensure that people don’t access things they shouldn’t, for whatever

policy reasons one might have.

It can listen to requests/responses and learn things, i.e. snoop on the traffic. Getting people to

use your proxy though is a different challenge...

o And of course it can listen to and modify requests/responses, for fun or profit.

For this assignment, you need to write a web proxy in C, Java or Python1, without the use of any

external web/http-related libraries (html-parsing support is ok though).

Your code MUST open sockets in the standard socket() API way, as per the tutorial exercises. Your code

MUST make appropriate and correctly-formed HTTP/1.0 (RFC1945) or HTTP/1.1 enhanced requests (to a

web-server, as a client) and responses (to a web-browser, as a server) on its own, and capture/interpret

the results on its own in both directions. You will be handcrafting HTTP packets, so you’ll need to

understand the structures of requests/responses and key HTTP headers.

Wireshark will be helpful for debugging purposes. The most common trap is not getting your line-ending

‘\n\n’ right on requests, and this is rather OS and language-specific. Remember to be conservative in

what you send and reasonably liberal in what you accept.

1 As most high-performance networking servers, and kernel networking modules, are written in C with other

languages a distant second, it is worth learning it. But, time is short. If you want to use another language (outside

of C/Java/Python), discuss with your tutor – it has to have native socket access, and somebody has to be able to

mark it.

Page 2 of 3

What your successful and highly-rated proxy will need to do:

1. Act as a proxy against a famous website, http://comp3310.ddns.net/

a. That website is not yet fully operational, an announcement will be made when it is. It

will be an approximate mirror site of the Australian National Botanic Gardens site.

2. Rewrite (simple) absolute URL links that originally pointed to the website to now point to your

proxy, so all subsequent requests (to our website) also go via your proxy.

a. Sometimes links are not written in pure style, e.g. they are calculated

within javascript, and we will accept those breaking, after checking.

3. Modifies the text content, by replacing every instance of the word “the” in the body text with

the word “eht” in bold (i.e. in html you write eht

4. Logs (prints to STDOUT):

a. The timestamp of each request

b. Each client-request that comes into your proxy, as received (‘GET / HTTP/1.0’, etc.)

i. Don’t log other headers

c. Each server-status-response that comes back (200 OK, 404 Not found, etc.)

i. Don’t log other headers

d. A count of the modifications made to that page by your proxy, counting text changes

and link rewrites separately (i.e. return two labelled numbers)

We will test this against the specified website, by running your code, opening our web browser, making

a top-level (‘/’) page request to your running proxy as if it were the server and we should get back our

remote homepage, modified suitably2. Any (simple) links we click on that page in our browser should

take us back to your proxy and again through to the site for that next page, and so on. We’re not going

to go too deep, there are some overly complex pages, we will just pick a few. There will be only one

client-browser at a time running against your proxy. Note, this is an interactive process, you’re not

caching or otherwise storing modified pages.

You should only need to manage the HTML pages (check the Content-Type header) for modifications.

Any non-HTML content (e.g. images, JS, CSS, etc.) from the site can be passed through unchanged. Don’t

forget to capture and passthrough all the headers in the request, as the site may require at least the

HTTP/1.1 ‘Host:’ header.

For efficiency you can also use persistent http/tcp connections, but beware of connection-timeouts.

Submission and Assessment

You need to submit your source code, and an executable (where appropriate). If it needs instructions to

run, please provide those in a README file. Your submission must be a zip file, packaging everything as

needed, and submitted through the appropriate link on wattle.

There are many existing web-proxying/caching tools and libraries out there, many of them with source.

While perhaps educational for you, the assessors know they exist and they will be checking your code

against them, and against other submissions from this class.

2 Most browsers support the direct configuration of a proxy address, but the behaviour can be a bit inconsistent,

so we’re trying to avoid that.

Page 3 of 3

Your code will be assessed on [with marks% available]

1. Output correctness [40%]

o The http queries it sends to the server on behalf of the client-browser

o The modified server-content and http packaging it returns to the client-browser

o The ability for users to follow links in their browser

o The log of requests/responses as above.

2. Performance [20%]

o A great proxy should be perfectly transparent, not causing any significant delays.

o How easy the code is to run, using a standard Linux environment (like the CS Labs, WSL)

3. Code correctness, clarity, and style [40%]

o Use of native sockets, writing own HTTP sender/receiver messages

o Documentation, i.e. comments and any README - how easily can somebody new pick

this up and modify it.

There can be gnarly html pages, with embedded scripts and links in image-maps, and other tricks, as

well as many references to other websites. We’ll be relaxed about broken links in some cases, and you

should not proxy links or requests that are not on the comp3310.ddns.net website.

You should be able to test your code against any HTTP-based website you like, although a lot of sites use

HTTPS now, or have complex html/js pages that can make parsing harder. Wireshark is very helpful to

check behaviours of your code against browsers or command line tools like wget/curl. Your tutors can

help you with advice (direct or via the forum) as can fellow students.

版权所有:留学生编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com