IN3063/INM702: Mathematics and Programming for AI

Coursework

Submission deadlines:

Report and Code: Sunday 2nd January 2022, 5pm

Presentation: Wednesday 19th January 2022, 5pm

Introduction

This coursework builds on the material covered in the lecture slides, the classroom

presentations, and the tutorial Jupyter notebooks used in the labs. On completing this

coursework, you should be able to code your analysis in Python, to implement and understand

regression methods and classification techniques, as well as to implement advanced neural

network techniques from scratch. You will make use of the different concepts learned in the

module:

How to convert mathematical principles into algorithms

How to implement those algorithms in Python

How to organize your ideas in an appropriate code structure

How to evaluate different algorithms

Python should be used for all implementations. Deliverables are:

Written reports of your work.

Your practical implementation (code), with the appropriate comments.

For INM702 only: an individual oral presentation (15 minutes)

Module marking:

INM702: 70% Coursework (Code and Report) and 30% Presentation.

IN3063: 100% Coursework

See the Appendix for details on grade-related criteria.

2

Teamwork

This coursework should be completed either in groups of two or individually (at least Task

1 individually). We encourage you to work in pairs, and no additional marks will be granted if

you do the Coursework alone. If you decide to work in pair, you should declare it on the report.

All team members are expected to contribute to all parts of the work: both the coding and the

report. Teamwork does NOT mean division of labour. You can distribute the leading role for

each assignment, but each of you must contribute to all the tasks. If you don’t, you will not be

evaluated for the tasks that you did not contribute to. Distributing the assignments is

considered a form of academic misconduct.

You are required to explain your personal contribution to each task in the coursework report,

in the reflection part.

For MSc AI students, there is a maximum number of modules you can work within the same

team. You cannot operate with the same team in more than 2 modules per term and no more

than 4 modules in total.

The Coursework is divided into 4 different Tasks. Even if you are in a team, Task 1 should be

solved individually, and no teamwork is allowed for the Task 1.

Submission

Submission is through Moodle (https://moodle.city.ac.uk ), and no other method of submission

will be accepted. You should submit the following files:

Report number 1: Task 1 (Individual), (maximum 3 pages. Any extra space is allowed

for citations).

Report number 2: Task 2, including a description of the work, your analysis, and a

reflection on the work indicating sources and personal contributions, if working in pairs.

(3 pages + 1 extra page containing additional space for supplemental figures. Any

extra space is allowed for citations).

Report number 3: Task 3-4, including a description of the work, your analysis, and a

reflection on the work indicating sources and personal contributions, if working in pairs.

(maximum 6 pages + 2 extra pages containing additional space for supplemental

figures. Any extra space is allowed for citations).

Zip file of the Code for Task 1 (properly commented and with references to code

sources, if any).

Zip file of the Code for Task 2 (properly commented and with references to code

sources, if any).

Zip file of the Codes for Task 3-4 (properly commented and with references to code

sources, if any).

Code files should comprise: a Jupyter notebook, with executable cells showing results

and with markdown cells explaining the steps of the analysis or of the modelling

process. Random number generators with fixed seeds should be included to ensure

the reproducibility of the results as described in your report. However, note that your

3

results should be robust to changes of the seeds. Additionally, corresponding Python

scripts should be included, collecting all the parts of your code.

In addition:

Your code must be developed and be available on a git server (github), with a full

revision history indicating who has created what code. This repository should be

available to the Lecturers of the module, Atif Riaz and Daniel Chicharro. You should

provide the link of the git repository in your report.

Format for reports: pdf format, single column, standard A4 margins, standard default line

spacing of 1.15, font Arial 11, including all figures.

Late submissions will score 0. You can upload work to Moodle more than once, so there is no

need for last minute submission. The submission period will be opened December 20. Don't

leave submission to the last minute, make sure to submit something and then revise it.

Presentation (INM 702 only)

You will be evaluated in an oral presentation (15 minutes). During this presentation, you will

present the results for the 4 tasks and answer questions from the Lecturers.

This is an individual exercise.

Oral presentations will take place January 19-21. A timetable with your personal spot will be

released in advanced.

Feedback

In the labs we can check your progress and give formative feedback. Evaluative feedback and

marks on your coursework will be given out after the submissions. Drop-in hours are available

at the Moodle site for additional feedback and questions.

Datasets

Fashion-MNIST (https://www.kaggle.com/zalando-research/fashionmnist) – a dataset

of online retailer Zalando’s article images. It is very similar in flavour to MNIST, but

instead of handwritten digits it contains fashion/clothing items (for an overview see

https://github.com/zalandoresearch/fashion-mnist). Fashion-MNIST consists of a

training set of 60,000 examples and a test set of 10,000 examples. Each example is a

28x28 greyscale image. Each image is associated with a label from 10 different

classes representing the type of clothing item (e.g., 0: Tshirt/Top, 1: Trouser, 2:

Pullover, etc.). Both training and test sets have 785 columns: the first column consists

of the class labels (0-9); the rest of the 784 columns contain the pixel values of the

associated image. You can use any library or API to load the data (like pytorch). You

will need this dataset for Task 3.

CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html) The CIFAR-10 dataset

consists of 60000 32x32 colour images in 10 classes, with 6000 images per class.

There are 50000 training images and 10000 test images. The dataset is divided into

4

five training batches and one test batch, each with 10000 images. The test batch

contains exactly 1000 randomly-selected images from each class. The training

batches contain the remaining images in random order, but some training batches may

contain more images from one class than another. Between them, the training batches

contain exactly 5000 images from each class. You will need this dataset for Task 4.

Coding

As indicated above, each task should be presented as an individual Jupyter Notebook

with a corresponding Python script and possibly additional modules and packages that

you developed and are used by the notebooks.

Code quality, clarity, organization, and comments will be taken into account in the marking.

The Tasks

In this coursework, you are expected to demonstrate what you have learned in the module

in terms of Programming, Regression methods, Neural Networks, and Deep Learning.

The maximum number of marks which can be scored is 100. Each Task weights 25 points.

In all tasks, you can use the built-in libraries of python (math, random, …), numpy, and

matplotlib. If you think that you might benefit from using another library, you can ask about it

to the Lecturers.

You will use PyTorch in Task 4, and you are allowed to use any library in Task 4.

Note that you can use any library for the purpose of loading the training and testing dataset of

the Fashion-MNIST for Task 3.

Task 1: 25 marks

The first task tests your Python skills and capacity to plan a statistical analysis. You need to

develop a simple game consisting of a rectangular grid (of size height x width) where each

cell has a random value between 0 and n. An agent starts at the upper-left corner of the grid

and must reach the lower-right corner of the grid as fast as possible. Accordingly, the task

consists on finding the shortest path.

5

There are two game modes:

The time spent on a cell is the number on this cell

The time spent on a cell is the absolute of the difference between the previous cell the

agent was on and the current cell it is on

-The task is divided in the following parts:

-Implementation of the game. Implement the game in a structured and flexible way to allow

the selection of game modes and parameters. Build a method to build and visualize the grid

filled with random numbers. Build a method to visualise a path. (30%)

-Develop your own heuristic algorithm. Identify simple criteria and strategies to find short

paths. This algorithm should be taken as a baseline. It does not have to be optimized to

perform fast or well, but should be better than random movements. Please implement this part

without searching for standard algorithms to find short paths. (10%)

-Implement the Dijkstra's algorithm to find the shortest path between two points. There are

many refined versions of this algorithm that you can find in the literature. Implement a simple

version close to the original algorithm, using a simple priority queue. Write your own code as

much as possible and provide detailed comments of each step. Relying on more sophisticated

implementations available online is not the objective of the task, but to be able to write your

own code. (30%)

-Plan and implement a statistical analysis to characterize the length of the shortest path in

dependence of several parameters of the grid, and comparing the two game modes. Relevant

parameters are: size of the grid, distribution from which cell numbers are generated, etc. (30%)

A detailed exposition of the Task and parameters will be presented in week 2.

Task 2: 25 Marks

Study several factors that affect the performance and interpretation of a simple model such as

Linear Regression analysis. The factors that will be discussed include model mismatch, the

presence of outliers in the data, the presence of hidden confounders or of selection bias, and

the presence of correlations between the covariates (multicollinearity). You will characterize

how these factors affect performance and the interpretation of the parameters in the model.

You will examine different solutions to eliminate or mitigate these effects (e.g. normalization

or transformation of the covariates). You should choose 2 of these factors to be studied in

6

your work. Each will contribute evenly to your mark (50%). A detailed exposition of the task

and parameters will be presented in week 4.

Task 3: 25 Marks

The third task is about classifying Fashion-MNIST dataset. A short description of the dataset

is provided in the Datasets section above.

You can use other API’s/libraries for loading the dataset, but not for the neural network

implementation. The point of this task is to develop a multi-layer neural network for

classification using mostly the numpy.

- Implement sigmoid and Relu layers (with forward and backward pass) (10%)

- Implement a softmax output layer (10%)

- Implement a fully parameterizable neural network (number and types of

layers, number of units can be changed)

(20%)

- Implement an optimizer (e.g. SGD or Adam) and a stopping criterion of your

choosing

(10%)

- Train your Neural Network using backpropagation (20%)

- Evaluate different neural network architectures/parameters, present and

discuss your results

(30%)

Task 4: 25 Marks

The fourth task is about implementing a neural network using PyTorch. You will use

CIFAR-10 dataset for this task.

- Implement a neural network (20%)

- Propose improvements (eg. Convolutional Neural Network, dropout, etc) (30%)

- Evaluate different parameters (20%)

- Present and discuss the results in the report (30%)

For INM702 Only: if you want to use some other dataset instead of CIFAR-10, you can do so.

But first please check with lecturers.

Reports

Each report must have an additional first title page (not included in the page count), and as

many references as needed (not counting in the page total). Graphical illustration of your

results is expected as well as numerical results and analysis.

You should present the results clearly and provide a discussion of the results, with conclusions

related to the problems being addressed. The conclusions section might discuss as well some

further work based on the results of this coursework.

Of particular importance, you should indicate on the report an estimate of the percentage of

code that you borrowed from external sources for each task, and cite them properly. This will

matter for the evaluation of your work. Failure to do so will lead to a fail mark.

Reflection

In the case of teamwork, the reflection part should address who did what.

Note

You are not only being marked on how good the results are. It matters that you try something

sensible and clearly describe the problem, methods, what you did, and your interpretation of

the results.

Coding & Referencing

This is, in large part, a coding assignment. If you use code (or other materials) written by

someone else, you must cite that code (or other material). If you do not cite work appropriately

you will have committed academic misconduct. Making superficial changes to the code does

not make it yours. You are also expected to make a coding contribution, so if you use a large

amount of code written by someone else, and cite it appropriately, your coding contribution

will be low, and your work marked accordingly in this respect.

Extenuating Circumstances

If you are not able to submit your coursework on time for unforeseen medical reasons or

personal reasons beyond your control you should contact the Programmes Office as soon as

possible and fill an Extenuating Circumstances form. Strong evidence in the form of, for

instance, medical certificates or legal statements will have to be produced.

https://studenthub.city.ac.uk/help-and-support/extenuating-circumstances-complaints-

appeals

Plagiarism

If you copy the work of others (either that of another team or of a third party), with or without

their permission, you will score no marks and further disciplinary action will be taken against

you. The same applies if you allow others to copy your work.

See more at

https://studenthub.city.ac.uk/__data/assets/pdf_file/0006/372822/6.-Referencing-and-

avoiding-plagiarism_FINAL.pdf

and see

https://www.citystudents.co.uk/pageassets/advice/selfhelpguides/academicmisconduct/Acade

mic-Misconduct-Policy-and-Guidance-1920.pdf

for general guidance on academic misconduct guidance.

版权所有：留学生编程辅导网 2020 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。