联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Matlab编程Matlab编程

日期:2021-04-02 11:40

HOMEWORK ASSIGNMENT 3

(Heriot-Watt MSc Course B31XN)

HOMEWORK ASSIGNMENT 3:

STOCHASTIC OPTIMISATION FOR BINARY SVM

Guidelines

Assessment The assignment will contribute 20% to the final mark for the course. For each task, marks

will be given for correctly using Matlab to carry out the task and for your interpretation and comments

about the results. Each task will have the same weight in the assignment. Additional 2 marks will be

given for respecting the guidelines.

General style The report should be a pdf file. There should be a header page that includes your name

and matriculation number.

Length Your assignment should be NOT more than four sides of A4 including title, graphs, tables,

references, but not including the Appendix. Use a standard font that can be read when printing the report

(be also careful with the font of figure’ labels and captions).

Content The main part of your assignment should give the results for each task that you carry out and

your comments on and conclusions from these results. Figures and tables must be included in the main

report.

Appendix The report should not contain Matlab codes, but they must be included in an Appendix. The

codes can be provided as pdf and/or snapshots, and must be properly commented

Submission The assignment should be submitted through Vision. It should not be submitted by email.

Collaboration Students are encouraged to discuss the methods used with other students, but your

submitted assignment must be all your own work. In particular, copying a section of your assignment,

either plots or commentary, from another student is a serious disciplinary offence. It is also an offence to

allow another student to copy your work, so your assignment files should not be made available to other

students.

Deadline The deadline for submission is 26 March 2021; assignments may be submitted early.

Late submissions Any assignment that is submitted after the submission deadline but within 5 working

days of the deadline will be penalised by a 30% reduction in the mark. Assignments that are submitted

more than 5 working days past the deadline will not be marked.

HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 1/9

Dr Audrey Repetti

HOMEWORK ASSIGNMENT 3

(Heriot-Watt MSc Course B31XN)

Figure 1: Sample of the MNIST dataset (http://yann.lecun.com/exdb/mnist/).

Figure 2: Sample of the MNIST dataset showing handwritten digits 0 and 1.

Outline

? Stochastic proximal algorithms

? Machine Learning

? Binary classifier

1 Project description

This project aims to illustrate stochastic proximal methods in the context of machine learning. You will

need to implement a stochastic gradient forward-backward (FB) algorithm with Matlab, to train a binary

support vector machine (SVM) classifier. To help with this task, you will also implement the standard

FB algorithm.

HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 2/9

Dr Audrey Repetti

HOMEWORK ASSIGNMENT 3

(Heriot-Watt MSc Course B31XN)

Binary SVM

The objective in binary regression is to be able to correctly predict a binary label (e.g. yes vs. no, or 0

vs. 1, etc.). Binary classification aims to pair an input x ∈ R

N to a binary output y ∈ {?1, +1}, via a

model of the form

y = dw(x) = sign

x

>w



, (1)

where > denotes the transpose operation, and w ∈ R

N are weights that need to be learned. Precisely,

the supervised learning task consists in finding w such that for a given input, the output will be correctly

predicted by the classifier. The classifier is trained on a training set of pairs

S =



(x`

, y`)16`6L |(?` ∈ {1, . . . , L}) x` ∈ R

N , y` ∈ {?1, +1}

. (2)

A sample of the first 21 images from the MNIST test set are shown in Figure 1. The full MNIST

dataset can be found at http://yann.lecun.com/exdb/mnist/, it contains 60000 training images of size

28 × 28, and 10000 testing images of size 28 × 28, all representing handwritten digits between 0 and 9.

A binary classifier will aim to provide a binary answer to a question (e.g. “yes” vs. “no”, or 0 vs. 1).

For this data set, the classifier could be trained, e.g., to recognise the digit 0 among the other digits. A

simplified version would be to only look at 0 and 1 digits, and classify the digits depending if they are 0

or 1. A sub-sample of the MNIST test set is given in Figure 2, only showing digits 0 and 1.

Minimisation problem

A classical approach for learning the vector of weights w is to define it as a solution to

minimise

w∈RN

g(w) + h(w), (3)

where g ∈ Γ0(R

N ) is a regularisation function, and h ∈ Γ0(R

N ) is the loss function. In the context of

sparse learning, g can be chosen to be an `1 norm:

(?w ∈ R

N ) g(w) = λkwk1, (4)

where λ > 0. Regarding h, a classical choice is the Huber loss. Let

(?w ∈ R

N ) h(w) = 1

L

X

L

`=1

φ(y` ? x

>

` w), (5)

with φ ∈ Γ0(R) is the Huber function defined as

(?u ∈ R) φ(u) = (

1

2

u

2

if |u| 6 δ,

δ


|u| ? 1

2

δ



otherwise.

(6)

Matlab toolbox

The file Main.m contains the main commands to implement the binary classifier.

HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 3/9

Dr Audrey Repetti

HOMEWORK ASSIGNMENT 3

(Heriot-Watt MSc Course B31XN)

Datasets. In this project, you will handle two datasets, both obtained from the MNIST dataset.

? dataset subMNIST.mat: Dataset that only contains digits 0 and 1.

? dataset MNIST.mat: Full MNIST dataset.

The variables X test and X train are tensors of dimension Nx × Ny × I and Nx × Ny × L, respectively,

where Nx = Ny = 28 and I (resp. L) is the total number of images contained in the testing set (resp.

training set). For instance, the i-th image in X test is given by X test(:,:,i). The variables Y test and

Y train are vectors of dimension L containing labels in {?1, +1}. The datasets have been preprocessed

such that the labels correspond to +1 if the associated the digit is 0, and the labels are ?1 if the associated

digit is not 0. For instance, for the dataset dataset subMNIST.mat, the first digit in X test (i.e.

X test(:,:,1)) is 1, so the the first coefficient in Y test (i.e. Y test(1)) is ?1. The second digit X test(:,:,2)

is 0, so the the second coefficient Y test(2) is +1.

Binary classifier model. You will aim to train a classifier to determine whether a hand-written digit is

a 0 or not. You will use a linear model of the form (1). This model in Matlab is implemented as follows:

where x is the image to be tested to check whether it is the digit 0 or not, and w is the variable that

needs to be trained.

In order to test the full training set with the learned variable w, the Matlab function d test will be

used, defined as follows:

To determine the percentage of errors done by the classifier, one can compute

E(w) = 100 ×

1

2

PI

i=1 |dw(xi) ? yi

|

I

, (7)

where (xi)16i6I are the testing images, and (yi)16i6I are the associated labels (= +1 if the associated

digit is a 0, and = ?1 if it is not a 0).

Additional information. Further details are provided below.

? op norm.m: val = op norm(A, At, im size)

– Compute the squared spectral norm of operator A

– A and At: operator and its adjoint. Must be defined as functions.

– im size: size of the input, e.g. if A applies to a variable of size [Nx,Ny], then im size=[Nx,Ny].

? huber.m: h = huber(x, delta)

– Compute the Huber function φ defined in (6).

– x can be either a scalar or a vector.

HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 4/9

Dr Audrey Repetti

HOMEWORK ASSIGNMENT 3

(Heriot-Watt MSc Course B31XN)

– delta: scalar value corresponding to the parameter δ in (6).

– If x is a vector of size L, then the output of the function is a vector of size L as well.

? huber grad: g=huber grad(x,delta)

– Compute the gradient of the Huber function φ defined in (6).

– x can be either a scalar or a vector.

– delta: scalar value corresponding to the parameter δ in (6).

– If x is a vector of size L, then the output of the function is a vector of size L as well.

– This function must be updated.

? TO BE COMPLETED: The parts of the code to be completed (or changed) are highlighted with

comments, e.g.:

2 Tasks

1. Before designing a stochastic gradient forward-backward to train a classifier, you will first build

a classical forward-backward algorithm to solve (3) (without stochastic gradient). The questions

below will drive you for this task.

(a) [1 Mark]

Let X = (x`)16`6L ∈ R

N×L be the matrix containing in the `-th column the input x`

. Let

Y = (y`)16`6L ∈ R

L.

Show that

(?w ∈ R

N ) h(w) = 1

L

?(Y ? X

>w) (8)

where ?: R

L → R is defined as

(?u = (u`)16`6L ∈ R

L

) ?(u) = X

L

`=1

φ(u`), (9)

with φ defined in (6).

(b) [2 Mark]

What is the gradient of function h? What is the Lipschitz constant β of its gradient?

(c) [1 Mark]

What is the proximity operator of function g?

(d) [1 Mark]

Give the FB iterations to solve (3).

HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 5/9

Dr Audrey Repetti

HOMEWORK ASSIGNMENT 3

(Heriot-Watt MSc Course B31XN)

2. [2 Marks]

You will now implement the FB algorithm with MATLAB.

? In file main.m, compute compute the Lipschitz constant of h:

? In file FB.m update the first lines corresponding to the different parameter, functions and

operators involved to run the FB algorithm to solve (3):

– gamma: step-size for FB

– g and h: compute functions g and h given in equations (4) and (5), respectively.

– grad =@(w): gradient of the smooth function, computed at w. To compute this gradient,

you also need to update the file huber grad.m, with function g=huber grad(x,delta).

– prox =@(w, T): proximity operator of the proximable function, computed at w, with

parameter T.

? In file FB.m update the FB steps:

3. [2 Marks]

Run the FB algorithm to solve problem (3), with both the two MNIST datasets. For each dataset,

comment on the results, and give a figure showing the percentage of error of the classifier obtained

during the training process, as a function of time.

4. For this task, you will aim to build a stochastic gradient forward-backward (SGFB) algorithm to

solve (3).

HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 6/9

Dr Audrey Repetti

HOMEWORK ASSIGNMENT 3

(Heriot-Watt MSc Course B31XN)

(a) [1 Mark]

For every k ∈ N, let Lk ? {1, . . . , D}, and Lk = ] Lk. Let XLk = (x`)`∈Lk ∈ R

Lk×N and

YLk = (y`)`∈Lk ∈ R

Lk .

In the context of SGFB, give the approximated gradient of h, only involving the functions

and operators associated with indices Lk.

(b) [1 Mark]

Give the SGFB iterations to solve problem (3), assuming that, at each iteration k ∈ N, a

subset Dk ? {1, . . . , D} of the D functions is randomly selected.

(c) [1 Mark]

State clearly the convergence conditions.

5. [2 Marks]

You will now implement the SGFB algorithm with MATLAB.

? In file FB sto.m, update the first lines corresponding to the different parameters, functions,

and operators involved to run the SGFB algorithm to solve (3):

They are all the same as for FB, apart from the computation of the approximated gradient,

that should only involve the functions/operators associated with the selected indices Ind.

? In file FB sto.m update the FB steps:

Additional information:

? In file Main.m, you will need to choose the parameters p and theta:

HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 7/9

Dr Audrey Repetti

HOMEWORK ASSIGNMENT 3

(Heriot-Watt MSc Course B31XN)

– p: parameter with values in ]0, 1], corresponding to the proportion of indices d ∈

{1, . . . , D} that are randomly selected at each iteration.

– theta: parameter to control convergence speed that must appear in the choice of the

step-size.

Remark: In this task, the main difficulty is to build the approximated gradient, hence finding a

suitable Matlab syntax is part of the question.

6. [2 Marks]

Run the SGFB algorithm for the MNIST dataset only containing 0 and 1 digits, considering the

following parameters: p ∈ {1‰, 1%, 10%, 50%} and θ ∈ {0.6, 0.8, 1}.

Give tables (see below) with (i) the percentage error when applying the learned classifier to the test

sets, (ii) the total time necessary to train the classifier, and (iii) the final objective value of h + g.

You may want to run the algorithm multiple times to take an average of both percentage error and

convergence time.

Remark: If you prefer, you can provide figures.

Percentage of error

θ \ p 1‰ 1% 10% 50%

0.6

0.8

1

Convergence time

θ \ p 1‰ 1% 10% 50%

0.6

0.8

1

Final objective value

θ \ p 1‰ 1% 10% 50%

0.6

0.8

1

7. [2 Marks]

Comment on the Matlab results obtained for the previous questions.

Based on your comments, choose the best values for p and θ and train the classifier with the SGFB

algorithm on the full MNIST dataset. Comment your results.

Remark: The problem being convex, the minimum objective value should be the same for all

methods. It may happen that you observe that it is not the case. This might be due to the considered

stopping criterion.

Keeping this in mind, you can use the results obtained from the classical FB algorithm as ground

truth results.

Do not change the stopping criterion, only acknowledge this fact if you observe it in your simulations.

[Additional 2 Marks] If guidelines are respected.

HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 8/9

Dr Audrey Repetti

HOMEWORK ASSIGNMENT 3

(Heriot-Watt MSc Course B31XN)

Matlab help

To obtain average values in Task 6, one can use loops in Matlab to automatically change parameters and

run multiple times an algorithm:

HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 9/9


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。