Mercurial > hg > machine-learning-hw4
annotate nnCostFunction.m @ 1:42b6020b2fdb
Do regularised cost function
author | Jordi Gutiérrez Hermoso <jordigh@octave.org> |
---|---|
date | Fri, 11 Nov 2011 14:13:51 -0500 |
parents | 395fc40248c3 |
children | e09973b9190f |
rev | line source |
---|---|
1
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
1 function [J grad] = nnCostFunction(nn_params, |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
2 input_layer_size, |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
3 hidden_layer_size, |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
4 num_labels, |
0 | 5 X, y, lambda) |
1
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
6 ##NNCOSTFUNCTION Implements the neural network cost function for a two layer |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
7 ##neural network which performs classification |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
8 ## [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ... |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
9 ## X, y, lambda) computes the cost and gradient of the neural network. The |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
10 ## parameters for the neural network are "unrolled" into the vector |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
11 ## nn_params and need to be converted back into the weight matrices. |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
12 ## |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
13 ## The returned parameter grad should be a "unrolled" vector of the |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
14 ## partial derivatives of the neural network. |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
15 ## |
0 | 16 |
1
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
17 ## Reshape nn_params back into the parameters Theta1 and Theta2, the |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
18 ## weight matrices for our 2 layer neural network |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
19 Theta1 = reshape (nn_params(1:hidden_layer_size * (input_layer_size + 1)), |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
20 hidden_layer_size, (input_layer_size + 1)); |
0 | 21 |
1
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
22 Theta2 = reshape (nn_params((1 + (hidden_layer_size |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
23 * (input_layer_size + 1))):end), |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
24 num_labels, (hidden_layer_size + 1)); |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
25 |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
26 ## Setup some useful variables |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
27 m = rows (X); |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
28 one_vec = ones (m, 1); |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
29 |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
30 Theta1_grad = zeros(size(Theta1)); |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
31 Theta2_grad = zeros(size(Theta2)); |
0 | 32 |
1
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
33 ht = sigmoid ([one_vec, sigmoid([one_vec, X]*Theta1')]*Theta2'); |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
34 |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
35 ## This is a bit tricky. In order to avoid expanding the y entries |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
36 ## into those useless 0-1 vectors (why represent the same data with |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
37 ## more space?), instead we use bsxfun together with an indexing |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
38 ## trick. Recall the long form of the cost function |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
39 ## |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
40 ## / -log( h_theta(x)) if y == 1 |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
41 ## cost = { |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
42 ## \ -log(1 - h_theta(x)) if y != 1 |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
43 ## |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
44 ## thus the indices formed with bsxfun pick out the entries of ht that |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
45 ## are the first form for this label or not the first form for this |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
46 ## label. Then everything just gets added together. |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
47 ## |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
48 ## Note that although the bsxfun does generate the 0-1 logical matrix |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
49 ## of the y's, it's useful that it's a logical matrix because |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
50 ## internally the indexing with a logical matrix can be done faster. |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
51 ## Also, logical indexing returns vectors, so the double summations |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
52 ## get flattened into a single summation. |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
53 J = -(sum (log (ht(bsxfun (@eq, 1:num_labels, y)))) \ |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
54 + sum (log (1 - ht(bsxfun (@ne, 1:num_labels, y)))))/m \ |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
55 |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
56 ## The regularisation term has to exclude the first column of the Thetas, |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
57 ## because we don't regularise the bias nodes. |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
58 + lambda*(sum (Theta1(:, 2:end)(:).^2) \ |
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
59 + sum (Theta2(:, 2:end)(:).^2))/(2*m); |
0 | 60 |
1
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
61 grad = [Theta1_grad(:) ; Theta2_grad(:)]; |
0 | 62 |
1
42b6020b2fdb
Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
0
diff
changeset
|
63 endfunction |