annotate nnCostFunction.m @ 1:42b6020b2fdb

Do regularised cost function
author Jordi Gutiérrez Hermoso <jordigh@octave.org>
date Fri, 11 Nov 2011 14:13:51 -0500
parents 395fc40248c3
children e09973b9190f
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
1 function [J grad] = nnCostFunction(nn_params,
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
2 input_layer_size,
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
3 hidden_layer_size,
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
4 num_labels,
0
395fc40248c3 Initial commit
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
diff changeset
5 X, y, lambda)
1
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
6 ##NNCOSTFUNCTION Implements the neural network cost function for a two layer
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
7 ##neural network which performs classification
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
8 ## [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
9 ## X, y, lambda) computes the cost and gradient of the neural network. The
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
10 ## parameters for the neural network are "unrolled" into the vector
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
11 ## nn_params and need to be converted back into the weight matrices.
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
12 ##
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
13 ## The returned parameter grad should be a "unrolled" vector of the
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
14 ## partial derivatives of the neural network.
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
15 ##
0
395fc40248c3 Initial commit
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
diff changeset
16
1
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
17 ## Reshape nn_params back into the parameters Theta1 and Theta2, the
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
18 ## weight matrices for our 2 layer neural network
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
19 Theta1 = reshape (nn_params(1:hidden_layer_size * (input_layer_size + 1)),
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
20 hidden_layer_size, (input_layer_size + 1));
0
395fc40248c3 Initial commit
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
diff changeset
21
1
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
22 Theta2 = reshape (nn_params((1 + (hidden_layer_size
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
23 * (input_layer_size + 1))):end),
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
24 num_labels, (hidden_layer_size + 1));
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
25
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
26 ## Setup some useful variables
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
27 m = rows (X);
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
28 one_vec = ones (m, 1);
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
29
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
30 Theta1_grad = zeros(size(Theta1));
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
31 Theta2_grad = zeros(size(Theta2));
0
395fc40248c3 Initial commit
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
diff changeset
32
1
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
33 ht = sigmoid ([one_vec, sigmoid([one_vec, X]*Theta1')]*Theta2');
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
34
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
35 ## This is a bit tricky. In order to avoid expanding the y entries
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
36 ## into those useless 0-1 vectors (why represent the same data with
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
37 ## more space?), instead we use bsxfun together with an indexing
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
38 ## trick. Recall the long form of the cost function
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
39 ##
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
40 ## / -log( h_theta(x)) if y == 1
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
41 ## cost = {
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
42 ## \ -log(1 - h_theta(x)) if y != 1
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
43 ##
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
44 ## thus the indices formed with bsxfun pick out the entries of ht that
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
45 ## are the first form for this label or not the first form for this
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
46 ## label. Then everything just gets added together.
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
47 ##
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
48 ## Note that although the bsxfun does generate the 0-1 logical matrix
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
49 ## of the y's, it's useful that it's a logical matrix because
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
50 ## internally the indexing with a logical matrix can be done faster.
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
51 ## Also, logical indexing returns vectors, so the double summations
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
52 ## get flattened into a single summation.
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
53 J = -(sum (log (ht(bsxfun (@eq, 1:num_labels, y)))) \
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
54 + sum (log (1 - ht(bsxfun (@ne, 1:num_labels, y)))))/m \
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
55
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
56 ## The regularisation term has to exclude the first column of the Thetas,
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
57 ## because we don't regularise the bias nodes.
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
58 + lambda*(sum (Theta1(:, 2:end)(:).^2) \
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
59 + sum (Theta2(:, 2:end)(:).^2))/(2*m);
0
395fc40248c3 Initial commit
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
diff changeset
60
1
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
61 grad = [Theta1_grad(:) ; Theta2_grad(:)];
0
395fc40248c3 Initial commit
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents:
diff changeset
62
1
42b6020b2fdb Do regularised cost function
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 0
diff changeset
63 endfunction