machine-learning-hw4: nnCostFunction.m annotate

annotate nnCostFunction.m @ 1:42b6020b2fdb

Do regularised cost function

author	Jordi Gutiérrez Hermoso <jordigh@octave.org>
date	Fri, 11 Nov 2011 14:13:51 -0500
parents	395fc40248c3
children	e09973b9190f

rev	line source
1 42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	1 function [J grad] = nnCostFunction(nn_params,
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	2 input_layer_size,
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	3 hidden_layer_size,
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	4 num_labels,
0 395fc40248c3 Initial commit Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: diff changeset	5 X, y, lambda)
1 42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	6 ##NNCOSTFUNCTION Implements the neural network cost function for a two layer
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	7 ##neural network which performs classification
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	8 ## [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	9 ## X, y, lambda) computes the cost and gradient of the neural network. The
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	10 ## parameters for the neural network are "unrolled" into the vector
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	11 ## nn_params and need to be converted back into the weight matrices.
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	12 ##
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	13 ## The returned parameter grad should be a "unrolled" vector of the
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	14 ## partial derivatives of the neural network.
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	15 ##
0 395fc40248c3 Initial commit Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: diff changeset	16
1 42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	17 ## Reshape nn_params back into the parameters Theta1 and Theta2, the
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	18 ## weight matrices for our 2 layer neural network
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	19 Theta1 = reshape (nn_params(1:hidden_layer_size * (input_layer_size + 1)),
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	20 hidden_layer_size, (input_layer_size + 1));
0 395fc40248c3 Initial commit Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: diff changeset	21
1 42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	22 Theta2 = reshape (nn_params((1 + (hidden_layer_size
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	23 * (input_layer_size + 1))):end),
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	24 num_labels, (hidden_layer_size + 1));
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	25
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	26 ## Setup some useful variables
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	27 m = rows (X);
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	28 one_vec = ones (m, 1);
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	29
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	30 Theta1_grad = zeros(size(Theta1));
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	31 Theta2_grad = zeros(size(Theta2));
0 395fc40248c3 Initial commit Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: diff changeset	32
1 42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	33 ht = sigmoid ([one_vec, sigmoid([one_vec, X]Theta1')]Theta2');
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	34
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	35 ## This is a bit tricky. In order to avoid expanding the y entries
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	36 ## into those useless 0-1 vectors (why represent the same data with
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	37 ## more space?), instead we use bsxfun together with an indexing
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	38 ## trick. Recall the long form of the cost function
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	39 ##
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	40 ## / -log( h_theta(x)) if y == 1
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	41 ## cost = {
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	42 ## \ -log(1 - h_theta(x)) if y != 1
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	43 ##
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	44 ## thus the indices formed with bsxfun pick out the entries of ht that
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	45 ## are the first form for this label or not the first form for this
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	46 ## label. Then everything just gets added together.
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	47 ##
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	48 ## Note that although the bsxfun does generate the 0-1 logical matrix
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	49 ## of the y's, it's useful that it's a logical matrix because
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	50 ## internally the indexing with a logical matrix can be done faster.
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	51 ## Also, logical indexing returns vectors, so the double summations
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	52 ## get flattened into a single summation.
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	53 J = -(sum (log (ht(bsxfun (@eq, 1:num_labels, y)))) \
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	54 + sum (log (1 - ht(bsxfun (@ne, 1:num_labels, y)))))/m \
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	55
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	56 ## The regularisation term has to exclude the first column of the Thetas,
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	57 ## because we don't regularise the bias nodes.
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	58 + lambda*(sum (Theta1(:, 2:end)(:).^2) \
42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	59 + sum (Theta2(:, 2:end)(:).^2))/(2*m);
0 395fc40248c3 Initial commit Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: diff changeset	60
1 42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	61 grad = [Theta1_grad(:) ; Theta2_grad(:)];
0 395fc40248c3 Initial commit Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: diff changeset	62
1 42b6020b2fdb Do regularised cost function Jordi Gutiérrez Hermoso <jordigh@octave.org> parents: 0 diff changeset	63 endfunction

Mercurial > hg > machine-learning-hw4

annotate nnCostFunction.m @ 1:42b6020b2fdb