# HG changeset patch # User Jordi GutiƩrrez Hermoso # Date 1323066159 18000 # Node ID 90d2a292663c6eee69825f0fd9c8e2bde9653b54 # Parent ded78d0b4987ec2c89534e22f80a026a3454cf9f Do k-means diff --git a/computeCentroids.m b/computeCentroids.m --- a/computeCentroids.m +++ b/computeCentroids.m @@ -1,40 +1,17 @@ function centroids = computeCentroids(X, idx, K) -%COMPUTECENTROIDS returs the new centroids by computing the means of the -%data points assigned to each centroid. -% centroids = COMPUTECENTROIDS(X, idx, K) returns the new centroids by -% computing the means of the data points assigned to each centroid. It is -% given a dataset X where each row is a single data point, a vector -% idx of centroid assignments (i.e. each entry in range [1..K]) for each -% example, and K, the number of centroids. You should return a matrix -% centroids, where each row of centroids is the mean of the data points -% assigned to it. -% - -% Useful variables -[m n] = size(X); - -% You need to return the following variables correctly. -centroids = zeros(K, n); - + ##COMPUTECENTROIDS returs the new centroids by computing the means of the + ##data points assigned to each centroid. + ## centroids = COMPUTECENTROIDS(X, idx, K) returns the new centroids by + ## computing the means of the data points assigned to each centroid. It is + ## given a dataset X where each row is a single data point, a vector + ## idx of centroid assignments (i.e. each entry in range [1..K]) for each + ## example, and K, the number of centroids. You should return a matrix + ## centroids, where each row of centroids is the mean of the data points + ## assigned to it. + ## -% ====================== YOUR CODE HERE ====================== -% Instructions: Go over every centroid and compute mean of all points that -% belong to it. Concretely, the row vector centroids(i, :) -% should contain the mean of the data points assigned to -% centroid i. -% -% Note: You can use a for-loop over the centroids to compute this. -% - - + centroids = cell2mat(cellfun(@(i) mean (X(idx == i, :)), + num2cell([1:K]'), "uniformoutput", false)) - - - - +endfunction -% ============================================================= - - -end - diff --git a/findClosestCentroids.m b/findClosestCentroids.m --- a/findClosestCentroids.m +++ b/findClosestCentroids.m @@ -1,33 +1,16 @@ function idx = findClosestCentroids(X, centroids) -%FINDCLOSESTCENTROIDS computes the centroid memberships for every example -% idx = FINDCLOSESTCENTROIDS (X, centroids) returns the closest centroids -% in idx for a dataset X where each row is a single example. idx = m x 1 -% vector of centroid assignments (i.e. each entry in range [1..K]) -% - -% Set K -K = size(centroids, 1); - -% You need to return the following variables correctly. -idx = zeros(size(X,1), 1); + ##FINDCLOSESTCENTROIDS computes the centroid memberships for every example + ## idx = FINDCLOSESTCENTROIDS (X, centroids) returns the closest centroids + ## in idx for a dataset X where each row is a single example. idx = m x 1 + ## vector of centroid assignments (i.e. each entry in range [1..K]) + ## -% ====================== YOUR CODE HERE ====================== -% Instructions: Go over every example, find its closest centroid, and store -% the index inside idx at the appropriate location. -% Concretely, idx(i) should contain the index of the centroid -% closest to example i. Hence, it should be a value in the -% range 1..K -% -% Note: You can use a for-loop over the examples to compute this. -% + ## Set K + K = rows (centroids); + + ## Using broadcasting (auto BSX) as available in Octave 3.5.0+ + d = sum ((permute (X, [1,3,2]) - permute (centroids, [3,1,2])).^2, 3); + [~, idx] = min (d, [], 2); - - - - +endfunction - -% ============================================================= - -end - diff --git a/kMeansInitCentroids.m b/kMeansInitCentroids.m --- a/kMeansInitCentroids.m +++ b/kMeansInitCentroids.m @@ -1,26 +1,12 @@ function centroids = kMeansInitCentroids(X, K) -%KMEANSINITCENTROIDS This function initializes K centroids that are to be -%used in K-Means on the dataset X -% centroids = KMEANSINITCENTROIDS(X, K) returns K initial centroids to be -% used with the K-Means on the dataset X -% - -% You should return this values correctly -centroids = zeros(K, size(X, 2)); - -% ====================== YOUR CODE HERE ====================== -% Instructions: You should set centroids to randomly chosen examples from -% the dataset X -% - - - - - - - - -% ============================================================= +##KMEANSINITCENTROIDS This function initializes K centroids that are to be +##used in K-Means on the dataset X +## centroids = KMEANSINITCENTROIDS(X, K) returns K initial centroids to be +## used with the K-Means on the dataset X +## + + ## Using second argument to randperm implemented in dev version + centroids = X(randperm (rows (X), K), :); end