Coursera machine learning week 8 Quiz answers Unsupervised Learning | Andrew NG

Coursera machine learning week 8 Quiz answers Unsupervised Learning | Andrew NG

1. For which of the following tasks might K-means clustering be a suitable algorithm
Select all that apply.

•  Given a set of news articles from many different news websites, find out what are the main topics covered.

K-means can cluster the articles and then we can inspect them or use other methods to infer what topic each cluster represents

•  Given historical weather records, predict if tomorrow’s weather will be sunny or rainy.

•  From the user usage patterns on a website, figure out what different groups of users exist.

We can cluster the users with K-means to find different, distinct groups.

•  Given many emails, you want to determine if they are Spam or Non-Spam emails.

•  Given a database of information about your users, automatically group them into different market segments.

You can use K-means to cluster the database entries, and each cluster will correspond to a different market segment.

•  Given sales data from a large number of products in a supermarket, figure out which products tend to form coherent groups (say are frequently purchased together) and thus should be put on the same shelf.

If you cluster the sales data with K-means, each cluster should correspond to coherent groups of items.

•  Given sales data from a large number of products in a supermarket, estimate future sales for each of these products.

3. K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?

•  Using the elbow method to choose K.

•  Feature scaling, to ensure each feature is on a comparable scale to the others.

•  Test on the cross-validation set.

•  Randomly initialize the cluster centroids.

4. Suppose you have an unlabeled dataset $inline&space;{x^{(1)},&space;...&space;,&space;x^{(m)}}$. You run K-means with 50 different random initializations, and obtain 50 different clusterings of the data.

What is the recommended way for choosing which one of these 50 clusterings to use?

•  Use the elbow method.

•  Plot the data and the cluster centroids, and pick the clustering that gives the most “coherent” cluster centroids.

•  Manually examine the clusterings, and pick the best one.

•  Always pick the final (50th) clustering found, since by that time it is more likely to have converged to a good solution.

•  The answer is ambiguous, and there is no good way of choosing.

5. Which of the following statements are true? Select all that apply.

•  A good way to initialize K-means is to select K (distinct) examples from the training set and set the cluster centroids equal to these selected examples.

This is the recommended method of initialization.

•  K-Means will always give the same results regardless of the initialization of the centroids.

•  Once an example has been assigned to a particular centroid, it will never be reassigned to another different centroid

•  For some datasets,
the “right” or “correct” value of K (the number of clusters) can be ambiguous, and hard even for a human expert looking carefully at the data to decide.

In many datasets, different choices of K will give different clusterings which appear quite reasonable. With no labels on the data, we cannot say one is better than the other.

•  If we are worried about K-means getting stuck in bad local optima, one way to ameliorate (reduce) this problem is if we try using multiple random initializations.

Since each run of K-means is independent, multiple runs can find different optima, and some should avoid bad local optima.

•  Since K-Means is an unsupervised learning algorithm, it cannot overfit the data, and thus it is always better to have as large a number of clusters as is computationally feasible.

