Coursera machine learning week 3 Quiz answer Logistic Regression | Andrew Ng

Coursera machine learning week 3 Quiz answer Logistic Regression | Andrew NG

1. Suppose that you have trained a logistic regression classifier, and it outputs on a new example a prediction  = 0.2. This means (check all that apply):

•  Our estimate for P(y = 1|x; θ) is 0.8.

h(x) gives P(y=1|x; θ), not 1 – P(y=1|x; θ)

•  Our estimate for P(y = 0|x; θ) is 0.8.

Since we must have P(y=0|x;θ) = 1 – P(y=1|x; θ), the former is
1 – 0.2 = 0.8.

•  Our estimate for P(y = 1|x; θ) is 0.2.

h(x) is precisely P(y=1|x; θ), so each is 0.2.

•  Our estimate for P(y = 0|x; θ) is 0.2.

h(x) is P(y=1|x; θ), not P(y=0|x; θ)

2. Suppose you have the following training set, and fit a logistic regression classifier .

Which of the following are true? Check all that apply.

•  Adding polynomial features (e.g., instead using ) could increase how well we can fit the training data.
•  At the optimal value of θ (e.g., found by fminunc), we will have J(θ) ≥ 0.
•  Adding polynomial features (e.g., instead using ) would increase J(θ) because we are now summing over more terms.
•  If we train gradient descent for enough iterations, for some examples $inline&space;x^{(i)}$ in the training set it is possible to obtain $inline&space;frac{partial&space;}{partial&space;theta_j&space;}&space;J(theta)&space;=&space;frac{1}{m}&space;sum_{i=1}^{m}(h_theta(x^{(i)})-y^{i})x^{(i)}_j$. Which of these is a correct gradient descent update for logistic regression with a learning rate of  ? Check all that apply.

4. Which of the following statements are true? Check all that apply.

•  The one-vs-all technique allows you to use logistic regression for problems in which each $inline&space;y^{(i)}$ comes from a fixed, discrete set of values.

If each $inline&space;y^{(i)}$ is one of k different values, we can give a label to each $inline&space;y^{(i)}&space;epsilon&space;{1,2,....,k}$ and use one-vs-all as described in the lecture.

•  For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).

The cost function for logistic regression is convex, so gradient descent will always converge to the global minimum. We still might use a more advanced optimisation algorithm since they can be faster and don’t require you to select a learning rate.

•  The cost function  for logistic regression trained with  examples is always greater than or equal to zero.

The cost for any example $inline&space;x^{(i)}$ is always  since it is the negative log of a quantity less than one. The cost function  is a summation over the cost for each sample, so the cost function itself must be greater than or equal to zero.

•  Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification).

We will need 3 classfiers. One-for-each class.

5. Suppose you train a logistic classifier . Suppose . Which of the following figures represents the decision boundary found by your classifier?

•  Figure:

In this figure, we transition from negative to positive when x1 goes from left of 6 to right of 6 which is true for the given values of θ.

•  Figure:
•  Figure:
•  Figure:

