Wednesday 21 August 2019

Introduction to Machine Learning: Assignment 2

1. The parameters obtained in linear regression

a. can take any value in the real space
b. are strictly integers
c. always lie in the range [0,1]
d. can take only non zero values

Ans: a

2. Consider forward selection, backward selection and best subset selection with respect to the
same data set. Which of the following is true?

a. Best subset selection can be computationally more expensive than forward selection
b. forward selection and backward selection always lead to the same result
c. best subset selection can be computationally less expensive than backward selection
d. best subset selection and forward selection are computationally equally expensive
e. both (b) and (d)

Ans: a

3. Adding interaction terms (such as products of two dimensions) along with original features in
linear regression

a. can reduce training error
b. can increase training error
c. cannot affect training error

Ans: a

4. Consider the following five training examples X = [2 3 4 5 6]  Y = [12.89 17.75 23.31 28.31 32.13] We want to learn a function of the form which is parameterized by (a,b).Using squared error as the loss function, which of the following parameters would you use to model this function

a. (4 3)
b. (5 3)
c. (5 1)
d. (1 5)

Ans: b

5. A study was conducted to understand the effect of number of hours the students spent
studying to their performance in the final exams. You are given the following 8 samples from the study.What is the best linear fit on this dataset?

a. y = -3.39x + 11.62
b. y = 4.59x + 12.58
c. y = 3.39x + 10.58
d. y = 4.69x + 11.62

Ans: b

6. Which of the following shrinkage method is more likely to lead to sparse solution?
a. Lasso regression
b. Ridge regression
c. Lasso and ridge regression both return sparse solutions

Ans: a

7. Consider the design matrix X of dimension N x (p+1) . Which of the following statements
are true?

a. The row space of X is the same as the column space of X^T
b. The row space of is the same as the row space of X^T
c. both (a) and (b)
d. none of the above

Ans: a

8. How does LASSO differ from Ridge Regression?

a. LASSO uses regularization while Ridge Regression uses regularization
b. LASSO uses regularization while Ridge Regression uses regularization
c. The LASSO constraint is a high-dimensional rhomboid while the Ridge Regression constraint is a
high-dimensional ellipsoid
d. Ridge Regression shrinks more coefficients to 0 compared to LASSO
e. The Ridge Regression constraint is a high-dimensional rhomboid while the LASSO constraint is a
high-dimensional ellipsoid
f. Ridge Regression shrinks less coefficients to 0 compared to LASSO

Ans: a,c and f

9. Principal Component Regression (PCR) is an approach to find an orthogonal set of basis vectors which can then be used to reduce the dimension of the input. Which of the following matrices contains the principal component directions as its columns (follow notation from the lecture video)

a. X
b. S
c. Xc
d. V
e. U

Ans: c

10. Let v , v , . . . v denote the Principal Components of some data X, as extracted by
Principal Components Analysis and where v is the First Principal
Component. What can you say about the variance of X in the directions defined by v , v , . . . v?

a. X has the highest variance along v
b. X has the lowest variance along v
c. X has the lowest variance along v
d. X has the highest variance along v
e. Order of variance : v ≥ v ≥ . . . ≥ v
f. Order of variance : v ≥ v ≥ . . . ≥ v

Ans: a,b and e

7 comments: