A Exam:Machine Learning with Python

Posted amcomputer

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了A Exam:Machine Learning with Python相关的知识,希望对你有一定的参考价值。

机器学习题目,目前已经做好了,1-7,9共八道题目。共99RMB, 需要的私聊。

Problem 1 [8 pts]

You are robot in a lumber yard, and must learn to discriminate Oak wood from Pine wood. You
choose to learn a Decision Tree Classifier. You are given the following examples:

(a) [3 pts] Calculate the information gain for each attribute for (1). Which attribute is the best for
(1)? Fill in ① and ②.

(b) [3 pts] Calculate the information gain for each attribute for (2). Which attribute is the best for
(2)? Fill in ③ and ④.

© [2 pts] Classify these new examples as Oak or Pine using your decision tree above.
i. What class is [Density = Light, Grain = Small, Hardness = Hard]?
ii. What class is [Density= Light, Grain = Small, Hardness = Soft]?

Problem 2 [3 pts]

Consider the following training set in the 2-dimensional Euclidean space:


(a) [1 pt] What is the prediction of the 3-nearest-neighbor classifier at the point (1,1)?

(b) [1pt] What is the prediction of the 5-nearest-neighbor classifier at the point (1,1)?

© [1pt] What is the prediction of the 7-nearest-neighbor classifier at the point (1,1)?

Problem 3 [4 pts]

Suppose we are learning a classifier with binary output values Y = 0 and Y = 1. There are two real valued input attributes X1 and X2. Here is our data:


Assume we will learn a decision tree using ID3 algorithm on this data.
Assume that when the decision tree splits on the real-valued attributes, it putsthe split threshold halfway
between the values that surround the highest-scoring split location. For example, if X2 is selected as
the root attribute, the decision tree would choose to split at X2 = 1, which is halfway between X2 = 0
and X2 = 2.
Let Algorithm DT2 be the method of learning a decision tree with only two leaf nodes (i.e. only one
split), and Algorithm DT* be the method of learning a decision tree fully with no pruning.
(a) (2 pts) What will be the training set error of DT2 and DT* on our data? Express your answer
as the number of misclassifications out of 10.
DT2:
DT*:

(b) (2pts) What will be the leave-one-out cross-validation error of DT2 and DT* on our data?
DT2:
DT*:

  • Leave-one-out cross validation is K-fold cross validation taken to its logical extreme, with K equal
    to N, the number of data points in the set. That means that N separate times, the function approximator
    is trained on all the data except for one point and a prediction is made for that point.

Problem 4 [3 pts]

Support vector machines learn a decision boundary leading to the largest margin from both classes.
You are training a Support Vector Machine (SVM) on a tiny dataset with 4 points shown in Figure 1

(a) [2 pts] Find w1, w2 and b from a decision boundary f(x1, x2) = w1⋅x1+w2⋅x2+b?
(b) [1 pt] Choose all support vectors (the data points that lie closest to the decision surface)

Problem 5 [6 pts]

Consider fitting the linear regression model for these data.

Assume you solve the problems using least squares.
(a) [3 pts] Fit 𝑌𝑌𝑖𝑖 = 𝛽𝛽0, find 𝛽𝛽0. Explain how you got your answer.

(b) [3 pts] Fit 𝑌𝑌𝑖𝑖 = 𝛽𝛽1𝑋𝑋𝑖𝑖, find 𝛽𝛽1. Explain how you got your answer.

Problem 6 [4 pts]

Suppose your training set for two-class classification in one dimension (d=1, 𝑥𝑥𝑖𝑖 ∈ ℝ) contains three
sample points:
point 𝑥𝑥1 = 3 with label 𝑦𝑦1 = 1,
point 𝑥𝑥2 = 1 with label 𝑦𝑦2 = 1, and
point 𝑥𝑥3 = −1 with label 𝑦𝑦3 = −1.
What are the values of w and b given by a logistic regression (with no regularization)?

Problem 7 [9 pts]

We are interested in predicting whether a person makes over 50K a year. For simplicity suppose we
model the two features with two boolean variables 𝑋𝑋1, 𝑋𝑋2 ∈ 0,1 and label 𝑌𝑌 ∈ 0,1 where 𝑌𝑌 = 1 indicates a person makes over 50K. In Figure, we show three positive samples (“+” for 𝑌𝑌 = 1) and
one negative samples (“-” for 𝑌𝑌 = 0). Please complete the following questions.

(a) [1 pts] If we train a k-NN classifier (k=3) based on data in Figure, and then try to classify the
same data. Which sample(s) must be misclassified by this classifier?

(b) [1 pts] For predicting samples in Figure, which model is better: Logistic Regression or Linear
Regression. Why?

© [1.5 pts] Is there any logistic regression classifier using X1 and X2 that can perfectly classify
the examples in Figure? Why?

(d) [1.5 pts] How about if we change label of point (0,1) from “+” to ”-”? Why?

(e) [2 pts] Suppose we have trained a linear regression model y=ax+b where a=0.5 and b=1.0, on
a set of training data points D = (1.0, 1.6), (1.5, 1.5), (3.0, 2.4). Please calculate the mean
squared errors (MSE) of this model on D (MSE = mean of error2
). Write down how you got
your answer.

(f) [2 pts] If we train a classifier based on data in Figure, and then we apply that classifier on a
testing dataset. The testing confusion matrix is given by:


What is the precision and recall of that classifier?

Problem 8 [9 pts]

Use k-means algorithm to create two clusters. The initial centroids are 𝜇1 = (2,2) and 𝜇2 = (1,1).
Measure the distance using the Euclidean distance:
e.g ) distance between 𝜇1 and 𝜇2 = (2 − 1)2 + (2 − 1)^2

  1. [5 pts] Fill in the following table:
  2. [2 pts] Re-compute the new centroids.

Problem 9 [3 pts]

Suppose you have inputs as x = −2, y = 5, and z = −4. You have a neuron q and neuron f with functions:
q = x + y
f = q ∗ z
What is the gradient of f with respect to x, y, and z? See the figure below:

Problem 10 [3 pts]

Consider the following convolution procedure:
Greyscale Image --> Convolution with filter A (no padding, stride=1) --> 2x2 maxpool --> Output
A =


When the input image is

what is the output of the convolution procedure?

Problem 11 [2 pts]

Suppose you are learning a CNN on grayscale images of size 105x154, so the image has only one
channel. In the first convolutional layer, you use a filter of size 21x14 with stride of size 7 in both x and
y dimensions without any padding or bias term. How many neurons will there be in the next layer?

Problem 12 [2 pts]

After performing SVD on a dataset with 5 features, you retrieve eigenvalues 6, 5, 4, 3, 2. How many
components should we include to explain at least 75% of the variance of the dataset?
(Hint: we choose k explaining 99% of the variance PCA)

Problem 13 [4 pts]

Consider a recommender system using collaborative filtering. The movie rating matrix is as follows:

(1) [2 pts] Predict (Movie1, Movie3, Movie4) ratings of User A.

(2) [2 pts] Predict (Movie1, Movie2, Movie3, Movie4) ratings of a new User E using mean
normalization.

以上是关于A Exam:Machine Learning with Python的主要内容,如果未能解决你的问题,请参考以下文章

Machine Learning - Octave环境的安装

CS224W摘要01.Introduction; Machine Learning for Graphs

Machine Learning In Action

Machine Learning学习目录

Python Machine Learning

COMPSCI 361 Machine Learning 重点解析