CS5783: Machine Learning

Posted 2020-11-22 jahjava

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了CS5783: Machine Learning相关的知识，希望对你有一定的参考价值。

CS5783: Machine Learning
Assignment 4
1 Gaussian process regression
Use the crash test dataset from assignment 3 again. In order to make numerical instability less of
an issue, scale the x and t values of your dataset to values between 0 and 1, i.e. normalize each
value by dividing it by the maximum value in that dimension.
We will be generating Gaussian processes using two different kernels:
• Squared exponential: k(x, x0) = exp{−(x−x0)22σ2 }
• Exponential: k(x, x0) = exp{−|x−x0|σ}
For each of these kernel families, construct your Gram matrix K and add diagonal noise to form
C. In the last assignment, we estimated the β precision parameter for the noise as 0.0025 (because
we eyeballed the standard deviation σ = 20, and β =1σ2 ). If you scale σ by the same magnitude
as you scaled all of the t values, you can compute the appropriate β for C.
You can now use C, t and the kernel function distances between x
∗ and each x to predict y∗
values at x∗. First, figure out an appropriate order of magnitude for the σ parameter (this is the σ
parameter for the kernels, not the standard deviation of the noise, as in the previous paragraph!).
Look at the output of your Gaussian process (perhaps by plotting using evenly-spaced x values)
and look for values that seem to be relatively well-behaved (poorly chosen ones might look nothing
like the data, or might crash your evaluator).
Once you have found a reasonable value of σ, perform five-fold cross-validation on 100 values
of σ of the same order of magnitude as your rough calculation found, computing average MSE and
determining a best-fit hyperparameter value.
For each of the kernel functions, plot the training data and the output of the Gaussian process
with the best-fit hyperparameter (by plotting 100 evenly spaced x values and their corresponding
GP outputs).
2 K-means clustering
Use the MNIST test set rather than the training set, simply because 10000 examples will be a
代做CS5783留学生作业、代写Machine Learning作业
little easier to work with then 60000, and we’re doing unsupervised learning anyhow. We wish to
minimize the K-means objective function
J(z, µ) = PNn=1
PK
k=1 znk||xn − µk||2,
where znk is 1 if example n is in cluster k and 0 otherwise.
1
Implement a K-means algorithm function that takes a value for the number of clusters to be
found (K), a set of training examples and a K-dimensional vector µ0k
that serves as an initial
mean vector. This function should return the n-dimensional cluster assignment (presumably as an
n × k one-hot matrix, since that is most convenient), as well as the converged µk vector. At each
iteration, print a dot as a progress indicator. Once J has converged, print out its value, as well as
the number of iterations it took.
Run your algorithm with K=10 (the true number of clusters) on the following intializations µ0k:
1. Ten data points chosen uniformly at random
2. Ten data points found using the K-means++ assignment algorithm
3. A data point drawn from each labeled class (found by looking at the test set labels – and yes,
this is cheating)
Visualize the 28×28-pixel images corresponding to each cluster mean found by your algorithm,
for each of these initializations.
Cluster the data using K=3, initialized using K-means++. Plot the cluster mean images and
a few randomly chosen representatives from the data for each class.
3 Hidden Markov Models
Construct a state machine that mimics the “occasionally dishonest casino” used as an example in
lecture. This machine has two states, “Loaded” and “Fair”. When in the “Fair” state, it outputs a
value between 1 and 6, chosen uniformly at random. When in the “Loaded” state, it also outputs
a value between 1 and 6, but this time the odds of emitting 1-5 are 1
10 each, while the odds of
emitting a 6 are 5
10 . This can be represented in a table:
p(xt|zt) =xt zt = F zt = L
1 0.16667 0.1
2 0.16667 0.1
3 0.16666 0.1
4 0.16667 0.1
5 0.16667 0.1
6 0.16666 0.5
Furthermore, the transition matrix A between hidden variables is the following:
p(zt|zt−1) =zt zt−1 = F zt−1 = L
F 0.95 0.10
L 0.05 0.90
The process should start in the “Fair” state. Capture the output of this process for 1000 steps
in a vector x, and record the true state of the hidden variable z for each step, as well.
Use the forward-backward algorithm on your vector of outputs, as well as the true probabilities
contained in the transition and emission matrices, to construct the MAP estimate of the state
distribution at each time point. Produce two plots of the estimate of ˆz of the probability of a
loaded die at time t, compared to the actual state which you saved when you generated the process
in the first place. In other words, one line on the graph will be a probability somewhere between 0
and 1, while the other will be a step function that transitions between exactly 0 and exactly 1. One
2
of your plots should be your estimate after performing your forward pass but before computing
the backward pass, and the other should be your estimate of ˆz when the entire inference process is
complete.
4 Turning in
Your code must run on Prof. Crick’s Python3 interpreter. He has the numpy, matplotlib and scipy
libraries installed, as well as the standard Python libraries such as random and math. You should
not need any others to do this assignment, and if you use any others, he will not be able to execute
it.
You must have a file named ’assn4.py’, and within it, three functions named ’problem1()’,
’problem2()’, and ’problem3()’.
You may have any number of .py files in your submission, which your assn4.py will import as
necessary. You do not have to include ’crash.txt’ or ’t10k-images-idx3-ubyte’ with your submission,
but you should assume that I will put files with those names into the working directory along with
your code.
If I execute the Python commands below, I am expecting to see something like the following.
Note that your program’s output should be qualitatively similar, but will not likely be identical,
since both you and the random number generator will make different choices than I did.
>>> import assn4
>>> assn4.problem1()
Squared Exponential
Best sigma = 0.11
See plot.
Exponential
Best sigma = 0.15
See plot.
>>> assn4.problem2()
Random initialization
................................................................

因为专业，所以值得信赖。如有需要，请加QQ：99515681 或微信：codehelp

以上是关于CS5783: Machine Learning的主要内容，如果未能解决你的问题，请参考以下文章

HDU 5783 Divide the Sequence（数列划分）

HDU_5783_DivideTheSequence

贪心HDU 5783 Divide the Sequence

HDU 5783 Divide the Sequence

CS224W摘要01.Introduction； Machine Learning for Graphs

CS224W摘要02.Traditional Methods for Machine Learning in Graphs