准备Mahout所用的向量ApplesToVectors

Posted gccbuaa

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了准备Mahout所用的向量ApplesToVectors相关的知识,希望对你有一定的参考价值。

<strong><span style="font-size:18px;">/***
 * @author YangXin
 * @info 准备Mahout所用的向量
 * 将苹果的信息转化为输入的向量
 */
package unitEight;

import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;

/**
 * 我们能够使用向量的名字或描写叙述为键,此处为NameVector,而向量本身作为值。

* Mahout的Vector类没有实现Writable接口。以避免他们和Hadoop直接耦合。

* 但能够用VectorWritable类来封装一个Vector并使之为Writable。

* 即Mahout中的向量能够使用VectorWritable类写入SequenceFile。

*/ public class ApplesToVectors { public static void main(String[] args) throws Exception{ List<NamedVector> apples = new ArrayList<NamedVector>(); NamedVector apple; apple = new NamedVector(new DenseVector(new double[]{0.11, 510, 1}), "Small round green apple"); apples.add(apple); apple = new NamedVector(new DenseVector(new double[]{0.23, 650, 3}), "Large oval red apple"); apples.add(apple); apple = new NamedVector(new DenseVector(new double[]{0.09, 630, 1}), "Small elongated red apple"); apples.add(apple); apple = new NamedVector(new DenseVector(new double[]{0.25, 590, 3}), "Large round yellow apple"); apples.add(apple); apple = new NamedVector(new DenseVector(new double[]{0.18, 520, 2}), "Medium oval green apple"); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path path = new Path("E:\\apples.txt"); SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, Text.class, VectorWritable.class); VectorWritable vec = new VectorWritable(); for(NamedVector vector:apples){ vec.set(vector); writer.append(new Text(vector.getName()), vec); } writer.close(); SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("E:\\apples.txt"), conf); Text key = new Text(); VectorWritable value = new VectorWritable(); while(reader.next(key, value)){ System.out.println(key.toString() + " " + value.get().asFormatString());; } reader.close(); } } </span></strong>



以上是关于准备Mahout所用的向量ApplesToVectors的主要内容,如果未能解决你的问题,请参考以下文章

Mahout - 聚类的向量

Mahout 聚类:使用 seqdumper 检索命名向量的名称时出错

Apache Mahout 中的矢量化

如何将 Mahout KMeans 集群集成到应用程序中?

如何从 Tf-IDF 向量中选择 K-mean 的初始聚类

mahout lucene 文档聚类howto?