最新的“apache mahout”库中是不是有“clusterdump”的任何 seqFileDir 选项？

Posted 2023-03-12

技术标签:

【中文标题】最新的“apache mahout”库中是不是有“clusterdump”的任何 seqFileDir 选项？【英文标题】：is there any seqFileDir option for "clusterdump" in the latest "apache mahout" library?最新的“apache mahout”库中是否有“clusterdump”的任何 seqFileDir 选项？ 【发布时间】：2012-06-24 06:41:40 【问题描述】：

我正在尝试对 mahout kmeans 聚类示例（synthetic_control 示例）。但我遇到以下错误：

> ~/MAHOUT/trunk/bin/mahout clusterdump --seqFileDir clusters-10-final --pointsDir clusteredPoints --output a1.txt

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/lib/hadoop/conf/
MAHOUT-JOB: /home/<username>/MAHOUT/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar

12/06/21 22:43:18 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

12/06/21 22:43:25 ERROR common.AbstractJob: Unexpected --seqFileDir while processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
.....

所以我猜 clusterdump 没有“seqFileDir”选项，但所有在线教程（例如https://cwiki.apache.org/MAHOUT/cluster-dumper.html）都参考了这个选项。你能建议我解决方法或我缺少什么吗？

【问题讨论】：

【参考方案1】：

您是否尝试将其指定为--input 选项？

【讨论】：

是的，我做到了。当我用--input替换--seqFileDir时，生成了一个输出文件，但那是EMPTY！你知道为什么教程谈论 --seqFileDir 选项吗？我正在解决这个问题。神奇地我找到了解决方案！感谢您的建议 --input 代替 --seqFileDir 选项。我做错的是，我没有意识到 clusterdump（设置了 HADOOP_HOME）从 HDFS 读取并将输出写入本地文件系统。无论如何，现在一切正常！

以上是关于最新的“apache mahout”库中是不是有“clusterdump”的任何 seqFileDir 选项？的主要内容，如果未能解决你的问题，请参考以下文章

Apache Mahout之协同过滤原理与实践

Apache Mahout 性能问题

Mahout 0.9：使用自己的测试集而不是使用拆分命令

Apache Mahout 和 Apache Spark 的 MLlib 有啥区别？

0.6 中缺少 Apache Mahout Math VectorWritable？

在 Java 的 Apache Mahout 中测试随机梯度下降模型