(HDFS)部署开发平台Intellij IDEA&&单词统计练习——大数据分析及其可视化5

Posted 2021-12-19 new DFP

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了(HDFS)部署开发平台Intellij IDEA&&单词统计练习——大数据分析及其可视化5相关的知识，希望对你有一定的参考价值。

1.配置Maven工具

左上角点击File-> Settings

选择Maven工具的路径打开到bin文件夹

复制远程仓库的地址：C:\\Users\\Administrator\\.m2

在windows下点开我的电脑复制进去

打开我们的文件确保该文件下有settings.xml 如果没有我们打开maven的

进入conf文件夹

复制这个settings.xml到 C:\\Users\\Administrator\\.m2 文件下

打开配置55行

提示《有些的maven版本不同在mvan的文件夹里有responsitory这个时候我我们在配置55行地址的时候要选择到responsitory文件的地址，我这个3.8.2的版本没有我直接选择到maven的地址》

2.创建项目

左上角 FIle->new project

展开项目的目录结构

在windows下找到hadoop的配置文件导入到resource

（这个hadoop的配置文件是在linux主节点下配置好下载到hadoop的不懂可以看hadoop配置文件的来源）

打开文件夹复制复制已选择的4项

粘贴到IDEA

配置pom.xml 双击打开

配置文件的信息的地址可以参考阅读

添加如下代码注意自己的版本号<version>2.7.3</version>我的是2.7.3

    <dependencies>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.7.3</version>
    </dependency>
    </dependencies>

添加进去后IDEA会弹出提示下载包我们要选择下载不然会报错

3.创建包和类

创建3个java类

Mynain.java

package com.MapTest;
import java.util.Date;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class Mymain 
    public static void main(String[] args) throws Exception 
        // Configuration类：读取Hadoop的配置文件，如 site-core.xml...；
        // 也可用set方法重新设置（会覆盖）：conf.set("fs.default.name", "hdfs://xxxx:9000")
        Configuration conf = new Configuration();
        // 将命令行中参数自动设置到变量conf中
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

        /**
         * 这里必须有输入输出
         */

        if (otherArgs.length != 2) 
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        

        Job job = new Job(conf, "word count"); // 新建一个 job，传入配置信息
        job.setJarByClass(Mymain.class); // 设置 job 的主类
        job.setMapperClass(MyMap.class); // 设置 job 的 Mapper 类
        job.setCombinerClass(MyReduce.class); // 设置 job 的 作业合成类
        job.setReducerClass(MyReduce.class); // 设置 job 的 Reducer 类
        job.setOutputKeyClass(Text.class); // 设置 job 输出数据的关键类
        job.setOutputValueClass(IntWritable.class); // 设置 job 输出值类
        FileInputFormat.addInputPath(job, new Path(otherArgs[0])); // 文件输入
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); // 文件输出
        boolean result = false;
        try 
            result = job.waitForCompletion(true);
         catch (Exception e) 
            e.printStackTrace();
        
        System.out.println(new Date().toGMTString() + (result ? "成功" : "失败"));
        System.exit(result ? 0 : 1); // 等待完成退出

MyMap.java

package com.MapTest;
import java.io.IOException;
import java.util.Date;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
// 创建一个 WordMapper类 继承于 Mapper抽象类
public class MyMap extends Mapper<Object, Text, Text, IntWritable> 
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    // Mapper抽象类的核心方法，三个参数
    public void map(Object key, // 首字符偏移量
                    Text value, // 文件的一行内容
                    Context context) // Mapper端的上下文，与 OutputCollector 和 Reporter 的功能类似
            throws IOException, InterruptedException 
        String[] ars = value.toString().split("['.;,?| \\t\\n\\r\\f]");
        for (String tmp : ars) 
            if (tmp == null || tmp.length() <= 0) 
                continue;
            
            word.set(tmp);
            System.out.println(new Date().toGMTString() + ":" + word + "出现一次，计数+1");
            context.write(word, one);

MyReduce.java

package com.MapTest;
import java.io.IOException;
import java.util.Date;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

// 创建一个 WordReducer类 继承于 Reducer抽象类
public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable>

    private IntWritable result = new IntWritable(); // 用于记录 key 的最终的词频数

    // Reducer抽象类的核心方法，三个参数
    public void reduce(Text key, // Map端 输出的 key 值
                       Iterable<IntWritable> values, // Map端 输出的 Value 集合（相同key的集合）
                       Context context) // Reduce 端的上下文，与 OutputCollector 和 Reporter 的功能类似
            throws IOException, InterruptedException
    
        int sum = 0;
        for (IntWritable val : values) // 遍历 values集合，并把值相加
        
            sum += val.get();
        
        result.set(sum); // 得到最终词频数
        System.out.println(new Date().toGMTString()+":"+key+"出现了"+result);
        context.write(key, result); // 写入结果

3.打包操作

打开操作台

清理包点击clean