Java之Hadoop API序列化和反序列化

Posted pblh123

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Java之Hadoop API序列化和反序列化相关的知识,希望对你有一定的参考价值。

CSDN话题挑战赛第2期
参赛话题:大数据学习成长记录

学习目标:

  1. 熟悉Java开发
  2. 掌握Java序列化和反序列化
  3. Hadoop的mapreduce序列化

软件版本信息

工具名称说明
VMware-workstation-full-15.5.1-15018445.exe虚拟机安装包
MobaXterm_Portable_v20.3.zip解压使用,远程连接Centos系统远程访问使用,支持登录和上传文件
CentOS-7-x86_64-DVD-1511.isoCentos7系统ISO镜像,不需要解压,VMware安装时需要
jdk-8u171-linuxx64.tar.gzjdk安装包,上传到Centos系统中使用
hadoop-2.7.3.tar.gzhadoop的安装包,需要上传到虚拟机中

前言/背景

Hadoop学习时,数据的传输通常使用序列化的格式,序列化后的数据传输在网络上更便利,存储上能解压更多空间,其次,在进程间传输时,序列化的字节流更便利。

IDEA-community开发Java

掌握通过IDEA开发Java程序

1,创建新的Maven项目

根据提示完成项目创建

2,配置pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>cn.com.lh</groupId>
    <artifactId>GUNBigdata</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
    </properties>

    <!-- 项目依赖 -->
    <dependencies>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>

        <dependency>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
            <version>1.2</version>
        </dependency>

		<dependency>  
		    <groupId>org.apache.hadoop</groupId>  
		    <artifactId>hadoop-mapreduce-client-core</artifactId>  
		    <version>2.7.3</version>  
		</dependency>  
		<dependency>  
		    <groupId>org.apache.hadoop</groupId>  
		    <artifactId>hadoop-mapreduce-client-common</artifactId>  
		    <version>2.7.3</version>  
		</dependency>


    </dependencies>


    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                    <!--    <verbal>true</verbal>-->
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <minimizeJar>true</minimizeJar>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

3,HadoopPerson类编写

package hdfs;  
  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.io.Writable;  
  
import java.io.DataInput;  
import java.io.DataOutput;  
import java.io.IOException;  
  
/**  
 * @Classname HadoopPerson  
 * @Description TODO  
 * @Date 2022/9/7 21:50  
 * @Created by Tiger_Li  
 */public class HadoopPerson implements Writable   
  
    private IntWritable id;  
    private Text name;  
    private IntWritable age;  
    private Text phone;  
  
    /**  
     * 无参构造函数  
     * 必须对属性进行初始化,否则在反序列化时会报出属性为null的错误  
     */  
    public HadoopPerson()  
        this.id = new IntWritable(0);  
        this.name = new Text("");  
        this.age = new IntWritable(0);  
        this.phone = new Text("");  
      
  
    //    有参的构造函数  
    public HadoopPerson(int id, String name, int age, String phone)  
        this.id = new IntWritable(id);  
        this.name = new Text(name);  
        this.age = new IntWritable(age);  
        this.phone = new Text(phone);  
      
  
    //    序列化:将当前对象的属性写入到输出流中  
    public void write(DataOutput out) throws IOException   
        this.id.write(out);  
        this.name.write(out);  
        this.age.write(out);  
        this.phone.write(out);  
      
  
    //    反序列化:读取输入流中的数据给当前的对象赋值  
    public void readFields(DataInput in) throws IOException   
        this.id.readFields(in);  
        this.name.readFields(in);  
        this.age.readFields(in);  
        this.phone.readFields(in);  
      
  
    @Override  
    public String toString()   
        return "HadoopPerson" +  
                "id=" + id +  
                ", name=" + name +  
                ", age=" + age +  
                ", phone=" + phone +  
                '';  
      
  
    public IntWritable getAge()   
        return age;  
      
  
    public void setAge(IntWritable age)   
        this.age = age;  
      
  
    public Text getName()   
        return name;  
      
  
    public void setName(Text name)   
        this.name = name;  
      
  
    public IntWritable getId()   
        return id;  
      
  
    public void setId(IntWritable id)   
        this.id = id;  
      
  
    public Text getPhone()   
        return phone;  
      
  
    public void setPhone(Text phone)   
        this.phone = phone;  
      
  
  
  

注意:有参构建函数、与无参构造函数的区别

4,本地文件系统中,序列化与反序列化测试类 TestSeriMain

package hdfs;  
  
import java.io.*;  
  
/**  
 * @Classname TestSeriMain  
 * @Description 本地文件系统中,序列化与反序列化测试类  
 * @Date 2022/9/7 16:35  
 * @Created by Tiger_Li  
 */public class TestSeriMain   
  
    public static void main(String[] args) throws IOException   
        ser();  
        deSer();  
      
  
  
    //序列化:将HadoopPerson对象写入到本地文件中  
    public static void ser() throws IOException   
//        1. 创建本地File对象,用于存储HadoopPerson对象  
        File file = new File("D:\\\\pblh123\\\\2022\\\\part-r-00001");  
//        2. 创建文件输出流对象,用于接收DataOutputStream的字节流,并将其写入到文件中  
        FileOutputStream outputstr = new FileOutputStream(file);  
//        3. 创建DataOutputStream对象,用于接收HadoopPerson序列化后的字节流,并传递给FileOutputStream  
        DataOutputStream dataoutputstr = new DataOutputStream(outputstr);  
//        4. 利用有参结构函数创建hadoopPerson对象  
        HadoopPerson aperson = new HadoopPerson(500,"李先生", 31,"15210369999");  
//        5. 调用person对象的write方法实现将其序列化后的字节流写入dataOutputStream对象中  
        aperson.write(dataoutputstr);  
//        6. 关闭数据流  
        dataoutputstr.close();  
//        7. 打印提升信息  
        System.out.println("一条信息序列化成功~");  
      
  
    //反序列化:读取本地文件person.txt中的字节流序列将其转成HadoopPerson对象  
    public static void deSer() throws IOException   
//        1. 指定本地File对象  
        File file = new File("D:\\\\pblh123\\\\2022\\\\part-r-00001");  
//        2. 创建文件输入流对象,用于读取文件内容  
        FileInputStream inputstr = new FileInputStream(file);  
//        3. 创建DataInputStream对象,接收本地文件的字节流,并传输给HadoopPerson对象进行赋值  
        DataInputStream datainputstr = new DataInputStream(inputstr);  
//        4. 利用无惨构建函数创建HadoopPerson对象,调用readFields方法将内容赋值给自己  
        HadoopPerson person = new HadoopPerson();  
        person.readFields(datainputstr);  
//        5. 关闭数据流  
        datainputstr.close();  
        inputstr.close();  
//        6. 打印提示信息  
        System.out.println("反序列化成功:" + person);  
      
  


5,HadoopPersonMapper Map阶段序列化代码开发

package hdfs;  
  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.io.LongWritable;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Mapper;  
  
import java.io.IOException;  
  
/**  
 * @Classname hdfsserdeser  
 * @Description MapReduce中的Mapper来实现Hadoop序列化测试  
 * @Date 2022/9/7 21:46  
 * @Created by Tiger_Li  
 */public class HadoopPersonMapper extends Mapper<LongWritable, Text, IntWritable,HadoopPerson>   
  
    /**  
     * 通过编写MapReduce中的Mapper来实现Hadoop序列化测试  
     * 思路:  
     * 1、创建实现类,实现Writable接口  
     * 2、在实现类中定义成员变量(成员变量的类型都是Hadoop的数据类型)  
     * 3、设置get/set方法、无参构造方法、有参构造方法、重写write和readFields方法  
     * 4、重写toString方法  
     * 5、编写HadoopPersonMapper类,继承Mapper类(区别在这里)  
     * 6、编写Driver类,设置Mapper及其输入输出(区别在这里)  
     */  
  
    @Override  
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, IntWritable, HadoopPerson>.Context context) throws IOException, InterruptedException   
        String string = value.toString();  
        String[] split = string.split(",");  
        HadoopPerson person = new HadoopPerson(Integer.parseInt(split[0]),split[1],Integer.parseInt(split[2]),split[3]);  
        context.write(person.getId(),person);  
  
  
      

6,HadoopPersonMDriver MapReduce代码开发

package hdfs;  
  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.mapreduce.Job;  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
  
/**  
 * @Classname HadoopPersonMDriver  
 * @Description Drive类  
 * @Date 2022/9/7 23:12  
 * @Created by Tiger_Li  
 */public class HadoopPersonMDriver   
  
    public static void main(String[] args) throws Exception   
        Job job = Job.getInstance(new Configuration());  
        job.setMapperClass(hdfs.HadoopPersonMapper.class);  
        job.setMapOutputKeyClass(IntWritable.class);  
        job.setMapOutputValueClass(HadoopPerson.class);  
        job.setOutputKeyClass(IntWritable.class);  
        job.setOutputValueClass(HadoopPerson.class);  
//        读取文件的格式要对,纯数字不能有titile  
        FileInputFormat.setInputPaths(job, new Path("D:\\\\pblh123\\\\2022\\\\javademo1.txt"));  
//        目标文件夹不能存在提前存在会报错  
        FileOutputFormat.setOutputPath(job, new Path("D:\\\\pblh123\\\\2022\\\\hdfspersonser"));  
        boolean completed = job.waitForCompletion(true);  
        if (completed)  
            System.out.println("文件序列化运行成功");  
        else  
            System.out.println("文件序列化运行失败");  
      

javademo1.txt内容如下

1,suben1,30,18275128970
2,suben2,32,18275128972
3,李四,33,18275128973
4,布丁,34,18275128974
5,张三,35,18275128975

个人经验总结

注意理解序列化与反序列化,注意MapReduce中序列化操作与Java本地的序列化操作差别。

以上是关于Java之Hadoop API序列化和反序列化的主要内容,如果未能解决你的问题,请参考以下文章

[转]Java 之 Serializable 序列化和反序列化的概念,作用的通俗易懂的解释

Java 之 Serializable 序列化和反序列化的概念,作用的通俗易懂的解释

java 序列化和反序列化的实现原理

java--序列化和反序列化

Hadoop MapReduce编程 API入门系列之mr编程快捷键活用技巧详解

unity Restsharp Api Json 序列化和反序列化