SpringBoot集成Hadoop

Posted 程序员超时空

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了SpringBoot集成Hadoop相关的知识,希望对你有一定的参考价值。

SpringBoot集成Hadoop,相关配置过程如下。默认在Linux下已经装好Hadoop集群(Hadoop-2.8.5)。

一、集成HDFS

1、主要application.properties配置

#hdfs
hdfs.url=hdfs://192.168.2.5:9000
hdfs.username=root
hdfs.replication=2
hdfs.blocksize=67108864

2、主要pom.xml配置

<dependencies>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-web</artifactId>
		<exclusions>
			<exclusion>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-starter-logging</artifactId>
			</exclusion>
		</exclusions>
	</dependency>
	<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
	<dependency>
		<groupId>org.apache.hadoop</groupId>
		<artifactId>hadoop-common</artifactId>
		<version>2.8.5</version>
	</dependency>
	<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
	<dependency>
		<groupId>org.apache.hadoop</groupId>
		<artifactId>hadoop-hdfs</artifactId>
		<version>2.8.5</version>
	</dependency>
	<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
	<dependency>
		<groupId>org.apache.hadoop</groupId>
		<artifactId>hadoop-client</artifactId>
		<version>2.8.5</version>
	</dependency>
	<!-- https://mvnrepository.com/artifact/com.alibaba/fastjson -->
	<dependency>
		<groupId>com.alibaba</groupId>
		<artifactId>fastjson</artifactId>
		<version>1.2.44</version>
	</dependency>
	<!-- https://mvnrepository.com/artifact/io.springfox/springfox-swagger2 -->
	<dependency>
		<groupId>io.springfox</groupId>
		<artifactId>springfox-swagger2</artifactId>
		<version>2.9.2</version>
	</dependency>
	<!-- https://mvnrepository.com/artifact/io.springfox/springfox-swagger-ui -->
	<dependency>
		<groupId>io.springfox</groupId>
		<artifactId>springfox-swagger-ui</artifactId>
		<version>2.9.2</version>
	</dependency>
	<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
	<dependency>
		<groupId>com.google.guava</groupId>
		<artifactId>guava</artifactId>
		<version>27.0.1-jre</version>
	</dependency>
</dependencies>

完成其他相关配置和代码。

3、Windows下环境变量设置

在Windows下启动程序,请求HDFS报以下错误:

java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

Windows下Java集成HADOOP,需要用到winutils.exe。提示缺少HADOOP_HOME或hadoop.home.dir相关配置。从https://github.com/steveloughran/winutils下载相关库,与本集群hadoop-2.8.5最接近的库为hadoop-2.8.3,下载hadoop-2.8.3到本地磁盘。通过设置Windows环境变量或在程序中设置hadoop.home.dir解决。

(1)设置Windows环境变量

设置Windows环境变量HADOOP_HOME=D:softhadoopwinutilshadoop-2.8.3

(2)程序中设置hadoop.home.dir

在程序初始化Hadoop Configuration之前添加以下代码设置环境变量

//Windows设置hadoop.home.dir
System.setProperty("hadoop.home.dir","D:\\soft\\hadoop\\winutils\\hadoop-2.8.3");

4、Windows主机映射

需要配置本机的hosts文件,添加hadoop集群的主机映射,配置内容和hadoop集群的配置差不多。编辑C:WindowsSystem32driversetchosts文件,添加:

# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
#      102.54.94.97     rhino.acme.com          # source server
#       38.25.63.10     x.acme.com              # x client host

# localhost name resolution is handled within DNS itself.
#	127.0.0.1       localhost
#	::1             localhost
192.168.2.5 hadoop.master

以上是关于SpringBoot集成Hadoop的主要内容,如果未能解决你的问题,请参考以下文章

mongo-hadoop 集成问题

Hadoop Kerberos 集成

spark集群安装并集成到hadoop集群

Elasticsearch:Hadoop 大数据集成 (Hadoop => Elasticsearch)

Elasticsearch:Hadoop 大数据集成 (Hadoop => Elasticsearch)

如何将 presto 集群集成到 hadoop 集群?