Distributed configure (hadoop 2.7.2 & spark 2.1.0)

Posted 杨铖

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Distributed configure (hadoop 2.7.2 & spark 2.1.0)相关的知识,希望对你有一定的参考价值。

Distributed configure (hadoop 2.7.2 & spark 2.1.0)

1. environment

Hadoop 2.7.2
spark 2.1.0
scala 2.11.8
sbt 0.13.15
java 1.8
maven 3.3.9
protobuf 2.5.0
findbugs 2.0.2

2. configure details

2.1 download the required software

  1. download the hadoop source code from https://dist.apache.org/repos/dist/release/hadoop/common/
  2. download the spark source code from http://spark.apache.org/downloads.html
  3. download scala from http://www.scala-lang.org/download/all.html
  4. download sbt from http://www.scala-sbt.org/download.html
  5. download java from http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
  6. download maven from http://maven.apache.org/download.cgi
  7. download protobuf from https://github.com/google/protobuf/tree/master/src
  8. download findbugs from https://sourceforge.net/projects/findbugs/?source=typ_redirect

2.1.1 configure the requirements

2.1.1.1 java8 environment

First need to remove the java environment on the system in present.

# see the all the java environment
rpm -qa | grep java
# then remove it by do this
rpm -e --nodeps XXXXX   # XXXXX is what you see when type 'rpm -qa | grep java'

# upload the JDK8 'jdk-8u131-linux-x64.tar.gz' which can be download from oracle official website
tar -zxvf jdk-8u131-linux-x64.tar.gz
vim /etc/profile
# add this code into the file.
JAVA_HOME=/usr/local/java/jdk1.8.0_131             # be care of the path.
JRE_HOME=JAVA_HOME/jre
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH JAVA_HOME CLASSPATH

source /etc/profile

Type java -version and javac in the console, if you see these message, that means java is installed successfully!

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

Usage: javac <options> <source files>
where possible options include:
  -g                         Generate all debugging info
  -g:none                    Generate no debugging info
  -g:lines,vars,source     Generate only some debugging info
  -nowarn                    Generate no warnings
  -verbose                   Output messages about what the compiler is doing
  -deprecation               Output source locations where deprecated APIs are used
  -classpath <path>          Specify where to find user class files and annotation processors
  -cp <path>                 Specify where to find user class files and annotation processors
  -sourcepath <path>         Specify where to find input source files
  -bootclasspath <path>      Override location of bootstrap class files
  -extdirs <dirs>            Override location of installed extensions
  -endorseddirs <dirs>       Override location of endorsed standards path
​````````````
2.1.1.2 scala-2.11.8 environment
tar -zxvf scala-2.11.8.tgz

vim /etc/profile
# add the following code into the file.
export SCALA_HOME=/usr/local/scala/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin

source /etc/profile

Type scala -version in the console, if it appears these messages, that means scala is installed successfully!.

Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
2.1.1.3 sbt-0.13.15 environment
curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
yum install sbt
sbt

Type sbt sbt-version in the console, if it appears these messages, that means sbt is installed successfully!.

[info] Set current project to sbt (in build file:/usr/local/sbt/)
[info] 0.13.15
2.1.1.4 maven-3.3.9 environment
tar -zxvf apache-maven-3.3.9-bin.tar.gz
# add the following code into the file.
export MAVEN_HOME=/usr/local/maven/apache-maven-3.3.9
export PATH=$PATH:$MAVEN_HOME/bin

source /etc/profile

Type mvn -v in the console, if it appears these messages, that means maven is installed successfully!.

Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T11:41:47-05:00)
Maven home: /usr/local/maven/apache-maven-3.3.9
Java version: 1.8.0_131, vendor: Oracle Corporation
Java home: /usr/local/java/jdk1.8.0_131/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-573.el6.x86_64", arch: "amd64", family: "unix"

2.2 configure the distributed system

2.2.1 configure Hadoop

2.2.1.1 compile Hadoop

first, uncompress hadoop-2.7.2-src.tar.gz.

tar -zxvf hadoop-2.7.2-src.tar.gz

second, download and install maven, protobufs.

tar -zxvf apache-maven-3.3.9-bin.tar.gz
cd /apache-maven-3.3.9
vim /etc/profile
export MAVEN_HOME=/your directory/apache-maven-3.3.9
export PATH=.:$PATH:$JAVA_HOME/bin:$MAVEN_HOME/bin
source /etc/profile
ln -s /your directory/apache-maven-3.5.0/bin/mvn /usr/bin/mvn

tar -zxvf protobuf-2.5.0.tar.gz
cd /protobuf-2.5.0
./configure --prefix=/your director/protobuf-2.5.0
make
make install
vim /etc/profile
# add the following code into the file.
export PATH=$PATH:/usr/local/protobuf/protobuf-2.5.0/bin/
export PKG_CONFIG_PATH=/usr/local/protobuf/protobuf-2.5.0/lib/pkgconfig/

source /etc/profile

third, dowload hadoop library using maven and then compile it.

cd hadoop-2.7.2-src
mvn clean package -Pdist,native -DskipTests -Dtar 

​ when compile hadoop source code, there may appear such problem:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (dist) on project hadoop-dist: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...<exec failonerror="true" dir="/usr/local/hadoop/hadoop-2.7.2-src/hadoop-dist/target" executable="sh">... @ 38:104 in /usr/local/hadoop/hadoop-2.7.2-src/hadoop-dist/target/antrun/build-main.xml

​ That you need to download four dependencies:

  1. cmake : yum install cmake(in centos)

  2. findbugs : https://sourceforge.net/projects/findbugs/?source=typ_redirect dowload and uncompress it and then

    yum install ant
    unzip findbugs-2.0.2-source.zip
    cd /your findbugs directory
    ant
    export FINDBUGS_HOME=/usr/local/findbugs/findbugs-2.0.2
  3. openssl-dev : yum install openssl-devel(in centos)

  4. zlib-dev : yum install zlib-devel(in centos)

finally, when it appears the following messages, that means hadoop compiled successfully!

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 21:45 min
[INFO] Finished at: 2017-04-24T19:20:15+08:00
[INFO] Final Memory: 119M/419M
[INFO] ------------------------------------------------------------------------

Now we can get the compiled code from:

/hadoop-2.7.2-src/hadoop-dist/target/hadoop-2.7.2.tar.gz
2.2.1.2 configure Hadoop in Standalone Operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. And we can test it right now:

mkdir distributed_input
cp /your directory/hadoop-/hadoop-2.7.2/etc/hadoop/*.xml distributed_input
./your directory/hadoop-2.7.2/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep /usr/local/distributed_input /usr/local/distributed_output 'dfs[a-z.]+'
cat distributed_output/*

After a few seconds, the console will appear these messages:

    File System Counters
        FILE: Number of bytes read=1153568
        FILE: Number of bytes written=2210810
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=1
        Map output records=1
        Map output bytes=17
        Map output materialized bytes=25
        Input split bytes=134
        Combine input records=0
        Combine output records=0
        Reduce input groups=1
        Reduce shuffle bytes=25
        Reduce input records=1
        Reduce output records=1
        Spilled Records=2
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=638582784
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=123
    File Output Format Counters 
        Bytes Written=23

That means we have run the example successfully!

In order to invoke hadoop command, we can configure the profile by do this:

export HADOOP_HOME=/your directory/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bi

source /etc/profile
2.2.1.3 configure Hadoop in Pseudo-Distributed Operation

Pseudo-Distributed is also a single-node but it runs in multiple separate Java processes.

There are several files need to be modified:

hosts ssh network ifcfg-eth0 resolv.conf

# modify the hosts file.
vim /etc/hosts
# add the master and worker's ip and hostname
# here is my example.
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.159.132 master
192.168.159.134 worker2
192.168.159.133 worker3

# modify the hostname.
vim /etc/sysconfig/network
# add the proper code, here is my example.
NETWORKING=yes
HOSTNAME=master # if it is worker then the HOSTNAME=worker'name

# restart the network
service network restart

# modify the ssh key, because we need every machine can connect each other without password.
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
# above should be done in each machine, then copy the authorized_key to the master and finally send it to all worker.
# log in worker2 and then:
ssh-copy-id -i master # copy the authorized_key to the master from worker2
# log in worker3 and then:
ssh-copy-id -i master # copy the authorized_key to the master from worker3
scp /root/.ssh/authorized_keys worker2:/root/.ssh/ # send the authorized_key to worker2
scp /root/.ssh/authorized_keys worker3:/root/.ssh/ # send the authorized_key to worker3

# make the directory for hadoop files/hdfs/logs and so on, the directory tree just like:
distribute_data
├── hadoop
│   ├── data
│   ├── hdfs
│   ├── logs
│   ├── name
│   └── temp
└── spark


#################################################################################################### belows don't need to do.
###################################################################################################
# modify the network adapter.
vim /etc/sysconfig/network-scripts/ifcfg-eth0
# add the following code.
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System eth0"
HWADDR=00:02:c9:03:00:31:78:f2
PEERDNS=yes
PEERROUTES=yes
IPADDR=192.168.159.130
NETMASK=255.255.255.0
GATEWAY=192.168.159.2
DNS1=100.100.100.1

# modify the DNS
# vim /etc/resolv.conf
# add the following code(It depends on your system and network).
nameserver 100.100.100.1
nameserver 114.114.114.114
nameserver 8.8.8.8

etc/hadoop/hadoop-evn.sh

# modify the JAVA_HOME variable.
export JAVA_HOME=/your JAVA_HOME directory.
# here is mine.
# export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64

etc/slaves

# add all the hostname of the worker machine to the slaves file.
worker2
worker3

etc/hadoop/core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:9000</value>
        </property>
        <!-- Size of read/write buffer used in SequenceFiles. -->
        <property>
             <name>io.file.buffer.size</name>
             <value>131072</value>
       </property>
        <!-- hadoop temp directory, it depends on you -->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/usr/local/distribute_data/hadoop/temp</value>
        </property>
</configuration>

etc/hadoop/hdfs-site.xml

<configuration>
<property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>master:50090</value>
    </property>
    <property>
      <name>dfs.replication</name>
      <value>2</value>
    </property>
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/usr/local/distribute_data/hadoop/hdfs/name</value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/usr/local/distribute_data/hadoop/hdfs/data</value>
    </property>
</configuration>

etc/hadoop/mapred-site.xml

<configuration>
 <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.address</name>
          <value>master:10020</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.address</name>
          <value>master:19888</value>
  </property>
</configuration>

etc/hadoop/yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
     <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
     </property>
     <property>
           <name>yarn.resourcemanager.address</name>
           <value>master:8032</value>
     </property>
     <property>
          <name>yarn.resourcemanager.scheduler.address</name>
          <value>master:8030</value>
      </property>
     <property>
         <name>yarn.resourcemanager.resource-tracker.address</name>
         <value>master:8031</value>
     </property>
     <property>
         <name>yarn.resourcemanager.admin.address</name>
         <value>master:8033</value>
     </property>
     <property>
         <name>yarn.resourcemanager.webapp.address</name>
         <value>master:8088</value>
     </property>
</configuration>

After modified the slaves / core-site.xml / hdfs-site.xml / mapred-site.xml / yarn-site.xml files in master, then copy them to the workers, like that:

scp core-site.xml worker2:/usr/local/hadoop/hadoop-2.7.2/etc/hadoop/ 
scp core-site.xml worker3:/usr/local/hadoop/hadoop-2.7.2/etc/hadoop/  
scp hdfs-site.xml worker2:/usr/local/hadoop/hadoop-2.7.2/etc/hadoop/  
scp hdfs-site.xml worker3:/usr/local/hadoop/hadoop-2.7.2/etc/hadoop/   
scp yarn-site.xml worker2:/usr/local/hadoop/hadoop-2.7.2/etc/hadoop/
scp yarn-site.xml worker3:/usr/local/hadoop/hadoop-2.7.2/etc/hadoop/
scp mapred-site.xml.template worker2:/usr/local/hadoop/hadoop-2.7.2/etc/hadoop/
scp mapred-site.xml.template worker3:/usr/local/hadoop/hadoop-2.7.2/etc/hadoop/

Finally, format the hdfs file system and start the distributed.

cd /your directory/hadoop-2.7.2
./bin/hdfs namenode -format

if it comes up these messages, that means it format the file successfully!

``````
17/04/26 16:02:20 INFO common.Storage: Storage directory /usr/local/distribute_data/hadoop/hdfs/name has been successfully formatted.
17/04/26 16:02:20 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/04/26 16:02:20 INFO util.ExitUtil: Exiting with status 0
17/04/26 16:02:20 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.159.132
************************************************************/

and start the distributed

./sbin/start-all.sh  # actually the hadoop team recommend to use start-dfs.sh and start-yarn.sh

if it comes up these messages, that means it started successfully!

Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-master.out
localhost: starting datanode, logging to /usr/local/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-master.out
worker2: starting datanode, logging to /usr/local/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-worker2.out
worker3: starting datanode, logging to /usr/local/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-worker3.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/hadoop-2.7.2/logs/yarn-root-resourcemanager-master.out
worker2: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-worker2.out
localhost: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-master.out
worker3: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-worker3.out

type jps in the master’s console, we can see:

18544 NodeManager
17540 NameNode
17764 DataNode
18437 ResourceManager
19557 Jps
18092 SecondaryNameNode
2.2.1.4 configure Hadoop in Fully-Distributed Operation

Fully-Distributed is very similar to Pseudo-Distributed, actually the configure step is same to above.

2.2.1.5 Test the calculation of hadoop distributed.
cp /usr/local/hadoop/hadoop-2.7.2/etc/hadoop/*.xml /usr/local/distribute_data/hadoop/data/
/your directory/hadoop-2.7.2/bin/hdfs dfs -mkdir /in
hadoop dfs -put /usr/local/distribute_data/hadoop/data/* /in

2.2.2 configure spark

2.2.2.1 compile spark

First, we need to add the MAVEN_OPTS to the profile, it will help us avoid the heap space error.

vim /etc/profile
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
source /etc/profile

Second, uncompress the spark source code and download spark library using maven and then compile it.

tar -zxvf spark-2.1.0.tgz
cd /spark-2.1.0
# need to declare the hadoop version, same to above version.
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.2 -DskipTests clean package  # wait a minute.

​ if it appears these error message:

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 20.189 s
[INFO] Finished at: 2017-04-27T14:58:14+08:00
[INFO] Final Memory: 41M/211M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on project spark-tags_2.11: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.: CompileFailed -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :spark-tags_2.11

​ make sure you have install the right version of scala-2.11.8 and maven-3.3.9, then reboot the system and type the same command again, actually it will fix the error by reboot.

​ if it appears these messages, that means we have download and compile spark libraries successfully!

[INFO] Spark Project Parent POM ........................... SUCCESS [ 13.441 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 17.570 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 17.121 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 22.711 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 14.905 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 22.380 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 22.645 s]
[INFO] Spark Project Core ................................. SUCCESS [10:59 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [04:20 min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:17 min]
[INFO] Spark Project Streaming ............................ SUCCESS [04:35 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:20 min]
[INFO] Spark Project SQL .................................. SUCCESS [14:59 min]
[INFO] Spark Project ML Library ........................... SUCCESS [09:13 min]
[INFO] Spark Project Tools ................................ SUCCESS [01:12 min]
[INFO] Spark Project Hive ................................. SUCCESS [11:40 min]
[INFO] Spark Project REPL ................................. SUCCESS [03:23 min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [02:26 min]
[INFO] Spark Project YARN ................................. SUCCESS [05:34 min]
[INFO] Spark Project Assembly ............................. SUCCESS [01:25 min]
[INFO] Spark Project External Flume Sink .................. SUCCESS [02:42 min]
[INFO] Spark Project External Flume ....................... SUCCESS [03:03 min]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [ 40.208 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [02:38 min]
[INFO] Spark Project Examples ............................. SUCCESS [08:19 min]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [01:10 min]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [04:44 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [02:35 min]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [03:42 min]
[INFO] Spark Project Java 8 Tests ......................... SUCCESS [06:04 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:58 h
[INFO] Finished at: 2017-04-27T19:44:19+08:00
[INFO] Final Memory: 88M/852M
[INFO] ------------------------------------------------------------------------
2.2.2.2 configure spark distributed in hadoop.

Once spark has been compiled successfully, here will come up with the conf directory, and what we need to do is adding the hostname of every machine into the slaves file and configure the spark-env.sh file.

# add this code into slaves file.
master
worker2
worker3

# add this code into spark-env.sh file
export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=512M
export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-2.7.2/etc/hadoop

And for convenience, we can configure the profile for spark environment.

vim /etc/profile
# add the following code.
export SPARK_HOME=/your directory/spark-2.1.0
export PATH=$PATH:$SPARK_HOME/bin

source /etc/profile

Finally, we should send the compiled spark code to the work machine.

scp -r spark-2.1.0 worker2:/usr/local/spark/
scp -r spark-2.1.0 worker3:/usr/local/spark/

Now we can start spark-2.1.0 in hadoop-2.7.2 and make some test.

cd /your directory/spark-2.1.0
./sbin/start-all.sh

And start the spark-shell, then will appear the following messages:

./bin/spark-shell

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/04/28 14:01:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/28 14:01:14 WARN spark.SparkConf: 
SPARK_WORKER_INSTANCES was detected (set to '1').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --num-executors to specify the number of executors
 - Or set SPARK_EXECUTOR_INSTANCES
 - spark.executor.instances to configure the number of instances in the spark config.

Spark context Web UI available at http://127.0.0.1:4040
Spark context available as 'sc' (master = local[*], app id = local-1493402474643).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\\ \\/ _ \\/ _ `/ __/  '_/
   /___/ .__/\\_,_/_/ /_/\\_\\   version 2.1.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

2.3 develop spark in IDEA(intellij)

2.3.1 con

以上是关于Distributed configure (hadoop 2.7.2 & spark 2.1.0)的主要内容,如果未能解决你的问题,请参考以下文章

解决SQL Server 阻止了对组件 'Ad Hoc Distributed Queries' 的 STATEMENT'OpenRowset/OpenDatasource&#

转载解决SQL Server 阻止了对组件 'Ad Hoc Distributed Queries' 的 STATEMENT'OpenRowset/OpenDatasour(代

SQL Server 启用Ad Hoc Distributed Queries的方法

SQL Server 阻止了对组件 'Ad Hoc Distributed Queries' 的 STATEMENT 'OpenRowset/OpenDatasource

解决SQL Server 阻止了对组件 'Ad Hoc Distributed Queries' 的 STATEMENT 'OpenRowset/OpenDatasource&

解决SQL Server 阻止了对组件 'Ad Hoc Distributed Queries' 的 STATEMENT'OpenRowset/OpenDatasource&#