Spark-1.6.0集群部署

1. 运行环境

本文搭建的是Spark Standalone模式,在已搭建好的Hadoop2.6.4集群上部署Spark。

版本信息:

  • JDK:1.8.0_74 64位
  • Hadoop:2.6.4
  • Spark:1.6.0
  • Scala:2.12.0-M3

2. 下载Spark

3. 安装Scala

3.1 解压缩

sudo mkdir /home/hadoop/scala
sudo tar -zxf scala-2.12.0-M3.tgz -C /home/hadoop/scala

3.2 修改环境变量

sudo vim ~/.bashrc  
SCALA_HOME=/home/hadoop/scala/scala-2.12.0-M3  
export PATH=$HADOOP_HOME/bin:$SCALA_HOME/bin:$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH

3.3 使配置文件生效

source ~/.bashrc 

3.4 显示版本及测试

scala -version  
scala  
5*5  
:q or CTRL+D 

4. 分发scala到各从节点

scp -r /home/hadoop/scala hadoop@hadoop2:/home/hadoop/
scp -r /home/hadoop/.bashrc hadoop@hadoop2:/home/hadoop/

5. 安装Spark

5.1 解压缩

sudo tar -zxf spark-1.6.0-bin-hadoop2.6.tgz -C /home/hadoop/  

5.2 配置环境变量

sudo vim ~/.bashrc  
export SPARK_HOME=/home/hadoop/spark-1.6-bin-hadoop2.6  
export PATH=$SPARK_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH   

5.3 使配置生效

source ~/.bashrc  

5.4 spark-env.sh

进入spark目录下的conf目录,将spark-env.sh.template拷贝到spark-env.sh,并添加信息:

cp spark-env.sh.template spark-env.sh 

export JAVA_HOME=/home/hadoop/java/jdk1.8.0_74   
export SCALA_HOME=/home/hadoop/scala/scala-2.12.0-M3  
export SPARK_MASTER_IP=hadoop1  
export SPARK_WORKER_MEMORY=2g  
export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.6.4/conf 

5.5 slavers

在slavers文件中添加各个从节点hadoop2,hadoop3…

6. 分发spark到各个从节点

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop2:/home/hadoop/  
scp -r ~/.bashrc hadoop@hadoop2:~/  

7. 启动集群

7.1 启动HDFS

sbin/start-dfs.sh

7.2 启动spark

sbin/start-all.sh

7.3 新进程

使用jps命令之后,hadoop1主节点出现Master进程,其他从节点出现Worker进程

7.4 Spark UI

http://hadoop1:8080

7.5 Spark-Shell

bin/spark-shell

8. 测试

8.1 spark-shell命令行下wordcount测试

1. spark-shell  
2. val file = sc.textFile("hdfs://hadoop1:9000/input/")  
3. val count = file.flatMap(line=>(line.split(" "))).map(word=>(word,1)).reduceByKey(_+_)  
4. count.collect()  
5. count.saveAsTextFile("hdfs://hadoop1:9000/output/wordcount") 

8.2 jar包测试

bin/spark-submit --class WordCount.WordCount --master spark://hadoop1:7077 ~/data/WordCount.jar hdfs://hadoop1:9000/input/ hdfs://hadoop1:9000/output/