Building A Raspberry Pi Cluster with Apache Spark

See slides from our talk at Rockville Raspberry Pi Jam 2017: Cluser Computing with Raspberry Pi

 

Figure out or set the IP Addresses of all nodes on your network.

[Cool trick for seeing local network Raspberry Pi’s]

sudo apt-get install nmap sudo nmap -sP 192.168.1.0/24 | awk '/^Nmap/{ip=$NF}/B8:27:EB/{print ip}'

Getting your current IP

hostname -I

 

[Get SSH set up into all Devices]

 sudo raspi-config
 ->Advanced Options
 ->SSH
 Enable -> Yes

On master I assume you are using the user Pi, but if not, make sure all Pi’s have the same user with the same password

ssh-keygen -t rsa -P ""
 ->Enter file in which to save the key (/home/pi/.ssh/id_rsa): [Enter]

 

[Generate a Host Addition File]

 apt-get install vim
 vim hosts_addition

Inside vim
press i to enter insert mode

192.168.1.2 master
192.168.1.3 slave01
192.168.1.4 slave02

press ESC to exit insert mode
press :wq
to save

Make a copy of your hosts file:

cp /ect/hosts ~/hosts.bak

sudo apt-get update
sudo apt-get upgrade

[Install Master Dependencies]
install the prerequisites:

 sudo apt-get install scala
 sudo apt-get install oracle-java8-jdk

 

[Install Master Spark]
Note past versions of Spark needed the oracle-java7-jdk, however in the new verion this will throw an error: [Image]

Download the latest version of Spark:
http://spark.apache.org/downloads.html

wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz

 

Untar Tarball:

tar xzf spark-2.2.0-bin-hadoop2.7.tgz

Edit .bashsrc (in your users home directory)
Add this to the end:

export JAVA_HOME=<path-of-Java-installation> (eg: /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/)
export SPARK_HOME=<path-to-the-root-of-your-spark-installation> (eg: /home/pi/spark-2.2.0-bin-hadoop2.7/)
export PATH=$PATH:/home/pi/spark-2.2.0-bin-hadoop2.7/bin

Reload .bashsrc

source ~/.bashrc

Test your variables by typing in the command line:

 $JAVA_HOME
 $SPARK_HOME

 

 

[Configure Master Spark]

cd ~/spark-2.2.0-bin-hadoop2.7/conf

sudo cp spark-env.sh.template spark-env.sh
sudo vim spark-env.sh

export SPARK_WORKER_CORES="2"
export SPARK_WORKER_MEMORY="512m"
export SPARK_MASTER_HOST="192.168.1.2"

:wq

Note that the variable in older versions was called SPARK_MASTER_IP

RP1 = 256m
RP2/RP3 = 512m

sudo vim slaves

press i

Enter your slaves, one per line:

slave01
slave02

 

[Create A Configured Version of Spark to Share]

cd ~
tar czf spark.tar.gz spark-2.2.0-bin-hadoop2.7

 

—FOR ALL SLAVES, do each of the steps below for each slave.

ssh into your slave

[Copy Over Files Needed from Master to Slave]

scp spark.tar.gz slave01:~
scp hosts_addition slave01:~
scp ~/.ssh/id_rsa.pub slave01:~

ssh slave1

[update all slaves]

sudo apt-get update
sudo apt-get upgrade

install dependencies

sudo apt-get install oracle-java8-jdk scala

sudo apt-get install vim

 

[Add authorized SSH key on all slaves]
on slave:

mkdir -p ~/.ssh
touch ~/.ssh/authorized_keys

Note that this does not damage existing directory of files if any
Verify the status of the files:

 ls -a /home/pi/.ssh
 cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
 rm ~/id_rsa.pub

Allow slave SSH into Master
ssh-keygen -t rsa -P ""
scp ~/.ssh/id_rsa.pub master:~
ssh master
ls -a /home/pi/.ssh
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
rm ~/id_rsa.pub
logout

[Add hosts data to all slaves]
add the relevant lines to your hosts file:

sudo vim /etc/hosts

press G
to get cursor to the last line of the file. This is important so you don’t corrupt the structure of existing data.
press :r /home/pi/hosts_addition
press :wq

[Uncompress the Configured Spark]

tar xzf spark.tar.gz

Useful References:

How to Install Apache Spark on Multi-Node Cluster


http://bailiwick.io/2015/07/07/create-your-own-apache-spark-cluster-using-raspberry-pi-2/

 

Leave a Reply

Your email address will not be published. Required fields are marked *