Building A Raspberry Pi Cluster with Apache Spark

See slides from our talk at Rockville Raspberry Pi Jam 2017: Cluser Computing with Raspberry Pi

 

Figure out or set the IP Addresses of all nodes on your network.

[Cool trick for seeing local network Raspberry Pi’s]

1
2
3
sudo apt-get install nmap sudo nmap -sP 192.168.1.0/24 | awk '/^Nmap/{ip=$NF}/B8:27:EB/{print ip}'
Getting your current IP
hostname -I

[Get SSH set up into all Devices]

1
sudo raspi-config

->Advanced Options
->SSH
Enable -> Yes

On master I assume you are using the user Pi, but if not, make sure all Pi’s have the same user with the same password

1
2
ssh-keygen -t rsa -P ""
->Enter file in which to save the key (/home/pi/.ssh/id_rsa): [Enter]

[Generate a Host Addition File]

1
2
apt-get install vim
vim hosts_addition

Inside vim
press i to enter insert mode

1
2
3
192.168.1.2 master
192.168.1.3 slave01
192.168.1.4 slave02

press ESC to exit insert mode
press :wq
to save

Make a copy of your hosts file:

1
2
3
4
cp /ect/hosts ~/hosts.bak

sudo apt-get update
sudo apt-get upgrade

[Install Master Dependencies]
install the prerequisites:

1
2
sudo apt-get install scala
sudo apt-get install oracle-java8-jdk

[Install Master Spark]
Note past versions of Spark needed the oracle-java7-jdk, however in the new verion this will throw an error: [Image]

Download the latest version of Spark:
http://spark.apache.org/downloads.html

1
wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz

Untar Tarball:

1
tar xzf spark-2.2.0-bin-hadoop2.7.tgz

Edit .bashsrc (in your users home directory)
Add this to the end:

1
2
3
export JAVA_HOME=<path-of-Java-installation> (eg: /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/)
export SPARK_HOME=<path-to-the-root-of-your-spark-installation> (eg: /home/pi/spark-2.2.0-bin-hadoop2.7/)
export PATH=$PATH:/home/pi/spark-2.2.0-bin-hadoop2.7/bin

Reload .bashsrc

1
source ~/.bashrc

Test your variables by typing in the command line:

1
2
$JAVA_HOME
$SPARK_HOME

 

[Configure Master Spark]

1
2
3
4
5
6
7
8
cd ~/spark-2.2.0-bin-hadoop2.7/conf

sudo cp spark-env.sh.template spark-env.sh
sudo vim spark-env.sh

export SPARK_WORKER_CORES="2"
export SPARK_WORKER_MEMORY="512m"
export SPARK_MASTER_HOST="192.168.1.2"

Press esc to exit insert/edit text mode and press :wq
Note that the variable in older versions was called SPARK_MASTER_IP

RP1 = 256m
RP2/RP3 = 512m

1
>sudo vim slaves

press i

Enter your slaves, one per line:

1
2
slave01
slave02

[Create A Configured Version of Spark to Share]

1
2
cd ~
tar czf spark.tar.gz spark-2.2.0-bin-hadoop2.7

—FOR ALL SLAVES, do each of the steps below for each slave.

ssh into your slave

[Copy Over Files Needed from Master to Slave]

1
2
3
4
5
scp spark.tar.gz slave01:~
scp hosts_addition slave01:~
scp ~/.ssh/id_rsa.pub slave01:~

ssh slave1

[update all slaves]

1
2
sudo apt-get update
sudo apt-get upgrade

install dependencies

1
2
3
sudo apt-get install oracle-java8-jdk scala

sudo apt-get install vim

[Add authorized SSH key on all slaves]
on slave:

1
2
mkdir -p ~/.ssh
touch ~/.ssh/authorized_keys

Note that this does not damage existing directory of files if any
Verify the status of the files:

1
2
3
ls -a /home/pi/.ssh
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
rm ~/id_rsa.pub

Allow slave SSH into Master

1
2
3
4
5
6
7
ssh-keygen -t rsa -P ""
scp ~/.ssh/id_rsa.pub master:~
ssh master
ls -a /home/pi/.ssh
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
rm ~/id_rsa.pub
logout

[Add hosts data to all slaves]
add the relevant lines to your hosts file:

1
sudo vim /etc/hosts

press G
to get cursor to the last line of the file. This is important so you don’t corrupt the structure of existing data.
press :r /home/pi/hosts_addition
press :wq

[Uncompress the Configured Spark]

1
tar xzf spark.tar.gz

Additional Useful References:

Install Apache Spark on Multi-Node Cluster


http://bailiwick.io/2015/07/07/create-your-own-apache-spark-cluster-using-raspberry-pi-2/

 

Leave a Reply

Your email address will not be published. Required fields are marked *