Building A Raspberry Pi Cluster with Apache Spark

See slides from our talk at Rockville Raspberry Pi Jam 2017: Cluser Computing with Raspberry Pi


Figure out or set the IP Addresses of all nodes on your network.

[Cool trick for seeing local network Raspberry Pi’s]

sudo apt-get install nmap sudo nmap -sP | awk '/^Nmap/{ip=$NF}/B8:27:EB/{print ip}'

Getting your current IP

hostname -I


[Get SSH set up into all Devices]

 sudo raspi-config
 ->Advanced Options
 Enable -> Yes

On master I assume you are using the user Pi, but if not, make sure all Pi’s have the same user with the same password

ssh-keygen -t rsa -P ""
 ->Enter file in which to save the key (/home/pi/.ssh/id_rsa): [Enter]


[Generate a Host Addition File]

 apt-get install vim
 vim hosts_addition

Inside vim
press i to enter insert mode master slave01 slave02

press ESC to exit insert mode
press :wq
to save

Make a copy of your hosts file:

cp /ect/hosts ~/hosts.bak

sudo apt-get update
sudo apt-get upgrade

[Install Master Dependencies]
install the prerequisites:

 sudo apt-get install scala
 sudo apt-get install oracle-java8-jdk


[Install Master Spark]
Note past versions of Spark needed the oracle-java7-jdk, however in the new verion this will throw an error: [Image]

Download the latest version of Spark:



Untar Tarball:

tar xzf spark-2.2.0-bin-hadoop2.7.tgz

Edit .bashsrc (in your users home directory)
Add this to the end:

export JAVA_HOME=<path-of-Java-installation> (eg: /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/)
export SPARK_HOME=<path-to-the-root-of-your-spark-installation> (eg: /home/pi/spark-2.2.0-bin-hadoop2.7/)
export PATH=$PATH:/home/pi/spark-2.2.0-bin-hadoop2.7/bin

Reload .bashsrc

source ~/.bashrc

Test your variables by typing in the command line:




[Configure Master Spark]

cd ~/spark-2.2.0-bin-hadoop2.7/conf

sudo cp
sudo vim



Note that the variable in older versions was called SPARK_MASTER_IP

RP1 = 256m
RP2/RP3 = 512m

sudo vim slaves

press i

Enter your slaves, one per line:



[Create A Configured Version of Spark to Share]

cd ~
tar czf spark.tar.gz spark-2.2.0-bin-hadoop2.7


—FOR ALL SLAVES, do each of the steps below for each slave.

ssh into your slave

[Copy Over Files Needed from Master to Slave]

scp spark.tar.gz slave01:~
scp hosts_addition slave01:~
scp ~/.ssh/ slave01:~

ssh slave1

[update all slaves]

sudo apt-get update
sudo apt-get upgrade

install dependencies

sudo apt-get install oracle-java8-jdk scala

sudo apt-get install vim


[Add authorized SSH key on all slaves]
on slave:

mkdir -p ~/.ssh
touch ~/.ssh/authorized_keys

Note that this does not damage existing directory of files if any
Verify the status of the files:

 ls -a /home/pi/.ssh
 cat ~/ >> ~/.ssh/authorized_keys
 rm ~/

Allow slave SSH into Master
ssh-keygen -t rsa -P ""
scp ~/.ssh/ master:~
ssh master
ls -a /home/pi/.ssh
cat ~/ >> ~/.ssh/authorized_keys
rm ~/

[Add hosts data to all slaves]
add the relevant lines to your hosts file:

sudo vim /etc/hosts

press G
to get cursor to the last line of the file. This is important so you don’t corrupt the structure of existing data.
press :r /home/pi/hosts_addition
press :wq

[Uncompress the Configured Spark]

tar xzf spark.tar.gz

Useful References:

How to Install Apache Spark on Multi-Node Cluster


Leave a Reply

Your email address will not be published. Required fields are marked *