Local Testing

Content

First Time Setup
- Hadoop, Giraph, GPS, Mizan, GraphLab
Initialization
Running Tests

The systems can all be tested locally in a pseudo-distributed environment. Note that these instructions are for Ubuntu 12.04. Other versions of Ubuntu or Linux may or may not work!

First Time Setup

Choose a folder to work in. We'll call it */.
Setup up each system as follows...

Hadoop

Grab Hadoop 1.0.4 and untar the source: bash cd */ wget https://archive.apache.org/dist/hadoop/core/hadoop-1.0.4/hadoop-1.0.4.tar.gz tar -xf hadoop-1.0.4.tar.gz
Modify */hadoop-1.0.4/conf/hadoop-env.sh and add: export JAVA_HOME=*/jdk1.6.0_30/ export HADOOP_HOME_WARN_SUPPRESS="TRUE"

`*/` must be an **absolute** path.

===

Giraph

Grab Giraph 1.0.0 and untar the source: bash cd */ wget http://apache.mirror.vexxhost.com/giraph/giraph-1.0.0/giraph-1.0.0.tar.gz tar -xf giraph-1.0.0.tar.gz

Note: Source also available [here](https://github.com/xvz/graph-processing/wiki/packages/giraph-1.0.0.tar.gz).

Grab the Maven 3 dependency: bash sudo aptitude install maven

===

GPS

Grab GPS rev 110 with Hadoop 1.0.4: bash cd */ wget https://github.com/xvz/graph-processing/wiki/packages/gps-rev-110.tar.gz tar -xf gps-rev-110.tar.gz

This includes only the changes needed to use Hadoop 1.0.4.

---
*Note*: If you want to do this manually:
```bash
cd */
svn checkout https://subversion.assembla.com/svn/phd-projects/gps/trunk/@110 ./gps-rev-110/
cp ./hadoop-1.0.4/hadoop-core-1.0.4.jar ./gps-rev-110/libs/
```

and change all instances of `hadoop-core-0.20.203.0.jar` to `hadoop-core-1.0.4.jar` in:
```
./gps-rev-110/local-master-scripts/make_gps_node_runner_jar.sh
./gps-rev-110/local-master-scripts/manifest.txt
./gps-rev-110/local-master-scripts/make_debug_monitoring_runner_jar.sh
```
---

Modify */gps-rev-110/conf/gps-env.sh to contain: bash GPS_LOG_DIRECTORY=*/var/tmp/ GPS_DIR=*/gps-rev-110/

These must be **absolute** paths.

===

Mizan

Grab Mizan-0.1bu1 and untar the source: bash cd */ wget https://mizan-graph-bsp.googlecode.com/files/Mizan-0.1bu1.tar.gz tar -xf Mizan-0.1bu1.tar.gz mv Mizan-0.1b Mizan-0.1bu1

**Note 1**: The folder should be `Mizan-0.1bu1` to ensure compatibility with the repo source code.

**Note 2**: Also available [here](https://github.com/xvz/graph-processing/wiki/packages/Mizan-0.1bu1.tar.gz). (This does not contain the example web-Google graph, but has the correct folder name.)

Grab Mizan's dependencies: ```bash sudo aptitude install libboost-all-dev

cd */
wget http://downloads.sourceforge.net/project/threadpool/threadpool/0.2.5%20%28Stable%29/threadpool-0_2_5-src.zip
unzip threadpool-0_2_5-src.zip
sudo mv ./threadpool-0_2_5-src/threadpool/boost/* /usr/include/boost/
rm -rf ./threadpool-0_2_5-src

cd */
wget https://github.com/xvz/graph-processing/wiki/packages/jdk-6u30-linux-x64.bin
chmod +x jdk-6u30-linux-x64.bin
./jdk-6u30-linux-x64.bin     # installs to */jdk1.6.0_30/

cd */
wget http://www.mpich.org/static/downloads/3.0.2/mpich-3.0.2.tar.gz
tar -xf mpich-3.0.2.tar.gz
```

**Note 1**: JDK 6u30 can be obtained directly from Oracle, but you'll need to register an account.

**Note 2**: threadpool and MPICH also available [here](https://github.com/xvz/graph-processing/wiki/packages/threadpool-0_2_5-src.zip) and [here](https://github.com/xvz/graph-processing/wiki/packages/mpich-3.0.2.tar.gz).

Compile MPICH: bash cd */ mkdir mpich2 cd mpich-3.0.2/ ./configure --disable-fc --disable-f77 --prefix=*/mpich2 # note the absolute path! make make install # installs to */mpich2/
Add to your ~/.bashrc (replace the */s!!): ```bash # exports for Mizan export MPI_HOME=/mpich2 export JAVA_HOME=/jdk1.6.0_30 export HADOOP_HOME=/hadoop-1.0.4 export BOOST_ROOT=/usr/include/boost export GIRAPH_HOME=/giraph-1.0.0

export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:$HADOOP_HOME/c++/Linux-amd64-64/lib/:$BOOST_ROOT/lib:$LD_LIBRARY_PATH
export CLASSPATH=$HADOOP_HOME/lib/commons-configuration-1.6.jar:$HADOOP_HOME/lib/commons-lang-2.4.jar:$HADOOP_HOME/lib/commons-logging-api-1.0.4.jar:$HADOOP_HOME/hadoop-core-1.0.4.jar:$HADOOP_HOME/conf:$CLASSPATH
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$MPI_HOME/bin:$PATH
```

Note again that `*/` must be an **absolute** path!

Close and re-open your terminals. Ensure that which mpic++ mpicc mpirun all return binaries in */mpich2/bin/.

===

GraphLab

Grab GraphLab and configure:

cd */
git clone https://github.com/graphlab-code/graphlab.git ./graphlab-2a063b3829
cd ./graphlab-2a063b3829 && git reset --hard 2a063b3829
./configure

Note: Ensure Mizan's ~/.bashrc changes are applied before running ./configure! Otherwise GraphLab will be compiled against the wrong MPI binaries.

Install sysstat (for sar), SSH, and generate a dummy key to enable SSHing to localhost:

sudo aptitude install sysstat
sudo aptitude install ssh
ssh-keygen -t dsa -P ""
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Check out this repo to the same directory and force an overwrite of untracked files:

cd */
git init
git remote add origin -t master https://github.com/xvz/graph-processing.git
git fetch --all
git reset --hard origin/master

Modify */benchmark/common/get-dirs.sh so DIR_PREFIX contains the absolute path to */. (E.g., this may be /home/young/cs848). Modify */benchmark/common/get-configs.sh to correct JVM Xmx values, so that Giraph and GPS do not run out of memory.
Create the log directories:

cd */
mkdir -p ./benchmark/giraph/logs
mkdir -p ./benchmark/gps/logs
mkdir -p ./benchmark/graphlab/logs
mkdir -p ./benchmark/mizan/logs

Run */benchmark/local-init.sh.

Note: You can change the number of pseudo-machines to use by changing LOCAL_MACHINES in local-init.sh. Default is 1 slave.

Perform a special first time compile for Giraph and Mizan:

cd */giraph-1.0.0/
mvn clean install -Phadoop_1.0 -DskipTests

cd */Mizan-0.1bu1/Release/
make clean && make all

If you want GPS's debug monitor, compile it with:

cd */gps-rev-110/local-master-scripts/
./make_debug_monitoring_runner_jar.sh

Compile the remaining systems using */benchmark/<system>/recompile-<system>.sh. (Also see Recompiling Systems.)

Initialization

Run */benchmark/local-init.sh. Note that this will erase HDFS and Hadoop logs. (No need to do this if you just did First Time Setup.)
Place datasets into */datasets/. See Datasets.

Note: */benchmark/datasets/load-splits.sh will not work for local testing, so split the datasets for GraphLab using */benchmark/datasets/split-input.sh and upload them manually.

Running Tests

Start Hadoop using */benchmark/hadoop/restart-hadoop.sh 1. (Or start-all.sh and wait for it to exit safemode.)
Use the individual scripts in */benchmark/<system>/<alg>.sh to run specific algorithms. Note that you must be in the same directory as the scripts: cd */benchmark/<system> before running ./<alg>.sh.
To parse the results, use

*/benchmark/parser/batch-parser.py <system> <time-log-files> --master

For example, ./batch-parser.py 0 ../giraph/logs/*time.txt --master to parse Giraph's logs.

Note: "Total net I/O" reported by the parser is invalid for local testing, as all communication is via loopback and not Ethernet. The logs do track loopback I/O ("lo"), which may be of limited use.

Our Results

Data and Paper

Running Experiments

Repo Structure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Local Testing

Content

First Time Setup

Hadoop

Giraph

GPS

Mizan

GraphLab

Initialization

Running Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally