-
Notifications
You must be signed in to change notification settings - Fork 12
Local Testing
The systems can all be tested locally in a pseudo-distributed environment. Note that these instructions are for Ubuntu 12.04. Other versions of Ubuntu or Linux may or may not work!
-
Choose a folder to work in. We'll call it
*/
. -
Setup up each system as follows...
-
Grab Hadoop 1.0.4 and untar the source:
bash cd */ wget https://archive.apache.org/dist/hadoop/core/hadoop-1.0.4/hadoop-1.0.4.tar.gz tar -xf hadoop-1.0.4.tar.gz
-
Modify
*/hadoop-1.0.4/conf/hadoop-env.sh
and add:export JAVA_HOME=*/jdk1.6.0_30/ export HADOOP_HOME_WARN_SUPPRESS="TRUE"
`*/` must be an **absolute** path.
===
- Grab Giraph 1.0.0 and untar the source:
bash cd */ wget http://apache.mirror.vexxhost.com/giraph/giraph-1.0.0/giraph-1.0.0.tar.gz tar -xf giraph-1.0.0.tar.gz
Note: Source also available [here](https://github.com/xvz/graph-processing/wiki/packages/giraph-1.0.0.tar.gz).
- Grab the Maven 3 dependency:
bash sudo aptitude install maven
===
- Grab GPS rev 110 with Hadoop 1.0.4:
bash cd */ wget https://github.com/xvz/graph-processing/wiki/packages/gps-rev-110.tar.gz tar -xf gps-rev-110.tar.gz
This includes only the changes needed to use Hadoop 1.0.4.
---
*Note*: If you want to do this manually:
```bash
cd */
svn checkout https://subversion.assembla.com/svn/phd-projects/gps/trunk/@110 ./gps-rev-110/
cp ./hadoop-1.0.4/hadoop-core-1.0.4.jar ./gps-rev-110/libs/
```
and change all instances of `hadoop-core-0.20.203.0.jar` to `hadoop-core-1.0.4.jar` in:
```
./gps-rev-110/local-master-scripts/make_gps_node_runner_jar.sh
./gps-rev-110/local-master-scripts/manifest.txt
./gps-rev-110/local-master-scripts/make_debug_monitoring_runner_jar.sh
```
---
- Modify
*/gps-rev-110/conf/gps-env.sh
to contain:bash GPS_LOG_DIRECTORY=*/var/tmp/ GPS_DIR=*/gps-rev-110/
These must be **absolute** paths.
===
- Grab Mizan-0.1bu1 and untar the source:
bash cd */ wget https://mizan-graph-bsp.googlecode.com/files/Mizan-0.1bu1.tar.gz tar -xf Mizan-0.1bu1.tar.gz mv Mizan-0.1b Mizan-0.1bu1
**Note 1**: The folder should be `Mizan-0.1bu1` to ensure compatibility with the repo source code.
**Note 2**: Also available [here](https://github.com/xvz/graph-processing/wiki/packages/Mizan-0.1bu1.tar.gz). (This does not contain the example web-Google graph, but has the correct folder name.)
- Grab Mizan's dependencies: ```bash sudo aptitude install libboost-all-dev
cd */
wget http://downloads.sourceforge.net/project/threadpool/threadpool/0.2.5%20%28Stable%29/threadpool-0_2_5-src.zip
unzip threadpool-0_2_5-src.zip
sudo mv ./threadpool-0_2_5-src/threadpool/boost/* /usr/include/boost/
rm -rf ./threadpool-0_2_5-src
cd */
wget https://github.com/xvz/graph-processing/wiki/packages/jdk-6u30-linux-x64.bin
chmod +x jdk-6u30-linux-x64.bin
./jdk-6u30-linux-x64.bin # installs to */jdk1.6.0_30/
cd */
wget http://www.mpich.org/static/downloads/3.0.2/mpich-3.0.2.tar.gz
tar -xf mpich-3.0.2.tar.gz
```
**Note 1**: JDK 6u30 can be obtained directly from Oracle, but you'll need to register an account.
**Note 2**: threadpool and MPICH also available [here](https://github.com/xvz/graph-processing/wiki/packages/threadpool-0_2_5-src.zip) and [here](https://github.com/xvz/graph-processing/wiki/packages/mpich-3.0.2.tar.gz).
-
Compile MPICH:
bash cd */ mkdir mpich2 cd mpich-3.0.2/ ./configure --disable-fc --disable-f77 --prefix=*/mpich2 # note the absolute path! make make install # installs to */mpich2/
-
Add to your
~/.bashrc
(replace the*/
s!!): ```bash # exports for Mizan export MPI_HOME=/mpich2 export JAVA_HOME=/jdk1.6.0_30 export HADOOP_HOME=/hadoop-1.0.4 export BOOST_ROOT=/usr/include/boost export GIRAPH_HOME=/giraph-1.0.0
export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:$HADOOP_HOME/c++/Linux-amd64-64/lib/:$BOOST_ROOT/lib:$LD_LIBRARY_PATH
export CLASSPATH=$HADOOP_HOME/lib/commons-configuration-1.6.jar:$HADOOP_HOME/lib/commons-lang-2.4.jar:$HADOOP_HOME/lib/commons-logging-api-1.0.4.jar:$HADOOP_HOME/hadoop-core-1.0.4.jar:$HADOOP_HOME/conf:$CLASSPATH
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$MPI_HOME/bin:$PATH
```
Note again that `*/` must be an **absolute** path!
- Close and re-open your terminals. Ensure that
which mpic++ mpicc mpirun
all return binaries in*/mpich2/bin/
.
===
Grab GraphLab and configure:
cd */
git clone https://github.com/graphlab-code/graphlab.git ./graphlab-2a063b3829
cd ./graphlab-2a063b3829 && git reset --hard 2a063b3829
./configure
Note: Ensure Mizan's ~/.bashrc
changes are applied before running ./configure
! Otherwise GraphLab will be compiled against the wrong MPI binaries.
- Install sysstat (for
sar
), SSH, and generate a dummy key to enable SSHing to localhost:
sudo aptitude install sysstat
sudo aptitude install ssh
ssh-keygen -t dsa -P ""
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
- Check out this repo to the same directory and force an overwrite of untracked files:
cd */
git init
git remote add origin -t master https://github.com/xvz/graph-processing.git
git fetch --all
git reset --hard origin/master
-
Modify
*/benchmark/common/get-dirs.sh
soDIR_PREFIX
contains the absolute path to*/
. (E.g., this may be/home/young/cs848
). Modify*/benchmark/common/get-configs.sh
to correct JVM Xmx values, so that Giraph and GPS do not run out of memory. -
Create the log directories:
cd */
mkdir -p ./benchmark/giraph/logs
mkdir -p ./benchmark/gps/logs
mkdir -p ./benchmark/graphlab/logs
mkdir -p ./benchmark/mizan/logs
- Run
*/benchmark/local-init.sh
.
Note: You can change the number of pseudo-machines to use by changing LOCAL_MACHINES
in local-init.sh
. Default is 1 slave.
- Perform a special first time compile for Giraph and Mizan:
cd */giraph-1.0.0/
mvn clean install -Phadoop_1.0 -DskipTests
cd */Mizan-0.1bu1/Release/
make clean && make all
If you want GPS's debug monitor, compile it with:
cd */gps-rev-110/local-master-scripts/
./make_debug_monitoring_runner_jar.sh
- Compile the remaining systems using
*/benchmark/<system>/recompile-<system>.sh
. (Also see Recompiling Systems.)
-
Run
*/benchmark/local-init.sh
. Note that this will erase HDFS and Hadoop logs. (No need to do this if you just did First Time Setup.) -
Place datasets into
*/datasets/
. See Datasets.
Note: */benchmark/datasets/load-splits.sh
will not work for local testing, so split the datasets for GraphLab using */benchmark/datasets/split-input.sh
and upload them manually.
-
Start Hadoop using
*/benchmark/hadoop/restart-hadoop.sh 1
. (Orstart-all.sh
and wait for it to exit safemode.) -
Use the individual scripts in
*/benchmark/<system>/<alg>.sh
to run specific algorithms. Note that you must be in the same directory as the scripts:cd */benchmark/<system>
before running./<alg>.sh
. -
To parse the results, use
*/benchmark/parser/batch-parser.py <system> <time-log-files> --master
For example, ./batch-parser.py 0 ../giraph/logs/*time.txt --master
to parse Giraph's logs.
Note: "Total net I/O" reported by the parser is invalid for local testing, as all communication is via loopback and not Ethernet. The logs do track loopback I/O ("lo"), which may be of limited use.