Skip to content

Setting up EC2

Young edited this page May 21, 2015 · 70 revisions

Contents

Our experiments were performed on us-west-2c, which is US West (Oregon) in the EC2 console.

Setting up boto

We'll use */ to refer to the repo's base directory on your local machine. You'll want to have */benchmark, */ec2, and */results available locally. The other folders are relevant only if you're looking to do local testing.

The */ec2/uw-ec2.py script makes use of the boto 2.27.0 Python package.

To install boto, ensure Python and pip are both installed (for Ubuntu, get python-pip; for Cygwin, use get-pip.py) and run sudo pip install boto or sudo pip install boto --upgrade.

Next, you'll need to set up the appropriate AWS credentials:

  1. Login to the AWS console and go the AWS IAM system.

  2. Select Users and click on "Create New User". Enter a user name (e.g., "awscli"):

"Create User"

  1. Copy down the Access Key ID and Secret Access Key.

"Security Credentials"

  1. Assign "Power User Access" permissions to the user:

"Power User Access"

  1. On your local machine, create ~/.boto containing:
[Credentials]
aws_access_key_id=<Access Key ID>
aws_secret_access_key=<Secret Access Key>

For the previous example, this would be:

[Credentials]
aws_access_key_id=AKIAJCXHDFHKDMGRDSWQ
aws_secret_access_key=7zOngn72gn/vfDxoqJoX2wUYiYH/VWwiGwdX4Ng4

Automated Setup

First Time Setup

If this is your first time setting up EC2, you'll need to create a key pair and security group:

*/ec2/uw-ec2.py create-kp
*/ec2/uw-ec2.py create-sg

create-kp creates a key pair named uwbench and saves the private key as */ec2/uwbench.pem. You'll need this for SSHing to the instances.

create-sg creates a security group named uwbench, containing the required connection rules.

Creating a Cluster

Launch, initialize, and connect to a cluster by using:

*/ec2/uw-ec2.py launch
*/ec2/uw-ec2.py init
*/ec2/uw-ec2.py connect

This creates a cluster of m1.xlarge instances (1 master and 4 slaves), performs some initialization tasks, and then SSHs to the master.

We recommend trying things out with t1.micro instances (free, except for storage costs):

*/ec2/uw-ec2.py -n <num-slaves> -t t1.micro launch
*/ec2/uw-ec2.py -n <num-slaves> init
*/ec2/uw-ec2.py -n <num-slaves> connect

where <num-slaves> is the number of slave machines (>0).

To launch spot instances, add the flag -p <spot-price>. <spot-price> is the maximum $ per hour per instance you are willing to pay. If the market price exceeds this, your instances are terminated. We recommend 0.20 for m1.xlarge on us-west-2c. You can check the market price by going to Spot Requests and clicking on "Pricing History".

Without the -p flag, the instances are on-demand instances (and here), which are more expensive but will not be terminated due to market price.

Note 1: While <num-slaves> can be anything > 0, it must be 4, 8, 16, 32, 64, or 128 for the batch benchmarking scripts to work.

Note 2: First time SSHing to any instance is always slow, so be patient with init. That said, sometimes some of the slaves will fail to be connectable. Give it a few minutes to see if clears up by itself. If not, try rebooting them. If that doesn't help, terminate them and create replacements.

Resizing Cluster Volumes

By default, a volume created from the master image is 6GB while one created from the slave image is 3GB, both of which are too small for large datasets. Specifying the --size-master or --size-slave arguments at launch will enable instances to automatically resize their volumes:

*/ec2/uw-ec2.py -n <num-slaves> --size-master <size-gb> --size-slave <size-gb> launch

We recommend using 500GB (i.e., --size-master 500) for the master and 30GB for the slave if you plan on testing with all datasets (and storing the full datasets at the master).

Note 1: Resizing volumes to large sizes (>50GB) can take some time. If */ec2/uw-ec2.py -n <num-slaves> init fails with "Connection refused", wait a few minutes before trying again.

Note 2: After resizing and downloading datasets, you may want to create your own private AMI images to reuse for future clusters. To use your own images, add the --ami-master and --ami-slave arguments.

To shrink an existing image or volume, see Shrinking AMI Images.

Starting, Stopping, Terminating

For an existing on-demand cluster, you can stop and start its instances by using:

*/ec2/uw-ec2.py -n <num-slaves> start
*/ec2/-uw-ec2.py -n <num-slaves> stop

A stopped cluster can be started again (its "temporary" ephemeral storage is deleted, but its EBS storage volumes stay intact).

To terminate (kill) a cluster, use:

*/ec2/uw-ec2.py -n <num-slaves> terminate

This deletes all instances and their EBS storage volumes! (See also: Terminating EC2)

Replacing Instances

Sometimes some of the slave instances of a cluster will act up. You can terminate them manually by filtering for their name via Instances in the EC2 console. Then, launch replacements using:

*/ec2/uw-ec2.py <options> launch
*/ec2/uw-ec2.py -n <num-slaves> init

Make sure to pass in the same options (-n, -p, -t, --size-slave, etc.) used for the existing cluster!

HVM Images

Our AMI images are PV rather than HVM, which is not supported by some EC2 instances. To convert PV images to HVM images, see Converting PV to HVM.

Manual Setup

See Manual Setup.

Clone this wiki locally