Skip to content
This repository was archived by the owner on Jan 6, 2023. It is now read-only.

Commit 93cdb7d

Browse files
vreisfacebook-github-bot
authored andcommitted
Small fixes to AWS README (#15)
Summary: Found these small issues while running the commands myself. Pull Request resolved: #15 Differential Revision: D18837656 Pulled By: kiukchung fbshipit-source-id: 354ee07ee65e9516642db62d8b9ceb434b9551e1
1 parent 6d6e44a commit 93cdb7d

File tree

1 file changed

+11
-10
lines changed

1 file changed

+11
-10
lines changed

aws/README.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,16 @@ jobs on AWS.
1010

1111
## Requirements
1212

13-
1. `pip install boto3`
14-
2. `git clone https://github.com/pytorch/elastic.git`
13+
1. `git clone https://github.com/pytorch/elastic.git`
14+
2. `cd elastic/aws && pip install -r requirements.txt`
1515
3. The following AWS resources:
1616
1. EC2 [instance profile](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html)
1717
2. [Subnet(s)](https://docs.aws.amazon.com/vpc/latest/userguide/default-vpc.html#create-default-subnet)
1818
3. [Security group](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html#DefaultSecurityGroup)
1919
4. EFS volume
2020
5. S3 Bucket
21-
21+
4. [Install](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html)
22+
the AWS Session Manager plugin
2223

2324
## Quickstart
2425

@@ -69,7 +70,7 @@ you have downloaded the imagenet dataset to `/mnt/efs/fs1/data/imagenet/train`.
6970
To run the script we'll use `petctl`,
7071

7172
``` bash
72-
python3 petctl.py run_job --size 2 --min_size 1 --max_size 3 --name ${USER}-job examples/imagenet/main.py -- --input_path /mnt/efs/fs1/data/imagenet/train
73+
python3 aws/petctl.py run_job --size 2 --min_size 1 --max_size 3 --name ${USER}-job examples/imagenet/main.py -- --input_path /mnt/efs/fs1/data/imagenet/train
7374
```
7475

7576
In the example above, the named arguments, such as, `--size` , `--min_size`, and
@@ -158,20 +159,20 @@ You can take a look at their console outputs by running
158159

159160
``` bash
160161
# see the status of the worker
161-
systemctl status torchelastic_worker
162+
sudo systemctl status torchelastic_worker
162163
# get the container id
163-
docker ps
164+
sudo docker ps
164165
# tail the container logs
165-
docker logs -f <container id>
166+
sudo docker logs -f <container id>
166167
```
167168

168169
> Note since we have configured the log driver to be `awslogs` tailing
169170
the docker logs will not work. For more information see: https://docs.docker.com/config/containers/logging/awslogs/
170171

171172
You can also manually stop and start the workers by running
172173
``` bash
173-
systemctl stop torchelastic_worker
174-
systemctl start torchelastic_worker
174+
sudo systemctl stop torchelastic_worker
175+
sudo systemctl start torchelastic_worker
175176
```
176177

177178
> **EXCERCISE:** Try stopping or adding worker(s) to see elasticity in action!
@@ -188,7 +189,7 @@ that is monitoring the job!). In practice consider using EKS, Batch, or SageMake
188189
To stop the job and tear down the resources, use the `kill_job` command:
189190

190191
``` bash
191-
python3 petctl.py kill_job --name ${USER}-job
192+
python3 petctl.py kill_job ${USER}-job
192193
```
193194

194195
You'll notice that the two ASGs created with the `run_job` command are deleted.

0 commit comments

Comments
 (0)