Skip to content

Commit 99d4ea6

Browse files
authored
Update README.md
1 parent d30af4a commit 99d4ea6

File tree

1 file changed

+203
-0
lines changed

1 file changed

+203
-0
lines changed

embed/tf/README.md

Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,206 @@
22

33
Embedding engine implementation using `torch` + `transformers`.
44

5+
## Usage
6+
7+
On GPUs (e.g., CUDA):
8+
9+
```bash
10+
PYTHONUNBUFFERED=1 \
11+
/path/to/fmwork/embed/tf/driver \
12+
--platform cuda \
13+
--model_root /path/to/models \
14+
--model_name ibm-granite/granite-embedding-125m-english \
15+
--model_class RobertaModel \
16+
--input_sizes 512 \
17+
--batch_sizes 1 \
18+
--reps 100
19+
```
20+
21+
On Spyre:
22+
23+
```bash
24+
PYTHONUNBUFFERED=1 \
25+
DTLOG_LEVEL=error \
26+
DT_DEEPRT_VERBOSE=-1 \
27+
DTCOMPILER_KEEP_EXPORT=-1 \
28+
TORCH_SENDNN_LOG=CRITICAL \
29+
/path/to/fmwork/embed/tf/driver \
30+
--platform spyre \
31+
--model_root /path/to/models \
32+
--model_name ibm-granite/granite-embedding-125m-english \
33+
--model_class RobertaModel \
34+
--input_sizes 512 \
35+
--batch_sizes 1 \
36+
--reps 100 \
37+
--torch.call:set_grad_enabled@ False \
38+
--compile \
39+
--compile:backend sendnn
40+
```
41+
42+
`--torch.call:set_grad_enabled@ False`, `--compile` and `--compile:backend sendnn` are required.
43+
Additional environment variables are optional but might help avoiding verbose outputs.
44+
Also on Spyre, only one combination of input / batch size can be executed at a time.
45+
In other words, `--input_sizes` and `--batch_sizes` must not be lists.
46+
47+
## Example of output
48+
49+
The `driver` can take one or more combinations of input and batch sizes.
50+
For each combination, the following output block will be produced:
51+
52+
```
53+
--------------------------------------------------------------------------------
54+
RUN 128 / 1
55+
--------------------------------------------------------------------------------
56+
57+
FMWORK REP 1 100 1108911.088153850 1108911.845211219 0.757057369
58+
FMWORK REP 2 100 1108911.845509595 1108911.849978236 0.004468641
59+
FMWORK REP 3 100 1108911.850187359 1108911.854049551 0.003862192
60+
...
61+
FMWORK REP 98 100 1108912.224051913 1108912.227749102 0.003697189
62+
FMWORK REP 99 100 1108912.227934311 1108912.231636694 0.003702383
63+
FMWORK REP 100 100 1108912.231828882 1108912.235531503 0.003702621
64+
65+
FMWORK RES 1108911.088153850 1108912.235531503 ibm-granite/granite-embedding-125m-english RobertaModel 128 1 3.691 270.9
66+
```
67+
68+
The `FMWORK REP` lines provide information about each repetition (controlled by `--reps`):
69+
* Current rep (e.g., `1`)
70+
* Total reps to run (e.g., `100`)
71+
* Start timestamp of rep (e.g., `1108911.088153850`)
72+
* End timestamp of rep (e.g., `1108911.845211219`)
73+
* Duration of rep in seconds (e.g., `0.757057369`)
74+
75+
The `FMWORK RES` line has a summary of the results:
76+
* First timestamp of first rep (e.g., `1108911.088153850`)
77+
* Last timestamp of last rep (e.g., `1108912.235531503`)
78+
* Model name (e.g., `ibm-granite/granite-embedding-125m-english`)
79+
* Model class (e.g., `RobertaModel`)
80+
* Input (prompt) size (e.g., `128`)
81+
* Batch size (e.g., `1`)
82+
* Latency in milliseconds (e.g., `3.691`)
83+
* Throughput (speed) in sequences per second (e.g., `270.9`)
84+
85+
## More on parameters
86+
87+
```python
88+
parser = argparse.ArgumentParser()
89+
parser.add = parser.add_argument
90+
parser.add('--platform', type=str, required=True)
91+
parser.add('--model_class', type=str, required=True)
92+
parser.add('--model_root', type=str)
93+
parser.add('--model_name', type=str, required=True)
94+
parser.add('--compile', action='store_true')
95+
parser.add('--eval', action='store_true')
96+
parser.add('--input_sizes', type=str, required=True)
97+
parser.add('--batch_sizes', type=str, required=True)
98+
parser.add('--reps', type=int, required=True)
99+
args, opts = parser.parse_known_args()
100+
fmwork.args.process_opts(args, opts, [
101+
'compile', 'model', 'torch.call', 'torch.set',
102+
], globals())
103+
```
104+
105+
The `driver` script takes a number of "fixed" parameters:
106+
107+
* `--platform` :
108+
Hardware/software platform identifier.
109+
Currently supported/tested `cuda` and `spyre`.
110+
* `--model_class` :
111+
Model class from `transformers` library to instantiate model.
112+
Depends on the selected model.
113+
Usual values include `BertModel` and `RobertaModel`.
114+
`AutoModel` can also be used.
115+
* `--model_root` :
116+
Path to root folder where models are located.
117+
This is not the path to the model itself.
118+
This is just a helper when pretty printing the model name.
119+
* `--model_name` :
120+
Model name.
121+
If `--model_root` is not specified,
122+
this should be the full path to the model.
123+
* `--compile` :
124+
Controls whether `torch.compile` is called or not.
125+
Further options might be required in the form of dynamic sub-options.
126+
* `--eval` :
127+
Controls whether `model.eval()` is called or not.
128+
* `--input_sizes` :
129+
Comma-separated list of input sizes (sequence / prompt length).
130+
On `spyre` this must be a single value.
131+
* `--batch_sizes` :
132+
Comma-separated list of batch sizes (concurrent requests / users).
133+
On `spyre` this must be a single value.
134+
* `--reps` :
135+
Number of repetitions to run.
136+
137+
The `driver` script can also take a number of dynamic sub-options.
138+
For instance, if `--compile` is passed,
139+
one might specify how to compile the model using one or more subs:
140+
141+
```bash
142+
PYTHONUNBUFFERED=1 \
143+
/path/to/fmwork/embed/tf/driver \
144+
--platform cuda \
145+
--model_root /path/to/models \
146+
--model_name ibm-granite/granite-embedding-125m-english \
147+
--model_class RobertaModel \
148+
--input_sizes 512 \
149+
--batch_sizes 1 \
150+
--reps 100 \
151+
--compile:backend inductor \
152+
--compile:dynamic@ True \
153+
--compile:mode reduce-overhead
154+
```
155+
156+
For each `--key val` parameter,
157+
the `:` in the key indicates the name of the set of sub-options
158+
-- in this case, `compile`.
159+
The actual option / parameter name comes next
160+
-- e.g., `backend` or `dynamic`.
161+
If the option contains a `@`,
162+
this indicates the value will be `eval()`;
163+
else, the value is `str()`.
164+
In this case, `inductor` is just a string (name),
165+
while `True` is evaluated to Python's `True`.
166+
167+
Current `driver` defines four sets of sub-options:
168+
169+
* `compile` :
170+
Options passed to the `torch.compile` call.
171+
Please refer to the documentation associated to the `torch` version
172+
you are currently using --
173+
e.g., https://docs.pytorch.org/docs/stable/generated/torch.compile.html.
174+
Nested options are supported --
175+
e.g., the `options` dict that can be passed to `torch.compile`.
176+
* `model` :
177+
Options passed to the `<model_class>.from_pretrained()` call.
178+
* `torch.call` :
179+
`torch` functions called during engine initialization.
180+
For instance, `--torch.call:set_grad_enabled@ False`
181+
calls `torch.set_grad_enabled(False)`, which is useful to disable
182+
gradient computation during the execution of the benchmark
183+
(required on Spyre).
184+
* `torch.set` :
185+
`torch` variables to be assigned.
186+
This can be used, for instance, to set up something like
187+
`torch.backends.cudnn.benchmark = True` via
188+
`--torch.set:backends.cudnn.benchmark@ True`.
189+
190+
## Processing results
191+
192+
A file containing results of an experiment (outputs from the `driver`)
193+
can be processed using the `process` script.
194+
Usage:
195+
196+
```bash
197+
/path/to/fmwork/embed/tf/process \
198+
--path <path> \
199+
--metadata_id <id>
200+
```
201+
202+
* `--path` is the path to the file.
203+
* `--metadata_id` can be used to associate the generated JSON
204+
to external information that, for instance, describes the environment
205+
where the experiment was executed.
206+
207+
The script will print a JSON containing a list of results present in the file.

0 commit comments

Comments
 (0)