@@ -6,16 +6,15 @@ The CPython interpreter is defined in C, meaning that the semantics of the
6
6
bytecode instructions, the dispatching mechanism, error handling, and
7
7
tracing and instrumentation are all intermixed.
8
8
9
- This document proposes defining a custom C-like DSL for defining the
9
+ This document proposes defining a custom C-like DSL for defining the
10
10
instruction semantics and tools for generating the code deriving from
11
11
the instruction definitions.
12
12
13
13
These tools would be used to:
14
-
15
- - Generate the main interpreter (done)
16
- - Generate the tier 2 interpreter
17
- - Generate documentation for instructions
18
- - Generate metadata about instructions, such as stack use (done).
14
+ * Generate the main interpreter (done)
15
+ * Generate the tier 2 interpreter
16
+ * Generate documentation for instructions
17
+ * Generate metadata about instructions, such as stack use (done).
19
18
20
19
Having a single definition file ensures that there is a single source
21
20
of truth for bytecode semantics.
@@ -46,7 +45,7 @@ passes from the semantic definition, reducing errors.
46
45
47
46
As we improve the performance of CPython, we need to optimize larger regions
48
47
of code, use more complex optimizations and, ultimately, translate to machine
49
- code.
48
+ code.
50
49
51
50
All of these steps introduce the possibility of more bugs, and require more code
52
51
to be written. One way to mitigate this is through the use of code generators.
@@ -62,9 +61,10 @@ blocks as the instructions for the tier 1 (PEP 659) interpreter.
62
61
Rewriting all the instructions is tedious and error-prone, and changing the
63
62
instructions is a maintenance headache as both versions need to be kept in sync.
64
63
65
- By using a code generator and using a common source for the instructions, or
64
+ By using a code generator and using a common source for the instructions, or
66
65
parts of instructions, we can reduce the potential for errors considerably.
67
66
67
+
68
68
## Specification
69
69
70
70
This specification is a work in progress.
@@ -74,7 +74,7 @@ We update it as the need arises.
74
74
75
75
Each op definition has a kind, a name, a stack and instruction stream effect,
76
76
and a piece of C code describing its semantics::
77
-
77
+
78
78
```
79
79
file:
80
80
(definition | family | pseudo)+
@@ -85,7 +85,7 @@ and a piece of C code describing its semantics::
85
85
"op" "(" NAME "," stack_effect ")" "{" C-code "}"
86
86
|
87
87
"macro" "(" NAME ")" "=" uop ("+" uop)* ";"
88
-
88
+
89
89
stack_effect:
90
90
"(" [inputs] "--" [outputs] ")"
91
91
@@ -128,9 +128,9 @@ and a piece of C code describing its semantics::
128
128
129
129
The following definitions may occur:
130
130
131
- - ` inst ` : A normal instruction, as previously defined by ` TARGET(NAME) ` in ` ceval.c ` .
132
- - ` op ` : A part instruction from which macros can be constructed.
133
- - ` macro ` : A bytecode instruction constructed from ops and cache effects.
131
+ * ` inst ` : A normal instruction, as previously defined by ` TARGET(NAME) ` in ` ceval.c ` .
132
+ * ` op ` : A part instruction from which macros can be constructed.
133
+ * ` macro ` : A bytecode instruction constructed from ops and cache effects.
134
134
135
135
` NAME ` can be any ASCII identifier that is a C identifier and not a C or Python keyword.
136
136
` foo_1 ` is legal. ` $ ` is not legal, nor is ` struct ` or ` class ` .
@@ -165,9 +165,9 @@ part of the DSL.
165
165
166
166
Those functions include:
167
167
168
- - ` DEOPT_IF(cond, instruction) ` . Deoptimize if ` cond ` is met.
169
- - ` ERROR_IF(cond, label) ` . Jump to error handler at ` label ` if ` cond ` is true.
170
- - ` DECREF_INPUTS() ` . Generate ` Py_DECREF() ` calls for the input stack effects.
168
+ * ` DEOPT_IF(cond, instruction) ` . Deoptimize if ` cond ` is met.
169
+ * ` ERROR_IF(cond, label) ` . Jump to error handler at ` label ` if ` cond ` is true.
170
+ * ` DECREF_INPUTS() ` . Generate ` Py_DECREF() ` calls for the input stack effects.
171
171
172
172
Note that the use of ` DECREF_INPUTS() ` is optional -- manual calls
173
173
to ` Py_DECREF() ` or other approaches are also acceptable
@@ -203,7 +203,6 @@ two idioms are valid:
203
203
` ERROR_IF(true, error) ` .
204
204
205
205
An example of the latter would be:
206
-
207
206
``` cc
208
207
res = PyObject_Add(left, right);
209
208
if (res == NULL ) {
@@ -232,16 +231,13 @@ The same is true for all members of a pseudo instruction
232
231
Some examples:
233
232
234
233
### Output stack effect
235
-
236
234
```C
237
235
inst ( LOAD_FAST, (-- value) ) {
238
236
value = frame->f_localsplus[oparg];
239
237
Py_INCREF(value);
240
238
}
241
239
```
242
-
243
240
This would generate:
244
-
245
241
``` C
246
242
TARGET (LOAD_FAST) {
247
243
PyObject * value;
@@ -253,15 +249,12 @@ This would generate:
253
249
```
254
250
255
251
### Input stack effect
256
-
257
252
```C
258
253
inst ( STORE_FAST, (value --) ) {
259
254
SETLOCAL(oparg, value);
260
255
}
261
256
```
262
-
263
257
This would generate:
264
-
265
258
``` C
266
259
TARGET (STORE_FAST) {
267
260
PyObject * value = PEEK(1);
@@ -272,17 +265,14 @@ This would generate:
272
265
```
273
266
274
267
### Input stack effect and cache effect
275
-
276
268
```C
277
269
op ( CHECK_OBJECT_TYPE, (owner, type_version/2 -- owner) ) {
278
270
PyTypeObject *tp = Py_TYPE(owner);
279
271
assert(type_version != 0);
280
272
DEOPT_IF(tp->tp_version_tag != type_version);
281
273
}
282
274
```
283
-
284
275
This might become (if it was an instruction):
285
-
286
276
``` C
287
277
TARGET (CHECK_OBJECT_TYPE) {
288
278
PyObject * owner = PEEK(1);
@@ -298,14 +288,12 @@ This might become (if it was an instruction):
298
288
### More examples
299
289
300
290
For explanations see "Generating the interpreter" below.)
301
-
302
291
```C
303
292
op ( CHECK_HAS_INSTANCE_VALUES, (owner -- owner) ) {
304
293
PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(owner);
305
294
DEOPT_IF(!_PyDictOrValues_IsValues(dorv));
306
295
}
307
296
```
308
-
309
297
``` C
310
298
op ( LOAD_INSTANCE_VALUE, (owner, index/1 -- null if (oparg & 1), res) ) {
311
299
res = _ PyDictOrValues_GetValues(dorv)->values[ index] ;
@@ -315,13 +303,11 @@ For explanations see "Generating the interpreter" below.)
315
303
Py_DECREF(owner);
316
304
}
317
305
```
318
-
319
306
```C
320
307
macro ( LOAD_ATTR_INSTANCE_VALUE ) =
321
308
counter/1 + CHECK_OBJECT_TYPE + CHECK_HAS_INSTANCE_VALUES +
322
309
LOAD_INSTANCE_VALUE + unused/4 ;
323
310
```
324
-
325
311
``` C
326
312
op ( LOAD_SLOT, (owner, index/1 -- null if (oparg & 1), res) ) {
327
313
char * addr = (char * )owner + index;
@@ -332,18 +318,15 @@ For explanations see "Generating the interpreter" below.)
332
318
Py_DECREF(owner);
333
319
}
334
320
```
335
-
336
321
```C
337
322
macro ( LOAD_ATTR_SLOT ) = counter/1 + CHECK_OBJECT_TYPE + LOAD_SLOT + unused/4;
338
323
```
339
-
340
324
``` C
341
325
inst ( BUILD_TUPLE, (items[ oparg] -- tuple) ) {
342
326
tuple = _ PyTuple_FromArraySteal(items, oparg);
343
327
ERROR_IF(tuple == NULL, error);
344
328
}
345
329
```
346
-
347
330
```C
348
331
inst ( PRINT_EXPR ) {
349
332
PyObject *value = POP();
@@ -367,21 +350,20 @@ For explanations see "Generating the interpreter" below.)
367
350
A _ family_ maps a specializable instruction to its specializations.
368
351
369
352
Example: These opcodes all share the same instruction format):
370
-
371
353
``` C
372
- family (LOAD_ATTR ) = { LOAD_ATTR_INSTANCE_VALUE, LOAD_SLOT };
354
+ family (load_attr ) = { LOAD_ATTR, LOAD_ATTR_INSTANCE_VALUE, LOAD_SLOT };
373
355
```
374
356
375
357
### Defining a pseudo instruction
376
358
377
359
A _pseudo instruction_ is used by the bytecode compiler to represent a set of possible concrete instructions.
378
360
379
361
Example: `JUMP` may expand to `JUMP_FORWARD` or `JUMP_BACKWARD`:
380
-
381
362
```C
382
363
pseudo(JUMP) = { JUMP_FORWARD, JUMP_BACKWARD };
383
364
```
384
365
366
+
385
367
## Generating the interpreter
386
368
387
369
The generated C code for a single instruction includes a preamble and dispatch at the end
@@ -430,7 +412,7 @@ rather than popping and pushing, such that `LOAD_ATTR_SLOT` would look something
430
412
stack_pointer += 1;
431
413
}
432
414
s1 = res;
433
- }
415
+ }
434
416
next_instr += (1 + 1 + 2 + 1 + 4);
435
417
stack_pointer[-1] = s1;
436
418
DISPATCH();
0 commit comments