Skip to content

Commit a88f497

Browse files
Gamal Sallamfacebook-github-bot
authored andcommitted
Backport perf trampoline
Summary: Backport the perf-trampoline introduced in python/cpython#96123. The perf trampoline doesn't work properly with the JIT, so we have submitted a PR to have a C-API to unify writing to the perf-map files python/cpython#103546. Reviewed By: czardoz Differential Revision: D45419843 fbshipit-source-id: 16bd13d7981e48c9eb7bc0e5eef1c1f4748965f6
1 parent 455709d commit a88f497

23 files changed

+1417
-3
lines changed

Cinder/module/known-core-python-exported-symbols

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1738,6 +1738,8 @@ _Py_tracemalloc_config
17381738
_PyTraceMalloc_GetTraceback
17391739
PyTraceMalloc_Track
17401740
PyTraceMalloc_Untrack
1741+
_Py_trampoline_func_end
1742+
_Py_trampoline_func_start
17411743
_PyTrash_begin
17421744
_PyTrash_cond
17431745
_PyTrash_deposit_object

Doc/c-api/init_config.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,20 @@ PyPreConfig
318318
319319
Default: ``1`` in Python config, ``0`` in isolated config.
320320
321+
.. c:member:: int perf_profiling
322+
323+
Enable compatibility mode with the perf profiler?
324+
325+
If non-zero, initialize the perf trampoline. See :ref:`perf_profiling`
326+
for more information.
327+
328+
Set by :option:`-X perf <-X>` command line option and by the
329+
:envvar:`PYTHONPERFSUPPORT` environment variable.
330+
331+
Default: ``-1``.
332+
333+
.. versionadded:: 3.12
334+
321335
.. c:member:: int use_environment
322336
323337
Use :ref:`environment variables <using-on-envvars>`? See

Doc/howto/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,4 @@ Currently, the HOWTOs are:
3030
clinic.rst
3131
instrumentation.rst
3232
annotations.rst
33-
33+
perf_profiling.rst

Doc/howto/perf_profiling.rst

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
.. highlight:: shell-session
2+
3+
.. _perf_profiling:
4+
5+
==============================================
6+
Python support for the Linux ``perf`` profiler
7+
==============================================
8+
9+
:author: Pablo Galindo
10+
11+
The Linux ``perf`` profiler is a very powerful tool that allows you to profile and
12+
obtain information about the performance of your application. ``perf`` also has
13+
a very vibrant ecosystem of tools that aid with the analysis of the data that it
14+
produces.
15+
16+
The main problem with using the ``perf`` profiler with Python applications is that
17+
``perf`` only allows to get information about native symbols, this is, the names of
18+
the functions and procedures written in C. This means that the names and file names
19+
of the Python functions in your code will not appear in the output of the ``perf``.
20+
21+
Since Python 3.12, the interpreter can run in a special mode that allows Python
22+
functions to appear in the output of the ``perf`` profiler. When this mode is
23+
enabled, the interpreter will interpose a small piece of code compiled on the
24+
fly before the execution of every Python function and it will teach ``perf`` the
25+
relationship between this piece of code and the associated Python function using
26+
`perf map files`_.
27+
28+
.. warning::
29+
30+
Support for the ``perf`` profiler is only currently available for Linux on
31+
selected architectures. Check the output of the configure build step or
32+
check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE``
33+
to see if your system is supported.
34+
35+
For example, consider the following script:
36+
37+
.. code-block:: python
38+
39+
def foo(n):
40+
result = 0
41+
for _ in range(n):
42+
result += 1
43+
return result
44+
45+
def bar(n):
46+
foo(n)
47+
48+
def baz(n):
49+
bar(n)
50+
51+
if __name__ == "__main__":
52+
baz(1000000)
53+
54+
We can run perf to sample CPU stack traces at 9999 Hertz:
55+
56+
$ perf record -F 9999 -g -o perf.data python my_script.py
57+
58+
Then we can use perf report to analyze the data:
59+
60+
.. code-block:: shell-session
61+
62+
$ perf report --stdio -n -g
63+
64+
# Children Self Samples Command Shared Object Symbol
65+
# ........ ........ ............ .......... .................. ..........................................
66+
#
67+
91.08% 0.00% 0 python.exe python.exe [.] _start
68+
|
69+
---_start
70+
|
71+
--90.71%--__libc_start_main
72+
Py_BytesMain
73+
|
74+
|--56.88%--pymain_run_python.constprop.0
75+
| |
76+
| |--56.13%--_PyRun_AnyFileObject
77+
| | _PyRun_SimpleFileObject
78+
| | |
79+
| | |--55.02%--run_mod
80+
| | | |
81+
| | | --54.65%--PyEval_EvalCode
82+
| | | _PyEval_EvalFrameDefault
83+
| | | PyObject_Vectorcall
84+
| | | _PyEval_Vector
85+
| | | _PyEval_EvalFrameDefault
86+
| | | PyObject_Vectorcall
87+
| | | _PyEval_Vector
88+
| | | _PyEval_EvalFrameDefault
89+
| | | PyObject_Vectorcall
90+
| | | _PyEval_Vector
91+
| | | |
92+
| | | |--51.67%--_PyEval_EvalFrameDefault
93+
| | | | |
94+
| | | | |--11.52%--_PyLong_Add
95+
| | | | | |
96+
| | | | | |--2.97%--_PyObject_Malloc
97+
...
98+
99+
As you can see here, the Python functions are not shown in the output, only ``_Py_Eval_EvalFrameDefault`` appears
100+
(the function that evaluates the Python bytecode) shows up. Unfortunately that's not very useful because all Python
101+
functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which
102+
bytecode-evaluating function.
103+
104+
Instead, if we run the same experiment with perf support activated we get:
105+
106+
.. code-block:: shell-session
107+
108+
$ perf report --stdio -n -g
109+
110+
# Children Self Samples Command Shared Object Symbol
111+
# ........ ........ ............ .......... .................. .....................................................................
112+
#
113+
90.58% 0.36% 1 python.exe python.exe [.] _start
114+
|
115+
---_start
116+
|
117+
--89.86%--__libc_start_main
118+
Py_BytesMain
119+
|
120+
|--55.43%--pymain_run_python.constprop.0
121+
| |
122+
| |--54.71%--_PyRun_AnyFileObject
123+
| | _PyRun_SimpleFileObject
124+
| | |
125+
| | |--53.62%--run_mod
126+
| | | |
127+
| | | --53.26%--PyEval_EvalCode
128+
| | | py::<module>:/src/script.py
129+
| | | _PyEval_EvalFrameDefault
130+
| | | PyObject_Vectorcall
131+
| | | _PyEval_Vector
132+
| | | py::baz:/src/script.py
133+
| | | _PyEval_EvalFrameDefault
134+
| | | PyObject_Vectorcall
135+
| | | _PyEval_Vector
136+
| | | py::bar:/src/script.py
137+
| | | _PyEval_EvalFrameDefault
138+
| | | PyObject_Vectorcall
139+
| | | _PyEval_Vector
140+
| | | py::foo:/src/script.py
141+
| | | |
142+
| | | |--51.81%--_PyEval_EvalFrameDefault
143+
| | | | |
144+
| | | | |--13.77%--_PyLong_Add
145+
| | | | | |
146+
| | | | | |--3.26%--_PyObject_Malloc
147+
148+
149+
150+
Enabling perf profiling mode
151+
----------------------------
152+
153+
There are two main ways to activate the perf profiling mode. If you want it to be
154+
active since the start of the Python interpreter, you can use the `-Xperf` option:
155+
156+
$ python -Xperf my_script.py
157+
158+
There is also support for dynamically activating and deactivating the perf
159+
profiling mode by using the APIs in the :mod:`sys` module:
160+
161+
.. code-block:: python
162+
163+
import sys
164+
sys.activate_stack_trampoline("perf")
165+
166+
# Run some code with Perf profiling
167+
168+
sys.deactivate_stack_trampoline()
169+
170+
# Perf profiling is not active anymore
171+
172+
These APIs can be handy if you want to activate/deactivate profiling mode in
173+
response to a signal or other communication mechanism with your process.
174+
175+
176+
177+
Now we can analyze the data with ``perf report``:
178+
179+
$ perf report -g -i perf.data
180+
181+
182+
How to obtain the best results
183+
-------------------------------
184+
185+
For the best results, Python should be compiled with
186+
``CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"`` as this allows
187+
profilers to unwind using only the frame pointer and not on DWARF debug
188+
information. This is because as the code that is interposed to allow perf
189+
support is dynamically generated it doesn't have any DWARF debugging information
190+
available.
191+
192+
You can check if you system has been compiled with this flag by running:
193+
194+
$ python -m sysconfig | grep 'no-omit-frame-pointer'
195+
196+
If you don't see any output it means that your interpreter has not been compiled with
197+
frame pointers and therefore it may not be able to show Python functions in the output
198+
of ``perf``.
199+
200+
.. _perf map files: https://github.com/torvalds/linux/blob/0513e464f9007b70b96740271a948ca5ab6e7dd7/tools/perf/Documentation/jit-interface.txt

Doc/using/cmdline.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -477,6 +477,12 @@ Miscellaneous options
477477
* ``-X warn_default_encoding`` issues a :class:`EncodingWarning` when the
478478
locale-specific default encoding is used for opening files.
479479
See also :envvar:`PYTHONWARNDEFAULTENCODING`.
480+
* ``-X perf`` to activate compatibility mode with the ``perf`` profiler.
481+
When this option is activated, the Linux ``perf`` profiler will be able to
482+
report Python calls. This option is only available on some platforms and
483+
will do nothing if is not supported on the current system. The default value
484+
is "off". See also :envvar:`PYTHONPERFSUPPORT` and :ref:`perf_profiling`
485+
for more information.
480486

481487
It also allows passing arbitrary values and retrieving them through the
482488
:data:`sys._xoptions` dictionary.
@@ -948,6 +954,13 @@ conflict.
948954

949955
.. versionadded:: 3.10
950956

957+
.. envvar:: PYTHONPERFSUPPORT
958+
959+
If this variable is set to a nonzero value, it activates compatibility mode
960+
with the ``perf`` profiler so Python calls can be detected by it. See the
961+
:ref:`perf_profiling` section for more information.
962+
963+
.. versionadded:: 3.12
951964

952965
Debug-mode variables
953966
~~~~~~~~~~~~~~~~~~~~

Include/cpython/initconfig.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@ typedef struct PyConfig {
142142
unsigned long hash_seed;
143143
int faulthandler;
144144
int tracemalloc;
145+
int perf_profiling;
145146
int import_time;
146147
int show_ref_count;
147148
int dump_refs;

Include/internal/pycore_ceval.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,27 @@ extern PyObject *_PyEval_BuiltinsFromGlobals(
4141
PyThreadState *tstate,
4242
PyObject *globals);
4343

44+
// Trampoline API
45+
46+
typedef struct {
47+
// Callback to initialize the trampoline state
48+
void* (*init_state)(void);
49+
// Callback to register every trampoline being created
50+
void (*write_state)(void* state, const void *code_addr,
51+
unsigned int code_size, PyCodeObject* code);
52+
// Callback to free the trampoline state
53+
int (*free_state)(void* state);
54+
} _PyPerf_Callbacks;
55+
56+
extern int _PyPerfTrampoline_SetCallbacks(_PyPerf_Callbacks *);
57+
extern void _PyPerfTrampoline_GetCallbacks(_PyPerf_Callbacks *);
58+
extern int _PyPerfTrampoline_Init(int activate);
59+
extern int _PyPerfTrampoline_Fini(void);
60+
extern int _PyIsPerfTrampolineActive(void);
61+
extern PyStatus _PyPerfTrampoline_AfterFork_Child(void);
62+
#ifdef PY_HAVE_PERF_TRAMPOLINE
63+
extern _PyPerf_Callbacks _Py_perfmap_callbacks;
64+
#endif
4465

4566
static inline PyObject*
4667
_PyEval_EvalFrame(PyThreadState *tstate, PyFrameObject *f, int throwflag)

Lib/test/test_embed.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
376376
'hash_seed': 0,
377377
'faulthandler': 0,
378378
'tracemalloc': 0,
379+
'perf_profiling': 0,
379380
'import_time': 0,
380381
'show_ref_count': 0,
381382
'dump_refs': 0,
@@ -458,6 +459,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
458459
use_hash_seed=0,
459460
faulthandler=0,
460461
tracemalloc=0,
462+
perf_profiling=0,
461463
pathconfig_warnings=0,
462464
)
463465
if MS_WINDOWS:
@@ -809,6 +811,7 @@ def test_init_from_config(self):
809811
'use_hash_seed': 1,
810812
'hash_seed': 123,
811813
'tracemalloc': 2,
814+
'perf_profiling': 0,
812815
'import_time': 1,
813816
'show_ref_count': 1,
814817
'malloc_stats': 1,
@@ -869,6 +872,7 @@ def test_init_compat_env(self):
869872
'use_hash_seed': 1,
870873
'hash_seed': 42,
871874
'tracemalloc': 2,
875+
'perf_profiling': 0,
872876
'import_time': 1,
873877
'malloc_stats': 1,
874878
'inspect': 1,
@@ -898,6 +902,7 @@ def test_init_python_env(self):
898902
'use_hash_seed': 1,
899903
'hash_seed': 42,
900904
'tracemalloc': 2,
905+
'perf_profiling': 0,
901906
'import_time': 1,
902907
'malloc_stats': 1,
903908
'inspect': 1,

0 commit comments

Comments
 (0)