Skip to content
This repository was archived by the owner on Feb 15, 2023. It is now read-only.

Commit bd15a1d

Browse files
committed
Merge pull request #257 from nostrademons/asan
Fix obsolete flags in the ctypes bindings, and add some debugging notes.
2 parents 6615e11 + e0fb9b0 commit bd15a1d

File tree

2 files changed

+106
-5
lines changed

2 files changed

+106
-5
lines changed

DEBUGGING.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
These are a couple of debugging notes that may be helpful for anyone developing
2+
Gumbo or trying to diagnose a tricky problem. They will probably not be
3+
necessary for normal clients of this library - Gumbo is relatively stable, and
4+
bugs are often rare and obscure. However, they're handy to have as a reference,
5+
and may also provide useful Google fodder to people searching for these tools.
6+
7+
Standard disclaimer: I use all of these techniques on my Ubuntu 14.04 computer
8+
with gcc 4.8.2, clang 3.4, and gtest 1.6.0, but make no warranty about them
9+
working on other systems. In particular, they're almost certain not to work on
10+
Windows.
11+
12+
Debug output
13+
============
14+
15+
Gumbo has a compile-time switch to dump lots of debug output onto stdout.
16+
Compile with the GUMBO_DEBUG define enabled:
17+
18+
```bash
19+
$ make CFLAGS='-DGUMBO_DEBUG'
20+
```
21+
22+
Note that this spits *a lot* of debug information to the console and makes the
23+
program run significantly slower, so it's usually helpful to isolate only the
24+
specific HTML file or fragment that causes the bug. It lets us trace the
25+
operation of each of the tokenizer & parser's state machines in depth, though.
26+
27+
Unit tests
28+
==========
29+
30+
As mentioned in the README, Gumbo relies on [googletest][] for unit tests.
31+
Unzip the gtest ZIP distribution inside the Gumbo root and rename it 'gtest'.
32+
'make check' runs the tests, as normal.
33+
34+
```bash
35+
$ make check
36+
$ cat test-suite.log
37+
```
38+
39+
If you need to debug a core dump, you'll probably want to run the test binary
40+
directly:
41+
42+
```bash
43+
$ ulimit -c unlimited
44+
$ make check
45+
$ .libs/lt-gumbo_test
46+
$ gdb .libs/lt-gumbo_test core
47+
```
48+
49+
The same goes for core dumps in other example binaries.
50+
51+
Assertions
52+
==========
53+
54+
Gumbo relies pretty heavily on assertions. By default they're enabled at
55+
run-time: to turn them off, define NDEBUG:
56+
57+
```bash
58+
$ make CFLAGS='-DNDEBUG'
59+
```
60+
61+
ASAN
62+
====
63+
64+
Google's [address-sanitizer][] is a helpful tool that lets you find memory
65+
errors with relatively low overhead: enough that you can often run it in
66+
production. Enabling it for C/C++ binaries is pretty standard and described on
67+
the ASAN documentation pages. It requires Clang >=3.1 or GCC >= 4.8.
68+
69+
```bash
70+
$ make \
71+
CFLAGS='-fsanitize=address -fno-omit-frame-pointer -fno-inline' \
72+
LDFLAGS='-fsanitize=address'
73+
```
74+
75+
ASAN can also be used when Gumbo is compiled as a shared library and linked into
76+
a scripting language via FFI, but this use-case is unsupported by the ASAN
77+
authors. To do it, use LD_PRELOAD to ensure the ASAN runtime support is
78+
included in the process:
79+
80+
```bash
81+
$ LD_PRELOAD=libasan.so.0 python -c 'import gumbo; gumbo.parse(problem_text)'
82+
```
83+
84+
Getting clean stack traces from this requires the use of the llvm-symbolizer
85+
binary, included with clang:
86+
87+
```bash
88+
$ export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-3.4
89+
$ export ASAN_OPTIONS=symbolize=1
90+
$ LD_PRELOAD=libasan.so.0 python -c \
91+
'import gumbo; gumbo.parse(problem_text)' 2>&1 | head -100
92+
$ killall llvm-symbolizer-3.4
93+
$ killall llvm-symbolizer-3.4
94+
$ killall llvm-symbolizer-3.4
95+
```
96+
97+
This use case is even less officially supported than using it with dynamic
98+
shared objects; on my machine, it led to a recursive ASAN error about a
99+
use-after-free in llvm-symbolizer, effectively fork-bombing the machine. Have
100+
the killalls ready, and avoid letting the process run for too long (eg. piping
101+
it to 'less').
102+
103+
[googletest]: https://code.google.com/p/googletest/
104+
[address-sanitizer]: https://code.google.com/p/address-sanitizer/

python/gumbo/gumboc.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -496,13 +496,10 @@ class Options(ctypes.Structure):
496496
# function. Right now these are treated as opaque void pointers.
497497
('allocator', ctypes.c_void_p),
498498
('deallocator', ctypes.c_void_p),
499+
('userdata', ctypes.c_void_p),
499500
('tab_stop', ctypes.c_int),
500501
('stop_on_first_error', ctypes.c_bool),
501-
('max_utf8_decode_errors', ctypes.c_int),
502-
# The following two options will likely be removed from the C API, and
503-
# should be removed from the Python API when that happens too.
504-
('verbatim_mode', ctypes.c_bool),
505-
('preserve_entities', ctypes.c_bool),
502+
('max_errors', ctypes.c_int),
506503
]
507504

508505

0 commit comments

Comments
 (0)