Skip to content

Commit b97d58a

Browse files
committed
Document HTML sanitation policy
1 parent 4669a09 commit b97d58a

File tree

1 file changed

+78
-4
lines changed

1 file changed

+78
-4
lines changed

docs/reference.md

Lines changed: 78 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,33 @@ instance of the `markdown.Markdown` class and pass multiple documents through
2525
it. If you do use a single instance though, make sure to call the `reset`
2626
method appropriately ([see below](#convert)).
2727

28-
### markdown.markdown(text [, **kwargs]) {: #markdown data-toc-label='markdown.markdown' }
28+
### `markdown.markdown(text [, **kwargs])` {: #markdown data-toc-label='markdown.markdown' }
29+
30+
!!! warning
31+
32+
The Python-Markdown library does ***not*** sanitize its HTML output. If
33+
you are processing Markdown input from an untrusted source, it is your
34+
responsibility to ensure that it is properly sanitized. See [Markdown and
35+
XSS] for an overview of some of the dangers and [Improper markup
36+
sanitization in popular software] for notes on best practices to ensure
37+
HTML is properly sanitized.
38+
39+
The developers of Python-Markdown recommend using [nh3] or [bleach][][^1]
40+
as a sanitizer on the output of `markdown.markdown`. However, be
41+
aware that those libraries may not be sufficient in themselves and will
42+
likely require customization. Some useful lists of allowed tags and
43+
attributes can be found in the [bleach-allowlist] library, which should
44+
work with either sanitizer.
45+
46+
47+
[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/
48+
[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md
49+
[nh3]: https://nh3.readthedocs.io/en/latest/
50+
[bleach]: http://bleach.readthedocs.org/en/latest/
51+
[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist
52+
[^1]: We are aware that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698).
53+
However, it is the only pure-Python HTML sanitation library we are aware of and may be the only option for
54+
those who cannot use [nh3] (Python bindings to a Rust library).
2955

3056
The following options are available on the `markdown.markdown` function:
3157

@@ -216,7 +242,23 @@ __encoding__{: #encoding }
216242
meet your specific needs, it is suggested that you write your own code
217243
to handle your encoding/decoding needs.
218244

219-
### markdown.Markdown([**kwargs]) {: #Markdown data-toc-label='markdown.Markdown' }
245+
!!! warning
246+
247+
The Python-Markdown library does ***not*** sanitize its HTML output. If
248+
you are processing Markdown input from an untrusted source, it is your
249+
responsibility to ensure that it is properly sanitized. See [Markdown and
250+
XSS] for an overview of some of the dangers and [Improper markup
251+
sanitization in popular software] for notes on best practices to ensure
252+
HTML is properly sanitized.
253+
254+
The developers of Python-Markdown recommend using [nh3] or [bleach][]
255+
[^1] as a sanitizer on the output of `markdown.markdownFromFile`.
256+
However, be aware that those libraries may not be sufficient in
257+
themselves and will likely require customization. Some useful lists of
258+
allowed tags and attributes can be found in the
259+
[bleach-allowlist] library, which should work with either sanitizer.
260+
261+
### `markdown.Markdown([**kwargs])` {: #Markdown data-toc-label='markdown.Markdown' }
220262

221263
The same options are available when initializing the `markdown.Markdown` class
222264
as on the [`markdown.markdown`](#markdown) function, except that the class does
@@ -229,7 +271,7 @@ string must be passed to one of two instance methods.
229271
the thread they were created in. A single instance should not be accessed
230272
from multiple threads.
231273

232-
#### Markdown.convert(source) {: #convert data-toc-label='Markdown.convert' }
274+
#### `Markdown.convert(source)` {: #convert data-toc-label='Markdown.convert' }
233275

234276
The `source` text must meet the same requirements as the [`text`](#text)
235277
argument of the [`markdown.markdown`](#markdown) function.
@@ -258,7 +300,23 @@ To make this easier, you can also chain calls to `reset` together:
258300
html3 = md.reset().convert(text3)
259301
```
260302

261-
#### Markdown.convertFile(**kwargs) {: #convertFile data-toc-label='Markdown.convertFile' }
303+
!!! warning
304+
305+
The Python-Markdown library does ***not*** sanitize its HTML output. If
306+
you are processing Markdown input from an untrusted source, it is your
307+
responsibility to ensure that it is properly sanitized. See [Markdown and
308+
XSS] for an overview of some of the dangers and [Improper markup
309+
sanitization in popular software] for notes on best practices to ensure
310+
HTML is properly sanitized.
311+
312+
The developers of Python-Markdown recommend using [nh3] or [bleach][]
313+
[^1] as a sanitizer on the output of `Markdown.convert`. However, be
314+
aware that those libraries may not be sufficient in themselves and will
315+
likely require customization. Some useful lists of allowed tags and
316+
attributes can be found in the [bleach-allowlist] library, which should
317+
work with either sanitizer.
318+
319+
#### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' }
262320

263321
The arguments of this method are identical to the arguments of the same
264322
name on the `markdown.markdownFromFile` function ([`input`](#input),
@@ -267,3 +325,19 @@ name on the `markdown.markdownFromFile` function ([`input`](#input),
267325
process multiple files without creating a new instance of the class for
268326
each document. State may need to be `reset` between each call to
269327
`convertFile` as is the case with `convert`.
328+
329+
!!! warning
330+
331+
The Python-Markdown library does ***not*** sanitize its HTML output. If
332+
you are processing Markdown input from an untrusted source, it is your
333+
responsibility to ensure that it is properly sanitized. See [Markdown and
334+
XSS] for an overview of some of the dangers and [Improper markup
335+
sanitization in popular software] for notes on best practices to ensure
336+
HTML is properly sanitized.
337+
338+
The developers of Python-Markdown recommend using [nh3] or [bleach][]
339+
[^1] as a sanitizer on the output of `Markdown.convertFile`. However, be
340+
aware that those libraries may not be sufficient in themselves and will
341+
likely require customization. Some useful lists of allowed tags and
342+
attributes can be found in the [bleach-allowlist] library, which should
343+
work with either sanitizer.

0 commit comments

Comments
 (0)