Skip to content

LibTidy does not support all entities! It should!! #643

Closed
@geoffmcl

Description

@geoffmcl

At present libtidy, in entities.c, supports only about some 253, case sensitive, entities... it uses TY_(tmbstrcmp) in the entitiesLookup static function...

Yet W3C pages like these - https://dev.w3.org/html5/html-author/charref - or - https://www.w3.org/TR/xml-entity-names/byalpha.html - and probably others, indicate in excess of 2,000 entities, some like &, &, ... in multiple case forms... not yet exactly counted, and there may be more, yet to be fully explored...

Here is a live web page - https://support.apple.com/en-us/HT202021 - containing the likes of 	, 
, pointed out to me by Karl Dahlke, of the edbrowse project, which uses libtidy to load and view web pages - thanks Karl...

So at present, edbrowse sees warnings about unknown entities, and unless preserve-entities: yes is added to the config, they will be rendered/returned as text &Tab, &NewLine, along with the warnings... not good!

I propose libtidy support all known entities, and seek feedback, and more, other W3C references, on this... thanks...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions