Skip to content

Question/Feature request: dealing with tags that are not self/closing (sgml) #31

@guraltsev

Description

@guraltsev

This is more of a question than a bug report. Please tell me if I am using the wrong tool for the job. In SGML some tags are not allowed to be self-closing. e.g. all modern browsers error out if they encounter

<script src="somescript.js"/>

even is the document is of type html5. The script must be

<script src="somescript.js"></script>

For context:

I use esxml in several filter functions that I execute when exporting org files to html. I do not want to redefine a new exporter, just to filter some things. I use libxml to parse the output of the org exporter at different steps, I modify it using dom, and then I re-output it using esxml. I actually found it very surprising that esxml is not built into emacs proper.

One example is that I define an option (variable) that decides whether local css and JS scripts should be inlined in the html.

I added a hook function into org-export-filter-final-output-functions (see (https://orgmode.org/manual/Advanced-Export-Configuration.html)[Advanced Export Configuration]) that parses the output html, changes the contents of the script tag if inlining is required, then finally pushes everything back to html. However, if I am not inlining the scripts, the roundtrip html->emasc xml->html
changes <script>[...]</script> pairs into <script/> making the html output incorrect. I temporary solved the issue by adding a whitespace as a string inside empty <script></script> tags.

Do you have any comments about this? Now that there is a pcase would it be appropriate to implement such behavior? Technically this is not an XML behavior but de-facto I see no other library to programmatically edit HTML documents in emacs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions