Description
From Karl, of edbrowse...
One item that has been on the to-do list for a long time -
you muck with the contents of <textarea>
but somewhat like <script>
, you should really leave it alone.
That's not just me, it's standard, and done by every other browser.
<body>
<textarea>hello<br>world</textarea>
</body>
Lots of websites put sample html into a textarea, html that you might want to cut & paste into your own web page or whatever,
you shouldn't mess with the tags at all.
https://www.prchecker.info/check_page_rank.php
Look at the stuff in the textarea.
I get around it by a preprocessing routine html-tidy.c line 17, and it works so well I almost forgot it was happening,
but at some point you may want to address this.
The contents of textarea should be hello<br>world
as written, not helloworld
.
Thus there are 3 modes.
- Process and clean up the tags.
- Don't touch or change a single byte, as in
<script>
or<style>
.
If you change anything at all here, things don't work. - don't interpret any html tags but still expand
&characters;
Only two I know here are<textarea>
and<title>
Hope this helps, and thank you again.
My @geoffmcl reply:
I am reviewing the <textarea>
tidy code, and understand what you want changed... ok, yes, the contents of textarea should be hello<br>world
as written, not helloworld, and a warning that a <br>
was deleted...
I can get that with a small change in the textarea text gathering... it is done in ParseText
service...
But we want the text like as in <script>var txt = 'hello<br>world';</script>
, where the text is treated as CDATA, and is not changed in any way... is the raw text stream until the </textarea>
is encountered, if ever... so I change it to ParseScript
...
Now we will get <textarea>hello<br>world</textarea>
, which is not bad, but is there an option to skip escaping...
I think this change only happens in the output, pretty print... In there, if I change the output of textarea to use PPrintScriptStyle
, then I get -
<textarea>
hello<br>world
</textarea>
Do not know if I can get rid of the newlines, but you do not consume html, just the node list, maybe you will not have this problem...
Added to tidy
issues, so I can assign an issue number to the fixes, probably in a branch initially...
The current tidy
diff is -
diff --git a/src/pprint.c b/src/pprint.c
index 321045e..25f72bc 100644
--- a/src/pprint.c
+++ b/src/pprint.c
@@ -2130,7 +2130,7 @@ void TY_(PPrintTree)( TidyDocImpl* doc, uint mode, uint indent, Node *node )
node->type = StartTag;
if ( node->tag &&
- (node->tag->parser == TY_(ParsePre) || nodeIsTEXTAREA(node)) )
+ (node->tag->parser == TY_(ParsePre)) )
{
Bool classic = TidyClassicVS; /* #228 - cfgBool( doc, TidyVertSpace ); */
uint indprev = indent;
@@ -2163,7 +2163,7 @@ void TY_(PPrintTree)( TidyDocImpl* doc, uint mode, uint indent, Node *node )
&& node->next != NULL )
TY_(PFlushLineSmart)( doc, indent );
}
- else if ( nodeIsSTYLE(node) || nodeIsSCRIPT(node) )
+ else if ( nodeIsSTYLE(node) || nodeIsSCRIPT(node) || nodeIsTEXTAREA(node) )
{
PPrintScriptStyle( doc, (mode | PREFORMATTED | NOWRAP | CDATA),
indent, node );
diff --git a/src/tags.c b/src/tags.c
index 139db11..518726a 100644
--- a/src/tags.c
+++ b/src/tags.c
@@ -263,7 +263,7 @@ static Dict tag_defs[] =
{ TidyTag_TABLE, "table", VERS_ELEM_TABLE, &TY_(W3CAttrsFor_TABLE)[0], (CM_BLOCK), TY_(ParseTableTag), CheckTABLE },
{ TidyTag_TBODY, "tbody", VERS_ELEM_TBODY, &TY_(W3CAttrsFor_TBODY)[0], (CM_TABLE|CM_ROWGRP|CM_OPT), TY_(ParseRowGroup), NULL },
{ TidyTag_TD, "td", VERS_ELEM_TD, &TY_(W3CAttrsFor_TD)[0], (CM_ROW|CM_OPT|CM_NO_INDENT), TY_(ParseBlock), NULL },
- { TidyTag_TEXTAREA, "textarea", VERS_ELEM_TEXTAREA, &TY_(W3CAttrsFor_TEXTAREA)[0], (CM_INLINE|CM_FIELD), TY_(ParseText), NULL },
+ { TidyTag_TEXTAREA, "textarea", VERS_ELEM_TEXTAREA, &TY_(W3CAttrsFor_TEXTAREA)[0], (CM_INLINE|CM_FIELD), TY_(ParseScript), NULL },
{ TidyTag_TFOOT, "tfoot", VERS_ELEM_TFOOT, &TY_(W3CAttrsFor_TFOOT)[0], (CM_TABLE|CM_ROWGRP|CM_OPT), TY_(ParseRowGroup), NULL },
{ TidyTag_TH, "th", VERS_ELEM_TH, &TY_(W3CAttrsFor_TH)[0], (CM_ROW|CM_OPT|CM_NO_INDENT), TY_(ParseBlock), NULL },
{ TidyTag_THEAD, "thead", VERS_ELEM_THEAD, &TY_(W3CAttrsFor_THEAD)[0], (CM_TABLE|CM_ROWGRP|CM_OPT), TY_(ParseRowGroup), NULL },
And still to test in edbrowse...
Further feedback welcome... thanks...