Skip to content

rendering of textarea tag broken #738

Open
@geoffmcl

Description

@geoffmcl

From Karl, of edbrowse...

One item that has been on the to-do list for a long time -
you muck with the contents of <textarea> but somewhat like <script>, you should really leave it alone.
That's not just me, it's standard, and done by every other browser.

<body>
<textarea>hello<br>world</textarea>
</body>

Lots of websites put sample html into a textarea, html that you might want to cut & paste into your own web page or whatever,
you shouldn't mess with the tags at all.
https://www.prchecker.info/check_page_rank.php
Look at the stuff in the textarea.
I get around it by a preprocessing routine html-tidy.c line 17, and it works so well I almost forgot it was happening,
but at some point you may want to address this.
The contents of textarea should be hello<br>world as written, not helloworld.

Thus there are 3 modes.

  1. Process and clean up the tags.
  2. Don't touch or change a single byte, as in <script> or <style>.
    If you change anything at all here, things don't work.
  3. don't interpret any html tags but still expand &characters; Only two I know here are <textarea> and <title>

Hope this helps, and thank you again.

My @geoffmcl reply:

I am reviewing the <textarea> tidy code, and understand what you want changed... ok, yes, the contents of textarea should be hello<br>world as written, not helloworld, and a warning that a <br> was deleted...

I can get that with a small change in the textarea text gathering... it is done in ParseText service...

But we want the text like as in <script>var txt = 'hello<br>world';</script>, where the text is treated as CDATA, and is not changed in any way... is the raw text stream until the </textarea> is encountered, if ever... so I change it to ParseScript...

Now we will get <textarea>hello&lt;br&gt;world</textarea>, which is not bad, but is there an option to skip escaping...

I think this change only happens in the output, pretty print... In there, if I change the output of textarea to use PPrintScriptStyle, then I get -

<textarea>
hello<br>world
</textarea> 

Do not know if I can get rid of the newlines, but you do not consume html, just the node list, maybe you will not have this problem...

Added to tidy issues, so I can assign an issue number to the fixes, probably in a branch initially...

The current tidy diff is -

diff --git a/src/pprint.c b/src/pprint.c
index 321045e..25f72bc 100644
--- a/src/pprint.c
+++ b/src/pprint.c
@@ -2130,7 +2130,7 @@ void TY_(PPrintTree)( TidyDocImpl* doc, uint mode, uint indent, Node *node )
             node->type = StartTag;
 
         if ( node->tag && 
-             (node->tag->parser == TY_(ParsePre) || nodeIsTEXTAREA(node)) )
+             (node->tag->parser == TY_(ParsePre)) )
         {
             Bool classic  = TidyClassicVS; /* #228 - cfgBool( doc, TidyVertSpace ); */
             uint indprev = indent;
@@ -2163,7 +2163,7 @@ void TY_(PPrintTree)( TidyDocImpl* doc, uint mode, uint indent, Node *node )
                  && node->next != NULL )
                 TY_(PFlushLineSmart)( doc, indent );
         }
-        else if ( nodeIsSTYLE(node) || nodeIsSCRIPT(node) )
+        else if ( nodeIsSTYLE(node) || nodeIsSCRIPT(node) || nodeIsTEXTAREA(node) )
         {
             PPrintScriptStyle( doc, (mode | PREFORMATTED | NOWRAP | CDATA),
                                indent, node );
diff --git a/src/tags.c b/src/tags.c
index 139db11..518726a 100644
--- a/src/tags.c
+++ b/src/tags.c
@@ -263,7 +263,7 @@ static Dict tag_defs[] =
   { TidyTag_TABLE,      "table",      VERS_ELEM_TABLE,      &TY_(W3CAttrsFor_TABLE)[0],      (CM_BLOCK),                                    TY_(ParseTableTag), CheckTABLE     },
   { TidyTag_TBODY,      "tbody",      VERS_ELEM_TBODY,      &TY_(W3CAttrsFor_TBODY)[0],      (CM_TABLE|CM_ROWGRP|CM_OPT),                   TY_(ParseRowGroup), NULL           },
   { TidyTag_TD,         "td",         VERS_ELEM_TD,         &TY_(W3CAttrsFor_TD)[0],         (CM_ROW|CM_OPT|CM_NO_INDENT),                  TY_(ParseBlock),    NULL           },
-  { TidyTag_TEXTAREA,   "textarea",   VERS_ELEM_TEXTAREA,   &TY_(W3CAttrsFor_TEXTAREA)[0],   (CM_INLINE|CM_FIELD),                          TY_(ParseText),     NULL           },
+  { TidyTag_TEXTAREA,   "textarea",   VERS_ELEM_TEXTAREA,   &TY_(W3CAttrsFor_TEXTAREA)[0],   (CM_INLINE|CM_FIELD),                          TY_(ParseScript),   NULL           },
   { TidyTag_TFOOT,      "tfoot",      VERS_ELEM_TFOOT,      &TY_(W3CAttrsFor_TFOOT)[0],      (CM_TABLE|CM_ROWGRP|CM_OPT),                   TY_(ParseRowGroup), NULL           },
   { TidyTag_TH,         "th",         VERS_ELEM_TH,         &TY_(W3CAttrsFor_TH)[0],         (CM_ROW|CM_OPT|CM_NO_INDENT),                  TY_(ParseBlock),    NULL           },
   { TidyTag_THEAD,      "thead",      VERS_ELEM_THEAD,      &TY_(W3CAttrsFor_THEAD)[0],      (CM_TABLE|CM_ROWGRP|CM_OPT),                   TY_(ParseRowGroup), NULL           },

And still to test in edbrowse...

Further feedback welcome... thanks...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions