Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsGeneralPHPASPPerlColdFusionFlashHTML, CSS, ScriptsBrowsers

Webmaster Forum / HTML, CSS, Scripts / HTML / August 2008



Tip: Looking for answers? Try searching our database.

innerHTML problem in IE6

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Kiran Makam - 26 Aug 2008 12:58 GMT
I am setting the content of a div dynamically using innerHTML
property. If the content contains an ampersand, text after the
ampersand is disappearing in IE6. It works properly in Firefox.

This is my code:
----------------
<body>

<div id='div1'></div>
<script>
var div = document.getElementById('div1');
div.innerHTML = "A&B";
</script>

</body>
---------------

IE6 renders the content of div1 as 'A'
Firefox renders the content properly as 'A&B'

If there is a space after ampersand, IE6 renders it properly. So I
think that IE is assuming anything after ampersand as an HTML entity
( like &nbsp; ).

Is this a bug in IE6? Is there any workaround for this?

Thanks
Kiran Makam
Jonathan N. Little - 26 Aug 2008 13:35 GMT
> I am setting the content of a div dynamically using innerHTML
> property. If the content contains an ampersand, text after the
[quoted text clipped - 9 lines]
> div.innerHTML = "A&B";
> </script>

Try:

div.innerHTML = "A&amp;B";

Signature

Take care,

Jonathan
-------------------
LITTLE WORKS STUDIO
http://www.LittleWorksStudio.com

Lars Eighner - 26 Aug 2008 13:38 GMT
In our last episode,
<6ddeff6e-6171-4ab5-bc81-eb4b4a822d6d@w1g2000prk.googlegroups.com>,
the lovely and talented Kiran Makam
broadcast on alt.html:

> I am setting the content of a div dynamically using innerHTML
> property. If the content contains an ampersand, text after the
> ampersand is disappearing in IE6. It works properly in Firefox.

> This is my code:
> ----------------
><body>

><div id='div1'></div>
><script>
> var div = document.getElementById('div1');
> div.innerHTML = "A&B";
></script>

></body>
> ---------------

> IE6 renders the content of div1 as 'A'
> Firefox renders the content properly as 'A&B'

> If there is a space after ampersand, IE6 renders it properly. So I
> think that IE is assuming anything after ampersand as an HTML entity
> ( like &nbsp; ).

> Is this a bug in IE6?

No, it is a bug in your markup.  & should always be &amp;  The browser is
entitled to suppose any string starting with & is an attempt at a character
entity.  It may be that FF has a better error correction ability, but
you can't blame a browser for how it handles errors.

> Is there any workaround for this?

Yes.  Enter & as &amp;

Signature

Lars Eighner <http://larseighner.com/> usenet@larseighner.com
                   War on Terrorism:  History a Mystery
"He's busy making history, but doesn't look back at his own, or the
  world's.... Bush would rather look forward than backward." --_Newsweek_

Harlan Messinger - 26 Aug 2008 15:58 GMT
> In our last episode,
> <6ddeff6e-6171-4ab5-bc81-eb4b4a822d6d@w1g2000prk.googlegroups.com>,
[quoted text clipped - 35 lines]
>
> Yes.  Enter & as &amp;

To clarify for the original poster: this isn't a workaround, it's the
proper way to escape the ampersand in HTML when it's being used as a
literal instead of in its special role as first character in an entity code.
Jukka K. Korpela - 26 Aug 2008 17:37 GMT
>> This is my code:

As so often, a URL would have been needed, even for an apparently trivial
piece of code. Experienced authors know this, and others should just believe
it. :-)

>> <script>
>> var div = document.getElementById('div1');
>> div.innerHTML = "A&B";

The markup is invalid due to lack of required type="..." attribute, but this
is really just a formality. More importantly, we don't know whether this is
supposed to be HTML or XHTML and how it has been served.

>> IE6 renders the content of div1 as 'A'
>> Firefox renders the content properly as 'A&B'
[quoted text clipped - 6 lines]
>
> No, it is a bug in your markup.

Whether the markup is correct depends on whether this is HTML or XHTML. In
HTML, the content model of <script> is CDATA, which means that entity
references are not recognized, so "&B" means just the character "&" followed
by the character "B". In XHTML, the content model is #PCDATA, in which
case...

> & should always be &amp;

... or something equivalent.

> The browser is entitled to suppose any string starting with & is an
> attempt at a character entity.

No, not in HTML when inside <script> (or <style>). Otherwise, it is
_required_ to treat "&" as potentially starting an entity reference or a
character reference. Error processing rules are then different for different
situations and flavors of HTML. In HTML 4.01, "&B" must be parsed as an
entity reference, but since no such entity has been defined, we're in the
error processing area, and treating "&" as a data character is conventional
in browsers in such cases. In XHTML, "&B", when not followed by a semicolon
(possibly after some name characters) is a well-formedness violation and XML
processors should simply report an error and refuse to display the document
at all.

Note: There are no grounds for assuming &B to be a "character entity" in any
flavor of HTML. The pseudo-term "character entity" is, at best, shorthand
for "entity reference that happens to evaluate to a one-character string".
The entity reference &B does not evaluate to anything; it is undefined.

Confused? Fine. Just outsource the script, avoiding the mess!

> It may be that FF has a better error
> correction ability, but
> you can't blame a browser for how it handles errors.

Oh we can, both on practical grounds and, in some cases, on formal grounds.

>> Is there any workaround for this?
>
> Yes.  Enter & as &amp;

The best way to solve the problem is to put the script in an external file
and reference it via <script type="text/javascript" src="foo.js"></script>

Yucca
Ben C - 26 Aug 2008 19:02 GMT
[...]
>>> var div = document.getElementById('div1');
>>> div.innerHTML = "A&B";
[...]
>> & should always be &amp;
>
[quoted text clipped - 6 lines]
> _required_ to treat "&" as potentially starting an entity reference or a
> character reference.

In markup like:

   <script>
       div.innerHTML = "A&B";
   </script>

"A&B" is certainly inside a script element. But is it also inside a
<div> element?

We can imagine that the browser recursively enters its HTML parser to
evaluate innerHTML, so its HTML parser will see something like this:

   <div>
       A&B
   </div>

where it would be required to treat & as potentially starting an entity
reference or a character reference as you say.
Jukka K. Korpela - 27 Aug 2008 18:17 GMT
> In markup like:
>
[quoted text clipped - 4 lines]
> "A&B" is certainly inside a script element. But is it also inside a
> <div> element?

A tricky question, which I tried to avoid. In terms of HTML specifications,
it is not inside any <div> element, since whatever happens via scripting is
outside the scope of those specs.

As http://msdn.microsoft.com/en-us/library/ms533897.aspx says so eloquently,
"There is no public standard that applies to this [innerHTML] property".
That vendor-specific page says:
"When the innerHTML property is set, the given string completely replaces
the existing content of the object. If the string contains HTML tags, the
string is parsed and formatted as it is placed into the document."

I think it is fair to read this so that they promise to parse the content as
HTML. This in turn means that &B would be detected as undefined entity
reference. If, on the other hand, A&amp;B were used, then it would be first
parsed (as <script> element content, assuming HTML 4.01 rules) as such, and
the second parsing would recognize &amp; as a reference that denotes the &
character. But they don't say exactly how the parsing works.

> We can imagine that the browser recursively enters its HTML parser to
> evaluate innerHTML,

Why, oh why, do people speak of recursion when they mean iteration?

> so its HTML parser will see something like this:
>
[quoted text clipped - 4 lines]
> where it would be required to treat & as potentially starting an
> entity reference or a character reference as you say.

No, I don't think it sees any <div> tag. It is parsing the string "A&B", and
I agree with the idea that here "&" should be treated as a special
character, here starting an entity reference. But the widely accepted
fallback for undefined entity references is to treat them "literally", i.e.
as if e.g. "&B" were really defined to mean "&B".

Yucca
Ben C - 27 Aug 2008 22:03 GMT
>> In markup like:
>>
[quoted text clipped - 18 lines]
> I think it is fair to read this so that they promise to parse the content as
> HTML.

Yes although they may be saying it's only parsed as HTML if it contains
HTML tags. In other words, 'innerHTML = "A&amp;B"' would be treated
differently from 'innerHTML = "<span>A&amp;B</span>"'. But that would be
silly even for Microsoft.

> This in turn means that &B would be detected as undefined entity
> reference. If, on the other hand, A&amp;B were used, then it would be first
[quoted text clipped - 6 lines]
>
> Why, oh why, do people speak of recursion when they mean iteration?

I don't know why people would do that. I only speak of recursion when I
mean recursion.

>> so its HTML parser will see something like this:
>>
[quoted text clipped - 6 lines]
>
> No, I don't think it sees any <div> tag.

Sorry if that was unclear. The <div> tag doesn't come from the
innerHTML, but we are rewriting the contents of a <div>, that's all I
meant.
Jukka K. Korpela - 27 Aug 2008 22:42 GMT
>>> We can imagine that the browser recursively enters its HTML parser
>>> to evaluate innerHTML,
[quoted text clipped - 3 lines]
> I don't know why people would do that. I only speak of recursion when
> I mean recursion.

Which recursion is involved when a browser, having parsed HTML data, starts
interpreting it, finds some client-side script code, executes it, then
starts parsing the data that results from the execution? (In this case, as
so often, the generation of that data is trivial, since it is a string
constant, but that's irrelevant here.) Answer: There is no recursion
involved. The parsing was finished long before the script execution started,
at the logical level at least, and then new parsing was initiated. It's
really not even iteration, except in a trivial sense.

Parsing HTML could itself be recursive (i.e., a parser routine might call
itself), and that would be natural in a sense since HTML is defined
recursively. But tag soup slurpers don't do that, and generally, recursive
parsing is less efficient than non-recursive parsing.

Yucca
Ben C - 27 Aug 2008 23:32 GMT
>>>> We can imagine that the browser recursively enters its HTML parser
>>>> to evaluate innerHTML,
[quoted text clipped - 12 lines]
> at the logical level at least, and then new parsing was initiated. It's
> really not even iteration, except in a trivial sense.

I said we can "imagine" that the browser recursively enters its HTML
parser. I'm not talking about particular implementations, although I see
no reason why they wouldn't use recursion here.

I'm not sure what you mean by "interpreting" HTML data. The basic
operation here is to build a DOM tree out of HTML. That might as well be
done in one step ("parsing") without another intermediate stage.

If that is the case, then script elements need to execute as they are
found, which could be naturally implemented by re-entering the parser.

The actual code would probably be mutually recursive-- the parser calls
the script interpreter which calls the parser.

> Parsing HTML could itself be recursive (i.e., a parser routine might call
> itself), and that would be natural in a sense since HTML is defined
> recursively. But tag soup slurpers don't do that

Who cares about tag soup slurpers or knows what the hell they do?

> and generally, recursive parsing is less efficient than non-recursive
> parsing.

How would you parse HTML more efficiently than by using recursive
parsing?
Sherm Pendley - 28 Aug 2008 00:12 GMT
> How would you parse HTML more efficiently than by using recursive
> parsing?

I don't know about other parsers, but Expat uses callback functions
that it calls when it finds an opening tag, closing tag, text node,
comment, etc. It's event driven, not recursive - the parser function
never calls itself.

sherm--

Signature

My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net

Ben C - 28 Aug 2008 07:42 GMT
>> How would you parse HTML more efficiently than by using recursive
>> parsing?
[quoted text clipped - 3 lines]
> comment, etc. It's event driven, not recursive - the parser function
> never calls itself.

Indeed, and neither does the tree builder you implement in the
callbacks-- it has to either maintain an explicit stack or use parent
pointers on the tree nodes it is generating.

But none of that is any more efficient than doing it recursively, it's
just one way of trying to separate things.
Sherm Pendley - 28 Aug 2008 14:59 GMT
>>> How would you parse HTML more efficiently than by using recursive
>>> parsing?
[quoted text clipped - 10 lines]
> But none of that is any more efficient than doing it recursively, it's
> just one way of trying to separate things.

It's not faster, but I'd say it's more memory-efficient. Instead of a
deep call stack + your data tree, you have just the tree. And it's
easier for a lot of programmers to understand - for some reason, a lot
of people have trouble with recursion.

sherm--

Signature

My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net

Ben C - 28 Aug 2008 17:30 GMT
>>>> How would you parse HTML more efficiently than by using recursive
>>>> parsing?
[quoted text clipped - 13 lines]
> It's not faster, but I'd say it's more memory-efficient. Instead of a
> deep call stack + your data tree, you have just the tree.

Typically yes. But if you use recursion (or your own stack) you can
trade the memory of parent pointers in the tree (which stays allocated
for as long as the tree persists) for stack memory (which is given back
to the system as soon as the tree is built).

In practice though you are unlikely not to need those parent pointers
later anyway.

> And it's easier for a lot of programmers to understand - for some
> reason, a lot of people have trouble with recursion.

Recursion can make programs easier to understand in a sort of "divide
and conquer" way. But sometimes it does make them harder to understand,
and it can make profiling difficult.

In this case the Expat way of doing things is fairly nice.
Neredbojias - 28 Aug 2008 02:15 GMT
> I said we can "imagine" that the browser recursively enters its HTML
> parser. I'm not talking about particular implementations, although I see
> no reason why they wouldn't use recursion here.

From the Neredbojias dictionary:

Recursion - The proximate deployment of more than one swear word, any of
which is not phraseologically related to the others.

Iteration - Improper or excessive use of a pronoun.

Hope that clears this up.

Signature

Neredbojias
http://www.neredbojias.net/
Great Sights and Sounds
http://adult.neredbojias.net/ (adult)

Jukka K. Korpela - 28 Aug 2008 05:21 GMT
> I said we can "imagine" that the browser recursively enters its HTML
> parser.

There's no reason to imagine anything more complex than I described.

> I'm not sure what you mean by "interpreting" HTML data.

Processing it by some semantic rules, such as the rule that <script> element
content is script code that needs to be passed to a script interpreter. This
is something that can only be performed after the element has been parsed.

> The basic
> operation here is to build a DOM tree out of HTML.

That's irrelevant. The point is that the HTML markup _has been parsed_, and
then you start doing something else. If you will then start parsing HTML
again, it ain't no recursion. It's just another instance of parsing.

>> Parsing HTML could itself be recursive (i.e., a parser routine might
>> call itself), and that would be natural in a sense since HTML is
>> defined recursively. But tag soup slurpers don't do that
>
> Who cares about tag soup slurpers or knows what the hell they do?

The innerHTML construct is all about tag slurpers, existing browsers, not
ideal browsers as defined in specifications.

> How would you parse HTML more efficiently than by using recursive
> parsing?

Browsers have done that for years. You just look at tags and turn them to
actions. You see <strong>, you start bolding. You see </strong>, you turn
bolding off. There are browser features that resemble structural processing,
and newer browsers might even be good at it, but in fact structural
processing can be performed by using explicit stacks, instead of the
implicit stacking involved in recursion.

I could write a nonrecursive HTML parser for you, but then I would have
to... charge you for it.

Yucca
Ben C - 28 Aug 2008 07:55 GMT
>> I said we can "imagine" that the browser recursively enters its HTML
>> parser.
>
> There's no reason to imagine anything more complex than I described.

What you're describing is more complex than what I described, in my
imagination at least.

>> I'm not sure what you mean by "interpreting" HTML data.
>
> Processing it by some semantic rules, such as the rule that <script> element
> content is script code that needs to be passed to a script interpreter. This
> is something that can only be performed after the element has been parsed.

OK, but the <script> element has to be interpreted before elements after
it in the source are.

>> The basic
>> operation here is to build a DOM tree out of HTML.
>
> That's irrelevant. The point is that the HTML markup _has been parsed_, and
> then you start doing something else. If you will then start parsing HTML
> again, it ain't no recursion. It's just another instance of parsing.

You're presupposing an unnecessarily complicated implementation. You're
saying the program looks something like this:

data = parse(html)

define process(data):
   blah blah which may involve calling parse

while no more html present in data:
   process(data)

Well I wouldn't write it like that.

>>> Parsing HTML could itself be recursive (i.e., a parser routine might
>>> call itself), and that would be natural in a sense since HTML is
[quoted text clipped - 4 lines]
> The innerHTML construct is all about tag slurpers, existing browsers, not
> ideal browsers as defined in specifications.

Yes I realize Microsoft invented innerHTML, but OperaFirefoxSafari
implement it and they are not tag slurpers.

>> How would you parse HTML more efficiently than by using recursive
>> parsing?
>
> Browsers have done that for years. You just look at tags and turn them to
> actions. You see <strong>, you start bolding. You see </strong>, you turn
> bolding off.

That isn't how the current generation of browsers work. They need a tree
to match CSS selectors to and to apply DOM methods to, and they produce
a tree even out of invalid markup (by bodging it around in various
ways-- different bodging rules for different elements).

> There are browser features that resemble structural processing,
> and newer browsers might even be good at it, but in fact structural
> processing can be performed by using explicit stacks, instead of the
> implicit stacking involved in recursion.

Yes everyone knows that. But it's normal when describing an algorithm to
say it is "recursive" even if when you come to implement it you avoid
actually writing a function that calls itself.

> I could write a nonrecursive HTML parser for you, but then I would have
> to... charge you for it.

I didn't ask if you could, I asked why you thought it would be more
efficient.

Go ahead and write it but I will only pay for it if I can't write a
recursive one that's just as efficient.
Jukka K. Korpela - 28 Aug 2008 08:12 GMT
>> Processing it by some semantic rules, such as the rule that <script>
>> element content is script code that needs to be passed to a script
[quoted text clipped - 3 lines]
> OK, but the <script> element has to be interpreted before elements
> after it in the source are.

Not at all. Actually, it need not be interpreted at all. Browsers may well
ignore the content of <script> elements, and they often do, but they still
need to _parse_ them (if not for anything else, in order to recognize the
end of the element).

>> That's irrelevant. The point is that the HTML markup _has been
>> parsed_, and then you start doing something else. If you will then
>> start parsing HTML again, it ain't no recursion. It's just another
>> instance of parsing.
>
> You're presupposing an unnecessarily complicated implementation.

No, I'm just describing what happens conceptually. A parser is a parser even
if integrated into a grotesquely large program.

> You're saying the program looks something like this:

No, I'm not saying anything about timing, such as processing some part of an
HTML document while the rest is still being parsed. Running a parser and a
script interpreter in parallel does not imply that if the script interpreter
invokes another instance of the parse, it would be some kind of recursion.

So you _are_ confusing recursion with iteration, or actually mere new
invocation - as many people do.

> Yes I realize Microsoft invented innerHTML, but OperaFirefoxSafari
> implement it and they are not tag slurpers.

They slurp tags more than you'd think. Check out whether they are still
sensitive to the presence or absence of the "optional" </p> tag as regards
to styling. Last time I checked, I was disappointed.

> Yes everyone knows that. But it's normal when describing an algorithm
> to say it is "recursive" even if when you come to implement it you
> avoid actually writing a function that calls itself.

There's no recursive algorithm involved in the handling of innerHTML.

Yucca
Ben C - 28 Aug 2008 09:09 GMT
>>> Processing it by some semantic rules, such as the rule that <script>
>>> element content is script code that needs to be passed to a script
[quoted text clipped - 8 lines]
> need to _parse_ them (if not for anything else, in order to recognize the
> end of the element).

I meant browsers that don't ignore the content of <script> elements, as
was obvious from the context.

Consider this markup:

   <div>
       <div id="foo">
           <p>world</p>
       </div>
       <script type="text/javascript">
           document.getElementById("foo").innerHTML = "<span>hello</span>";
           document.getElementById("bar").innerHTML = "<span>hello</span>";
       </script>
       <div id="bar">
           <p>world</p>
       </div>
   </div>

In browsers that support Javascript and innerHTML, there is a reason why
the first call to getElementById succeeds but the second fails.

A browser that "parsed" div#bar before it interpreted the script would
risk getting this wrong.

Conceptually it's easier to think of the script as being interpreted as
soon as it is encountered.

>>> That's irrelevant. The point is that the HTML markup _has been
>>> parsed_, and then you start doing something else. If you will then
[quoted text clipped - 4 lines]
>
> No, I'm just describing what happens conceptually.

That's also all I was doing. If you have a point I'm not seeing it.

> A parser is a parser even
> if integrated into a grotesquely large program.
[quoted text clipped - 5 lines]
> script interpreter in parallel does not imply that if the script interpreter
> invokes another instance of the parse, it would be some kind of recursion.

My pseudocode didn't describe a parallel program.

> So you _are_ confusing recursion with iteration, or actually mere new
> invocation - as many people do.

You actually wrote something once about why all attempts at
communication are doomed.

I don't know if "jumping to the conclusion that the other person doesn't
know what he's talking about as soon if you don't understand him
immediately" was on the list of reasons but perhaps it should be.

>> Yes I realize Microsoft invented innerHTML, but OperaFirefoxSafari
>> implement it and they are not tag slurpers.
>
> They slurp tags more than you'd think. Check out whether they are still
> sensitive to the presence or absence of the "optional" </p> tag as regards
> to styling. Last time I checked, I was disappointed.

Do you mean they produced the wrong tree from valid HTML? Or that they
produced different trees from invalid HTML depending on whether </p> was
present or not.

In other words:

   <p>Hello <b>world </p>
   <p>foo bar</b></p>

might have produced something different from:

   <p>Hello <b>world
   <p>foo bar</b>

But both are invalid, so anything goes.

I would be interested if you have an example of Firefox/Opera/Konqueror
consistently producing the wrong tree from valid HTML.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.