innerHTML problem in IE6
|
|
Thread rating:  |
Kiran Makam - 26 Aug 2008 12:58 GMT I am setting the content of a div dynamically using innerHTML property. If the content contains an ampersand, text after the ampersand is disappearing in IE6. It works properly in Firefox.
This is my code: ---------------- <body>
<div id='div1'></div> <script> var div = document.getElementById('div1'); div.innerHTML = "A&B"; </script>
</body> ---------------
IE6 renders the content of div1 as 'A' Firefox renders the content properly as 'A&B'
If there is a space after ampersand, IE6 renders it properly. So I think that IE is assuming anything after ampersand as an HTML entity ( like ).
Is this a bug in IE6? Is there any workaround for this?
Thanks Kiran Makam
Jonathan N. Little - 26 Aug 2008 13:35 GMT > I am setting the content of a div dynamically using innerHTML > property. If the content contains an ampersand, text after the [quoted text clipped - 9 lines] > div.innerHTML = "A&B"; > </script> Try:
div.innerHTML = "A&B";
 Signature Take care, Jonathan ------------------- LITTLE WORKS STUDIO http://www.LittleWorksStudio.com
Lars Eighner - 26 Aug 2008 13:38 GMT In our last episode, <6ddeff6e-6171-4ab5-bc81-eb4b4a822d6d@w1g2000prk.googlegroups.com>, the lovely and talented Kiran Makam broadcast on alt.html:
> I am setting the content of a div dynamically using innerHTML > property. If the content contains an ampersand, text after the > ampersand is disappearing in IE6. It works properly in Firefox.
> This is my code: > ---------------- ><body>
><div id='div1'></div> ><script> > var div = document.getElementById('div1'); > div.innerHTML = "A&B"; ></script>
></body> > ---------------
> IE6 renders the content of div1 as 'A' > Firefox renders the content properly as 'A&B'
> If there is a space after ampersand, IE6 renders it properly. So I > think that IE is assuming anything after ampersand as an HTML entity > ( like ).
> Is this a bug in IE6? No, it is a bug in your markup. & should always be & The browser is entitled to suppose any string starting with & is an attempt at a character entity. It may be that FF has a better error correction ability, but you can't blame a browser for how it handles errors.
> Is there any workaround for this? Yes. Enter & as &
 Signature Lars Eighner <http://larseighner.com/> usenet@larseighner.com War on Terrorism: History a Mystery "He's busy making history, but doesn't look back at his own, or the world's.... Bush would rather look forward than backward." --_Newsweek_
Harlan Messinger - 26 Aug 2008 15:58 GMT > In our last episode, > <6ddeff6e-6171-4ab5-bc81-eb4b4a822d6d@w1g2000prk.googlegroups.com>, [quoted text clipped - 35 lines] > > Yes. Enter & as & To clarify for the original poster: this isn't a workaround, it's the proper way to escape the ampersand in HTML when it's being used as a literal instead of in its special role as first character in an entity code.
Jukka K. Korpela - 26 Aug 2008 17:37 GMT >> This is my code: As so often, a URL would have been needed, even for an apparently trivial piece of code. Experienced authors know this, and others should just believe it. :-)
>> <script> >> var div = document.getElementById('div1'); >> div.innerHTML = "A&B"; The markup is invalid due to lack of required type="..." attribute, but this is really just a formality. More importantly, we don't know whether this is supposed to be HTML or XHTML and how it has been served.
>> IE6 renders the content of div1 as 'A' >> Firefox renders the content properly as 'A&B' [quoted text clipped - 6 lines] > > No, it is a bug in your markup. Whether the markup is correct depends on whether this is HTML or XHTML. In HTML, the content model of <script> is CDATA, which means that entity references are not recognized, so "&B" means just the character "&" followed by the character "B". In XHTML, the content model is #PCDATA, in which case...
> & should always be & ... or something equivalent.
> The browser is entitled to suppose any string starting with & is an > attempt at a character entity. No, not in HTML when inside <script> (or <style>). Otherwise, it is _required_ to treat "&" as potentially starting an entity reference or a character reference. Error processing rules are then different for different situations and flavors of HTML. In HTML 4.01, "&B" must be parsed as an entity reference, but since no such entity has been defined, we're in the error processing area, and treating "&" as a data character is conventional in browsers in such cases. In XHTML, "&B", when not followed by a semicolon (possibly after some name characters) is a well-formedness violation and XML processors should simply report an error and refuse to display the document at all.
Note: There are no grounds for assuming &B to be a "character entity" in any flavor of HTML. The pseudo-term "character entity" is, at best, shorthand for "entity reference that happens to evaluate to a one-character string". The entity reference &B does not evaluate to anything; it is undefined.
Confused? Fine. Just outsource the script, avoiding the mess!
> It may be that FF has a better error > correction ability, but > you can't blame a browser for how it handles errors. Oh we can, both on practical grounds and, in some cases, on formal grounds.
>> Is there any workaround for this? > > Yes. Enter & as & The best way to solve the problem is to put the script in an external file and reference it via <script type="text/javascript" src="foo.js"></script>
Yucca
Ben C - 26 Aug 2008 19:02 GMT [...]
>>> var div = document.getElementById('div1'); >>> div.innerHTML = "A&B"; [...]
>> & should always be & > [quoted text clipped - 6 lines] > _required_ to treat "&" as potentially starting an entity reference or a > character reference. In markup like:
<script> div.innerHTML = "A&B"; </script>
"A&B" is certainly inside a script element. But is it also inside a <div> element?
We can imagine that the browser recursively enters its HTML parser to evaluate innerHTML, so its HTML parser will see something like this:
<div> A&B </div>
where it would be required to treat & as potentially starting an entity reference or a character reference as you say.
Jukka K. Korpela - 27 Aug 2008 18:17 GMT > In markup like: > [quoted text clipped - 4 lines] > "A&B" is certainly inside a script element. But is it also inside a > <div> element? A tricky question, which I tried to avoid. In terms of HTML specifications, it is not inside any <div> element, since whatever happens via scripting is outside the scope of those specs.
As http://msdn.microsoft.com/en-us/library/ms533897.aspx says so eloquently, "There is no public standard that applies to this [innerHTML] property". That vendor-specific page says: "When the innerHTML property is set, the given string completely replaces the existing content of the object. If the string contains HTML tags, the string is parsed and formatted as it is placed into the document."
I think it is fair to read this so that they promise to parse the content as HTML. This in turn means that &B would be detected as undefined entity reference. If, on the other hand, A&B were used, then it would be first parsed (as <script> element content, assuming HTML 4.01 rules) as such, and the second parsing would recognize & as a reference that denotes the & character. But they don't say exactly how the parsing works.
> We can imagine that the browser recursively enters its HTML parser to > evaluate innerHTML, Why, oh why, do people speak of recursion when they mean iteration?
> so its HTML parser will see something like this: > [quoted text clipped - 4 lines] > where it would be required to treat & as potentially starting an > entity reference or a character reference as you say. No, I don't think it sees any <div> tag. It is parsing the string "A&B", and I agree with the idea that here "&" should be treated as a special character, here starting an entity reference. But the widely accepted fallback for undefined entity references is to treat them "literally", i.e. as if e.g. "&B" were really defined to mean "&B".
Yucca
Ben C - 27 Aug 2008 22:03 GMT >> In markup like: >> [quoted text clipped - 18 lines] > I think it is fair to read this so that they promise to parse the content as > HTML. Yes although they may be saying it's only parsed as HTML if it contains HTML tags. In other words, 'innerHTML = "A&B"' would be treated differently from 'innerHTML = "<span>A&B</span>"'. But that would be silly even for Microsoft.
> This in turn means that &B would be detected as undefined entity > reference. If, on the other hand, A&B were used, then it would be first [quoted text clipped - 6 lines] > > Why, oh why, do people speak of recursion when they mean iteration? I don't know why people would do that. I only speak of recursion when I mean recursion.
>> so its HTML parser will see something like this: >> [quoted text clipped - 6 lines] > > No, I don't think it sees any <div> tag. Sorry if that was unclear. The <div> tag doesn't come from the innerHTML, but we are rewriting the contents of a <div>, that's all I meant.
Jukka K. Korpela - 27 Aug 2008 22:42 GMT >>> We can imagine that the browser recursively enters its HTML parser >>> to evaluate innerHTML, [quoted text clipped - 3 lines] > I don't know why people would do that. I only speak of recursion when > I mean recursion. Which recursion is involved when a browser, having parsed HTML data, starts interpreting it, finds some client-side script code, executes it, then starts parsing the data that results from the execution? (In this case, as so often, the generation of that data is trivial, since it is a string constant, but that's irrelevant here.) Answer: There is no recursion involved. The parsing was finished long before the script execution started, at the logical level at least, and then new parsing was initiated. It's really not even iteration, except in a trivial sense.
Parsing HTML could itself be recursive (i.e., a parser routine might call itself), and that would be natural in a sense since HTML is defined recursively. But tag soup slurpers don't do that, and generally, recursive parsing is less efficient than non-recursive parsing.
Yucca
Ben C - 27 Aug 2008 23:32 GMT >>>> We can imagine that the browser recursively enters its HTML parser >>>> to evaluate innerHTML, [quoted text clipped - 12 lines] > at the logical level at least, and then new parsing was initiated. It's > really not even iteration, except in a trivial sense. I said we can "imagine" that the browser recursively enters its HTML parser. I'm not talking about particular implementations, although I see no reason why they wouldn't use recursion here.
I'm not sure what you mean by "interpreting" HTML data. The basic operation here is to build a DOM tree out of HTML. That might as well be done in one step ("parsing") without another intermediate stage.
If that is the case, then script elements need to execute as they are found, which could be naturally implemented by re-entering the parser.
The actual code would probably be mutually recursive-- the parser calls the script interpreter which calls the parser.
> Parsing HTML could itself be recursive (i.e., a parser routine might call > itself), and that would be natural in a sense since HTML is defined > recursively. But tag soup slurpers don't do that Who cares about tag soup slurpers or knows what the hell they do?
> and generally, recursive parsing is less efficient than non-recursive > parsing. How would you parse HTML more efficiently than by using recursive parsing?
Sherm Pendley - 28 Aug 2008 00:12 GMT > How would you parse HTML more efficiently than by using recursive > parsing? I don't know about other parsers, but Expat uses callback functions that it calls when it finds an opening tag, closing tag, text node, comment, etc. It's event driven, not recursive - the parser function never calls itself.
sherm--
 Signature My blog: http://shermspace.blogspot.com Cocoa programming in Perl: http://camelbones.sourceforge.net
Ben C - 28 Aug 2008 07:42 GMT >> How would you parse HTML more efficiently than by using recursive >> parsing? [quoted text clipped - 3 lines] > comment, etc. It's event driven, not recursive - the parser function > never calls itself. Indeed, and neither does the tree builder you implement in the callbacks-- it has to either maintain an explicit stack or use parent pointers on the tree nodes it is generating.
But none of that is any more efficient than doing it recursively, it's just one way of trying to separate things.
Sherm Pendley - 28 Aug 2008 14:59 GMT >>> How would you parse HTML more efficiently than by using recursive >>> parsing? [quoted text clipped - 10 lines] > But none of that is any more efficient than doing it recursively, it's > just one way of trying to separate things. It's not faster, but I'd say it's more memory-efficient. Instead of a deep call stack + your data tree, you have just the tree. And it's easier for a lot of programmers to understand - for some reason, a lot of people have trouble with recursion.
sherm--
 Signature My blog: http://shermspace.blogspot.com Cocoa programming in Perl: http://camelbones.sourceforge.net
Ben C - 28 Aug 2008 17:30 GMT >>>> How would you parse HTML more efficiently than by using recursive >>>> parsing? [quoted text clipped - 13 lines] > It's not faster, but I'd say it's more memory-efficient. Instead of a > deep call stack + your data tree, you have just the tree. Typically yes. But if you use recursion (or your own stack) you can trade the memory of parent pointers in the tree (which stays allocated for as long as the tree persists) for stack memory (which is given back to the system as soon as the tree is built).
In practice though you are unlikely not to need those parent pointers later anyway.
> And it's easier for a lot of programmers to understand - for some > reason, a lot of people have trouble with recursion. Recursion can make programs easier to understand in a sort of "divide and conquer" way. But sometimes it does make them harder to understand, and it can make profiling difficult.
In this case the Expat way of doing things is fairly nice.
Neredbojias - 28 Aug 2008 02:15 GMT > I said we can "imagine" that the browser recursively enters its HTML > parser. I'm not talking about particular implementations, although I see > no reason why they wouldn't use recursion here. From the Neredbojias dictionary:
Recursion - The proximate deployment of more than one swear word, any of which is not phraseologically related to the others.
Iteration - Improper or excessive use of a pronoun.
Hope that clears this up.
 Signature Neredbojias http://www.neredbojias.net/ Great Sights and Sounds http://adult.neredbojias.net/ (adult)
Jukka K. Korpela - 28 Aug 2008 05:21 GMT > I said we can "imagine" that the browser recursively enters its HTML > parser. There's no reason to imagine anything more complex than I described.
> I'm not sure what you mean by "interpreting" HTML data. Processing it by some semantic rules, such as the rule that <script> element content is script code that needs to be passed to a script interpreter. This is something that can only be performed after the element has been parsed.
> The basic > operation here is to build a DOM tree out of HTML. That's irrelevant. The point is that the HTML markup _has been parsed_, and then you start doing something else. If you will then start parsing HTML again, it ain't no recursion. It's just another instance of parsing.
>> Parsing HTML could itself be recursive (i.e., a parser routine might >> call itself), and that would be natural in a sense since HTML is >> defined recursively. But tag soup slurpers don't do that > > Who cares about tag soup slurpers or knows what the hell they do? The innerHTML construct is all about tag slurpers, existing browsers, not ideal browsers as defined in specifications.
> How would you parse HTML more efficiently than by using recursive > parsing? Browsers have done that for years. You just look at tags and turn them to actions. You see <strong>, you start bolding. You see </strong>, you turn bolding off. There are browser features that resemble structural processing, and newer browsers might even be good at it, but in fact structural processing can be performed by using explicit stacks, instead of the implicit stacking involved in recursion.
I could write a nonrecursive HTML parser for you, but then I would have to... charge you for it.
Yucca
Ben C - 28 Aug 2008 07:55 GMT >> I said we can "imagine" that the browser recursively enters its HTML >> parser. > > There's no reason to imagine anything more complex than I described. What you're describing is more complex than what I described, in my imagination at least.
>> I'm not sure what you mean by "interpreting" HTML data. > > Processing it by some semantic rules, such as the rule that <script> element > content is script code that needs to be passed to a script interpreter. This > is something that can only be performed after the element has been parsed. OK, but the <script> element has to be interpreted before elements after it in the source are.
>> The basic >> operation here is to build a DOM tree out of HTML. > > That's irrelevant. The point is that the HTML markup _has been parsed_, and > then you start doing something else. If you will then start parsing HTML > again, it ain't no recursion. It's just another instance of parsing. You're presupposing an unnecessarily complicated implementation. You're saying the program looks something like this:
data = parse(html)
define process(data): blah blah which may involve calling parse
while no more html present in data: process(data)
Well I wouldn't write it like that.
>>> Parsing HTML could itself be recursive (i.e., a parser routine might >>> call itself), and that would be natural in a sense since HTML is [quoted text clipped - 4 lines] > The innerHTML construct is all about tag slurpers, existing browsers, not > ideal browsers as defined in specifications. Yes I realize Microsoft invented innerHTML, but OperaFirefoxSafari implement it and they are not tag slurpers.
>> How would you parse HTML more efficiently than by using recursive >> parsing? > > Browsers have done that for years. You just look at tags and turn them to > actions. You see <strong>, you start bolding. You see </strong>, you turn > bolding off. That isn't how the current generation of browsers work. They need a tree to match CSS selectors to and to apply DOM methods to, and they produce a tree even out of invalid markup (by bodging it around in various ways-- different bodging rules for different elements).
> There are browser features that resemble structural processing, > and newer browsers might even be good at it, but in fact structural > processing can be performed by using explicit stacks, instead of the > implicit stacking involved in recursion. Yes everyone knows that. But it's normal when describing an algorithm to say it is "recursive" even if when you come to implement it you avoid actually writing a function that calls itself.
> I could write a nonrecursive HTML parser for you, but then I would have > to... charge you for it. I didn't ask if you could, I asked why you thought it would be more efficient.
Go ahead and write it but I will only pay for it if I can't write a recursive one that's just as efficient.
Jukka K. Korpela - 28 Aug 2008 08:12 GMT >> Processing it by some semantic rules, such as the rule that <script> >> element content is script code that needs to be passed to a script [quoted text clipped - 3 lines] > OK, but the <script> element has to be interpreted before elements > after it in the source are. Not at all. Actually, it need not be interpreted at all. Browsers may well ignore the content of <script> elements, and they often do, but they still need to _parse_ them (if not for anything else, in order to recognize the end of the element).
>> That's irrelevant. The point is that the HTML markup _has been >> parsed_, and then you start doing something else. If you will then >> start parsing HTML again, it ain't no recursion. It's just another >> instance of parsing. > > You're presupposing an unnecessarily complicated implementation. No, I'm just describing what happens conceptually. A parser is a parser even if integrated into a grotesquely large program.
> You're saying the program looks something like this: No, I'm not saying anything about timing, such as processing some part of an HTML document while the rest is still being parsed. Running a parser and a script interpreter in parallel does not imply that if the script interpreter invokes another instance of the parse, it would be some kind of recursion.
So you _are_ confusing recursion with iteration, or actually mere new invocation - as many people do.
> Yes I realize Microsoft invented innerHTML, but OperaFirefoxSafari > implement it and they are not tag slurpers. They slurp tags more than you'd think. Check out whether they are still sensitive to the presence or absence of the "optional" </p> tag as regards to styling. Last time I checked, I was disappointed.
> Yes everyone knows that. But it's normal when describing an algorithm > to say it is "recursive" even if when you come to implement it you > avoid actually writing a function that calls itself. There's no recursive algorithm involved in the handling of innerHTML.
Yucca
Ben C - 28 Aug 2008 09:09 GMT >>> Processing it by some semantic rules, such as the rule that <script> >>> element content is script code that needs to be passed to a script [quoted text clipped - 8 lines] > need to _parse_ them (if not for anything else, in order to recognize the > end of the element). I meant browsers that don't ignore the content of <script> elements, as was obvious from the context.
Consider this markup:
<div> <div id="foo"> <p>world</p> </div> <script type="text/javascript"> document.getElementById("foo").innerHTML = "<span>hello</span>"; document.getElementById("bar").innerHTML = "<span>hello</span>"; </script> <div id="bar"> <p>world</p> </div> </div>
In browsers that support Javascript and innerHTML, there is a reason why the first call to getElementById succeeds but the second fails.
A browser that "parsed" div#bar before it interpreted the script would risk getting this wrong.
Conceptually it's easier to think of the script as being interpreted as soon as it is encountered.
>>> That's irrelevant. The point is that the HTML markup _has been >>> parsed_, and then you start doing something else. If you will then [quoted text clipped - 4 lines] > > No, I'm just describing what happens conceptually. That's also all I was doing. If you have a point I'm not seeing it.
> A parser is a parser even > if integrated into a grotesquely large program. [quoted text clipped - 5 lines] > script interpreter in parallel does not imply that if the script interpreter > invokes another instance of the parse, it would be some kind of recursion. My pseudocode didn't describe a parallel program.
> So you _are_ confusing recursion with iteration, or actually mere new > invocation - as many people do. You actually wrote something once about why all attempts at communication are doomed.
I don't know if "jumping to the conclusion that the other person doesn't know what he's talking about as soon if you don't understand him immediately" was on the list of reasons but perhaps it should be.
>> Yes I realize Microsoft invented innerHTML, but OperaFirefoxSafari >> implement it and they are not tag slurpers. > > They slurp tags more than you'd think. Check out whether they are still > sensitive to the presence or absence of the "optional" </p> tag as regards > to styling. Last time I checked, I was disappointed. Do you mean they produced the wrong tree from valid HTML? Or that they produced different trees from invalid HTML depending on whether </p> was present or not.
In other words:
<p>Hello <b>world </p> <p>foo bar</b></p>
might have produced something different from:
<p>Hello <b>world <p>foo bar</b>
But both are invalid, so anything goes.
I would be interested if you have an example of Firefox/Opera/Konqueror consistently producing the wrong tree from valid HTML.
|
|
|