Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsGeneralPHPASPPerlColdFusionFlashHTML, CSS, ScriptsBrowsers

Webmaster Forum / HTML, CSS, Scripts / JavaScript / February 2005



Tip: Looking for answers? Try searching our database.

process XMLHTTP response returning poorly formed html

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
dandiebolt@yahoo.com - 25 Feb 2005 19:08 GMT
Using xmlhttp I am accessing a document from the web that is not xml
and is in fact not even proper html even though it is supposed to be
(unbalanced tags). Here is the type of code I am using:

url="http://www.domain.com/page.html";
var xmlhttp = new ActiveXObject("Msxml2.XMLHTTP");
xmlhttp.open("GET", url, false);
xmlhttp.send();
var xmlResp = xmlhttp.responseXML;

I want to create an array that holds the contents of every paragraph
<P> tag. The paragraphs are well formed with both opening and closing
tags <P> and </P> but the document as a whole is not valid xml or html.
How can I process the so-called xml response to extract the contents of
each paragraph into a unique element of the array? I would like to use
DOM methods to extract the paragraphs rather than parse up the text
with xmlhttp.responseText. Any help would be appreciated.
strout - 25 Feb 2005 22:25 GMT
It will be easy to parse by regular expression.
dandiebolt@yahoo.com - 26 Feb 2005 00:30 GMT
> It will be easy to parse by regular expression.

How? The only think I know about the document is that the information I
need is in between  successive <P> and </P> tags. I was reluctant to
use regexp because I have any more structure than what is described and
I don't control the format of the page source.
Tim Williams - 26 Feb 2005 04:24 GMT
Why not automate IE to load the page and then grab your content once
IE has done it's job ?  Presumably that will "fix" any irregularities
in the source.

Depends on where you want to do this, but is an option.  Or just use
response.text and MSHTML ?

Tim.

> Using xmlhttp I am accessing a document from the web that is not xml
> and is in fact not even proper html even though it is supposed to be
[quoted text clipped - 17 lines]
> DOM methods to extract the paragraphs rather than parse up the text
> with xmlhttp.responseText. Any help would be appreciated.
Jim Ley - 26 Feb 2005 09:31 GMT
>Using xmlhttp I am accessing a document from the web that is not xml
>and is in fact not even proper html even though it is supposed to be
>(unbalanced tags).

There's nothin inherent in unbalanced tags that would make something
not valid HTML - html fullly allows lots of closing elements as
optional.

>How can I process the so-called xml response to extract the contents of
>each paragraph into a unique element of the array? I would like to use
>DOM methods to extract the paragraphs rather than parse up the text
>with xmlhttp.responseText. Any help would be appreciated.

using the browser to parse the responseText as html will give you a
DOM of it, there's no other reasonable solution.

Jim.
Mr.Clean - 28 Feb 2005 15:54 GMT
> >How can I process the so-called xml response to extract the contents of
> >each paragraph into a unique element of the array? I would like to use
[quoted text clipped - 3 lines]
> using the browser to parse the responseText as html will give you a
> DOM of it, there's no other reasonable solution.

You could also do it with just an IHTMLDocument2 implementaton, you
don't really need the browser.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.