Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsGeneralPHPASPPerlColdFusionFlashHTML, CSS, ScriptsBrowsers

Webmaster Forum / HTML, CSS, Scripts / JavaScript / May 2008



Tip: Looking for answers? Try searching our database.

regex failing

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
noon - 30 May 2008 13:34 GMT
I'm runing an xmlHttpRequest to get the site's source code and then
applying the regex

xhr.responseText.split(/<body[^>]*>((?:.|\n)*)<\/body>/i)[1]

Works for google.com.  Fails on yahoo.com and imdb.com pages (ex:
http://imdb.com/title/tt0482606/ )

Can someone help me tweak this, or give insight as to why its
failing?  I can't spot it
Erwin Moller - 30 May 2008 14:45 GMT
noon schreef:
> I'm runing an xmlHttpRequest to get the site's source code and then
> applying the regex
[quoted text clipped - 6 lines]
> Can someone help me tweak this, or give insight as to why its
> failing?  I can't spot it

Maybe...
You didn't mention what it is you WANT your regex to do.
And you didn't say what 'failing' is. An error? An unexpected result?

Regards,
Erwin Moller
noon - 30 May 2008 15:06 GMT
That information might help huh.  I want it to strip everything
inbetween body tags. The error was that I was either receiving nothing
or receiving the entire html including the head tags etc. I have since
seem to have got it working with this code:

xhr.responseText.split(/<body[^>]*>((.|\n|\r|\u2028|\u2029)*)<\/body>/
gi)[1];

Though improvement suggestions are welcome
Thomas 'PointedEars' Lahn - 30 May 2008 21:26 GMT
> That information might help huh.  I want it to strip everything
> inbetween body tags. The error was that I was either receiving nothing
[quoted text clipped - 3 lines]
> xhr.responseText.split(/<body[^>]*>((.|\n|\r|\u2028|\u2029)*)<\/body>/
> gi)[1];

With

 foo<body>...</body>bar

this would give you

 ...

But you wanted to *strip* everything *in between*, _not_ split.

> Though improvement suggestions are welcome

 ... = xhr.responseText.match(/<body(|\s+[^>]*)>((.|\s)*)<\/body>/i)[1];

is largely equivalent to your code in this case and more efficient.
However, IMHO that is still _not_ stripping everything in between but
*matching* everything in between, which is probably what you meant to say.

Note that (X)HTML is a context-sensitive language which cannot be parsed
with one regular expression (defining a regular language) alone.  In your
case it should work because a Valid (X)HTML document MUST NOT have more
than one `body' element.

PointedEars
Signature

var bugRiddenCrashPronePieceOfJunk = (
   navigator.userAgent.indexOf('MSIE 5') != -1
   && navigator.userAgent.indexOf('Mac') != -1
)  // Plone, register_function.js:16

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.