hi andy: thanks for the advice. fortunately, search engine inclusion
is not important to me. I am just trying to add the ability to
"search=find" (OCR-equivalent) text in image-files of articles. The
user will see the true authentic representation of the text in the
image itself.
This seems to be harder to do than I thought. I can't even get a
simple "proof of concept" to work. sigh...
regards,
/iaw
> I am just trying to add the ability to
> "search=find" (OCR-equivalent) text in image-files of articles.
Seems like the images alt text is what you need, or maybe even look at
using longdesc.
You'll have to process the HTML a bit though, either before publishing
(XSLT might be handy) or with some client-side JS.
ivowel@gmail.com a écrit :
> [...] I am just trying to add the ability to
> "search=find" (OCR-equivalent) text in image-files of articles. The
> user will see the true authentic representation of the text in the
> image itself.
A possible start would be:
* create a div of fixed width and height (in pixels)
* set the image as background of that div
* split the recognized text in words, or possibly in lines, create one
span per unit and position it inside the div
(using lines instead of words as unit allows the search to span
several words)
But then a simple proof of concept shows that a lot of information is
missing: do you have any way to obtain the font used in the image? How
can you make sure the end user has the same font installed, or allows
text to be as small as the text in the image?
I would go another route:
* create a div as above
* put the image in it
* write a simple FORM that lets the user enter search pattern
* write the server side code that, given that search pattern,
find all the matches and, for each match, computes a rectangle
framing the matching text, then creates a span inside the outer
div (simple border, no background to let the text shine through)
* when this is working, add some Ajax "magic" for interactivity.
Good luck!

Signature
Daniel Déchelotte
http://yo.dan.free.fr/
ivowel@gmail.com - 29 Dec 2006 23:44 GMT
hi daniel:
the best analogy may be the desire to display a NYTimes front page
image, and allow a user to search for text on this page.
your first suggestion is also what I was thinking of, but the remaining
issue is how I can make the foreground words invisible, except that I
need at least a colored box when selected.
I value speed and convenience [using ordinary browser mechanisms] over
perfect alignment accuracy or even multi-word searches. I can also
assume that my users have the basic microsoft web fonts. at least on a
word-by-word basis, I can make sure that I won't be off too much. In
fact, I would even be happy to display an arrow where the searched for
word starts---the point is to allow the reader to find a word on a long
image page.
I have never used ajax. maybe I need to learn it to do something like
this, though.
Andy---any alt tags on images are not searchable, and even if they
were, they would not show the reader where a particular word starts.
regards,
/iaw
> ivowel@gmail.com a écrit :
>
[quoted text clipped - 27 lines]
>
> Good luck!