Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsGeneralPHPASPPerlColdFusionFlashHTML, CSS, ScriptsBrowsers

Webmaster Forum / HTML, CSS, Scripts / CSS / June 2008



Tip: Looking for answers? Try searching our database.

Choice of format for web publishing

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Haines Brown - 19 Jun 2008 14:00 GMT
I'd like to raise an issue that is somewhat outside the focus of this
newsgroup although related, which is the ideal document format for web
publication.

In terms of likely future trends, what is the ideal format for the
publication of technical documents (by "technical", I mean documents
that are paginated, have bibliography and footnotes),

The reason for my question is that I've become involved in a project to
develop an on-line journal in the humanities. The publisher intends to
solicite manuscripts in Word and convert them to PDF (using Chicago
Style Sheet, which is another matter).

My instinct is to suggest to him that PDF has disadvantages (including
accessibility and not being machine readable), and that he consider
(X)HTML instead. I'd like to know reasons for choosing one over the
other.

Signature


      Haines Brown, KB1GRM

Garmt de Vries - 19 Jun 2008 14:15 GMT
On Jun 19, 3:00 pm, Haines Brown <bro...@teufel.hartford-hwp.com>
wrote:
> The reason for my question is that I've become involved in a project to
> develop an on-line journal in the humanities. The publisher intends to
[quoted text clipped - 5 lines]
> (X)HTML instead. I'd like to know reasons for choosing one over the
> other.

I'm involved in an online journal that uses the Open Journal System.
Articles are stored on the server as OpenOffice documents, and HTML or
PDF versions are generated on the fly, according to the user's choice.
This method seems to offer the best of both: PDF allows you to
download and store just one file, and makes for better printing; HTML
is (I believe) more accessible, as you say.
David Stone - 19 Jun 2008 18:52 GMT
In article
<26aa761b-d10a-476d-842b-05b442315822@25g2000hsx.googlegroups.com>,

> On Jun 19, 3:00 pm, Haines Brown <bro...@teufel.hartford-hwp.com>
> wrote:
[quoted text clipped - 14 lines]
> download and store just one file, and makes for better printing; HTML
> is (I believe) more accessible, as you say.

I notice that a number of both commercial and professional organization
publishers do the same thing; browsing the journal on the web site
gives you volume, issue, and article listings with short abstracts.
Each entry then allows you to select html or pdf format.

The one thing I would strongly suggest you DON'T do, is try to
emulate the traditional two-column printed page format, whether
presenting the article as html or pdf.  There are very few things
more annoying than having to continuously scroll up and down
individual pages, especially when reading the text at the bottom
of one column that relates to a table of figure at the top of the
adjacent one!  I always end up printing such articles out in order
to read them...
Haines Brown - 19 Jun 2008 19:22 GMT
What I was hoping to see was someone suggest Tei/XML, with an
appropriate schema and style sheet, but since it was not mentioned, I
wonder if there's a problem going in that direction.
Signature


      Haines Brown, KB1GRM

Michael Wojcik - 20 Jun 2008 12:54 GMT
> What I was hoping to see was someone suggest Tei/XML, with an
> appropriate schema and style sheet, but since it was not mentioned, I
> wonder if there's a problem going in that direction.

Ask and ye shall receive - Andy Dingley just suggested TEI, though he
proposed (and I concur) that you store internally in TEI or DocBook
but serve HTML. I'm not sure whether that's what you were proposing
above, or whether you were thinking of serving XML + schema + style
sheet to user agents. The latter won't be handled properly by many
UAs, and will confuse non-technical users if they try to save content,
etc.

You might want to take a look at /Kairos/ [1]. They've been in the
online-humanities-journal biz for a while (about 12 years), so they
have a lot of experience with what works well for their authors and
readers.

They publish most content as HTML, but they also run multimedia
articles and the like. One factor to consider with an online
humanities journal is that authors will want to use the affordances of
the readers' systems, and that means accommodating things like video
and interactive applets. (Obviously not all readers will be able to,
or choose to, view that kind of content; but enough will.)

You can get some nice innovative work if you allow for things like
Karl Stolley's "Lo-Fi Manifesto" [2], for example.

(The current /Kairos/ design is ... aging, shall we say; but they have
a much nicer redesign coming out with the next issue that is prettier,
standards-compliant, and amply supplied with features that degrade
gracefully, like hCard markup on author information.)

[1] http://kairos.technorhetoric.net/
[2]
http://kairos.technorhetoric.net/12.3/binder.html?topoi/stolley/index.htm

Signature

Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University

Haines Brown - 21 Jun 2008 12:01 GMT
> Andy Dingley just suggested TEI, though he proposed (and I concur)
> that you store internally in TEI or DocBook but serve HTML.  I'm not
> sure whether that's what you were proposing above, or whether you were
> thinking of serving XML + schema + style sheet to user agents. The
> latter won't be handled properly by many UAs, and will confuse
> non-technical users if they try to save content, etc.

Well, I _was_ toying with the idea of serving XML+schema+stylesheet. By
"UA" I presume you mean the average browser (IE). However, I didn't
realize that browsers have problems with XML + public schema +
stylesheet. Would you be more specific about the kinds of problems and
their likelihood of their occurring? And why would a non-technical user
be confused? Wouldn't the user see on his browser the same thing if the
document were instead served as HTML?

I'm unclear about just what is implied by "store internally". Do you
mean placing TEI or DocBook documents in a database on the server and
then process them for display as HTML/XHTML for the user?

> You might want to take a look at /Kairos/ [1]. They've been in the
> online-humanities-journal biz for a while (about 12 years), so they
> have a lot of experience with what works well for their authors and
> readers.

I don't understand why you offered this as an example, and probably miss
your point. The document I looked at from the Kairos site is just some
JavaScript that defines a framework and inserts into it an old-fashioned
(using table for format, for example) document. If I were to do this I'd
use SSI, XHTML, and CSS, but in any case, at least for the document I
viewed, the internally stored document is only HTML, not TEI or DocBook.

Signature


      Haines Brown, KB1GRM

Michael Wojcik - 24 Jun 2008 17:22 GMT
>> Andy Dingley just suggested TEI, though he proposed (and I concur)
>> that you store internally in TEI or DocBook but serve HTML.  I'm not
[quoted text clipped - 5 lines]
> Well, I _was_ toying with the idea of serving XML+schema+stylesheet. By
> "UA" I presume you mean the average browser (IE).

I mean user agent: whatever is processing the data you send. (That's
standard terminology in the W3C specs, the HTTP RFCs, etc.) Doesn't
particularly matter to me whether it's "average" or exotic, though of
course you may decide not to worry about supporting less-common UAs.
(Do you expect people to read your journal on their iPhones? On other
mobile devices? On browsers embedded in appliances?)

> However, I didn't
> realize that browsers have problems with XML + public schema +
> stylesheet. Would you be more specific about the kinds of problems and
> their likelihood of their occurring?

I was over-hasty with that comment. I assumed that there were many UAs
that won't handle XML + schema + style sheet. (IE, for example,
doesn't even handle XHTML properly.) And I believe I've read more
substantial claims to that effect. But I realized when I read your
response that I had not actually verified that suspicion.

Personally, if I were building this application, I'd be reluctant to
serve XML + schema + style sheet, simply because I'd rather not do the
interoperability testing (or limit my content to a handful of common
UAs), when it's not at all difficult to serve HTML 4.01 Strict instead.

> And why would a non-technical user
> be confused? Wouldn't the user see on his browser the same thing if the
> document were instead served as HTML?

Suppose you are a non-technical user. Suppose you are viewing a page
of this journal and decide to save a copy. You know, from prior
experience, that a saved web page is a file with an extension like
".htm" and possibly a folder containing some images and the like.
What's a ".xml" file? What's a ".xsd" file?

And whether the user sees "the same thing" is hard to say. Browsers
have built-in styles for HTML, which they will fall back on in various
circumstances. Some users have user style sheets, which select HTML
elements.

> I'm unclear about just what is implied by "store internally". Do you
> mean placing TEI or DocBook documents in a database on the server and
> then process them for display as HTML/XHTML for the user?

You have to store content, and you have to serve it. Sometimes content
is static - that is, the server simply sends the stored representation
(often just by reading a file from a local filesystem). Often it's
dynamic: server-side includes, ASP and JSP and PHP and other sorts of
scriptable pages, CGI scripts, server extensions that execute
application code, etc.

I don't care (well, for these purposes) how you store content. I'm
suggesting that you store it in a form that works well for your
production toolchain and for the applications that use it - so TEI or
DocBook might well be a good choice. And I'm suggesting that you serve
it in a form that the UA is likely to handle well; I'd suggest HTML
4.01 Strict with external CSS 2.1 style sheets.

To go from the stored representation to the presentation
representation, XSLT looks like the obvious mechanism. The server
could do that on the fly, if it has sufficient resources; or it could
cache the generated HTML; or the HTML could be generated whenever the
XML is updated and served statically.

>> You might want to take a look at /Kairos/ [1]. They've been in the
>> online-humanities-journal biz for a while (about 12 years), so they
[quoted text clipped - 5 lines]
> JavaScript that defines a framework and inserts into it an old-fashioned
> (using table for format, for example) document.

I was unclear. I didn't mean /Kairos/ as an example of an implementation.

I suggested it because it's an online humanities journal of long
standing, relatively wide readership, and good reputation; because
they've had to deal with all of these issues, and these are the
compromises they arrived at; and because it demonstrates my other
point, which is that people writing for an online journal will want to
be able to use all the possible facilities. That means people will
want to submit articles with multimedia components, so you need to
think about how you'll handle non-text materials in your toolchain.
People will want to submit articles with dynamic content and scripting
- even applications, with any luck - so you'll need to handle that.

> If I were to do this I'd
> use SSI, XHTML, and CSS, but in any case, at least for the document I
> viewed, the internally stored document is only HTML, not TEI or DocBook.

How can you tell how the document is stored internally? What you see
is what the server sent you. You don't know what it did in producing
that content.

Signature

Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University

Haines Brown - 25 Jun 2008 01:20 GMT
Michael, thank you for your wise comments and clarifications.

My translation of your "user agent" into the instance of browswers I see
was too restrictive. You are right; I do have to consider iPhones,
etc. Yes, that would be exotic today, but tomorrow perhaps less so. On
the other hand, there is perhaps reason to assume that "exotic" UAs will
at the same time learn to deal with XML.

I know that IE does not do HTML well, and I have to make the appropriate
accomodations. I'm too ignorant about the matter to say whether it would
do any worse with XML.

You point about the user possibly defining the presentation style
understood. That suggests serving the pages with a clear separation of
format and marked-up content, which can be either XML or HTML.

Thanks again, you were very helpful.
Signature


      Haines Brown, KB1GRM

Andy Dingley - 20 Jun 2008 11:06 GMT
> The reason for my question is that I've become involved in a project to
> develop an on-line journal in the humanities. The publisher intends to
[quoted text clipped - 5 lines]
> (X)HTML instead. I'd like to know reasons for choosing one over the
> other.

What do you mean by "publish" here?

By all means offer PDFs as one final format that your CMS can offer to
readers.

Don't _store_ your content as PDFs though. Use something else
(anything!) and generate PDFs on demand (with caching and maybe pre-
generation).

As a storage format, XHTML is one choice, as would be DocBook or
TEI.

I wouldn't use HTML, although I'd publish my XHTML to readers as HTML
(for web-design reasons oft discussed hereabouts). The reason for this
is that whatever XML-based format you choose for internal storage,
it's likely to involve lots of namespacing and composition of overall
schema by importing snippets from both DocBook and Dublin Core (etc.
etc.)  You really need namespace and processing features that XHTML
gives you easily when HTML won't. XML tools will be far more use than
SGML.

I would favour DocBook over HTML for any "long" document that needs
structure at a scope greater than heading / para. Neither has much
semantic markup to them, neither has any advantage in the quality of
their inline markup. DocBook does win out though for section/chapter/
book level structure.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.