XML Publication : Web document publication through XML
by J.M. Vanel , Copyright ©
J.M. Vanel - 2001
Last update:
Download current version ; User documentation (not up-to-date) ; en
français
Back to main page
What is XML Publication ?
XML Publication is a set of tools to generate Web pages from (possibly large)
desktop documents or other structured documents. For instance books with
paragraphs, or tabular data. It cuts big documents in Web pages, creates
customizable multi-index. All this is done through a repeatable process,
where data is separated from presentation and user settings.
It uses cutting-edge XML techniques and particularly XSLT. It is under GNU
Public License.
History
I used these techniques for industrial catalogs at industrySuppliers.com,
a (defunct) market place, last spring. Then I got involved in Seed to Seed
(www.seed2seed.net ), a project to
publish information collected worldwide about sustainable agriculture. With a
bit of refactoring, XML Publication resulted from these projects. Currently I
work at Information & Document (idocw.com ), where we offer support on XML
Publication (and many XML and other techniques). Someday XML Publication will
also be applied to Worldwide Botanical
Knowledge Base .
XMLPublish data flow
XMLPublish data
flow
Samples
Samples
Source a table, a paragraph structured text, a poorly structured word
processor file, etc.
Requirements
Free software
Lighweight and easy to maintain: 1000 XSLT and 100 Makefile / Ant
lines
using mature techniques: XSLT, Jakarta Ant, Makefile
Portable: just need JVM, bash shell or Jakarta Ant, and Saxon XSLT
processor
Content and presentation separation through HTML templates and CSS
Work with poorly structured documents
Adapt to authors and their desktop tools
flexibility
content organisation: tabular, paragraphs, classification,
indexing
presentation: XHTML template for site wrapper, CSS
various data sources: Web, word processor, spreadsheet, relational,
XML
Static pages vs dynamic pages
Static pages
faster HTTP response and computer resources used
easier to deploy, no need of web application servlet container (Tomcat
etc)
easy indexation by classical search engines (google ...)
Dynamic pages
allows instant user customization : page layout, etc
more coding needed, but freeware available
less use of disk space
allows complex queries through relational or XML databases
allows instant update of catalog data
XSLT formatting from XML documents generated bt XMLPublish can be
reused in dynamic servers (Cocoon)
anyway the "fill cart" functionality needs a dynamic server
Information object model
document
item
rubric
keyword
hierarchy (for HTML navigation)
If your browser is an SVG browser (like Amaya ), you see below an UML
diagram of XMLPublish Information object model:
Document
Item
Rubric
String name
Keyword
xsd:element
realize
realize
XMLPublish object model
Otherwise, use these links:
XMLPublish object model as SVG
(if you have the Adobe SVG plugin)
XMLPublish object model as GIF
XML basic structure
Simple 2-level structure reflecting the above "Information object model"
:
<root>
<itemType1>
<rubricType1>any markup ...</rubricType1>
<rubricType2>any markup ...</rubricType2>
... etc
</itemType1>
... other item types
</root>
Actually there are 2 concrete structures for the master.xml file:
table with rows
documents with paragraphs and sub-paragraphs
Presentation settings
keywords & stopwords file
site specific XHTML wrapper
document-specific header
XSLT customization
framework calls user callbacks for file names, labels, etc
easy to add XSLT template rules to customize formatting of item and
rubrics
Implementation
Automatic update and chaining with GNU Jakarta Ant (or GNU make).
Moreover ant allows to be multi-platform regarding file-systems.
The build.xml files are designed so that data can be URL, not only files
(alas Ant is not yet enough Web-oriented, but we work on that too).
Modularity
Each functionality in a separate XSLT transform:
semantic markup
creating word lists by rubric
creating hyperlinked indices and (multi-level) TOC
XHTML formatting
The chaining of transforms (data-flow) is not hard-wired in the XSLT
transforms, the Ant build.xml
takes care of that.
Future
validation and assisted editing for correction of source documents
(Schematron + XED or Emacs)
navigation through hierarchy of categories (commercial catalogs)
integration of an XML search engine (Exist)
dynamic stylesheets and multiple presentations: table, paragraphs,
tree
WAR packaging for J2EE Web servers for dynamic server
Web interface and Web services authoring and publication
connectors for several document types: DocBook, OpenOffice, TEI,
spreadsheets, relational databases, ...
authoring GUI tools for semantic markup, option choosing, and
publish
Detailed list of features and tasks
Back to main page