Worldwide Botanical Knowledge Base
http://wwbota.free.fr/
presentation sponsored by
Jean-Marc Vanel
GNU-Linux <--> operating systems
WWBKB <--> biology
XML: no other way!
How did this project come about?
-
thesis in numerical simulations in fluid dynamics
-
my jobs:
-
fluid simulations : lots of complex structures: meshes, fluid
networks, boundary conditions
-
financial instruments
-
for 15 years I had living collections of plants
-
one day a colleague told me that one can find anything on
internet: I challenged him to find Cycas: he had 4 hits with Altavista!
Biodiversity and nature conservancy
-
thousands of species are disappearing forever,
-
need quick identification of specimens from threatened areas,
without qualified scientists;
-
the knowledge is on paper like in Linnaeus' time,
==> an inventory of the biological inheritance of our earth
has to be done.
There is currently no other botanical project with this
scope.
Why ?
-
funds go rather to biotechnologies than to descriptive biology,
-
pluri-disciplinarity is not easy,
-
laboratories are in a position of concurrence
-
past efforts from biologists (e.g. Delta, Lucid) have been
software first, data afterwards
Specification
make botanical data available on Internet :
-
description of species, including 2D and 3D pictures,
-
computer-aided identification of specimens,
-
geographical distribution,
-
distributed knowledge, links with other disciplines
-
zoological: pollinating, disseminators, parasites, herbivores
-
genetics biochemical: e.g. correlation between molecules and taxonomy
-
paleontological
-
ecological - phytosociological, pedological, climatic: search of correlation,
analyze of vicarious species
-
agronomy, plant uses, ethnobotany
A free sofware / free information project
-
nobody can own nature
-
a great project for humanity, and
-
a great, far-reaching, and enjoyable software project
-
federate existing resources and people
Collaborators
-
Flora of China ( http://flora.harvard.edu/china/
),
-
Flora of Australia
-
Taxonomic Databases Working Group
-
Kew Botanical Gardens
-
Museum National d'Histoire Naturelle, Paris
The data exists
Initial intuitions:
-
relational DB
-
uncouple data (XML) and processing: XML
engines, relational DB, OO DB, AI engines
-
Web search engine, e.g.:
+species: +flower:yellow +bark:white +sharp
-
offer easy access to detailed
knowledge both for the layman and the professional botanist
The metadata
-
Why a relational DB is not enough:
-
part-of relations: e.g. bark is a part of stem (and trunk)
-
inheritance of properties from upper taxonomic classes: e.g.
"oppposite leaves"
-
different levels of metadata
-
several sources of metadata: Floras, Wordnet,
botanical lexicons
-
Abstract Data Model: UML diagrams
-
uses of metadata: GUI,
reasoning (description logic), specify messages
New kind of software needed
A multi-everything browser will in fact be an empty shell
that knows only XHTML at start
-
multi-domain documents wrapped in XHTML
-
calls the appropriate processors when it sees certain XML
namespaces and/or Processing Instructions.
-
book-keeping: management of:
-
drag'n drop and clipboard with an XML data model (DOM object
passed)
-
display space between XML processors (tiling, resize, ...)
-
mapping between raw XML and displayed XML transformed by
XSLT
-
Generic display skills are also desirable:
-
collapsable tree/graph views for:
-
document tree
-
inheritance graph
-
the ID/IDREF graph
-
file/directories, URL/hyperlink graphs
-
extended search/query
-
editor with the same multi-domain capabilities.
-
using a standard dictionary (e.g. wordnet) and some AI techniques
will enable to treat well-formed XML with natural language tags: disambiguation,
tagging while typing
A general and modular tool for manipulating data, of the
3 main kinds:
-
document-oriented (HTML & word processor)
-
structure-oriented (database type)
-
knowledge-oriented (semantic network, AI, RDF, etc)
The next killer-app ... A role for Mozilla ? or Gnome? KDE
? or database vendors ?
or will the next Microsoft Wave submerge all ?
WW B KB building works
2D and 3D images
-
vectorized images (SVG) from bitmap images
-
3D images (X3D, etc) generated from several pictures through
stereoscopic software
-
very compact representations adapted to growing beings having
recursive structure: L-Systems,
MathML, etc
-
Artificial Vision
techniques
Botany, Zoology, Ecology and the Semantic Web
-
express relations between plants and animals in a formal
and flexible way (RDF/XLink)
-
plants (250 000) are just a beginning: insects alone have
1 000 000 species
-
all kinds of properties, and properties about properties
-
XML is not just to catalog consumers's preferences and locate
the cheapest merchandise
Vision: field observations
You have a portable computer running the botanical
database, with a camera, a GPS, and a wireless Internet connection. Suddenly
you meet a remarkable plant; you show it to the computer, which asks you
two questions about the number of carpels, and the shape of hairs (answers
needs a cutting of the ovary, and lenses). The computer tells you that
this a new location of Strasburgeria robusta, which was thought to exist
only in New Caledonia. You are proposed to send e-mails to the specialists
of the Strasburgeriaceae, and of the region, and to collect a herbarium
specimen. Meanwhile this discovery, complete with images and geographical
coordinates, is sent to the appropriate server, and the updated repartition
map appears on the screen.
Conclusions
-
$£$£$£$£$£$£$£$£$£$£$£$£$£$£$£$£$£$£$£$
-
need collaborators, need contacts
-
can do great things with little resources, like the physicists
in the 19th century
-
make standards for 20 years or more ==> clean & flexible
design, track standards and implementations
-
a new way to make science: GNU spirit
Issues
-
AI techniques, XML exchange formats & protocols for AI
-
new ways of working for taxonomic scientists: new species
created by Web transaction
-
Problems with existing standards
-
overlapping concepts: RDF & Xlink, RDF Schema & XML
Schema, XSLT & X-QL
-
non-XML syntaxes: XPath, CSS
-
many existing vocabularies are monolythic (e.g. X3D, XML
Spec)
-
little reuse of Schemas and concepts
-
reinventing the wheel; schemas are longer to read and understand
-
little modularity
___________
structure the presentation as a semantic network