Short-term plan for a WWW botanical database

last update: Jan. 15, 2000

List of Tasks

Following is the List of Tasks, ordered from the most advanced to the least:
XML Schema, Query and GUI, Parsing, Webmaster, Server, Communication and public relations, Images.

I (JMV) can go on currently with the first 3 tasks (XML Schema, Query and GUI, Parsing), and the overall coordination, but help is needed on the rest.

XML Schema

[study (see preliminary schemas) and specification stage]
 - try RDF sample
 - write on description vs qualificator
 - general XML vocabulary for biological descriptions (...)
 - convergence Delta-descriptions
 - write XSLT transforms between flat database, Delta etc
 - links with other disciplins (zoology, ecology)

Query and GUI

[study, prototyping and specification stage]
- choose a browser (IE or Mozzilla), for the short-term and long-term
- use of aplets for XSLT
- use of metadata (containment, generalization, general and specialized dictionaries)
- use of logical computing (inference engines, etc) to enhance query
- use of natural language techniques to formalize further the descriptions

Parsing

[study, implementation (see marked up files) and specification stage]
 - Flora Europaea
 - Flora of China
 - Flora of Australia
 - find out about other floras
 - copyright problems

Webmaster

[existing site]
- architecture of the site
- collect contributions
- upload site content
- answer or forward mail about the site
- find suitable hosts
- manage mailing lists
- manage links
- manage referencing (see that lots of other sites reference ours)
- manage referencing by search engines

Server

[study and specification stage]
- validate choise of XML processor (XT, Xerces, or search engine technique/indexing, or maybe database)
- multi-server
- implement download and query page
- find server machines and/or institutions to host databases

Communication and public relations

[exploratory stage]
 - find sponsors
 - find developpers and biologists
 - make our project well-known among
   - scientists in general
   - political world
   - economic world
   - organizations defending nature
   - international organizations

Images

[study and specification stage]
- 3D recognition from 2 or 3 pictures from different angles
- 3D representation (with Bezier mappings, VRML or whatever)
- pattern recognition, vectorization
- L-Systems
- static or growth simulation images
- how to display 3D on the browser
- collect images

Update: sept. 23, 1999

Data

We want to build complex databases for botanical data, aimed primarily in computer-aided identification of sample plants.. Building up on the existing plant names synomyns databases, the first data to include will be : But we must specify an extensible software architecture, in order to support:

Existing software

DELTA is a series of taxonomic software, connected by a common data file format. It is available for download. This is a very valuable software, but:

Gathering plant descriptions

Maybe it would be easier as a first step towards this world flora to have a few fields in natural language (english and/or latin), like this: The legacy of existing floras on paper could be put in this schema, but this would need some treatement of natural language.  Afterwards we could: Genus and Family would be either a string, or a foreign key to another table with fields: Flower, Leaf, Area, Other Information fields; the Species inherits characters of the Genus, and can redefine them.

Specifying vocabularies

While gathering as much as possible of natural language plant descriptions, we must define vocabularies. Whatever technology we choose for our database (object, deductive, document-based, etc), we will need a common naming for all our data. Here we have 2 possibilities:

Searching state-of-the-art sofware

A third parallel activity will be the search and evaluation of all kinds of software and techniques.

Image processing

For assisted identification of plant samples, we need image vectorization and pattern recognition. The set of reference patterns (species in the database) can be very large, but the search can use the available non-graphical information, like geographical area. For leaves and their veins, a 2D treatment will be enough, but for flowers and twigs architecture (phyllotaxy), we need 3D treatment with stereoscopic analysis.
The use of plant growth and plant shape models can help both in pattern recognition and in reduce of storage space.

Database, network

Processing of natural language

Like said before, a natural language processor could extract a formal description from a natural language description. A formal description is easy to translate to several national languages.

Expert systems

Help is needed on that matter...
We should check also also the knowledge representation (KR) conceps.

First data Model

 Botany alone is a huge project (about 250 000 species of flowering plants), and we want these botanical data to be easily usable by anyone for any kind of use.
To this end, it is not wise to be tied to a particular database system, or file format, or network protocol.

Even an object model in the sense of UML or ODL is hard to achieve, because there are at least 2 different views:

So I think the best way to advance now is: