What's new in WWBDB

What's new in the worldwide botanical knowledge project

J. M. Vanel - See also my My diary about computer science - Last update - back to home page

2003-08-16

Properties inheritance for taxonomic and part-of hierarchies

To enhance the current search engine (http://jmvanel.free.fr/protea.html), we need to add some reasoning/inference capabilities:

treat upper taxon properties as default value (e.g. in a species if genus has property "leaves evergreen", and species doesn't assert otherwise, current species has same property
treat properties of container organ as default value (e.g. if "petal color = red", flower has same property, at least partially
combine two rules above

Certainly, rule 2 will not hold for all combinations of feature/property. To get an idea of which properties are in which features, I can run a modified version of my parser with WordNet.

Then I have several possibilities to implement the above reasoning/inference capabilities:

write some more or less ad-hoc code in XQuery to implement the reasoning, possibily storing the inherited properties in extra files
switch to a IA engine, such as Protégé (based on CLIPS and/or OWL), and write the logic of property inheritance in a language (e.g. OWL) dedicated to this

The advantages of choice 1. are:

the well-understood (at least for me JMV!) computational model offered by eXist database and XQuery language
the format is very near the original Natural Language (NL) plant descriptions, and has a good level of semantic markup

The advantages/inconvenients of choice 2. are:

leverage on a regular IA engine enables to add easily more rules and facts, e.g. about geographical ranges, plant uses, etc
the performance is unknow

In the choice 2, probably the best choice for the exchange format is OWL.

WordNet resources

I found this library :
JWNL (Java WordNet Library), written by John Didion [email], has been released. It is a Java API for accessing WordNet, and provides API-level access to WordNet data. It is pure Java (uses no native code).

There are also others (JNI) Java accesses to WordNet. We can use this to extract the feature/property ontology out of the XML database directly, without having to touch our FloraParse Lex/Yacc C++ parser.

This botanical ontology can then be used in enhanced user forms for the search engine, showing :

relevant features at each containment level
relevant properties for each feature

This botanical ontology can be easily connected to the general Word ontology, which can be obtained in RDF (theoretically OWL compatible):

A Resource Description Framework (RDF) representation of WordNet and ontology defining the terms used to represent the RDF version were developed by Sergey Melnik and Stefan Decker

2003-08-12

I met 2 the leaders of two research groups from the INRIA:

Group Atoll - ATelier d'Outils Logiciels pour le Langage naturel
BIOTIM Exploitation de Gisements Texte-Image en Biodiversité. 2003-2006

Nice demo of the french parser at http://graves.inria.fr:8200/perl/parser.pl, with an output graph made with Graphviz.

Since february there is a new version of Link Grammar, the syntactic parser of English, based on link grammar, an original theory of English syntax. Alas, it still doesn't seem to work on sentences without verb.

2003-08-05

No sponsors yet :-(( .

I found an interesting project to define an ontology for plants:
Plant Ontology Consortium : http://www.plantontology.org

It is defined in a little known language: DAG data structure, with an editor called DAG edit . I downloaded the CVS for the ontology and the software from sf.net.

I opened this file:

anatomy/anatomy.ontology

with the Java+Graphviz GUI DAG edit. The tool has a good look and feel. The search utility is allright. I couldn't find how to see in the GUI the human readable definitions that are in the file. The ontology seems rather complete; see an exemple graph: all classes whose name contains "leaf" . However there are too many classes, as one can see on the "leaf" example. Also general properties like color, size, shape are not reused, so there is (in traits/traits.ontology) a petal color, a sepal color, etc. Obviously this ontology wasn't build with specimen indenfication in mind ...

I have commited the floraParser on Sourceforge; it is up-to-date with the latest WordNet 2.0 and gcc 3.3.2 . To download all this:

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/wwbota login

cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/wwbota co floraParse

The latest version of the search engine for the Flora of China is currently on the CVS of the eXist project, in directory webapp/xse . I have to update it to the XQuery language (currently XSP).

2003-03-06

More than 150 hits on the Flora of China search engine ! Including someone from the Flora of China project at Havard. For now, my personal 500 MHz computer can cope with the load, so I give the URL.

New page on Data prepraration (Getting XML from various sources, Natural language processing, Generate organ list).

2003-02-16

Now that we have a working application of the Flora of China, I declare open the hunting period for SPONSORS $$$ !!!!

If you want to test the Flora Of China search engine, please mail me; it is currently hosted on my personal 500 MHz computer and I don't want it to be overloaded . By the way we need hosting for the application.

If you want to run the application on your computer, you must download the project/ directory on the CVS at Sourceforge.net (see sourceforge.net/projects/wwbota) . Then you install it on top of the XML database eXist . Finally you download the data:
http://wwbota.free.fr/project/data/flora_text.xml.zip

It is not really complicated; however install documentation is needed, and a release zip file of course.

The application and the project was presented at the TDWG working group ( www.tdwg.org ) Structure of Descriptive Data (SDD) meeting in Paris : http://160.45.63.11/Projects/TDWG-SDD/index.html .

We discussed about the design of an XML Schema for descriptive data able to replace the old DELTA format. It will include some ideas of WWBKB combined with the needs of professional taxonomists.

<Gregor_copy_from_here status="not finished">
J.M. Vanel tried to convince the audience to use MathML as constrain language for the computed characters (e.g. "length of petiole is twice the length of leaf blade"). But this obviously needs some testing. A dialect of Prolog could also be used, or a subset of C/Java/JavaScript returning a boolean. One of the crucial points is how to name plain characters so that they can be refered to in a MathML expression for a computed character. The advantages of MathML is first being XML (it's easy to generate programming language code), and second its ability to express set theoretical assertions like "for all individuals, leaves are either red or green", or "for all individuals, all leaves have the same color".

I also advocated in favour of reusing the XML Schema typing system to express as much as possible of the character semantics (a character is an element of description reusable for many descriptions, e.g. "length of petiole"). But this level of abstraction needs some pedagogy. At the extreme the "terminology" part of the current Schema, i.e. the character definitions, could be a plain XML Schema in the "http://www.w3.org/2001/XMLSchema" namespace, with in some places additional attributes in the "http://www.tdwg.org/2002/SDD namespace. This trick of adding attributes from another namespace in a Schema is perfectly legal; this is also the design of XLink for instance.

There was also an interesting discussion on "arranging characters in arrays", which I would rather translate as "complexType characters". Clearly, besides simpleType characters, there is a place for non-character independant variables, e.g. medium, temperature and time in the fungi example.
</Gregor_copy_from_here>

The following example, "growth diameter of fungal cultures on Petri-dishes" is shown below; the cultivation occurs on various media (OA = Oat-Agar, MA = Malt-Agar, SNA = Synth. Nutrient-Poor Medium), at different temperatures (15, 20, 25 °C) and over different time (7, 14, 21 days):

15°C:	OA	MA	SNA
7d	8 mm	10 mm	- mm
14d	18 mm	21 mm	6 mm
21d	22 mm	40 mm	- mm

20°C:	OA	MA	SNA
7d	21 mm	40 mm	- mm
14d	39 mm	80 mm	38 mm
21d	60 mm	- mm	- mm

2003-02-02

I have have updated the XSLT transforms library and floraParse to put it on Sourceforge CVS.

2003-02-01

Now we have a search engine working on the Flora of China with eXist and Cocoon. We need to find a servlet or, even better, Cocoon hosting. This user interface is in the hands of Cyril.

Now I can go back to structure of data and metadata, and parsing. I have a look again at link-grammar, to see if it still needs a verb in the sentence or if there is a workaround. Well, after some trials on their online "parse a sentence" page, I saw that it's not enough; I try to ping them. On freshmeat.net I looked with the keyword "linguistic" and found this:

Kura Language Database - A system for storing, analyzing, and presenting linguistic data.
paai's Text Utilities - A collection of programs for text processing.

2003-01-15

Yes, the project is more alive (and necessary) that ever! As a side-kick the XMLPublication has been developped for another project. The XSLT transforms library has been expanded, taking in account the new possibilities of XML Schema. The eXist XML database is more efficient than ever, and the new XML:DB Java API is well established. There a new specification, more detailed but less demanding, that will be realized before the Taxonomic Database Working Group meeting in Paris on February 13-14th. We welcome Cyril Vidal, a talented young Java and XML developer, who will be in charge of displaying the query results using Cocoon. To ease the group work, a Sourceforge project, WWBKB, has been created.
As a parallel effort, work goes on about Flora text parsing and XML-izing. Here is a sample species with the new XML format. Thanks to WordNet, the nouns and adjectives are marked up. Also numbers with units (dimensions) and pure numbers will be marked up as such e.g. :

<branch>
 <t:f> generally many</t:f>
 <t:dim> internodes 1--10 cm</t:dim>
</branch>
<fruit>
 <t:f> berries</t:f>
 <t:f> distinct</t:f>
 <t:num> 1 - 8 - 12 per flower</t:num>
 <t:f> or coalescent</t:f>
 <t:f> forming syncarps</t:f>
 <t:num> 1 per flower</t:num>
</fruit>

The words missing in Wordnet are also marked up:

<wn missing="="stipules"/>

and the list will be given to the WordNet project.

2001-08-09

Being a developer, I couldn't resist the temptation to make a side-kick, and so here is yet another Identification program and framework in Java: the Open Identification API . It is aimed mainly towards Delta-like data, so it is complementary to other tools (FloraParse) for textual data.

2001-07-13

During the last 6 months is was in charge of R&D and industrial catalogs at IndustrySuppliers.com, a market place for industrial equipment that recently went into bankruptcy. There I developed techniques to manage e-catalogs before publication on Internet. I had the opportunity to work on useful technologies : Perl, XSLT, Makefile, Cygnus Cygwin bash, WinCVS, Apache Jakarta Tomcat, etc.

During this time lots of things happened outside the WWBKB project : new versions of WordNet,link-grammar; XML protocols (Soap, Universal Description, Discovery, and Integration (UDDI) at http://www.uddi.org/ , XML Resource Directory Description Language (RDDL) at http://www.openhealth.org/RDDL/ , WSDL ); new frameworks for knowledge representation: Topic Maps http://www.topicmaps.org/ , The DARPA Agent Markup Language (DAML) http://www.daml.org/ , the final version of W3C's XML Schema; new Natural Language software: GROK, Open NL API, etc.

Inside the WWBKB project, work continues on the Flora of China: use of Wordnet 1.7, port to last gcc compiler, enhancement of XML markup to include BOTH semantic markup and original presentation, development of XSLT transforms to generating schema, and above all, use of eXist ( http://exist.sourceforge.net/ ), a nice freeware XML database with textual indexation.

2000-12-23

I modified the HTML client page (http://wwbota.free.fr/Generic/XMLClient.htm?URL=species.xml ) so that it has a minimal behavior with Mozilla and CSS2. But sometimes Mozilla M17 crashes... Also there is a bug in Netscape 6 preventing it to work...

At last!
The servlet for the the Flora of China (FOC) seems to be working allergist on my local machine, with requests such as this:
file:///windowsD/jmv/wwbota.free.fr/Generic/XMLClient.htm?URL=http://localhost/servlets/FOC?xpath=//td[.//stigma]

But for the whole subset of the FOC (10 000 species and 15 Mb ) such a query lasts 6 mn 30 . The current implementation uses a combination of SAX and XSLT ( Saxon 5 from http://users.iclway.co.uk/mhkay/saxon/ ).
So I will just put in line a small subset, to try and perfect the site and its ergonomy.
However this Java Framework with SAX, DOM, and XSLT is very useful for batch processings such as: restructuring, statistics, generating lists of tags and Schema, adding informations from other sources (JDBC ...) ...

What remains to do for the FOC site:

- install JDK and JSERV on the server
- install the FOC servlet on the server
- verify that it works with IE5
- add the plant names to the XML data, and change <tr> tags to <species>
- generate a squeleton XML dataset, or another way of generating queries towards rthe server, and connect the servlet with an apropriate version of the above mentioned HTML client page
- put the XML in small files on the server, for search engines
- try ozone
- try commercial XML databases: Tamino, Excelon

Thanks to the Laboratoire Informatique et Systématique at University of Paris Jussieu, and to its head Régine Vignes for allowing me to use one of their servers.

I tried GMD's ipsi XQL search engine for XML with contains() function: it doesn't work! The ipsi XQL search engine is unable to search for sub-strings in textual content.

2000-12-18

I began a new job at IndustrySuppliers.com, a market place for industrial equipment, as responsible for R&D and industrial catalogs. Happily it has strong technical synergy with the WWBKB project.

Cultivated plants, plant uses, varieties

I got involved in an ambitious worldwide project about plant uses, and sharing knowledge, called Seed2seed , from the Ecoropa association in Paris ; more details coming soon.

2000-11-19

Design

I try to make a complete formalization (manually) of a few descriptions from the Flora of China. In parallelel I enhance the abstract data model with new abstractions, like Context/Restriction. This is certainly the best way to advance the design. Moreover it provides an acurate introduction to formal plant description for several kinds of most interesting people:

experts in Natural Language (NL) analysis,
experts in Artificial Intelligence (AI),
experts in data-mining, OLAP, etc.

Site

new page sumerizing the local software ressources
enhancements in Library of XSLT transforms
enhancements in botanical resources on INTERNET

2000-11-15

I come back from the TDWG 2000 meeting in Frankfurt (see http://www.bgbm.fu-berlin.de/TDWG/2000/). Lots of contacts, and a fruitful discussion on Structure of Data Set with R. Pankhurst, N. Lander, and G. Hagedorn.
Also fruitful discussions with Morris on XML protocols, and with J. Lerenard about Computer-assisted Plant Identification.
Among others, interesting presentations about geographical services (alexandria.ucsb.edu/gazeeter), and GEIN ( http://www.gein.de ), a german environment portal federating others sites through XML.

Yesterday I was in Paris at a conference about Internet and Ecology ( http://www.multimania.com/mgiran/ ), where I saw again a presentation about GEIN. Several Web masters where asking advices about how to use XML in their sites.

2000-11-06

New distribution of FloraParse, a parser for classical Floras generating XML markup.
See Release Notes.
download FloraParse

2000-11-03

More than one year since start of project!
Many hopes created, and not much concrete yet !
However, even without money, even without technically competent collaborations, it advances.

The immediate goal is still to put on a server the Flora of China, with a relational database and requests like :

SELECT * FROM descriptions WHERE petal LIKE 'yellow'

An alternative implementation is to use SAX (sax.org, Simple API for SAX) to make the query.

In both cases, we need to generate colums names (= XML elements names) out of the existing XML. While doing this, an XML Schema will also be generated, and this XML Schema will be registered in XML repositories :

xml.org
biztalk.org

GUI
The Web page for queries will have a pull-down menu for plant organs (e.g. petals) and an input field for searched sub-string (e.g. yellow). Probably there will be 2 pull-down menues, one for plant organs and one for sub-organs. Then the server will respond with a page like the prototype Web page allready published, which allows further refinement queries on the local data, without going back to the server.

News from J.M. Vanel
I am trying to find a consultancy job around XML technologies, that will have a synergy with the WWBKB project. I will probably attend the TDWG 2000 meeting in Frankfurt (see http://www.bgbm.fu-berlin.de/TDWG/2000/).

New on this site
XML local ressources have been updated:

prototype Web page allready mentioned
page about useful examples of XSLT transforms

2000-07-14

Work with Xalan and Xerces to enhance the structure of the Flora of China XML database ( more details on the mailing list).

2000-06-27

At last I terminated the parsing of the 10 000 species of the Flora of China :
http://jmvanel.free.fr/pub/data/
See more details on the mailing list: http://www.egroups.com/group/wwbota/
I hope that it can become a test case for the XML databases.

2000-06-11

A week-end of work with Bryan Thompson a my house on knowledge representation. Bryan is a specialist of the Shruti inference system (www.icsi.berkeley.edu/~shastri/shruti). Minutes of the meeting are coming .

2000-05-27

What I did this week:

lots of mails to WWW9 Conference attendees that I met,
exploratory work: readings about and downloading the link-grammar, a natural language parser (thank you Philippe!) ,
enhancements in the Web page displaying botanical descriptions (WORK ONLY with the IE5 patch compliant with the XSLT and XPath W3C Recommandations, downloadable at Microsoft ),
continue integrating the Wordnet library in FloraParse,
a roadmap, going more in details about tasks, to foster cooperative design and implementation.

2000-05-24

My (JMV) presentation at WWW9 in Amsterdam on May 19 was successful, and got compliments from Jon Bosak, one of the creators of XML. The crop of business cards is 15, I will tell you what comes out of these contacts.

THANK YOU to all who helped and encouraged in various ways:
Jacques Lolieux, Anthony Brach, Mary Clare HOGAN, Nick Fulton, family Smalbrugge, Guillaume Rousse, Thomas Beale, Henning MUELLER, Dominique Salmon, Olivier Tavignot, Franck Yvetot, Ivodor Atanassov, Pierre Deransart, Rita Lemaire-Smith, Don Kirkup, Nick Lander, Laurent Kiryenko, and I certainly forget several...
Thanks to Reuters for paying my expenses for the 4 days WWW9 Conference.

The FloraParse has been adapted to the Flora of China data, and thanks to Olivier Nouguier will be soon put on the Web using a standard database, probably PostgreSQL.
I run into small problems in exporting well-formed HTML out of MS Access files, because:

the HTML (family descriptions) is not well-formed in the original file,
a standard export or "save as HTML" in Access treats the HTML as ordinary text , and consequently transforms, e.g., <UL> in <UL>

I tried to use W3C's tidy program on the 18 Mb file, it ran all night without result! I work on integrating the Wordnet library in FloraParse.
Among other technologies, I look currently at :

link-grammar (Natural Language parser)
IA engines:

ontobroker
Classic
Shruti inference system (www.icsi.berkeley.edu/~shastri/shruti)

2000-04-19

I downloaded the Flora of China data, with special permission (from the Flora of China Project), in two big MDB (MS Access) files. It's an emotion to have access to the accumulated work of hundreds of botanists over centuries.

I installed the WordNet environnment on my machine.

First atempt to write an XML Schema for XML All-purpose Protocol, see XAP.xsd .

Since 2 or 3 weeks there is a first specification for an client-server application doing specimen identification; I feel that it can give a good idea of how the pieces of the puzzle fit together: data, metadata, protocol, GUI.

2000-04-10

Technique:

first version of of a XML Schema derived from the Botanical Glossary of the Flora of Australia; I used the HTMLGlossary2XMLSchema.xslt XSLT transform to generate it from the original file; this is still work in progress. With wordnet I'll be able to distinguish nouns, adjectives, relational adjectives, and among nouns those wich are organs, properties, etc. So the metadata described in the Abstract Data Model will be implemented. Also a RDF Schema and an XML DTD will be generated from the XML Schema.

reading 5 papers about WordNet :

Descriptive Adjectives
Descriptive adjectives are what one usually thinks of when adjectives are
mentioned. A descriptive adjective is one that ascribes a value of an attribute to a noun.
That is to say, x is Adj presupposes that there is an attribute A such that A(x)=Adj.To
say The package is heavy presupposes that there is an attribute WEIGHT such that
WEIGHT(package)=heavy. Similarly, low and high are values for the attribute HEIGHT.
WordNet contains pointers between descriptive adjectives and the noun synsets that refer
to the appropriate attributes.

Wordnet also indicates Relational Adjectives, so it has all we need !

Contacts

Paolo Nesi, an specialist of image processing, who is currently leading a European project about delivering musical scores an Internet, a project having similarities with ours.

2000-03-27

Technique:

I (JMV) started a thread on xml-dev about "XML All-purpose Protocol", proposed as synthesis beetwen SOAP and Corba;
Soon another thread on xml-dev about natural vocabulary and linguistic approach to XML, and turning Wordnet into a big XML Schema, or maybe several, one with the part-of relation enforced and one without;
Related to this, work will start on botanical glossary; most of the definitions of nouns start with an article (for the rest, maybe use Wordnet).

Communication:

some following for the 3D geometry with MathML proposal on "x3d-contributors@web3d.org" , etc.

2000-03-18

Communication:

The XML Cover Pages, SGML and XML News, By: Robin Cover have writen about our project on 2000-03-16 : http://www.oasis-open.org/cover/xml.html ( the authoritative overview of the XML technology, new entries everyday. )
the discusion mailing list actually started today on egroups

2000-03-09

Communication:

J.M. Vanel will talk at WWW9 in Amsterdam at Developper's Day on May 19
Brave Georg has writen about WWBOTA in his mountly Free Software chronicle: Brave GNU World <column@gnu.org> the monthly GNU forum in English, German, French, Spanish and Japanese. Check it out at http://www.gnu.org/brave-gnu-world/
a discusion mailing list will be started today on egroups
reflexions and first trial to find sponsors

Technique:

new page here on 3D geometry representation ; started a thread about that in x3d-contributors@web3d.org

2000-02-24

Abstract Data Model for Taxonomy published
FloraParse sent to D. Kirkup, Kew Gardens
evaluation of VRML/X3D as language for 3D geometry: not convincing
Studying XML-Schema, testing RDF software (DATAX), writing XSLT transforms, readings about Cognitive Science and Semantic Networks, etc.

2000-01-26

Discovered the Flora of North America (www.fna.org) by searching with Altavista sites having the same keywords as this one. It's magnificent!

The worldwide botanical database project has moved to http://wwbota.free.fr. My personal pages are now at http://jmvanel.free.fr .

I subscribed to x3d-contributors@web3d.org.

2000-01-23

Downloaded lots of things for 2D and 3D images:

fds for SVG from Univ. of Tsukuba, a vectorizer for bitmap images (I'll test it on Fairchild Tropical Garden http://www.ftg.fiu.edu/ herbarium images)
Geometra, a stereoscopic software generating VRML files; I have no images right now to test it; there is another free download called Camora :-) or something.

2000-01-22

I provoked a discussion on xml-dev@ic.ac.uk about "XML for 3D geometry and objects".
See archive: http://www.lists.ic.ac.uk/hypermail/xml-dev/
Conclusion: we have to evaluate VML and SVG from the W3C, and X3D of the Web3D CONSORTIUM, the designated successor of the ISO Standard VRML97.
We also need nice editors for botanical 3D shapes: leaves, flowers, fruits, twigs. Note that the 3D structure of a fruit, and its time evolution from a flower has much in common with CAD (Computer Aided Design) models in mechanical engineering with montage sequence.
But the previleged source of 3D data remains of course real specimens, or herbarium specimens, which brings the necessity to have stereoscopic software.

2000-01-20

Having well attacked the parsing, I am now investigating AI and 3D representation subjects. I'm reading "Semantic networks" by Lokendra Shastri, which has an interesting knowledge model, and proven techniques to implement the searches. The model allows for probabilities, e.g. flower 70% rose, 30% white.

2000-01-15

Semantics

I finally found a name for the concept of an adjective defining no property, but narrowing the subject, like "inferior" in "inferior leaves". It's called relationnal adjective, or pseudo-adjective.

3D geometry

I'm investigating syntaxes and formats for CAD (Computer Aided Design) data: IGES, STEP, etc. Apparently no XML thing exists in this field. Maybe use MathML vocabulary to express Bezier and NURBS algebra, together with a vocabulary borrowed from STEP, and of course botanical markup.

I'm also investigating algorithms to obtain a CAD description from a 2D volumic mesh.

XML

My XML outfit is growing and includes:

browsers: IE5, Mozilla M11 (soon M12 will display XML)
editors: XED, XML Notepad
transform engines: J. Clark's XT, IBM's LotusXSL (it works as an applet inside IE5 to enable a proper XSLT-compliant transform)
RDF engines: DataX, SiRPAC

Contacts

the neighbouring INRIA at Rocquencourt (French Institut National de Recherche en Informatique et Automatique)
Brave Gnu World, the famous mounthly Free Software Foundation chronicle

2000-01-05

Lots of new items at http://jmvanel.free.fr/ , including a few Diospyros descriptions, actually parsed by FloraParse from the Flora of China, with an enhanced sample User Interface for queries.

1999-12-29

The Lex-Yacc-C++ parser for classical Floral is ready for release. Mail me to get the sources.
At last I wrote a list of tasks. See if something is within your competences.

1999-12-18

The Lex-Yacc-C++ parser for classical Flora begins to breathe; it will be released for Christmas under the GNU Public Licence. I looked at the Flora Europeae.
I'll ask Bosak for advice.

Contacts

Free Software Foundation (www.gnu.org) has sent an encouraging mail

The Filters project is writing conversion routines from proprietary formats to XML-based file formats ==>TODO: ask for information about this project

France's Museum National d'Histoire Naturelle, at the Arboretum of Chèvreloup
Don Kirkup of Kew Gardens is working in the same direction (markup of floristic texts)
lots of contributions on the TDWG - Structure of Descriptive Data <TDWG-SDD@USOBI.ORG> mailing list; an interesting discussion since Mon, 22 Nov 1999 on the

LISTSERV@USOBI.ORG

http://usobi.org/archives/tdwg-sdd.html

I (JMV) got a password for the very interesting West Australian Florabase database: http://www.calm.wa.gov.au/science/florabase.html
lots of activity on the W3C's mailing lists:www-rdf-interest@w3.org and

mailto:xml-dev@ic.ac.uk

looked dictionary projects: very promising in that they offer unambiguous semantic for common vocabulary:

wordnet@princeton.edu

www.dict.org

looked Xdelta (http://www.bath.ac.uk/~ccslrd/delta/index.html) , a direct translation of Delta format in XML; comments: probably better to declare character numbers as ID; characters could be turned into rdf:Property

1999-11-13

A first version of a Lex-Yacc-C++ parser is almost ready; it will be applied on the floras of China and Australia.

1999-10-19

First prototype of a Web page displaying botanical descriptions with several stylesheets. Sorry, it works only with Microsoft Internet Explorer 5, which is freely downloadable. Queries with XSL/XPath are in preparation, also possibility to use any browser, using an applet to transform the XML.

Thinking about new pages:

specifications
botany for computer scientists
computer science for botanists

Contacts with a botanist at The Royal Botanic Gardens Kew, interested in the automated handling of descriptive botanical information.

1999-10-10

Contacts with specialists in image generation (L-systems), and image database.
Posts presenting the project have been made in:
comp.ai, comp.ai.nat-lang, comp.databases, comp.databases.object, comp.lang.xml

1999-10-04

In the present time we work more with the computer people than with the botanists, because the requirements appear clear enough, and because we must come up with some convincing prototype. The botanists will not believe to the feasibility of the project until they see something concrete.

However contacts have been taken with the IOPI (International Organization for Plant Information ).

I studied the feasibility of an automated translation of a standart flora into a XML file. The example was from the Flora of China at http://flora.harvard.edu/china/search/search.html ; the Guidelines for Contributors is a very interesting document showing that the Flora of China is a very well-defined document, but a parsing using standart tools like LEX and YAC is not applicable. Here is a hand-made parsing of an example species taken from the Guidelines for Contributors. It can be seen that the required tool must have a basic knowledge of nouns and articles, to be able to translate expressions like "leaf blade" and "floral tube". A post was made in comp.ai.nat-lang, to ask for the relevant techniques.

A prototype Web page for a query was made.

The overall architecture will be inspired by the book of S. Mohr (Building Distributed Applications with XML, LDAP, and IE5, at Wrox Press), who envisions servers on Internet, offering their knowledge via HTTP requests, and advertising their services via Metadata.

XML as an exchange format;

URI (Uniform Resource Indicators) will be established to point to species;
using these URI and the RDF (Resource Description Format) syntax, any other document will be able, by refering to any plant species, to add information to basic taxonomic data provided;
several regional floras servers can cooperate for a single request; the result XML files will be concatenated, either on the client browser, or on a front server;

on the server side:

possibly also XML with DOM (Document Object Model ) and XSL as a query processor on the server side; this solution would need an efficient multi-threaded DOM implementation, capable to manage 600 Mbytes (300 000 species x 2kbytes) and tens of requests simultineously;
or an OODBMS;
or an indexing technology similar to the internet search engines;
or a batch file-to-file sequential processor will able to treat not-too-complex queries on 64 Mbytes computers;

on the client side, a browser will be enough to manage regional floras or several large families; XSL as a query processor and a few stylesheets and scripts will provide a state of the art User Interface;
of course, replicas on CD's (with only the most representative images) or DVD are possible.