FloraParse V 1.3.1 Release Notes 2000-11-06 FloraParse, a parser for classical Floras generating XML markup. Copyright (C) 2000 Jean-Marc Vanel & Worldwide Botanical Knowledge Base This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. Explanations: see: http://wwbota.free.fr/Samples/parsing.htm http://wwbota.free.fr/What_s_new.htm This version is shipped with a sample from the Flora of China (FOC) that was exported as HTML from a Microsoft Access database. - NEW for this version: use of Wordnet for generation of tags - tested on Linux Redhat 6.2; has sometimes been tested on HP; should work on any Unix - to (re)generate parser and perform a test run, type: gmake clean; gmake - you must have Wordnet installed (http://www.cogsci.princeton.edu/~wn) - corrected bad generation of tags from noun+adjective , treated as noun+noun ( e.g. ), when the second word is both a noun and an adjective; this is in function findTagName(). This solves many problems, but there remains some, e.g. is not generated, as it should be. Projects for next version: - distinguish generation of tags and ( see taxonomy.xsd ); this is in function chooseNumberTag() - treat pseudo-adjectives, by using an auxiliary botanical lexicon - use a XML/HTML parser (e.g. xml.apache.org/Xerces) to parse the HTML and pass each description string to FloraParse - treat floras of Australia and North America - have auxiliary programs (probably with SAX and XSLT) that will rearrange the generated HTML/XML as in: http://wwbota.free.fr/Samples/DisplayDescriptions/species.xml - provide on the Web the XML version of the FOC descriptions, in a single zipped file, and in one-file-per-species form - provide on the Web XML Schemas and DTD's describing the data - collect the words unknown by Wordnet and add a module in FloraParse to treat them sensibly - ensure that all the information contained in the original document is kept in the XML, so that the XML becomes, with the agreement of the authors, the unique source for the Flora; to demonstrate this, make some publishing XSL style sheets