Abstract Data Model for Taxonomy

Version: 0.0
Last update: 2000-11-07    - First version: 2000-02-19
Author: J.M. Vanel

Basically the Data Model includes:

Design principles:

UML diagram




Below are two diagrams that present the same information as the above diagram, split in two parts: It might be more convenient for some readers.




C++ version

Note: this is an abstract model; I use C++/Java syntax because: But be aware that this is an Abstract Data Model, analogous to the XML information set. So an actual C++/Java implementation could be quite different. Anyway the first intended implementation is XML. Also we will see that for different purposes, the same information can be mapped to different structures, with or without loss of information.
 
#include <string>
#include <set>

namespace org_UNO_biology_taxonomy {

class Named { public:
  string name;
};

class Annotated : public Named { public:
  string text, author;
};

class Organ; class Zone;

class TaxonomicClass : public Annotated
// Description:  classification units like family, genus, species, etc
{ public:
  TaxonomicClass* upperClass /*mandatory*/;
  set<TaxonomicClass*> lowerClasses;
  string referenceAuthor, referencePublication, publicationDate;
  Organ* description;
  Zone* repartition;
};

class Feature;
class Property;
class Value;

class Organ : public Annotated
{ public:
  set<Organ*> parts;
  set<Feature*> features;
  Organ* partOf;
};

class Value
{ public:
  string value;
  float occurence;
};

class Feature : public Annotated
{ public:
  Property* property;
  set<Value> values;
};

class Property : public Annotated
{ public:
  set<Organ*> appliesTo;
  set<TaxonomicClass*> appliesToClasses;
};

class Zone : public Annotated
{ public:
};

}; // end of namespace org_UNO_biology_taxonomy

Notes

Issues

Proposed XML Namespaces

The formal definitions would exist in XML Schema and RDF Schema .
  xmlns:bio="http://www.tdwg.org/xml-schema/2000/biology/bio#"
general vocabulary common to whole biology (e.g. color, cell)
 xmlns:tax="http://www.tdwg.org/xml-schema/2000/biology/taxonomy"
vocabulary covered by the present document, that is the abstract machinery used to express classification, part-of relations, general properties and their relations
xmlns:bot="http://www.tdwg.org/xml-schema/2000/biology/botany"
vocabulary for botany, excluding concepts belonging to general biology
xmlns:zoo="http://www.tdwg.org/xml-schema/2000/biology/zoology"
vocabulary for zoology, excluding concepts belonging to general biology
xmlns:mic="http://www.tdwg.org/xml-schema/2000/biology/microbiology"
vocabulary for microbiology, excluding concepts belonging to general biology
xmlns:geo="http://www.tdwg.org/xml-schema/2000/biology/geography"
vocabulary for geographical repartition of taxa

 

 
 
 
 
 
 
 
 
 
 
 

Once the "abstract machinery" has been settled, it will not be difficult to make the other vocabularies, since concepts are already well-defined. Simply each word should be classified as either:

  • Organ, or
  • Property, or
  • Property value.
  • Using standard XML mechanisms, it is no problem if, say, bryologists, want to have their own Namespace and vocabulary, because some terms have a different meaning in bryology and in flowering plants.

    XML Examples

    <bio:descriptions
       xmlns:bio="http://www.tdwg.org/rdf-schema/2000/biology/bio#"
      xmlns:tax="http://www.tdwg.org/rdf-schema/2000/biology/taxonomy#"
      xmlns:bot="http://www.tdwg.org/rdf-schema/2000/biology/botany#"
    >

    <rdf:RDF  xml:lang="en"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#" >

    <!-- just to show an example of  locally defined property -->
    <tax:Property rdf:ID="color">
        <rdfs:domain rdf:resource="http://www.tdwg.org/rdf-schema/2000/biology/bio#Organ"/>
        <rdfs:range rdf:resource="#colorValues"/>
       <!-- color is not associated to Organ or a TaxonomicClass through appliesTo and appliesToClasses 
              because color can obviously be applied everywhere.  -->
     </tax:Property>

    <!-- just to show an example of enumerated values  -->
    <rdfs:Class rdf:ID="colorValue"/>
      <colorValue rdf:ID="red"/>
      <colorValue rdf:ID="green"/>

    <!-- just to show an example of  locally defined organ -->
    <rdfs:Class ID="petal">
       <rdfs:subClassOf
          rdf:resource="http://www.tdwg.org/rdf-schema/2000/biology/bio#Organ" />
       <tax:partOf rdf:resource="http://www.tdwg.org/rdf-schema/2000/biology/botany#flower" />
    </rdfs:Class>

    <!-- character is assumed to be defined as a rdf:Property in http://www.tdwg.org/rdf-schema/2000/biology/taxonomy# 
       (see above xmlns declaration) -->
    <tax:character rdf:ID="petal_color">
       <!-- reference to  locally defined resources -->
       <bio:organ rdf:resource="#petal"/>
      <tax:property rdf:resource="#color"/>
    </tax:character>

    <!-- species is assumed to be defined as a tax:TaxonomicClass in http://www.tdwg.org/rdf-schema/2000/biology/taxonomy#  -->
    <tax:species rdf:ID="S1">
      <name>domestica</name>
      <genus>Prunus</genus>
      <petal_color  rdf:resource="#red" />
    </tax:species>

    </rdf:RDF>

    </bio:descriptions>

    Transform between structures and file formats

  • Transform from structure with Features nested to structure with separate list of characters (matrix)
  • Algorithm:
    Transform from standard Flora to  structure with Features nested
    The general structure is the same, but the Property (e.g. color) doesn't appear as such, all we have is value(s) like "dark red or reddish purple". However the Organ is available (see  syntaxic processing of floristic texts)
    Transform from Delta to structure with separate list of characters
    Both Organ and Property have to be guessed from the label of each character. However data type is present.

    References