Wednesday, 1 May 2013

ANPS WW data and the ALA

Welcome to the world of acronyms!  For those not familiar with the things I do, the subject of this post refers to the Australian Native Plants Society Canberra Region (ANPS) - particularly the Wednesday Walkers (WW) of that Society and the Atlas of Living Australia (ALA).

Following a suggestion by the President of ANPS, and based on my experience in putting COG GBS data onto the Atlas, I have created a database into which I am copying the plant lists from the Wednesday Walks and uploading the results of that on to the ALA.  The purpose of this post is to serve as a progress report to the WW so that folk know what is going on.  {With computing systems it is always usually a good idea to let people know what is going on during development rather than surprising them with a fait accompli!!}

There are several objectives to this process:
  1. It should ensure that the data collected by the WWs is not lost if one or two computers go bung;
  2. The data becomes available to a wider audience (all users of ALA) than when the data is only listed on the ANPS website; and
  3. As the database gets updated the data can be used for cross classifications and other analyses.
I'll try to address how those objectives are being (or can be) satisfied a little further on.

The basic record (there are 4843 of these in the database)  is of a plant taxon observed at a site in a nominated period of time.  A few definitions might be useful there;
  • Taxon: In terms of submission to ALA the taxon may be a species or a subspecies while the database also holds records only identified to the genus level.  (We concluded that if we weren't sure about the identification to species level we shouldn't report it to ALA.)  The database currently includes 787 taxa: of these
    • 614 are the usual Genus species bipartite names
    • 50 have an appended subspecies or varietal name; and 
    • 123 are only identified with confidence to the genus level.  This includes both situations where only the Genus is known (ie names of the form Genus sp.) and situations where the species is suspected but with reduced certainty (ie names of the form Genus sp. species - this is easier to analyse than the form Genus ? species used on the ANPS WW website.)  There are 325 records of these types.
    • It is possibly of interest that the 5 commonest families - in the sample of data entered thus far - are Asteraceae, Poaceae, Orchidaceae, Myrtaceae and Fabaceae.  This is likely to highly biased by the sites processed thus far.  In total I have entered at least one taxon for 102 families (including some non-flowering plants).
  • Site:  This is usually the walk that is taken, georeferenced to the point we start walking.  Where we do multiple stops on a car crawl, the separate sites are listed individually.  The record thus far is a trip to Tallaganda which generated 5 sites.  The image below shows the location of the most recently uploaded set of sites.
  • Period of time.  When the walks started the ALA was not even a concept and databases needed a computer the size of a small house to run.  So things weren't quite set up in an ideal way for this project.  (They did, of course, do what was needed in terms of giving walkers a list of plants seen in the area.)  A key aspect of this was that the walks were recorded by updating the list from the previous visit to a site and it isn't possible to unravel this.  Fortunately the ALA is able to accept a record spanning several years (Atlassing is after all a spatial, rather than temporal, process.) Since mid 2012 separate records have been kept for each outing.  Some comments about the records entered to date.
    • Including the old one-off visits, 43 of 57 visit records are single day records;
    • Of the 14 multi-day records the longest duration covered by a composite is 4095 days (about 11 years) for Brooks Hill Reserve near Bungendore.
    • Note that where something dramatic occurred to change the vegetation - eg the bush fires in the Tinderries - a new record was commenced.
Checking details is an interesting process.  A list of 'good' species is being compiled as I go along.  When loading a new visit the first step is to identify all species not currently in the Master species list (fortunately ACCESS has a predefined process for this). With 57 visits included I am usually finding only 1 or 2 additional species per visit unless a new habitat type is loaded.
  • When a visit to Mount Ginini (elevation 1762m) was entered 26 species were added to the species list which up to that point had related mainly to elevations in the 600 - 900m range..
  • Similarly a visit to the sandstone country at Touga Rd in Morton National Park added approximately 40 species to the list generated from the granite and shale of the Monaro.
Any mismatches are then checked against Plantnet to check that there are no typos etc.  I have discussed the naming of plants, and how this is set up to guarantee typos, elsewhere.
  • An issue with Plantnet is that it follows the policies of the Royal Botanic Gardens and where the taxonomists are declaring war on each other (boys will be boys, even if some of them are female) it may be that the names we have accepted do not generate a match.  This is particularly the case with orchids, but fortunately we have - he said with great optimism - a authoritative list of good names for orchid species.
  • I do also glance at the distribution maps in Plantnet to make sure the species isn't restricted to (eg) sandhills around Silverton but basically accept the identifications by the experts in the WW crew.  In some cases it seems that the records we have developed are range extensions - that is a good thing.  By way of example this image shows the Plantnet distribution for Damasonium minus with two red dots added to indicate (roughly) sites where the WWs have found it.
The database design is helpful in maintaining quality by a process of redundancy.  Species names appear in 3 tables and these can be readily compared to ensure that all is balanced.  Similarly Site names appear in another 3 tables which can be checked against each other.  

Possibly the most difficult variable to check is the geocoordinates.  Fortunately the process for loading data to ALA includes a sandbox facility with mapping functions, so that when a minus is inadvertently added to a longitude giving a site in the location indicated by the Eastern dot (about 1700km SE of the Cook Islands) ...
... it can be fixed.  That's a bit far out of the Canberra region even for a Field Trip!

I have also run the dataset by another pair of eyes before submitting to the Atlas (which resulted in a few nuances being remedied).

So how are we going against the three objectives?
  1. We have about (it is difficult to assess the denominator) 25% of the walks now in a widely recognised public domain, which is a good step towards objectives 1 and 2.
  2. I have used the database in a rudimentary fashion to compile an article to be published in the ANPS (Canberra Region) Journal currently in press.  A first step along the winding path towards objective 3.

Much more data needs to be loaded before complex analyses can be undertaken.  I hope to have completed the data entry to the data base by the end of this year.

No comments: