Saturday, 24 June 2017

More indignation about Taxonomy!

Warning: this is probably of "special interest"only.  Lots of semantics and pedantry ahead!

I quite often post in a light-hearted and unbiased way about the efforts of taxonomists to add to our understanding of relationships between members of the carbon-based world.  Actually that is not true: I really disagree with quite a bit of what taxonomists do which seems to be more about advancing their careers by boosting their publication count than actually adding to knowledge.  The archetypal model is:
  • Researcher A publishes a paper combining 6 taxa into 3; 
  • Researcher B then comes along and splits the 3 new taxa into 6 (possibly a completely different set to those existing before A weaved their magic);
  • .Researcher C then reviews the whole lot and changes everything back to the way it was originally.  Neither A nor B accept this so all other researchers select which model suits them and war breaks out!
The catalyst for this rant is receiving a table from my friend Ian showing the percentage of each family of birds which he has observed.  He did so knowing this would incite me to attempt to replicate his table for my own, far shorter life list.  How could I  resist?

The initial problem was that of the volume of typing.  Even with only close to 2,000 species typing an entry such as "Machaerirhynchidae" would be very prone to error (and also a cramp)!   The obvious solution was to:
  1. find an on-line list of species which also showed the family to which they belonged and 
  2. automatically match this to my life list. 
Ian assisted by saying he used the IOC list for this purpose.  Once I had persuaded Google that I was totally uninterested in the International Olympic Committee (a general statement, unrelated to this exercise) I was able to download a spreadsheet with 33,500 rows and 14 columns which looked like this:

The first issue is why there are 33,500 rows when there are only ~11,000 species of birds?  The main response to this is that the IOC list includes subspecies which are of no interest to me (I have enough difficulty identifying to species)!  Some creative editing should fix that , but how?

As an aside I'm really only interested in two columns containing Family name and Species name but can see that the others are useful to other people.  That is why the EXCEL delete column function exists.

Some points arose indicated by the lurid symbols in this image:
As shown by the green arrow the family name only appears for the first row of the family.  OK, a bit of "Copy Down by dragging" will fix that.  Of more difficulty was the fact, illustrated by the red arrows, that the species name was a row below the genus name.  Why on earth this is the case I have no idea: my assumption is that someone in the IOC IT department hasn't been taking their tablets to the required dosage.  It looked like a complete bugger to overcome this until I spotted the column headed "Species English" and highlighted by a purple ring: again vernacular names become more useful than the Scientific/binomial names.

I added a record number field (so that I could if needed cross refer back to the downloaded information) and then reduced the number of records by deleting all the subspecies records, and the number of columns to the English name and the family name.  Then uploaded this to an ACCESS DB.  I also uploaded my life list, and ran a mismatch query, which gave me 284 mismatches.

My basic approach was to go through the list of mismatches and then identify the relevant name in the IOC list.

  1. In a fair proportion of cases it was possible to do this by picking a key word in the eBird name and searching for that (or, if a hyphen was involved, searching for the name omitting the hyphen).  
  2. If that didn't work I would refer to Avibase, which has a great list of synonyms, and try them.
  3. If nothing looked promising in the English synonyms I would search the IOC spreadsheet using the binomial words.
  4. In one case - so far - I had to go back to eBird to find the binomial they have adopted and search on that.
I can't change the eBird taxonomy (and all I really want to do is get a matching name so as to be able to capture the Family name) so always change the IOC name.  Then rerun the mismatch query to check that the match occurs.

Here are some samples of the mismatches.
  • Crested Caracara - IOC has split into Northern and Southern. I recorded mine in Mexico, so Northern.  OK: accept split with difficulty.   Delete the word "Northern" from IOC table.
  • Crested Serpent-Eagle IOC doesn't have the hyphen.  Does now.
  • Crested Tit: IOC has European Crested Tit. They also have Grey Crested Tit.  Looking at Avibase their listing for "Crested Tit" offers birds in three genera so it would seem that the eBird listing could benefit from the qualifier but I can't change that.
  • Crested Shrike-tit IOC has Crested Shriketit  As neither 'Shrike' nor 'tit' possibly fair enough.
  • Crested Tern  IOC Greater Crested Tern.  Both have Lesser Crested Tern.  Seems sensible to have qualifier in both, but ......
  • Dark Chanting-Goshawk IOC Dark Chanting Goshawk  Would seem reasonable to have the hyphen if there was some other Goshawks called Dark, Pale or Eastern.  However they seem to be a closely related group so having them as hyphenated also fair enough.
  • Dark-brown Honeyeater IOC has Grey-eared Honeyeater.  OK Not a trivial difference.
  • Unusually eBird has Eastern Mountain-Greenbul but IOC only has Mountain Greenbul
  • EBird has Gould's Sunbird while IOC has Mrs Gould's Sunbird.  Looking at "Whose bird" didn't help resolve which of these is correct!  Both versions are sourced as Vigors 1831.  Of course, go with the sexist eBird
  • The trickiest one thus far has been the eBird species "Greyish Flycatcher".  This isn't listed in Avibase nor in "Birds of East Africa".  Referring back to eBird the binomial is Bradornis microrhyncha  and I was able to find that in the IOC spreadsheet (although the genus has been lumped into Melaenornis.  Why do I hate taxonomists?  The name I am after (to change) is African Grey Flycatcher!
  • As far as I can recall Initially I recalled that there was only one case in which eBird had a species that was rated a subspecies by the IOC.  This was the Red-billed Gull of New Zealand, which the IOC rate as a subspecies of the Silver Gull. I have subsequently found that eBird has splut Black-shouldered Kite and Back-winged Stilt into Eurasian and Australian Species and recognises both African Swift and Black Swift (while IOC only recognises African Black Swift..
I didn't keep a score of the causes of the differences between the 2 taxonomies. It was sufficiently difficult to keep focused on the job of resolving the mismatches.

My overall impressions were that there were only a few cases in which species had been split by IOC.  Of these several appeared to be latitudinal splits (eg a single species now had two species labelled 'Northern' and 'Southern').  A further situation is adding a continental modifier (eg African XXXX): this seems unavoidable where colonial powers use similar names for different species in the various parts of their hegemony.

The situation reported above for Dark-brown Honeyeater seemed reasonably common: how realistic this renaming was is beyond my ability to say in the absence of the papers supporting the change.

The use of hyphens certainly explained a fair number of the mismatches.  This was generally in the "group name" part of the vernacular name (eg Hanging-Parrot vs Hanging Parrot; Sea-Eagle vs Sea-eagle).  I can't recall any in which the "descriptive" part of the name changed hyphenation (eg White-bellied would never become "White bellied").  To my mind the use of hyphens is sensible where a subset of a wider group are all similar: thus "Whistling-Ducks" are a pretty similar subset of "Ducks".

A couple of groups seemed to be very variable in which one of a pair of alternative names were used.  An example here is eBird making much greater use of Francolin while the IOC preferred Spurfowl.  Again I have no idea which is better, but it is a pain in the backside that the "authorities" can't agree which is what.

The haemorhoidal reference in the preceding paragraph is really the source of much of my annoyance: it is fair enough that change occurs where real knowledge increases.  (I am not sure how much of the DNA stuff is real knowledge: the practitioners of that dark art are not good at revealing confidence intervals.)  However where people seem to base their ideas purely on their preferences rather than any evidence I get very annoyed.

Over all I have members of 179 of the 123 families.  There are:
  • 17 families in which I have seen all the species (mainly families with only one member);
  • 31 families in which I have more than half (but not all) the species;
  • 131 families in which I have seen at least 1 species but less than half of those available, and 
  • 59 families for which I have seen none.
A further nuance is that IOC include several extinct species (including all 5 in the Hawaiian Family Mohoidae).  So in a few cases getting to 100% of the species listed by IOC is impossible.  Thanks to Ian  for pointing that out.

