Saturday, 17 February 2018

Traps -or at least pitfalls - for old players

Reader advisory:  Please note that is a tale about an error - so read the whole post before trying to emulate this!  As consequnce of my discoveries some of my earlier posts about the 2016 Census have been massaged a little to reflect my new understanding.

When the first tranche of 2016 Census data was released in June 2016 I seized upon the General Profile for Carwoola with glee and did a basic comparison of people and dwellings between 2016 and 2011 Census results.  There was no drama with the people comparison, but the comparison of data from the Profile with information for 2011 Table Builder results suggested a fairly serious undercount of dwellings in 2016.  This is that comparison (its an image not a table).
I developed various conspiracy theories about how this arose and in February 2018 decided to document the issues with the poor count of dwellings.  As I greatly prefer using Table Builder (TB) to wading through a squillion worksheets of profile data I created a TB table of the dwellings data to check I was doing the right thing.  A significant difference appeared for the 2016 Census data .

Following an exchange of emails with the ABS contact (thanks Harry for your great service) I have realised where my error(s) lay.  Here is the correct data (again an image):
Rather than a decline in number of dwellings there is the expected increase.  How has this error arisen?

Here is an image of the tables pretty much as they appear in TB and the General Profile G32. (I have removed some 'empty' lines from the Profile to fit the image on to a single page.)
The similarity of the two sets of labels is clearer when the repetitious stuff is removed from the TB version.  I have also emphasised a few crucial words in the Profile version.

First Issue

In many cases the data label in the two sets is identical (eg Separate house; Improvised home, tent, sleepers out) so I believed the data items were the same.  This was an error, as in the TB series both those items included numbers of unoccupied dwellings while they were in the separate item "Unoccupied private dwellings" in the Profile.

Second Issue

The second footnote to the Profile table says the data excludes "Other non-classifiable households".  Exactly what was included in that element was difficult to track down but eventually I found the following under Household Composition HHCD in the Glossary component of the Census Dictionary: 
"The 'Other not classifiable' category includes those households which the ABS Field Officer determined were occupied on Census night but where the ABS Field Officer could not make contact; households that contained only persons aged under 15 years; or households which could not be classified elsewhere in this classification because there was insufficient information on the Census form." 
I have emphasised some words there as it emerged when looking (through TB) at the "Other not classifiable households" they all contained Imputed persons and no other persons.  Thus in the case of Carwoola this item is a measure of Dwelling non-response.

Concordance

Dwellings in Table Builder530
unoccupied Dwellings -41
Occupied Dwellings489
Visitor only Households-3
Other non-classified households-27
Profile definition459
Profile occupied dwellings
452
I am relaxed about regarding the residual difference of 7 dwellings as being the cumulative impact of the random perturbations to preserve privacy.

Recommendations

I realise that in Census it isn't always possible to satisfy everyone and that many trade-offs have to be made.  In particular if product A for year 20xx is changed to be compatible with product B for that year it immediately raises an incompatibility with product B for year 20XX-5).  Further making the changes suggested below for Table G32 and 2 entries in the Census Dictionary might require other changes to render other tables and classifications to be consistent in style. (That might not be a bad thing, apart from the workload involved.)

Further it isn't possible to idiot proof things (because, as a Canadian friend said "They keep making bigger idiots." Perhaps I should put my hand up here?).  However it seems to me that there are a few things that could be done to avoid words having different meanings between platforms.

In writing what follows I am assuming that the Profiles series will continue into the future, as being used by many folk,  even though TB is IMHO far more useful for anyone with on-line access.
  • I suggest that the profile table G32 be adjusted to include an extra column for "All private dwellings" which include occupied private dwellings, unoccupied private dwellings and other non-classifiable households.  This should thus be comparable with the TB results for STRD.  By also retaining the column as currently designed it will allow consistency between years and a direct relationship between the number of occupied dwelling and the people therein.
  • It would be helpful for Census Dictionary classifications to be supplemented as below:
    • STRD have additional text added to indicate that "not stated includes households which the ABS Field Officer determined were occupied on Census night but where the ABS Field Officer could not make contact" - ie the text from HHCD and possibly a second item to indicate that it included dwellings where the person who filled in the form on-line did not indicate the structure of the dwelling.
    • HHCD have additional test summarising the explanation of "not classifiable currently in the Glossary.

No comments: