Tuesday, 16 August 2016

Wash-up part 2: the 2016 Census

My basic view of the 2016 Census is to thank goodness I am no longer working on the project.  I really have no view yet what the actual wash-up will be, in terms of the impact on the data collected.  It does however seem clear that descriptions of it as the worst Australian Census ever are not too far off the mark.

The two big ticket items which have been covered in the media have been:
  • The retention of names and addresses; and 
  • The failure of the on-line system on Census night.
I will comment a bit on both of them below, but there are also some other matters that support the  view that “Those who fail to learn from history are doomed to repeat it”.   It is worth reading the quote from Churchill in the linked page for an expansion on that idea.

Retention of names and addresses

Probably the best review of the issues under this topic is the fairly lengthy paper by Bill McLennan (former Australian Statistician).  It is a little surprising that the paper appears on the website of The Australian Privacy Foundation, but it is certainly within their area of interest.

My only addition is to marvel at a comment by one of the senior ABS folk saying to the effect that "We have always collected names and addresses in the Census."  This is ingenuous at best.  Indeed names have always been included on the paper form (it helps some parts of processing the form) but they have never - to my knowledge - been entered in the computer record.  That certainly applied up to and including the 2001 Census.  

A change in 2001 was that people were able to opt to have the computer images of the form, including names and addresses, retained for 100 years to satisfy genealogists.  (That change came against the strong advice of Mr McLennan, the then Statistician, because of his fears that it would cause a loss of confidence in the Census and the ABS.)  However the names and addresses were never part of the ABS data holding.  I believe that to also have been the case for 2006 and 2011.

The failure of the on-line system on Census night

Again there has been one outstanding review of this matter.  

The arithmetic in that article about the number of submissions that could be expected is rather good.  It is particularly interesting that the number of simultaneous accesses expected by the ABS was 1,000,000: that is the number of attempts to call the telephone hotline in 1996 (on my watch) when that couldn't handle demand!  So there is history, suggesting that an extra zero should have been added to their trials.  

There seems to be some muttering that IBM failed in this regard, but that is poor thinking.  The specification of what the system should handle would come from the contract issuer (ABS) not the contract fulfiller (IBM).  Unless what was specified was a completely daft number such as 1,000, IBM would be within their rights to say "it could handle what it was required to handle and the fact that this was woefully understated is problem not bilong mi."

If the problem was in fact solely a denial of service (DoS) attack the preceding point is irrelevant, but it seems interesting that the "nearly successful" attack coincided with a point in time at which the peak load of legitimate demand was just beginning.  In fact the ABS website notes the coincidence of the 4th DoS attack with 3 other events:

  • A fourth denial of service attempt
  • A large increase in traffic to the website with thousands of Australians logging on to complete their Census
  • A hardware failure when a router became overloaded
  • Occurrence of a false positive, which is essentially a false alarm in some of the system monitoring information.

I have emphasised the second point in that list

The article talks about "Big Bang Deployments" as a risk but that doesn't particularly excite me.  All Censuses are Big Bangs as they are pretty much unique and have to work first time.  Despite this the Census is one of the few statistical exercises in which major developments are introduced each cycle.  That is what makes working on them so exciting, and why testing is very important.  Examples of changes introduced in the Australian Censuses I worked on include:
  • 1996:  Computerised mapping processes.  The actual work started on developing this about 1993 and it had to be complete, with about 35,000 maps produced and distributed to every part of Australia by June 1996. A particular challenge was that the data required for the maps was held in 9 different systems by the 6 States, 2 Territories and the Commonwealth.  They all had to be integrated to forma single huge dataset.  It happened.
  • 1996  Computerised field management system This is how the field staff were recruited and monitored through the event.  Had to be working by about June 1996 (and it just about was).
  • 2001 Optical Character Recognition as the basis of data entry.  The dates of release of information were announced 2 years before the Census.  If data entry failed - or even went significanty slower than expected, so did the data release schedule fail.  They were all met.

Other matters

Indigenous origin

As we were in a caravan park on Census night we completed a hard copy form. I was very surprised in doing that to find the question about indigenous origin to have changed position from the top of page 4.  The last time that happened the growth in the number of folk identifying as indigenous stopped, causing great alarm when it resumed in  the following census.  Another example of "deja vue all over again" is in the offing.

Post Enumeration Survey 

One of the crucial deadlines of the Census used to be to get all the material out of the field by early September to avoid conflict with the Post-Enumeration Survey (PES).  This is the essential measure of coverage of the Census   My memory is that it was usually undertaken early in September after the Census.  By way of example the PES in 2011 ran from 4 September to 3 October. (Obviously the greater the time between the Census and the PES the harder it is to align the results.)  As a result of the chaos on Census night the time for completion of Census forms is now 23 September.  I suggest that this is later than used to be the case in an all hard copy Census.  There seem to be two alternatives:
  • Folk will be able to submit forms while the PES is in the field; or
  • The PES will be later than in the past with consequent higher mismatch rates.
In either case the utility of the PES as a measure of the Census undercount will be compromised.  This will give some issues in assessing the reliability of the population estimates, used (inter alia) to allocate Commonwealth Electorates between States.  A very serious problem.

From a question I asked ABS it seems the PES is going to be later but not as late as the end of the form submission period (although that now seems to be a rubbery).  Oh dear, oh dear.

Cost of the Census

An article in the SMH by Peter Martin raises the issue of the cost of the Census.  Although he doesn't raise it, this caused me to reflect on another statement by ABS in which they talked about having about 40,000 field staff.  Allowing for population growth that is about what used to be the case for collector delivery and pickup.  From my view the whole benefit of a mail out/online back approach was to reduce the number of field staff, and thus their cost (which from memory used to be about 50% of the total cost of the Census).  Certainly, when we looked at introducing a mail-back census for 2001 the modelling suggested that it would significantly reduce costs.  As would a successful on-line operation.  (In the event, in my exercise, the ABS decided the risk of mail-back not achieving savings outweighed the possible benefits if our luck held expectations were met.)

If they have the same number of field staff plus (for example) all the extra costs of mail out (about 5m letters at $?? each) and the online system,(cited as $9.5m paid to IBM alone) no wonder the budget was in trouble.

I suspect that one of the underlying issues was that it was not realised that managing the finances of the Census is a key aspect of the project.  The statistical side is pretty easy: no mucking about with fancy sampling models, just go and count everyone. In contrast working out the budget when throughput rates have to be estimated; staff payment rates negotiated with Unions and a very large number of  relatively low-paid staff recruited is very demanding, especially when the initial costs are calculated 5 years before the event.  It also requires solid support for the program from senior staff in ABS when the inevitable brawls with external bean-counters emerge.

The views of Gruen

ABC (Australian manifestation) has a hugely popular programme about advertising in which 4 PR/advertising people talk about recent campaigns.  This week (I think 16 August) they paid some attention to the Census and basically tipped a large bucket of poop on the public communications campaign .  It was  hilarious, clever and IMHO correct..  Basically what they had to say about what should have been done was exactly what we did (on the advice of our PR people and our contractors).

An interesting sideline of Gruen was the mention of the ad campaign in Queensland being 'different' to the rest of the country.  This seems to have been the case: all the ads we saw featured The Count - we saw none of the boring standard ones!

Of great concern is that two very clever friends - one with a Census background, the other without such experience, both got a big take away message from Gruen that the Census isn't really needed now.  They also recognised the thrust of the program that the current exercise was in the nose, but stressed the "not needed" point as a key message.  This makes me think that the days of the Census are numbered (probably without a PES to measure the undercount).

It should be noted that Gruen had another serve of the Census the following week: for comedianss it seems to be the gift that keeps on giving.

Follow up

I don't know about the rest of Australia but the follow-up for us was very ordinary.  Despite us being home all day on 20 August we did not get a visit from the Collector, just a blank form and a pathetic card dropped in our letter box.  I was tempted to complete the form and mail it back, despite the fact that this would cause duplication.  Instead I completed the address and marked the box labelled UO in the Field Staff Only back page.  (I also added a note about where we were on Census night, although I doubt if anyone will read that.)

What is of concern is that because the Collector didn't try to contact us, if I hadn't known how to play the game the place would just be part of the "didn't get a form back" morass rather than being a legitimate "counted elsewhere" record.  I suspect there is going to be a big issue here which I don't think the PES will sort out.

In fact I did meet our Collector on 31 August: he was delivering a reminder card as I took the recycling up.  I explained that the house was unoccupied on Census and he should mark someting up accordingly, which he said he would do.  Again he made no effort to come to the house: whether this is because he was slack or if procedures had changed I have no idea.  Again advice from ABS is that he should have come to the house as before.  I address this matter specifically in a second post.

Some further insight to the continuing cock-up is in this article in the Guardian.   What surprises me is that no-one seems to have thought of the possibility that spammers etc have decided that bogus Census emails are a good way of dumping malware on people's computers.  Thus I predict a future bomb for the ABS when they get accused of sending infected emails to people!

The impact of this fiasco

Currently unknown, and won't be known for quite some time!

An obvious impact is the loss of respect for ABS.  Whatever happens with the ultimate response rate, this mess will have an impact beyond the Census as it will make collecting all data more difficult.  Simply sacking a few folk won't overcome this - nor will some PR platitudes.

A major difficulty facing any review is measuring what has actually gone on in the Census.  This would usually be based upon the PES, but in these circumstances I am unsure about the PES (see above).  How the Census data are built into the population estimates will be very tricky and this affects important issues such as allocation of Commonwealth funds and the distribution of electorates between States. Bill McLennan addresses this issue in the early part of his paper.

Obviously if there is a major response issue (either non-provision of questionnaires or non-response to "sensitive" questions) the data will be less useful as evidence in the enormous range issues to which the census data is applied by public and private sector planners and analysts and academic researchers.

The final impact comes from a Big Issue which does not seem to have been talked about in the media.  That is the use of the Census in forming the selection framework for population surveys.  In simple terms selecting a sample needs to have a list from which the sample can be extracted.  It was always the case that the ABS population surveys started with a list of small areas and the size of their populations.  At a small-area level the Census is the only source of the population size.  If the populations sizes in this list are understated  - especially in a biased way - the quality of data from the Surveys will also be of reduced quality so a high proportion of ABS output will be affected until the next Census is run.

No comments: