A longer second look at the 2016 Census
Reader advisory!
This post is mainly about pretty technical aspects of the Census.
I think it is however interesting background to the more overtly
interesting stuff about the nature of folk in the area.
Summary
This post is mainly about the quality of the data in the 2016 Census.
Some issues are raised suggesting that there are a few problems with the
data in our small area, particularly for number of dwellings and detailed
characteristics of the population, but they are probably not such that the
results are unusable. It would however be useful to consider the impact
of these findings on more detailed analysis and particularly any comparison
with 2011 Census results.
Background
A couple of weeks ago the ABS released the first batch of 'real'
data from the 2016 Census. (I am very cynical about the stuff they
released earlier in the form of some strange averages. That was for
public relations purposes and not that useful for any analysis.) The data
was released through standard profiles and very helpfully went to quite small
areas.
At that time I posted the outcome of
my analysis, for the State Suburb of Carwoola, of the age distribution, person
counts and dwelling counts. Following some later work my conclusion is that the results are believable although I suspect that some dwellings may have been missed. Of course, if there is an issue with
number of dwellings this implies that there are consequent issues with the
people in them!
Last week the ABS released the first tranch of data for Table Builder, an on-line system
that allows users to download information to fit their own table designs rather
than the standard profile tables. I rate this as a fantastic product
since, in addition to the added flexibility it offers, I hate having to
download umpteen standard tables when all I need is a simple tabulation of a
couple of variables.
There is however an additional benefit in that Table Builder includes
some information not included in the Profiles. I am particularly pleased
to see that information appears, for 2016 only, about the imputation of age
(and a few other key variables). Here is the basic 2016 Table Builder
menu showing the Imputation Flag fields.
Very few people don't give an age when they complete the form so in
effect this indicates person non-response: the collector has identified an
Occupied Dwelling but a completed form has not been received. Although I
didn't complete the form on-line (apart from the chaos of the Census night, I'm
not sure that was an option for people in caravan parks) I would assume that
the on-line data entry system would have been intelligent enough not to allow
the form to be submitted without an age. This should further reduce the
(already limited) scope for imputation being required, except for form non-response.
Let's move on to some results.
Results
The first chart shows the age-imputation rates (ie number of records for
which age was imputed as a percentage of the total number of records) for a
hierarchy of areas (the "selected suburbs" are explained below).
It is interesting that NSW performs slightly better than Australia as a
whole but I will pass over that for now. My set of selected
rural-residential State Suburbs perform a little worse than Queanbeyan-Palerang
LGA as a whole.
It is interesting that is NSW is split into Sydney and the rest the
former has an imputation rate of 5.02% while the latter is at 5.89%. For Victoria the contrast is even more
evident: Melbourne 4.87%, Rest of the State 6.13%. A more rigorous split into urban/rural is not
possible until the full set of information is released later.
The next chart shows the individual State Suburbs of interest to me.
The three components of the Gazette catchment area are shown first, then
Captains Flat (closely linked to the Gazette area) and finally the two more
northerly Suburbs.
At first glance the folk of Hoskinstown are due to visit the Naughty
Corner, while the denizens of Primrose Valley/Urila and Bywong get a large
bouquet.
Discussion
It is important to realise that data is imputed where:
- the Collector assesses that :
- a dwelling is on a property
and
- was occupied on Census
Night; and
- a form was not received for
that dwelling.
If the Collector doesn't realise that a dwelling exists, or if the
collector realises that a dwelling exists but considers that it was unoccupied
on Census Night then data will not be imputed for that dwelling,
This causes particular difficulties in cases such as the
Widgiewa/Whiskers Creek Rds where it seems that the Collector didn't visit
houses but simply left the material in letterboxes. For example:
1. if there is a
dwelling but no letterbox no dwelling record will exist. (That was the
case for our place on Census Night. I am told that a number of other
houses don't have a letterbox as residents use PO Boxes in town close to their
work.)
2. if there is a
letterbox a dwelling record will be created even if there isn't a dwelling
and
1. if the property owner
visits the area on (eg) the weekend and removes the census material from the
box it will probably be recorded as an occupied dwelling (and thus records
imputed) but
2. if the census
material is not taken away from the box it will probably be recorded as an
unoccupied dwelling (and person records not imputed) even if the dwelling is
occupied but the occupier can't be bothered cleaning the crud out of the
letterbox (because the useful stuff goes to their urban PO Box)
The importance of imputation is made explicit in the report of the Census Independent Assurance Panel (CIAP) where it is shown (Table 3.2.2) that
the final under-enumeration rate for the Census is 1.0% being a balance between
a 4.3% gross undercount
of people on Census forms, a 1.3% gross overcount of people on Census forms and a net overcount of 2.1% of
persons imputed (there is obviously a rounding effect in that sum).
Back in the day (1996, 2001) when I worked on the Census the results of
the PES gave something like 1.6% gross undercount and 0.1% gross overcount.
We used to contrast this with the USA (who ran a Census in those days)
and had something of the order of 8.5% undercount and 6.9% overcount but still
claimed a net underenumeration rate of 1.6%. Obviously Australia has a
way to go to get to the US situation, but its all downhill.
Table 3.2.2 also shows raw numbers as well as rates. This shows
that the net overcount of imputed
(aka invented) people was 490,174. Now the number of
imputed age records for Australia given in Table Builder is 1,287,265.
Comparing those two values shows that 38% of imputed records were in
error. (A sensation-seeking journalist would add "an amazing"
before the 38%!) What is the problem?
Again the CIAP Report is very helpful. In the
text of section 3.2.2 they give 4 situations which explain the over-imputation:
1. non-responding
private dwellings were incorrectly deemed to be occupied on Census night;
or
2. too many people were
imputed into a (non-reporting) private dwelling that was correctly deemed to be
occupied (the report notes this to be a small contribution);
3. People were
incorrectly imputed into non-private dwellings on Census night, due to
either:
1. an overestimate of
the Census night occupancy of non-private dwellings, or
2. because people were
counted a second time on a form at their private dwelling residence.
Noting the views of
the CIAP regarding case 2, and noting that there are no significant non-private
dwellings in the Stoney Creek area (as far as I am aware) we are left with case
1 as being a possible cause of over-imputation in this area. This would
fit well with the idea of weekenders, where the property was actually
unoccupied on Census night but the form was removed from the letterbox when the
owner visited on the weekend and thus person records were incorrectly
imputed.
However this may simply balance out – for persons - cases in which the
collector didn’t identify dwellings that were occupied (and in which the
occupants gave up trying to get a form or log-in credentials through the help
line). Thus:
- the number of persons may be not too bad an estimate but
- a higher than expected number of records may have “not stated” for detailed characteristics which are not imputed.
At a more general level the observation that Capital Cities appear to
have a lower imputation rate than the rest of their States is intriguing. In terms of the reasons for over-imputation
offered by CIAP I would have thought that non-private dwellings were generally
more evident in the big cities than the rest of the State (and this will be
checked). It would thus seem that the
impact of incorrectly identifying unoccupied private dwellings as occupied is
largely a rural issue. This suggests to
me that the mail-out –internet back approach worked well in major urban
centres, but the traditional drop-off collector follow-up approach has been
less successful in the rural areas.
Comments