Census Imputation by type of area
I was looking back through some of the posts I have done about the 2016 Census and found a comment about being able to analyse imputation rates by Urban-Rural nature of area once that classification is included in Table Builder. That data release has now happened so here is the analysis I have done. As is often the case, investigating that topic led to a number of other interesting topics so this post has got rather long but hopefully is still coherent.
By way of background, nearly everyone that completes a form provides an age. Thus imputed age is an indicator of non-response. In many cases this is where:
My previous posts (here and here) have all covered Person imputation, and I will return to that below, but before that here are some observations about Dwelling imputation..
In the case of Carwoola there are 3 (thus probably only 1, due to rounding) ONCH dwellings which were not imputed. So it would seem fair to say that ONCH ~ Non-responding dwelling.
The % of total dwellings imputed is also interesting.
I shall return to this comparison below.
Later, for reasons discussed in comparing Dwelling and Person Imputation I looked at tables comparing imputation of age and sex for NSW, Queanbeyan and Carwoola. The proportions of age records for which sex was also imputed were respectively 93.3%, 96.1% and 96.3%. Again this shows support for age imputation as a simple proxy for Dwelling imputation.
The first chart is looking at age imputation rates by type of area for New South Wales as a whole.
As might be expected those with No usual address have the highest imputation rate. The Rural Balance are higher than any of the more urban types of area, with Major Urban being the best performed.
I then looked at the Queanbeyan Palerang Regional Council (QPRC) area and found a similar pattern apart from the Migratory etc and No usual address categories which were both nil.
While the ranking of the types of area is similar the difference between Rural and Urban areas is even more pronounced.
I then investigated which areas were in which type of area. These are commented on below. A summary of the definition of the types of area is shown in the Census Dictionary. A key factor is the definition of an Urban Centre as "a cluster of contiguous SA1s with an aggregate population exceeding 1,000 persons contained within SA1s that are of 'urban character'. "
The imputation rates for the six elements shown in that table are "interesting":
Yet again there is a marked difference between the Rural and Urban elements.
From my knowledge of our area it is more likely that there are dwellings with no letterboxes as people have mail boxes in town. Over the collection period it is more likely that even where there isn't a dwelling (either occupied or vacant) the owner will clear the crud from the mailbox.
It is also the case that this will distort the comparison between rural and more urban areas. This arises because the quality of the Australia Post address lists in Major Urban Areas has been thoroughly assessed in deciding to use those lists as the basis for delivery and thus there is not likely to be a problem in those areas.
I have suggested to my Council that a key aspect of their use of Census data is to compile estimates of the expected number of dwellings by State Suburb based on the number of dwellings in 2011 plus development applications for dwellings in the period up to 2016. If this is greater than the number of dwellings recorded in 2016 further investigation is needed.
If, as seems likely, the number of people has been over-imputed this may to some extent counterbalance the understatement of dwellings for applications where the total number of persons is all that is required. However any analysis of characteristics other than age and sex is going to be flawed due to the unduly high not stated rate. It is also probable that the characteristics of people in 'missing' dwellings will be different to those in reporting dwellings. I have no idea how significant those problems are because I have no information about:
By way of background, nearly everyone that completes a form provides an age. Thus imputed age is an indicator of non-response. In many cases this is where:
- it is believed the dwelling is occupied but
- a completed questionnaire has not been received.
Aspects of Imputation
There are two aspects to imputation. The first, and simplest, I will term dwelling imputation where no form was received for an occupied private dwelling and the number of males and females in the dwelling needed to be imputed - in effect all person records are imputed. The second I will term person imputation and also includes cases where the number of males and females is known but at least some persons within the dwelling have not provided the basic demographic information (age, marital status and place of usual residence).My previous posts (here and here) have all covered Person imputation, and I will return to that below, but before that here are some observations about Dwelling imputation..
Dwelling Imputation
A variable of particular interest is where the dwelling is rated as an "Other non-classifiable household", since this excludes them from the crucial General Profile table G32. This category includes:- households which the ABS Field Officer determined were occupied on Census night but where the ABS Field Officer could not make contact;
- households that contained only persons aged under 15 years; or
- households which could not be classified elsewhere in this classification because there was insufficient information on the Census form.
The Dwelling imputation flag would be set in subset 1. This category seems to provide a very high proportion of the dwellings categorised as Other non-classifiable household" - ONCH - as shown below.
% ONCH dwellings imputed | |
NSW | 98.3 |
QPRC | 98.9 |
Carwoola | 90.0 |
The % of total dwellings imputed is also interesting.
% total dwellings imputed | |
NSW | 4.41 |
QPRC | 5.21 |
Carwoola | 5.09 |
Person Imputation
My starting point for this element is that most people who complete information in a Census record will show their age as it isn't a sensitive item (unlike say, income). Thus age imputation could be seen as a reasonable proxy for a person not completing a form. To check on this I created a table for Australia cross classifying Country of Birth of the Person x Age Imputed. In summary;- 94% of person records for which age was imputed showed Not Stated for Birthplace;
- 75% of records which showed Not Stated for Birthplace had age imputed; and
- where a Birthplace was stated (at the 1 digit level) the age imputation rates varied from 0.2 -0.4%
Later, for reasons discussed in comparing Dwelling and Person Imputation I looked at tables comparing imputation of age and sex for NSW, Queanbeyan and Carwoola. The proportions of age records for which sex was also imputed were respectively 93.3%, 96.1% and 96.3%. Again this shows support for age imputation as a simple proxy for Dwelling imputation.
The first chart is looking at age imputation rates by type of area for New South Wales as a whole.
As might be expected those with No usual address have the highest imputation rate. The Rural Balance are higher than any of the more urban types of area, with Major Urban being the best performed.
I then looked at the Queanbeyan Palerang Regional Council (QPRC) area and found a similar pattern apart from the Migratory etc and No usual address categories which were both nil.
While the ranking of the types of area is similar the difference between Rural and Urban areas is even more pronounced.
I then investigated which areas were in which type of area. These are commented on below. A summary of the definition of the types of area is shown in the Census Dictionary. A key factor is the definition of an Urban Centre as "a cluster of contiguous SA1s with an aggregate population exceeding 1,000 persons contained within SA1s that are of 'urban character'. "
- Major Urban: The urban area of Queanbeyan is part of the Urban Centre of Canberra-Queanbeyan, which with a combined population of over 100,000 is a Major Urban Area. ( I am intrigued as to the how the areas can be combined, given the farmland along Canberra and Pialligo Avenues, but for now just accept it,) The Major Urban area includes the State Suburbs of Queanbeyan; Queanbeyan East; Queanbeyan West; Greenleigh; Karrabar; and Jerrabomberra.
- Other Urban: This is Urban Centres that are not Major. In the QPRC area this is the urban parts of the State Suburbs of Braidwood, Bungendore and (very surprisingly given the merger of Queanbeyan and Canberra noted above) Googong. There will be more detail about these areas below.
- Bounded Locality: The only Locality in the QPRC area is Captains Flat (population 450). (On driving through the village Nerriga has some aspects of a locality but the population of 73 is too small, so it is part of the Rural Balance.)
- Rural Balance: The rest of the area, including the entire Stoney Creek Gazette catchment.
Braidwood | Bungendore | Googong | |
Urban | 1267 | 3323 | 1522 |
Rural | 380 | 861 | 1163 |
The imputation rates for the six elements shown in that table are "interesting":
Yet again there is a marked difference between the Rural and Urban elements.
Comparison of Dwelling and Person Imputation rates
It is unfortunate that it is not possible in Table Builder to compare dwelling and person variables in a single table. This table compares the imputation rates for dwellings and persons.
In each case the % of persons for which age was imputed is higher than the % of dwellings for which age was imputed. Possibly this reflects the fact that some persons for which age was imputed did provide a questionnaire. However the ratio of the two rates is far above the proportion of age imputed persons for which age and sex were both imputed. This causes me to consider that too many people are imputed when a dwelling is not contacted: that conclusion is supported by findings in the CIAP report which showed over-imputation to be a significant component of gross overcount.Discussion
The above is simply a statement of fact in that these are the results of the Census. What follows is somewhat more normative as it is based upon the author's inferences and opinions. There are three elements to be considered:
- What has happened to cause the difference between Rural and Urban elements?
- How does this affect the "fitness for use" of the data?
- What are the implication for the ABS and users of the data/
What has happened?
I have described in other blogposts including this one my observations of the Collector for the area in which I live. (As they don't actually collect forms any more they are now called Field Officers by ABS. However I am a demon for tradition and will stick with Collectors.)
I also contacted some neighbours (in different Collector's workloads) to try to get a feel for what happened in a somewhat larger area. It seemed to me that our area was similar to many others, but not all, with Collectors not actually visiting the house but just leaving the materials in the letter boxes.
To some extent this is a continuation of problems evident for rural-residential areas in the 1996 Census (when I was the Director of the Field operation). The Collectors for such areas felt hard done by as:
- they had a large number of dwellings in their workload relative to 'pure' rural areas; but
- had more issues of access (long drives, locked gates, territorial dogs) relative to the urban areas.
However in 1996 their pay was linked (in part) to the number of forms they physically delivered to their supervisor so they had a fair incentive to maximise response rates. I don't know the pay system for 2016, but the ABS' ambition was to maximise on-line response so the incentive of handing in a hard copy form didn't exist. Thus some rural-residential (and possibly some pure rural) Collectors appeared to have overcome the problems in point 2 above by simply dropping the forms off in letter boxes.
In the more urban areas the access problems don't arise so the Collectors would be more likely to walk the 10 metres to the house and make contact. (There are other issues such as security buildings but those may be overcome by the mail out approach and it is generally able to be determined which residences are occupied etc.)
In cases where the Collector has (in effect) enumerated letter boxes rather than dwellings the effect can vary between three circumstances (more detail is at this post):
- An occupied dwelling exists but has no letter box: unless the occupants contact ABS to get a form (or complete the form somewhere else, which was our situation) neither the dwelling nor the residents thereof are counted;
- A letter box exists but there isn't a permanently occupied dwelling:
- If material is removed from the box - an occupied dwelling is incorrectly recorded and people are incorrectly imputed;
- If material is not removed from the box - an unoccupied dwelling is incorrectly recorded but no people are imputed;
Impact on fitness for use
Taking the three situations of letter box problems into the data it would seem that the likely outcomes would be:- an understatement of the number of dwellings - situation 1 being greater than situation 2.1 plus 2.2; and
- an unduly high imputation rate as a consequence of 2.1.
It is also the case that this will distort the comparison between rural and more urban areas. This arises because the quality of the Australia Post address lists in Major Urban Areas has been thoroughly assessed in deciding to use those lists as the basis for delivery and thus there is not likely to be a problem in those areas.
I have suggested to my Council that a key aspect of their use of Census data is to compile estimates of the expected number of dwellings by State Suburb based on the number of dwellings in 2011 plus development applications for dwellings in the period up to 2016. If this is greater than the number of dwellings recorded in 2016 further investigation is needed.
If, as seems likely, the number of people has been over-imputed this may to some extent counterbalance the understatement of dwellings for applications where the total number of persons is all that is required. However any analysis of characteristics other than age and sex is going to be flawed due to the unduly high not stated rate. It is also probable that the characteristics of people in 'missing' dwellings will be different to those in reporting dwellings. I have no idea how significant those problems are because I have no information about:
- the nature of the the persons in missed dwellings; nor
- the analyses undertaken by people looking at small area Census data.
Conclusion
It is hard to come to a hard conclusion when the discussion finishes with a statement about the author's lack of knowledge. However it seems clear to me that:
- there is scope for ABS to improve the quality of Census data, in particular for Rural areas, by closely monitoring the delivery process; and
- users of small area data need to pay heed to the number of imputed records, both dwelling and person, in the area of interest.
Comments