Saturday, 22 July 2017

A further look at Imputation Rates

Imputation in the Census is a statistical process through which
  • estimates are made of the number of people in dwellings which do not complete a form; and
  • the core demographic attributes (age, sex and marital status) are imputed for each "estimated " person.
In a previous detailed post I concluded that it appeared that

  • problems with non-private dwellings (NPDs) - known by the UN Statistics Division as Communal Dwellings, which I actually think is a better term but not the traditional one used in Australia -  were more an issue in the capital cities; and 
  • incorrectly rating non-responding private dwellings (OPDs) as occupied was the core issue in rural areas.  

This post attempts to investigate that issue a little more rigorously.

I think I do that but will note that what follows runs a risk of illustrating two issues with statistical investigations:
  1. It is possible to find that one gets sucked in to following "interesting trails" well beyond the original focus of the research; and
  2. Analysis can go well beyond the capacity of the data to support the analysis with the result that one ends up where the sun don't shine (and I am not referring to the UK in that).
A first issue has been to find information about the number of people in NPDs.  At this stage I suspect that the data that actually counts people by type of dwelling (eg private x non private) is not available in Table Builder Basic.  However it can be closely approximated by the number of people coded to "not applicable" in the variable "Relationship within household" which can be cross classified according to whether the Age Imputation flag has been set.  That is what I have used below.

A second issue is that some of the most useful geographic codes such as Section of State will only be released as part of the final release in October.  That will be poked and prodded then but for the time being I looked at Capital City vs the rest of the State as an approximate indicator.
It is immediately obvious that the imputation rate for NPDs is much higher than for private dwellings regardless of the State, or part of State in which the dwelling is located.  For the 7 State-level areas for which a Capital and Rest split is made, the rate for NPDs is higher in the City in 4 cases (blue stars), and lower in 3 (red stars) .  For OPDs the rate is higher in the Rest of State in every case.

It is probably important to note the comment by the Census Independent Assurance Panel that a major issue was over-imputation in NPDs.  It appears (see detailed previous post) that more than 1/3rd of imputed records were incorrect.  I don't have any information about whether this effect was even across States or between Metropolitan and ex-Metro areas.  It appears however that it is still unlikely to reduce the real need for imputation in NPDs to the level required in OPDs.

At the risk of falling into risk type 2 above, I looked at the ratio of imputation rates NPD:OPD for states and the metro/ex metro split.  (Note that all the ACT is regarded as part of the metropolitan area - possibly, and reasonably - reflecting the very small population in rural ACT.)
The low ratios in the NT reflect the relatively high levels of imputation for OPDs in that area.

Making an heroic assumption that the behaviour of people in NPDs is similar between city and bush, I think that what this shows is that there is more of a response issue in OPDs in the "Rest of State" areas which makes the difference between the two dwelling types less pronounced (ie a smaller ratio).  I can't work out how to demonstrate whether the higher rate for ex-metro OPDs is:
  • A real response deficiency in such dwellings; or 
  • Due to Collectors in rural areas incorrectly identifying vacant dwellings as occupied on Census Night.
but I have a gut feeling, supported by a comment in the CIAP Report that the latter is (at least) important.  I have emphasised the reference to Census night since in the case where a dwelling is a weekender or such-like the Collector may have evidence that the dwelling was occupied at some stage during the Collection Period and conclude, wrongly, that this was the case on Census Night.  In theory the property owners should be able to indicate that the dwelling was unoccupied on Census Night through the on-line system but I don't think that was, in practice, the case.

I have given some consideration to investigating more detail about NPDs but have decided to leave this to another post.

