Sunday, 23 July 2017

2016 Census Age and Ancestry for Gazette area

The only place to begin this is with a discussion of what is meant by Gazette area.  In using Table Builder for the 2011 Census Results I combined 3 State Suburbs (Carwoola (CA), Hoskinstown (HO) and Primrose Valley (PV)) to approximate the catchment area as shown in the graphic below.  I noted that the area for Primrose Valley was not right, as the boundary crossed the Queanbeyan River to include Urila.
 In starting "work" on this post I was surprised to find that the populations of Hoskinstown and Primrose Valley had dropped by about 50% between 2011 and 2016.  This led me to look at the ABS Geography pages which showed that those two suburbs had both been split (which was not at all apparent as the same name had been retained).

Carwoola appears to be the same in 2016 as in 2011.  Primrose Valley has been split with parts now going to Urila (U) and Yarrow (Y) neither of which existed (nor indeed exist) in the State Suburb list for 2011. Hoskinstown has had 3 parts split off going to Forbes Creek (FC); Rossi (R) and Palerang (P).  It is possibly not the fault of ABS as their "Maps" page includes these boundaries (along with Electoral and Postal boundaries) as Non-ABS structures.    However, IMHO they should have a good hard look at how the boundaries are presented in Census output.

Another note for caution: "our" area is Palerang and the State Suburb of Palarang is much further South - close to Dalgety.
The 'recognition' of Yarrow has little effect on analysis as there are no more than 15 people (and perhaps as few as 8) in that area .  With so few people the Census data are intentionally unstable to preserve confidentiality.  Palerang has zero population.  All the other 'new' suburbs have non-trivial populations.  

Population size and Age

Having combined the new suburbs to match the old definitions gives this comparison:
Carwoola and Hoskinstown show a slight - but puzzling - decrease since 2011 while Primrose Valley show some growth.
Using a polynomial to remove the random noise from age profiles shows very similar patterns in 2011 and 2016 for the Gazette area in total.
Getting back into detail here are the State Suburb populations for the Gazette area (plus Urila).

Ancestry

I decided that my second foray into the first tranch of data would be to look at ancestry: if there was a marked change in response rates between 2011 and 2016 I would have expected that to show up here as a major increase in "Not Stated".  In fact the % of records with no ancestry stated has declined slightly since 2011.  Possibly this is an effect of many forms being completed on line.

Looking at the incidence of Not Stated Ancestry and Imputation of age, in most suburbs it appears that in most cases a response of "Not stated" for ancestry means that the age was imputed (ie no form was received for a household).  In a couple of smaller suburbs an unusual situation appears in which the number of records for which age was imputed is greater than the number with Ancestry not stated.  I can only conclude that this is either:

  • an artefact of the confidentiality provisions; or
  • because a few people completed a hard copy form and the answer for age was not legible; or
  • because a few people are amazingly sensitive about their age!

At the most detailed level, 35 ancestry groups were reported in 2011 and 30 in 2016.  Overall:
  • the number of people reporting each ancestry correlates very well (coefficient of 99.4%) between the years;  
  • The 5 most common "ancestries" English (EN), Australia (AU), Not Stated (N/S), Irish (IRE) and Scottish (SCO) appear in the same order in both years. Italian (now 6th) and German (now 7th) swapped places;  
  • 8 ancestries maintained the same rank, 14 moved up the link and 16 dropped down; 
  • 8 ancestries  disappeared between the Censuses while 3 others were added in 2016.
In view of these fairly marginal changes the ancestry profiles are very similar when shown as pie charts.

Looking at the individual suburbs tends to give a similar picture to the above, although with the much smaller population sizes for suburbs other than Carwoola the domination of Australian or English is more evident.  It is also possible that a relatively high percentage (eg one ancestry in Rossi is 4% against the value of 1% for the total area) could be attributed to a single family.



My broad conclusion from this exercise is that the information for 2016 is consistent with that from 2011.  My reservation about the quality of the data continues to be a concern that some dwellings were not identified by the Collectors


Saturday, 22 July 2017

A further look at Imputation Rates

Imputation in the Census is a statistical process through which
  • estimates are made of the number of people in dwellings which do not complete a form; and
  • the core demographic attributes (age, sex and marital status) are imputed for each "estimated " person.
In a previous detailed post I concluded that it appeared that

  • problems with non-private dwellings (NPDs) - known by the UN Statistics Division as Communal Dwellings, which I actually think is a better term but not the traditional one used in Australia -  were more an issue in the capital cities; and 
  • incorrectly rating non-responding private dwellings (OPDs) as occupied was the core issue in rural areas.  

This post attempts to investigate that issue a little more rigorously.

I think I do that but will note that what follows runs a risk of illustrating two issues with statistical investigations:
  1. It is possible to find that one gets sucked in to following "interesting trails" well beyond the original focus of the research; and
  2. Analysis can go well beyond the capacity of the data to support the analysis with the result that one ends up where the sun don't shine (and I am not referring to the UK in that).
A first issue has been to find information about the number of people in NPDs.  At this stage I suspect that the data that actually counts people by type of dwelling (eg private x non private) is not available in Table Builder Basic.  However it can be closely approximated by the number of people coded to "not applicable" in the variable "Relationship within household" which can be cross classified according to whether the Age Imputation flag has been set.  That is what I have used below.

A second issue is that some of the most useful geographic codes such as Section of State will only be released as part of the final release in October.  That will be poked and prodded then but for the time being I looked at Capital City vs the rest of the State as an approximate indicator.
It is immediately obvious that the imputation rate for NPDs is much higher than for private dwellings regardless of the State, or part of State in which the dwelling is located.  For the 7 State-level areas for which a Capital and Rest split is made, the rate for NPDs is higher in the City in 4 cases (blue stars), and lower in 3 (red stars) .  For OPDs the rate is higher in the Rest of State in every case.

It is probably important to note the comment by the Census Independent Assurance Panel that a major issue was over-imputation in NPDs.  It appears (see detailed previous post) that more than 1/3rd of imputed records were incorrect.  I don't have any information about whether this effect was even across States or between Metropolitan and ex-Metro areas.  It appears however that it is still unlikely to reduce the real need for imputation in NPDs to the level required in OPDs.

At the risk of falling into risk type 2 above, I looked at the ratio of imputation rates NPD:OPD for states and the metro/ex metro split.  (Note that all the ACT is regarded as part of the metropolitan area - possibly, and reasonably - reflecting the very small population in rural ACT.)
The low ratios in the NT reflect the relatively high levels of imputation for OPDs in that area.

Making an heroic assumption that the behaviour of people in NPDs is similar between city and bush, I think that what this shows is that there is more of a response issue in OPDs in the "Rest of State" areas which makes the difference between the two dwelling types less pronounced (ie a smaller ratio).  I can't work out how to demonstrate whether the higher rate for ex-metro OPDs is:
  • A real response deficiency in such dwellings; or 
  • Due to Collectors in rural areas incorrectly identifying vacant dwellings as occupied on Census Night.
but I have a gut feeling, supported by a comment in the CIAP Report that the latter is (at least) important.  I have emphasised the reference to Census night since in the case where a dwelling is a weekender or such-like the Collector may have evidence that the dwelling was occupied at some stage during the Collection Period and conclude, wrongly, that this was the case on Census Night.  In theory the property owners should be able to indicate that the dwelling was unoccupied on Census Night through the on-line system but I don't think that was, in practice, the case.

I have given some consideration to investigating more detail about NPDs but have decided to leave this to another post.

Thursday, 20 July 2017

Three days in the life.

May you not live in interesting times! I make that statement having had an interesting 48 hours with the small dog.

We did an 8km walk in Queanbeyan with her on Tuesday 18 July and she was in fine form. On returning home she was put on the lead to put tooth to bone which concluded with her yelping very loudly. This is most unusual for her and when I went to investigate she was caught up in some vegetation. On untangling her she gagged and vomited up a lot of slimy foam.

This was repeated several times over the next couple of hours. She then seemed to come good until evening when she became very unsettled and appeared to be looking for a dark corner to crawl into.  A definite worry.  The unsettled behavior continued through the night and we didn't get much sleep.   The process of settling her down was to cuddle her and stay very still oneself.   By morning she seemed to be getting very unresponsive. 

As soon as our vet (John Montgomery of Sonza) opened on Wednesday morning we rushed her over there.  (Actually in comparison to when we knew she had been snake-bitten and time was of the essence this wasn't much of a rush - my guess is that it took me about 40% longer this time.)   

We were expecting John's advice to involve a green needle as a result of some cardiac situation.  Instead he diagnosed a spider bite (I suspect red-back) and gave her a shot of cortisone and some antihistamine, noting that she may be sleepy but overall the prognosis was OK. Indeed she was sleepy for most of the day. However by evening on Wednesday she had another drink of water and scoffed some beef and rice.  Then she sat on the bed in a pose which clearly said "Where's my Smacko?" 

A more restful night for all concerned.  She still seemed a bit quiet when she got up on Thursday morning but did explore the kitchen floor looking for any food scraps.  However when Frances went to the wardrobe to get her parka Tammy's ears were pricked and she trotted over to the back door looking very interested in going for a walk.  Off she went and despite having had next to no food for 2 days still managed to park a coil.

We kept the walk short, and on return offered her a couple of chop bones.  As she was still rated as a poorly dog she allowed them indoors on her mat.
Demolition dog rools!  The bones vanished quickly.

We then went into Canberra and while Frances took care of some business I went to North Weston Ponds with Tammy in the search for a decent bird of the day. The first interesting sighting was 2 Darters.
 A third Darter joined them but it was a fuss getting the camera out so didn't take that snap.  The hoped for species (White-fronted Chat) appeared but got chased off by a bolshie Magpie before the Lumix could be wielded.

Instead I took an image of the snow-free Brindabellas.
The weather wasn't that pleasant.  Some folk would say that 8oC and 35kph winds was somewhere between ordinary and average. So we didn't hang around.  But having a bit of time before due to pick Frances up I thought I'd swing in to the RSPCA shop to see what they had in the way of doggie jackets, in an effort to overcome Tammy's spitting the dummy when the road is wet.  They had just what was needed: Dog Hoodies at $5 a go..
I didn't run a client satisfaction survey but have put it on a couple of times without losing any digits, which is far from the worst outcome possible.

Overall a pleasant end to a rather fraught period.

Sunday, 16 July 2017

Drips

It was a tad cool out this way this morning.  We recorded -5.0oC while a nearby resident scored -8.1oC just about the time we passed her house on the dog-walk!

Of course this coincided with my test of the lawn sprinkler system firing up.  As a result some of the trees became a trifle ice encrusted.
 In this image I noticed how I had caught some gravid drops ...
 .. and decided to try and capture one at the point of separation from the twig.  Unfortunately this turned out to be a sort of "Whack-a-Mole" exercise but with scores of places for the target to appear.  So I haven't got a plop-drop but I found these make a nice sequence of capturing more mirror images of our house in the drop.

 This one makes it clear what I am on about!

Saturday, 15 July 2017

Some Winter images

This time last year we were in Atherton, experiencing maximum temperatures around 28oC.  This year it feels, some days, as though the number is similar but the letter has shifted 3 places down the alphabet.  Today was not flash.

The morning started off quite warm (0oC) but quite damp after some miserable drizzle.  This did give some interesting cloud FX on the dog walk.  Looking down on the Plain ...
 .. and back towards Taliesin.
 Later in the morning I took myself off to the Plain to see if I could discover some interesting birds.  Not really, was going to be the answer until coming back along Briars-Sharrow Rd I came across a flock of 318 (yes, I counted them) Sulphur-crested Cockatoos.  They have been feeding in that paddock for weeks.

 After the count I heard the distinctive calls of Double-barred Finches but couldn't get them into sight.  A heard record is good enough for Bird-of-the-Day!  As I climbed back into my car a male Nankeen Kestrel was busy sorting out its feathers.   Initially I thought it was a chestnut-headed female ...
 .. but once it put its wing down the grey head of a male was revealed.
 After a short flight its plumage was back in order.

Wednesday, 12 July 2017

A short look at Census imputation rates.

This post is intended to simply show some information from the 2016 Census Table Builder product.  A more detailed and much longer coverage of this topic, which includes justification for my conclusions is at this post.  But only go there if you really like detail about statistics.

The release of Table Builder includes some Imputation Flags which are set for the core demographic variables (Age, Sex, Marital Status) when the Collector identifies an occupied dwelling for which no form is received.  Thus it is effectively an indicator of one part of non-response.  

The importance of imputation is made explicit in the report of the Census Independent Assurance Panel (CIAP) where it is shown (Table 3.2.2) that the final under-enumeration rate for the Census is 1.0% being a balance between a 4.3% gross undercount of people on Census forms, a 1.3% gross overcount of people on Census forms and a net overcount of 2.1% of persons imputed (there is obviously a rounding effect in that sum).

From Table Builder I have compiled charts of age-imputation rates (number of age records imputed x 100/total number of age records) for a range of areas.  Let us first look at some larger areas.  

The set of selected suburbs are those I have blogged about in the past and are specified two graphs down.  A point which interested me was the apparent increase in imputation as areas get more "rural".  The ABS has not yet released the data which shows results classified by Major Urban, Other urban and Rural so I looked at the Capital City and Rest of State values for NSW and Victoria.
In both cases the imputation rates are higher for the ex-Metropolitan area than for the City.  This suggests to me that the mail-out/internet completion approach was not a problem in this regard.  For reasons explained in the longer post I believe a major reason for the issue is unoccupied dwellings being incorrectly classified as occupied.

For the local area  I posted previously about the Stoney Creek Gazette catchment area (Carwoola, Hoskinstown and Primrose Valley) Captains Flat, Wamboin and Bywong.  Here are the rates for them.
At first glance Hoskinstown needs to spend some time in the Naughty Corner while Primrose Valley/Urila and Bywong get a gold star!  However this may simply reflect there being a few weekender residences in Hoskinstown.

My overall conclusion is that the overall Census data for our area is useful.  The number of people and the age distribution are probably pretty good, but I still suspect the number of dwellings is somewhat low and there will be quite high not-stated rates for the variables for which values are not imputed.

A longer second look at the 2016 Census

Reader advisory!  
This post is mainly about pretty technical aspects of the Census.  I think it is however interesting background to the more overtly interesting stuff about the nature of folk in the area.

Summary

This post is mainly about the quality of the data in the 2016 Census.  Some issues are raised suggesting that there are a few problems with the data in our small area, particularly for number of dwellings and detailed characteristics of the population, but they are probably not such that the results are unusable.  It would however be useful to consider the impact of these findings on more detailed analysis and particularly any comparison with 2011 Census results.

Background

A couple of weeks ago the ABS released the first batch of  'real' data from the 2016 Census.  (I am very cynical about the stuff they released earlier in the form of some strange averages.  That was for public relations purposes and not that useful for any analysis.)  The data was released through standard profiles and very helpfully went to quite small areas.

At that time I posted the outcome of my analysis, for the State Suburb of Carwoola, of the age distribution, person counts and dwelling counts.  My conclusion was that the first two were believable but the latter seemed to be rather low considering the area seems to be growing rather than shrinking.  Of course, if there is an issue with number of dwellings this implies that there are consequent issues with the people in them!

Last week the ABS released the first tranch of data for Table Builder, an on-line system that allows users to download information to fit their own table designs rather than the standard profile tables.  I rate this as a fantastic product since, in addition to the added flexibility it offers, I hate having to download umpteen standard tables when all I need is a simple tabulation of a couple of variables.

There is however an additional benefit in that Table Builder includes some information not included in the Profiles.  I am particularly pleased to see that information appears, for 2016 only, about the imputation of age (and a few other key variables).   Here is the basic 2016 Table Builder menu showing the Imputation Flag fields.


Very few people don't give an age when they complete the form so in effect this indicates person non-response: the collector has identified an Occupied Dwelling but a completed form has not been received.  Although I didn't complete the form on-line (apart from the chaos of the Census night, I'm not sure that was an option for people in caravan parks) I would assume that the on-line data entry system would have been intelligent enough not to allow the form to be submitted without an age.  This should further reduce the (already limited) scope for imputation being required, except for form non-response.

Let's move on to some results.

Results

The first chart shows the age-imputation rates (ie number of records for which age was imputed as a percentage of the total number of records) for a hierarchy of areas (the "selected suburbs" are explained below).
It is interesting that NSW performs slightly better than Australia as a whole but I will pass over that for now.  My set of selected rural-residential State Suburbs perform a little worse than Queanbeyan-Palerang LGA as a whole. 
It is interesting that is NSW is split into Sydney and the rest the former has an imputation rate of 5.02% while the latter is at 5.89%.  For Victoria the contrast is even more evident: Melbourne 4.87%, Rest of the State 6.13%.  A more rigorous split into urban/rural is not possible until the full set of information is released later.

The next chart shows the individual State Suburbs of interest to me.  The three components of the Gazette catchment area are shown first, then Captains Flat (closely linked to the Gazette area) and finally the two more northerly Suburbs.
At first glance the folk of Hoskinstown are due to visit the Naughty Corner, while the denizens of Primrose Valley/Urila and Bywong get a large bouquet.

Discussion

It is important to realise that data is imputed where:
  • the Collector assesses that :
    • a dwelling is on a property and 
    • was occupied on Census Night; and
  • a form was not received for that dwelling.
If the Collector doesn't realise that a dwelling exists, or if the collector realises that a dwelling exists but considers that it was unoccupied on Census Night then data will not be imputed for that dwelling,  This causes particular difficulties in cases such as the Widgiewa/Whiskers Creek Rds where it seems that the Collector didn't visit houses but simply left the material in letterboxes.  For example:
1.     if there is a dwelling but no letterbox no dwelling record will exist.  (That was the case for our place on Census Night.  I am told that a number of other houses don't have a letterbox as residents use PO Boxes in town close to their work.)
2.     if there is a letterbox a dwelling record will be created even if there isn't a dwelling and 
1.     if the property owner visits the area on (eg) the weekend and removes the census material from the box it will probably be recorded as an occupied dwelling (and thus records imputed) but
2.     if  the census material is not taken away from the box it will probably be recorded as an unoccupied dwelling (and person records not imputed) even if the dwelling is occupied but the occupier can't be bothered cleaning the crud out of the letterbox (because the useful stuff goes to their urban PO Box) 
The importance of imputation is made explicit in the report of the Census Independent Assurance Panel (CIAP) where it is shown (Table 3.2.2) that the final under-enumeration rate for the Census is 1.0% being a balance between a 4.3% gross undercount of people on Census forms, a 1.3% gross overcount of people on Census forms and a net overcount of 2.1% of persons imputed (there is obviously a rounding effect in that sum).

Back in the day (1996, 2001) when I worked on the Census the results of the PES gave something like 1.6% gross undercount and 0.1% gross overcount.  We used to contrast this with the USA (who ran a Census in those days) and had something of the order of 8.5% undercount and 6.9% overcount but still claimed a net underenumeration rate of 1.6%.  Obviously Australia has a way to go to get to the US situation, but its all downhill.

Table 3.2.2 also shows raw numbers as well as rates.  This shows that the net overcount of imputed (aka invented) people was 490,174.  Now the number of imputed age records for Australia given in Table Builder is 1,287,265.  Comparing those two values shows that 38% of imputed records were in error.  (A sensation-seeking journalist would add "an amazing" before the 38%!)  What is the problem?  

Again the CIAP Report is very helpful.  In the text of section 3.2.2 they give 4 situations which explain the over-imputation:
1.     non-responding private dwellings were incorrectly deemed to be occupied on Census night; or 
2.     too many people were imputed into a (non-reporting) private dwelling that was correctly deemed to be occupied (the report notes this to be a small contribution); 
3.     People were incorrectly imputed into non-private dwellings on Census night, due to either: 
1.     an overestimate of the Census night occupancy of non-private dwellings, or 
2.     because people were counted a second time on a form at their private dwelling residence. 
Noting the views of the CIAP regarding case 2, and noting that there are no significant non-private dwellings in the Stoney Creek area (as far as I am aware) we are left with case 1 as being a possible cause of over-imputation in this area.   This would fit well with the idea of weekenders, where the property was actually unoccupied on Census night but the form was removed from the letterbox when the owner visited on the weekend and thus person records were incorrectly imputed. 
However this may simply balance out – for persons - cases in which the collector didn’t identify dwellings that were occupied (and in which the occupants gave up trying to get a form or log-in credentials through the help line).  Thus:
  •      the number of persons may be not too bad an estimate but
  •            a higher than expected number of records may have “not stated” for detailed characteristics which are not imputed. 
  •     As indicated in my discussion of the profile data in my earlier post it still looks as though the number of dwellings is somewhat understated

At a more general level the observation that Capital Cities appear to have a lower imputation rate than the rest of their States is intriguing.  In terms of the reasons for over-imputation offered by CIAP I would have thought that non-private dwellings were generally more evident in the big cities than the rest of the State (and this will be checked).  It would thus seem that the impact of incorrectly identifying unoccupied private dwellings as occupied is largely a rural issue.  This suggests to me that the mail-out –internet back approach worked well in major urban centres, but the traditional drop-off collector follow-up approach has been less successful in the rural areas.