CAPDU Member wins ICPSR award

CAPDU member Wendy Watkins was named a 2013 winner of the William H. Flanigan Award, together with Daniel C. Tsang. Read the full ICPSR announcement here.

Congratulations to both of them, and the other award winners!

Advertisements

Axed sites not being archived

Wanted to highlight a post from the site economicjustice.ca, Axed federal advisory web sites  available on Archive.org, but not Library and Archives Canada. The vanished sites include the  First Nations Statistical Institute, National Council of Welfare, and the National Roundtable on the Environment and the Economy. See the economicjustice post for links and more details.

I’m quite unnerved by the increasingly ephemeral nature of what had been public government information. In the days when print information was routinely deposited in public and academic libraries, it would have been unthinkable for the government to later recall and shred those deposited items. Government document librarians would have been outraged.

Librarians, your documents are being shredded.

Edit: see also the datalibre.ca post Silencing the Archivists? Who does that? Canada does! for more on what is going on with Library and Archives Canada and why we need more outrage.

Toronto gives up on NHS

Interesting article on how the Toronto city government has decided not to use data from the NHS for planning purposes, deeming it too unreliable.

Favourite quote: on being asked if not being able to use census data will pose a problem, the head of the Social Research and Analysis unit responded “Only in the sense that the true indication of a neighbourhood outcome is how things change over time”.

Only?

Thanks to Tracey Lauriault for posting this to the CAPDU list!

Open Data: revamped portal and revised license

The Canadian government relaunched the data.gc.ca Open Data portal on Tuesday, together with a new version of the Open Government Licence. The licence is an improvement; among other things they have removed some inapplicable language inherited from the British licence it was based on. (Teresa Cassa’s post here has more information on the legal aspects.) Canada also signed the G8 Open Data Charter.

A striking amount of main page real estate space seems to be taken up by government promotional material – that’s become a feature on most sites under this government, but these seem especially large. The search function may have improved, though I didn’t use the old one heavily enough to be certain. Main page navigation / browsing is still lacking, though you can do a reasonable approximation of a subject browse using search filters. Tracey Lauriault of datalibre.ca spent some time looking around, so I’ll link you to her evaluation here.

Over all, I think the existence of an improved open data portal and licence constitutes a positive step, looked at separately from all this government’s other actions relating to data – something which I have trouble doing. (This article from the CBC has quotes from open government veterans Tracey Lauriault and David Eaves that encapsulate this unease better than I could.)

I think the G8 Charter has promise, however. I have doubts about what our current government will choose to release (or collect, or keep), and this doesn’t really alleviate them, though the commitment to release “high value data” (Annex 6.2) is worth noting. (Among other things, no apparent provision for archiving.) What it does do is lay out some guidelines for how data should be released, once the decision has been made to release them. The Technical Annex has some good language on open, machine-readable formats, APIs, documentation and metadata mapping.

This post from David Eaves goes into some more detail on the potential impact of those high value release commitments, and is definitely worth a read.

IASSIST, the Census, and a lot of imaginary Germans

The 2013 IASSIST conference was held a couple of weeks ago in Cologne, Germany, with a number of CAPDU members in attendance. The Canadian census situation was a  theme running through all of last year’s conference in Washington – the opening plenary was devoted to it, and it came up repeatedly in talks, in comments on sessions, during coffee breaks – like a sore point that we data people just couldn’t stop poking. This year talk had of course died down, though I noticed a few pointed comments on governments that interfere with data collection.

However, the closing plenary was of surprising relevance – perhaps more so than the speaker intended. The talk was on record linkage in the social sciences, and the speaker made some comments suggesting that censuses are obsolete (or soon will be), with administrative data collection and linkage being the future of social science analysis. (This was suggested last year as a possible direction for Canada to consider.) He pointed out that our host country Germany relied largely on administrative data – in 2011 they had conducted their first census since reunification. (Privacy issues can be a political minefield in Germany, for reasons which it should not be too difficult to extrapolate.)

Coincidentally, that same day the results of the 2011 German census were released. They found that administrative data had over-counted the country’s population by 1.5 million, or around 1.5 percent. They over-counted the population of foreign residents by 15 percent. The foreign residents accounted for a large share of the population drop, but there were still close to 500,000 mysteriously nonexistant German citizens.

I’ll admit I’m surprised by this myself – I’d assumed that administrative data would be at least as accurate as Census survey data.

Immigrant counts are off – by quite a bit

So far my favourite snippet from the reference material on the National Household Survey (pointed out by this Toronto Star article) is this, one, taken from here:

It is impossible to definitively determine how much the NHS may be affected by non-response bias. However, based on information from other data sources, evidence of non-response bias does exist for certain populations and for certain geographic areas…
(B)ased on the estimates and trends from the sources mentioned above, evidence suggests that the NHS estimate for the population born in the Philippines is overestimated at the national level. According to population estimates, the number of immigrants from the Philippines who entered Canada from January 2006 until June 2011 is 141,502, while the NHS estimate of the population born in the Philippines who immigrated between January 2006 and the survey date, May 10, 2011 is larger (152,270). As well, the population born in Pakistan is suggested to be underestimated… 

Also, that list of townships for which no data will be released? Includes 1814 places. That’s… quite a few.

The Star article quotes Industry Minister Christian Paradis saying the following:

“This was the first time a voluntary National Household Survey was undertaken,” Paradis said. “Our government will be looking at options to improve the quality and reliability of the data generated by the 2016 census cycle.”

Anyone have any advice for him?

National Household Survey and data supression

Statistics Canada has started releasing data from the National Household Survey (the infamous Census long form replacement), along with some information about what they won’t be releasing. This release includes data on Aboriginal Peoples and on Immigration and Ethnocultural Diversity.

Data was suppressed for areas with less than a 50% response rate. No data at the neighbourhood (census tract and census dissemination area) level has been released yet, but there is a list of Census subdivisions (towns) that are being suppressed.  Some of these have populations in excess of 2500, about the size of a census tract (and much larger than a dissemination area), making me wonder how many holes will be in the neighbourhood data and even if they will release it at all.

Here’s the list of suppressed towns:
http://www12.statcan.gc.ca/nhs-enm/2011/ref/sup_CSD-SDR-eng.cfm

Update: a researcher tells me that contacts at Statistics Canada have told him that dissemination area data will be available “at least as a custom tabulation.” Hmm.

Update #2: Tracey Lauriault at datalibre.ca has a nice summation of news stories on and reactions to the NHS release, including some very cogent criticisms.

Well, we all knew it could be bad. Now we’re seeing how bad.