Earlier this week researchers from the University of Melbourne released a report on the successful re-identification of Australian patient medical data that formed part of a de-identified open dataset.
In September 2016, the researchers were able to re-identify the longitudinal medical billing records of 10% of Australians, which equates to about 2.9 million people. The report outlines the techniques the researches used to re-identify the data and the ease at which this can be done with the right know-how and skill set (ie someone with an undergraduate computing degree could re-identify the data).
At first glance, the report exposes the poor handling of the dataset by the Department of Health. Which brings into focus the need for adequate contractual obligations regarding use and handling of personal information, and the need to ensure adequate liability protections are addressed even where the party’s intentions are for all personal information to be de-identified. The commercial risk with de-identified data has shown to be the equivalent of a dormant volcano.
In a digital world, where big data and open datasets can be easily accessed or created, careful consideration should be given to the privacy implications that arise when these datasets are published online.
Thus, on second glance, the report raises deeper and more complex issues concerning:
- the tension between the benefits of open data versus the protection of personal information;
- the release and use of data in a digital world; and
- the limitations of de-identification techniques.
Stay tuned for a deeper dive into the University of Melbourne’s report and an exploration of these issues for business compiling and handling big data in the new year!