Four years ago, Netflix came up with what seemed like an ingenious plan to improve its movie recommendations algorithm: crowdsource the problem and award the best solution a $1 million prize. But the video rental and streaming company found out that anonymizing data isn't easy.
For the first edition of the "Netflix Prize" in 2006, the company released 100 million supposedly anonymized movie ratings. Each included a unique subscriber ID, the movie title, year of release and the date on which the subscriber rated the movie. Contestants were asked to develop an algorithm that was 10% better than Netflix' existing in predicting how subscribers rated other movies.
Just 16 days later, two University of Texas researchers announced that they had identified some of the Netflix users in the data set. In some cases, Arvind Narayanan and Vitaly Shmatikov were able to identify targets by matching their Netflix reviews with data from other sites like IMDb. More damningly, the found that if you knew a few movies a Netflix subscriber had rented in a given time period, you could reverse-engineer the data and find out the rest of their viewing history.
Despite the UT findings, Netflix continued the contest and named a $1 million winner. But when Netflix tried to launch another contest in 2009 -- with subscriber data including gender, zip code, and age -- the smackdown came in the form of a lawsuit. One plaintiff said she would be "irreparably harmed by Netflix's disclosure of her information."
That woman is a lesbian mother who is not open about her sexual orientation. She filed the suit as Jane Doe. As her legal filing put it: "To some, renting a movie such as Brokeback Mountain or even The Passion of the Christ can be a personal issue that they would not want published to the world."
After months of back-and-forth, Netflix called off the second contest and settled the lawsuit. Lead counsel Scott Kamber, of KamberLaw LLC, says he believes Netflix genuinely tried -- but failed -- to protect its users' privacy.
"The contest was clever, but they overlooked an aspect of privacy that I think we were able to get them to focus on," Kamber said. "Netflix was in its infancy, so really, it made a mistake while trying to do the right thing."