Why we need data minimization safeguards now (and how to do it)

Leer en español.

Online entities have largely been allowed to collect any data they want about any person and use it for any purpose, so long as they are transparent about those practices. These practices, however, can cause extensive harm to people, including data-driven discrimination. Access Now’s new report Data minimization: Key to protecting privacy and reducing harm explores ways to combat these types of abuses by limiting the amount of information entities collect online. 

Data minimization is the concept that companies should collect only the data necessary to provide their product or service, and nothing more. Unfortunately, that is not what typically happens. Data mining has become commonplace as storage has become cheaper and the internet has become more pervasive. Without safeguards, organizations will collect more data than they need, and the potential for causing real harm to people will increase. Data minimization requirements seek to address this problem at its core: data not collected cannot be used to harm people.

“Data minimization is fundamental to the right to privacy,” said Eric Null, U.S. Policy Manager at Access Now. “For too long, we have allowed those harvesting data to set the rules about how, when, and where to collect and how to use deeply personal information. Particularly with regard to behavioral advertising and training machine learning systems, online entities essentially do whatever they want with little regard for protecting human rights. It’s time to address the abuse by passing strong data minimization requirements in a federal privacy law.”

The report includes the following recommendations for lawmakers, software developers, and others engaged in data minimization policies:

  • In the context of a strong data protection framework, allow organizations to collect data on protected classes for the purpose of civil rights auditing or to benefit underrepresented populations.
  • Regulators that do not ban behavioral advertising should at minimum require limits to the data collected for this purpose.
  • Machine learning (ML) developers should adopt a method to perform data minimization for ML models that minimizes the effects on model performance and safeguards privacy rights.

“Race-based data has been used to undermine Black people’s health, character, and right to equal opportunity in the United States,” said Willmary Escoto, U.S. Policy Analyst at Access Now. “In the absence of a comprehensive law protecting people from exploitative data collection, marginalized communities will continue to suffer the consequences of algorithmic racism. Data regarding protected classes should be collected only in limited circumstances where the intended use is to benefit the protected class or to audit systems for racial bias.”

Read the full report.