This opinion piece was originally posted on Tech At BloombergIt was contributed by Lucy C. Erickson, Natalie Evans Harris, and Meredith M. Lee, three members of Bloomberg's Data for Good Exchange (D4GX) community. All three authors contributed equally to this piece.

"Why a Data Science Code of Ethics is Good for (Your) Business." Register today.

Don't forget to join Natalie Evans Harris on April 10 for our webinar with General Assembly, "Why a Data Science Code of Ethics is Good for (Your) Business." Register today. 

With 2.5 quintillion records of data created every day, people are being defined by how they travel, surf the Internet, eat, and live their lives. We are in the midst of a “data revolution,” where individuals and organizations can store and analyze massive amounts of information. Leveraging data can allow for surprising discoveries and innovations with the power to fundamentally alter society: from applying machine learning to cancer research to harnessing data to create “smart” cities, data science efforts are increasingly surfacing new insights — and new questions.

Working with large databases, new analytical tools, and data-enabled methods promises to bring many benefits to society. However, “data-driven technologies also challenge the fundamental assumptions upon which our societies are built,” says Margo Boenig-Liptsin, co-instructor of UC Berkeley’s “Human Contexts and Ethics of Data” course. Boenig-Liptsin notes, “In this time of rapid social and technological change, concepts like ‘privacy,’ ‘fairness,’ and ‘representation’ are reconstituted.” Indeed, bias in algorithms may favor some groups over others, as evidenced by notorious cases such as the finding by MIT Researcher Joy Buolamwini that certain facial recognition software fails to work for those with dark skin tones. Moreover, lack of transparency and data misuse at ever-larger scales has prompted calls for greater scrutiny on behalf of more than 50 million Facebook users.

In an era where fake news travels faster than the truth, our communities are at a critical juncture, and we need to be having difficult conversations about our individual and collective responsibility to handle data ethically. These conversations, and the principles and outcomes that emerge as a result, will benefit from being intentionally inclusive.

What does responsible data sharing and use look like — for a data scientist, a parent, or a business? How are our socioeconomic structures and methods of interaction shaping behavior? How might we ensure that our technologies and practices are fair and unbiased?  

One idea that has gained traction is the need for a ‘Hippocratic Oath’ for data scientists. Just as medical professionals pledge to “do no harm,” individuals working with data should sign and abide by one or a set of pledges, manifestos, principles, or codes of conduct. At Bloomberg’s Data for Good Exchange (D4GX) in New York City on Sunday, September 24, 2017, the company announced a partnership with Data for Democracy and BrightHive to bring the data science community together to explore this very topic. More than 100 volunteers from universities, nonprofits, local and federal government agencies, and tech companies participated, drafting a set of guiding principles that could be adopted as a code of ethics. Notably, this is an ongoing and iterative process that must be community-driven, respecting and recognizing the value of diverse thoughts and experiences.

The group re-convened on February 6, 2018 at the inaugural D4GX event in San Francisco, again open to the public. Notable attendees included DataScience.com advisory board member DJ Patil, who served as the Chief Data Scientist of the United States from 2015 to 2017, Doug Cutting, co-creator of Hadoop and advocate for open source, as well as representation from the National Science Foundation-funded Regional Big Data Innovation Hubs and Spokes program. At this event, participants reviewed over 75 drafted ethics principles formulated by several working groups, with the goal of distilling a larger group of tenets into a streamlined set of principles for an ethics code.

Efforts such as Bloomberg’s D4GX can be situated in a growing movement, with increased interest in the ethical aspects of technology, particularly related to advances in data science and artificial intelligence (AI) systems. For example, AI Now 2017, IEEE, The Future of Life Institute, Metro Lab Network, the ACM, and the Oxford Internet Institute have all issued reports on these topics. Plus, Microsoft’s Brad Smith and Harry Shum published a book entitled “The Future Computed: Artificial Intelligence and its role in society” earlier this year.

A recent National Science Foundation grant focusing on the responsible use of big data was awarded to a group of researchers led by Julia Stoyanovich, an assistant professor at Drexel University, who has participated regularly in D4GX events in New York and San Francisco. The goal of this research project is to understand how legal and ethical norms can be embedded into technology, and to create tools that enable responsible collection, sharing, and analysis of data. These issues have also been a topic of discussion at multiple recent workshops. Earlier this month, a workshop at the National Academy of Sciences focused on ethics and data in the context of international research collaborations. Similarly, another recent workshop on fairness in machine learning aimed to identify key challenges and open questions that limit fairness, both in theory and in practice.

As noted in the AI Now 2017 report, there are powerful incentives for the commercial sector to disregard these initiatives in favor of business as usual. It is not clear how compliance and accountability could be incentivized, monitored, or enforced in both the public and private sectors, although new European Union regulations pertaining to data privacy will affect organizations globally beginning in May 2018. Both “top-down” regulations, as well as “grassroots” efforts, are increasingly raising questions about how we might define fairness, combat bias, and create ethics guidelines in data science and AI.

Yet, widespread adoption of ethical data collection and data analysis practices requires more than business penalties and awareness of these issues on the part of data science practitioners and the general public. Ultimately, data scientists and our broader community of data users must be equipped with the right tools and methodologies, and help each other leverage guidance effectively. Boenig-Liptsin notes, “We need to understand how our values shape our data tools and, reciprocally, how our data tools inform our values.” Successful efforts will require thoughtful and sustainable collaboration to apply insights and refine solutions.

We are seeing an increasing number of data practitioners and leaders stand up and speak about the questionable and often outright illegal collection, sharing, and use of sensitive data. For their voices to drive change, and for our society to truly harness the positive impacts of data innovation, while mitigating unintended consequences, we will need a collective effort. This effort needs to reach beyond academia and policymakers, to anyone who can contribute — from the public and private sectors.

Our community needs to collectively voice expectations for responsible data use, bringing data practitioners together to examine existing research and evidence. By creating environments, curricula, and tools that support community dialogue around ethics challenges, we can hope to translate findings into actionable principles — and to hold each other accountable. In addition to working with regulatory bodies, the shaping of social norms can transform these principles into enforceable standards for the responsible use of data. As Barbara C. Jordan, a former member of the U.S. House of Representatives from Texas, and Professor of Ethics at the University of Texas at Austin, eloquently stated in 1976:

“There is no executive order; there is no law that can require the American people to form a national community. This we must do as individuals, and if we do it as individuals, there is no President of the United States who can veto that decision… we must define the ‘common good’ and begin again to shape a common future.”

Fully harnessing the data revolution requires that we not only explore what can be done with data, but also that we understand the broader impacts of how any individual or organization’s contribution affects others. We should be having these conversations early and often, bringing in a diverse range of perspectives. We should be having these conversations not just at academic conferences and in tech and ethics courses, but around dinner tables, everywhere. 

Join the data ethics conversation on April 10 by registering for our webinar with General Assembly and Natalie Evans Harris, "Why a Data Science Code of Ethics is Good for (Your) Business."

Lucy C. Erickson is an AAAS Science & Technology Policy Fellow.

Natalie Evans Harris is the Co-Founder and Chief Operating Officer of BrightHive.io.

Meredith M. Lee is the Executive Director of the West Big Data Innovation Hub, based at the University of California at Berkeley. 

Note: The views expressed in this piece do not necessarily reflect the position or policy of the American Association for the Advancement of Science, the National Science Foundation, the University of California at Berkeley, or the federal government.

Author
Lucy C. Erickson, Natalie Evans Harris, and Meredith M. Lee