Press Release

November 6, 2025

Mozilla Data Collective Redefines How AI Data Is Created, Shared, and Who Benefits

  • Mozilla Data Collective is a new platform for responsible data exchange—building an equitable AI ecosystem rooted in community, choice, and autonomy.
  • It’s the first social enterprise incubated by Mozilla Foundation, pioneering a radical alternative to how AI data is created and shared.
  • Communities can sell their datasets, define the terms of use, and have a direct input on which initiatives aligned with their values should benefit from their data.
  • The platform is an evolution of Mozilla’s Common Voice platform, the world’s largest public participation dataset, enabling anyone to build voice-enabled technologies.


BARCELONA, NOVEMBER 6, 2025–
Today marks the official launch of Mozilla Data Collective (MDC), a groundbreaking new platform for responsible AI data exchange that puts power back into the hands of people and communities. Backed by Mozilla Foundation, Mozilla Data Collective redefines how AI data is created, shared, and governed, introducing a new model for equitable data ownership and transparent value exchange in the AI era. The beta version of the platform debuts during the 15th Mozilla Festival, in Barcelona, marking a new chapter in Mozilla’s long-standing commitment to openness, trust, and community-driven technology.

“Data is power, and that power should belong to people and organizations who are creating that data,” said E.M. Lewis-Jong, Founder of Mozilla Data Collective. “We are building a future where data contributors are valued, where communities can choose who benefits from their data, and where data becomes a lever for collective good rather than entrenching power.”

This initiative comes at a critical moment for the AI industry, which continues to grapple with the challenge of accessing high-quality data sourced through responsible and equitable means. The lack of a proper infrastructure for fair data sharing hinders a healthy innovation ecosystem. Even more severely, it undercuts the crucial role data contributors play in the AI value chain. More often, data profits other organizations far removed from the communities that generate it.

MDC aims to change that. As the destination platform for community-led AI data, contributors directly control where their data is used, whether it’s to support research or innovative solutions that benefit their community.

“AI can help our languages, and we welcome its benefits, but data sovereignty matters. In Sub-Saharan Africa, this means fair collaboration between those who provide data and those who build services,” says Emmanuel Ngué Um, Université Yaoundé I, Basaa speaker.

The platform’s operational model prioritizes equity and fairness as data contributors retain full ownership of their datasets and can set a payment fee for their data. The full fee goes to the data owner. They can also choose to share the dataset freely under any open license. If a dataset is released by community for compensation, MDC applies a small 5% fee, charged to the data downloader, which is further reinvested to support the platform being free for contributors. This is a radical shift in AI data governance; one that champions choice, equity, and autonomy.

Backed by multi-million-dollar seed funding, MDC is the first social enterprise incubated by Mozilla Foundation. The platform challenges extractive data practices by giving contributors full control over how their data is used and who benefits from it. This won’t be a one-size-fits-all solution. “There are groups who wish to make their data available only for specific, values-aligned uses—such as research, education, or accountability—and those who seek fair compensation to reinvest into their communities. MDC is designed to support all of these approaches, offering flexibility while upholding a shared commitment to equity, transparency, and consent in AI data,” says Lewis-Jong.

For Nabiha Syed, Mozilla Foundation’s Executive Director, the launch of MDC reflects the Foundation’s long-standing commitment to further openness. “To reclaim the curiosity, creativity, and joy of the open web in the AI era, we need to build infrastructures that support and protect data agency. We believe that Mozilla Data Collective will play a significant role in the AI ecosystem by inspiring communities to utilize their data for good and reject the inevitable reality presented by a few tech companies who seek to monopolize the industry,” Syed says.

Building on the learnings from Common Voice

Mozilla Data Collective is more than a platform; it’s a movement toward equitable AI, where transparency, trust, and community benefit define the foundation of innovation. By shifting from extraction to collaboration, MDC envisions a world where communities are not just data sources but active partners in shaping the digital future.

Mozilla Data Collective’s approach builds on Mozilla’s Common Voice success as the world’s largest open, public participation speech dataset, with 30,000+ hours of speech data in 300 languages. Common Voice datasets have been used to build audio deepfake detection software, secure patient-doctor healthcare translation software, and a voice-enabled chatbot helping women in the Democratic Republic of Congo better understand and exercise their legal land rights.

“For too long, low-resource language communities were treated as data sources, not stakeholders. MDC is changing the game by putting us in the pilot's seat to share our voices on our own terms,” says Meesum Alam, Saraiki speaker

With over 1 million contributors over the past eight years, Common Voice has demonstrated that communities want to be part of building inclusive datasets, but it has also highlighted the lack of available choices. This is why MDC offers communities and institutions a more flexible, secure, and customizable infrastructure to share their data—beyond the “open or closed” binary often presented.

“Communities shouldn’t have to choose between erasure or exploitation,” says Lewis-Jong, “with MDC, we’re making it easy to do the right thing: share data responsibly, transparently, and for a society-wide benefit.”


Notes to the Editor

Mozilla Data Collective Core Features

  • Mozilla Data Collective Alpha version was launched on September 17, 2025 and the platform features over 300 datasets accessible in 286 languages.
  • All users are required to set up an account and agree to the data terms before access is granted.
    • Every dataset includes a guided onboarding datasheet that explains intended uses (e.g., research, non-profit use, etc.), geographical and industry specificity (for example, finance, agriculture, education in Africa, South America, etc), and ways to attribute/acknowledge the data source.
  • Data contributors and communities set the terms for how their datasets are utilized, in which contexts, and by whom. This means:
    • Data contributors get 100% of the paid fee for their data. No middle-actors, vendors, or brokers involved.
    • Data contributors retain the rights over the data.
    • Data downloaders pay a 5% charge in addition to the dataset fee to support the platform’s growth.
  • Mozilla Data Collective combines legal and technical safeguards to protect datasets from misuse and extractive practices.
    • This includes: custom data licenses, robust terms of service, slow consent UIs, authentication checks, time-based frictions, trust-based frictions, data poisoning, watermarking, log slide IDs, user-specific tagging, community moderation, legal remedies, and data release scans.

About E.M Lewis-Jong, Founder, Mozilla Data Collective

E.M Lewis-Jong is the Founder and Vice President of Mozilla Data Collective – the data platform that’s rebuilding the AI ecosystem with community needs at the center. They were previously the Director at Common Voice, the world’s largest open crowdsourced speech corpus, spanning 300+ languages and 750,000 open community contributors. EM has led language data research and community governance program backed by the US National Science Foundation, NVIDIA, Gates Foundation, and GIZ. Before Mozilla, they were a founding executive in CivTech, working on sharing information, data, and knowledge across borders. They have been on the NLP TAP for Lacuna Fund, an Industry Board Advisor for Speech Technologies at the University of Groningen, and an Advisor for the Caribbean Parliaments’ digitization project. Their first degree was from the University of Oxford (BA, MA), and their doctoral research on AI agent configuration is based at the University of Sussex's Creative Technology Lab.

About Mozilla Foundation: Mozilla Foundation is a global nonprofit dedicated to ensuring the internet remains open, inclusive, and equitable. Founded in 2003, it supports people-first technology through funding, advocacy, education, and research.

Rooted in the open-source movement and guided by the Mozilla Manifesto, Mozilla Foundation focuses on critical issue areas like ethical data practices, healthy digital ecosystems, and shifting digital power toward individuals and communities. Its work connects technologists, researchers, policymakers, and activists to reimagine and rebuild systems to serve the public good.

With over two decades of global impact, Mozilla Foundation continues to lead the movement for a better technology future—powered by people, and open by design. Learn more at mozillafoundation.org.