Library
- 
            
  Towards Best Practices for Open Datasets for LLM TrainingJan. 13, 2025Stefan Baack, Stella Biderman, Aviya Skowron, Kasia OdrozekOpenness and AI / AI fairness, accountability, and transparencyBuilding on community insights from 30 AI dataset experts, this research paper distills best practices for creating open datasets for LLM training. The paper is a collaboration between Mozilla and EleutherAI. 
- 
            
  From Skin to Screen Bodily Intgrity in the Digital AgeNov. 20, 2024Júlia KeserűResearch examines the collection of “body-centric data” which has experienced a dramatic surge since the COVID-19 pandemic and the rise of sophisticated AI tools. 
- 
            
  Incomplete Chronicles: Unveiling Data Bias in Maternal HealthNov. 18, 2024Min’enhle NcubeAn ethnographic research into the datasets powering maternal healthcare app “DawaMom” used across Zambia and other Southern African countries 
- 
            
  Public AISept. 30, 2024Mark Surman, Nik Marda, Jasmine SunResponsible technologyA vision for a robust ecosystem of initiatives that promote public goods, public orientation, and public use throughout every step of AI development and deployment. 
- 
            
  Unveiling the Potential of a Conversational Agent in Developer Support: Insights from Mozilla’s PDF.js ProjectJuly 10, 2024Marco CastelluccioOpen sourceThis paper presents an investigation into whether AI can be leveraged to assist developers and guide open source community members. It introduces DevMentorAI, an LLM-based tool that uses a RAG approach to answer developer questions, which is then evaluated with a case study on Mozilla's PDF.js proje 
- 
            
  Strengthening Data Ecosystems in Indian SchoolsJune 10, 2024Manvi Parashar, Poorvi Yerrapureddy, Astha KapoorEducational data collection and analysis can both improve and introduce risks to India’s sprawling education system. Findings follow a nine-month investigation by Mozilla and Aapti Institute, supported by USAID. 
- 
            
  Towards a Framework for Openness in Foundation ModelsMay 21, 2024Mozilla, Columbia UniversityAI fairness, accountability, and transparencyThis paper presents a framework for grappling with openness across the AI stack. The paper surveys existing approaches to defining openness in AI models and systems, and then proposes a descriptive framework to understand how each component of the foundation model stack contributes to openness. 
- 
            
  Mind the Gap: What Working With Developers on Fuzz Tests Taught Us About Coverage GapsApril 24, 2024Christian Holler, Jason Kratzer, Andy Zaidman, Alberto Bacchelli, Marco Castelluccio, Carolin BrandtPrivacy, security & tracking
- 
            
  AI Intersections DatabaseApril 18, 2024Mozilla Insights, Kenrya RankinAI bias & discrimination / AI fairness, accountability, and transparency / Community building / Digital inclusion / LGBTQIA+The AI Intersections Database maps intersections between the key social justice and human rights areas of our time and documented AI impacts and their manifestations in society. 
- 
            
  Full Disclosure: Stress testing tech platforms’ ad repositoriesApril 16, 2024Claire Pershan, Amaury LesplingartAds transparency / ElectionsMozilla called on Check First to test the ad transparency tools maintained by 11 of the world's largest tech companies. These tools aim to let researchers, watchdogs, and members of the public understand how commercial communications influence the information space and affect society.