Parcourir les auteurs et autrices

Towards Best Practices for Open Datasets for LLM Training

13 janvier 2025

Openness and AI / AI fairness, accountability, and transparency

Building on community insights from 30 AI dataset experts, this research paper distills best practices for creating open datasets for LLM training. The paper is a collaboration between Mozilla and EleutherAI.

Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI

6 février 2024

AI bias & discrimination / AI fairness, accountability, and transparency

Mozilla finds that Common Crawl's outsized role in the generative AI boom has improved transparency and competition, but is also contributing to biased and opaque generative AI models.

Bulletin de santé d’Internet 2022

18 juillet 2022

Santé d’Internet / Bulletin de santé d’Internet / Équité, responsabilité et transparence de l’IA

Une compilation annuelle de recherches et d’articles expliquant les clés d’un Internet plus sain. Dans cette édition, nous limitons notre attention à l’intelligence artificielle.

Stefan Baack

Dernières recherches

Towards Best Practices for Open Datasets for LLM Training

Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI

Bulletin de santé d’Internet 2022

Centre de recherches /

Parcourir les auteurs et autrices /

Stefan Baack

Dernières recherches

Towards Best Practices for Open Datasets for LLM Training

Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI

Bulletin de santé d’Internet 2022