HuggingFaceFW/fineweb-edu
Dataset em destaque no Hugging Face — 411.9 mil downloads. 📚 FineWeb-Edu 1.3 trillion tokens of the finest educational data the 🌐 web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? 📚…
O dataset HuggingFaceFW/fineweb-edu está entre os destaques do Hugging Face — dados que alimentam o treinamento e a avaliação dos modelos do momento.
- 411.9 mil downloads
- 1.2 mil curtidas
Sobre o dataset
📚 FineWeb-Edu 1.3 trillion tokens of the finest educational data the 🌐 web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? 📚 FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from 🍷 FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.
text-generation
Leia também
How Businesses Are Building Specialized AI They Can Trust
Fika Jobs raises $4M to build a video-first hiring platform where AI agents interview candidates
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness