Dataset LLMs & Texto

HuggingFaceFW/fineweb-edu

Dataset em destaque no Hugging Face — 411.9 mil downloads. 📚 FineWeb-Edu 1.3 trillion tokens of the finest educational data the 🌐 web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? 📚…

Hugging Face · Datasets ·HuggingFaceFW · ·↓ 411911 ·♥ 1161

O dataset HuggingFaceFW/fineweb-edu está entre os destaques do Hugging Face — dados que alimentam o treinamento e a avaliação dos modelos do momento.

  • 411.9 mil downloads
  • 1.2 mil curtidas

Sobre o dataset

📚 FineWeb-Edu 1.3 trillion tokens of the finest educational data the 🌐 web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? 📚 FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from 🍷 FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.

text-generation

Explorar o dataset no Hugging Face →

compartilhar: