Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature
A novel pipeline called MatMMExtract is introduced that processes compound scientific figures into individual panels and generates structured annotations using large language model…
Hugging Face · Daily Papers
·Subham Ghosh, Shubham Tiwari
·
·▲ 7 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Subham Ghosh, Shubham Tiwari, Mohammad Ibrahim, Abhishek Tewari
- 7 upvotes da comunidade
- Temas: end-to-end pipeline, large language model, materials science taxonomy, image-text pairs, vision-language learning, YOLO12-m detector
Resumo
Resumo original (em inglês), extraído do paper:
A novel pipeline called MatMMExtract is introduced that processes compound scientific figures into individual panels and generates structured annotations using large language models, creating a comprehensive dataset for vision-language learning in materials science.Onde ler
// relacionados
Leia também
Blog
Auditing Generalization in AI-Generated Video Detection: A Six-Control Protocol and the VidAudit Toolkit
Blog
3D HAMSTER: Bridging Planning and Control in Hierarchical Vision Language Action Models through 3D Trajectory Guidance
Blog
Dense Structural Priors for Sparse Functional Landmark Localization in Surgical Videos
Blog