Blog LLMs & Texto Geração de Imagem

ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

arXiv:2606.23835v1 Announce Type: new Abstract: ABACUS is a unified vision-language model that handles object counting, crowd counting, referring-expression counting, and count-faithful image generation without any benchmark-specific training required. Our model is built on existing 3B-parameter unified foundation model and is adapted for object localization tasks using three key innovations: density-aware adaptive zooming with objectness maps for spatial grounding; a boundary-aware count policy...

arXiv cs.CV ·Anindya Mondal, Sauradip Nag, Anjan Dutta · 24 de janeiro de 2026

Ver no Hugging Face

// relacionados

ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

Leia também

Europe is pushing back on Washington’s chip war

Comfy-Org/Krea-2

Cerebras stock plunges after earnings as CEO says margin outlook was misunderstood

OpenAI and Broadcom announce chip designed for LLM inference at scale