// radar de ia

Áudio & Voz

Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.

todas LLMs & Texto Geração de Imagem Visão Computacional Áudio & Voz Multimodal Dados & Embeddings Robótica & RL

Blog Áudio & Voz

How to burst the AI bubble: Strike at its roots

Sci-fi author/tech journalist Cory Doctorow on his new book, The Reverse Centaur's Guide to Life After AI .

23.06.2026

Blog LLMs & Texto

MindAlign: Decoding Inner Speech from fMRI Signals via Multimodal Embedding Alignment under Limited Data

arXiv:2606.20696v1 Announce Type: new Abstract: Decoding inner speech from non-invasive brain signals remains a fundamental challenge due to the absence of overt linguistic output, limited training data, and large inter-subject variability. Existing brain-to-text approaches often rely on task-specific decoder fine-tuning, which restricts scalability and complicates adaptation to new participants. We propose MindAlign, a decoupled two-stage brain-to-language framework that enables open-ended text...

23.06.2026

Blog LLMs & Texto

EmoInstruct-TTS: Dual-Path Instruction-Guided Emotional Speech Synthesis

arXiv:2606.20650v1 Announce Type: new Abstract: Instruction-based controllable speech synthesis enables users to specify emotions through natural language. However, existing approaches often rely on coarse emotion labels and lack explicit modeling of fine-grained intensity. We propose EmoInstruct-TTS, a dual-path instruction-guided framework for emotional speech synthesis. We introduce Emotion2embed, a supervised semantic-acoustic emotion embedding covering 48 emotional states, including fine-gr...

23.06.2026

Blog LLMs & Texto

LLM-Based Multi-Reference Evaluation for Efficient and Robust Assessment of Phrase Break Annotations

arXiv:2606.21098v1 Announce Type: new Abstract: Reliable evaluation of phrase break annotations is crucial, as subtle variations in prosodic boundaries directly affect the clarity and naturalness of speech. However, existing approaches exhibit major limitations: single-reference evaluation assumes a unique gold phrasing for an utterance despite multiple valid phrasings, while human judgment, though flexible, is labor-intensive and unscalable. To address these, we propose LLM-based Multi-Referenc...

23.06.2026

Editorial Áudio & Voz

TADA: o modelo de síntese de voz que elimina alucinações por design — e é 11 vezes mais rápido

A Hume AI lança o TADA (Text-Acoustic Dual Alignment), modelo TTS open-source com alinhamento 1:1 entre tokens de texto e voz — uma escolha arquitetural que torna fisicamente impossível omitir ou adicionar palavras, e produz áudio em tempo real.

22.06.2026

Blog Robótica & RL

Hotter Than a Hot Tub: The 45°C Breakthrough to Cool AI’s Biggest Machines

Hot tubs sit at about 38 to 40 degrees Celsius, warm enough that most people can only soak for about 15 minutes. NVIDIA’s newest AI servers can run their cooling liquid even hotter — up to 45 degrees Celsius, or 113 degrees Fahrenheit. That higher temperature limit is precisely what makes them more energy efficient. […]

22.06.2026

Paper LLMs & Texto

Unlimited OCR Works

Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption during long-sequence OCR tasks, enabling efficient transcription of multiple pag…

22.06.2026 ·▲ 9

Editorial Áudio & Voz

Nemotron 3.5 ASR: a NVIDIA aposta no pequeno para transcrever ao vivo

Com 0,6 bilhão de parâmetros e desenho voltado a streaming, o modelo da NVIDIA tenta resolver o problema mais ingrato da transcrição automática — fazer aparecer a palavra enquanto ela ainda está sendo dita.

21.06.2026

Dataset Dados & Embeddings

GenAI4ELab/papercli-papers

Dataset em destaque no Hugging Face — 10.6 mil downloads. AI Conference & Journal Papers Searchable metadata and full-text PDF mirrors for papers from top-tier AI venues (NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV…

20.06.2026 ·↓ 10551

Modelo Áudio & Voz

owensong/Inflect-Nano-v1

Modelo de síntese de voz em alta no Hugging Face — 0 downloads e 165 curtidas da comunidade.

19.06.2026

Dataset Geração de Imagem

HKUSTAudio/ISCSLP2026-CoT-TTS

Dataset em destaque no Hugging Face — 3.4 mil downloads. ISCSLP 2026 CoT-TTS Dataset Dataset Overview This dataset is prepared for the ISCSLP 2026 CoT-TTS Challenge and is designed to support research on con…

19.06.2026 ·↓ 3438

Paper Áudio & Voz

Improving Text-to-Music Generation with Human Preference Rewards

A text-to-music generation system uses reward conditioning, expert iteration, and preference tuning to improve audio quality while maintaining efficiency within a 120M-parameter mo…

19.06.2026

1 / 3 próxima →

27 itens no radar