How to burst the AI bubble: Strike at its roots
Sci-fi author/tech journalist Cory Doctorow on his new book, The Reverse Centaur's Guide to Life After AI .
Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.
Sci-fi author/tech journalist Cory Doctorow on his new book, The Reverse Centaur's Guide to Life After AI .
arXiv:2606.20696v1 Announce Type: new Abstract: Decoding inner speech from non-invasive brain signals remains a fundamental challenge due to the absence of overt linguistic output, limited training data, and large inter-subject variability. Existing brain-to-text approaches often rely on task-specific decoder fine-tuning, which restricts scalability and complicates adaptation to new participants. We propose MindAlign, a decoupled two-stage brain-to-language framework that enables open-ended text...
arXiv:2606.20650v1 Announce Type: new Abstract: Instruction-based controllable speech synthesis enables users to specify emotions through natural language. However, existing approaches often rely on coarse emotion labels and lack explicit modeling of fine-grained intensity. We propose EmoInstruct-TTS, a dual-path instruction-guided framework for emotional speech synthesis. We introduce Emotion2embed, a supervised semantic-acoustic emotion embedding covering 48 emotional states, including fine-gr...
arXiv:2606.21098v1 Announce Type: new Abstract: Reliable evaluation of phrase break annotations is crucial, as subtle variations in prosodic boundaries directly affect the clarity and naturalness of speech. However, existing approaches exhibit major limitations: single-reference evaluation assumes a unique gold phrasing for an utterance despite multiple valid phrasings, while human judgment, though flexible, is labor-intensive and unscalable. To address these, we propose LLM-based Multi-Referenc...
A Hume AI lança o TADA (Text-Acoustic Dual Alignment), modelo TTS open-source com alinhamento 1:1 entre tokens de texto e voz — uma escolha arquitetural que torna fisicamente impossível omitir ou adicionar palavras, e produz áudio em tempo real.
Hot tubs sit at about 38 to 40 degrees Celsius, warm enough that most people can only soak for about 15 minutes. NVIDIA’s newest AI servers can run their cooling liquid even hotter — up to 45 degrees Celsius, or 113 degrees Fahrenheit. That higher temperature limit is precisely what makes them more energy efficient. […]
Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption during long-sequence OCR tasks, enabling efficient transcription of multiple pag…
Com 0,6 bilhão de parâmetros e desenho voltado a streaming, o modelo da NVIDIA tenta resolver o problema mais ingrato da transcrição automática — fazer aparecer a palavra enquanto ela ainda está sendo dita.
Dataset em destaque no Hugging Face — 10.6 mil downloads. AI Conference & Journal Papers Searchable metadata and full-text PDF mirrors for papers from top-tier AI venues (NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV…
Modelo de síntese de voz em alta no Hugging Face — 0 downloads e 165 curtidas da comunidade.
Dataset em destaque no Hugging Face — 3.4 mil downloads. ISCSLP 2026 CoT-TTS Dataset Dataset Overview This dataset is prepared for the ISCSLP 2026 CoT-TTS Challenge and is designed to support research on con…
A text-to-music generation system uses reward conditioning, expert iteration, and preference tuning to improve audio quality while maintaining efficiency within a 120M-parameter mo…