// radar de ia

Robótica & RL

Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.

Blog LLMs & Texto

Nous Research's NousCoder-14B is an open-source coding model landing right in the Claude Code moment

Nous Research , the open-source artificial intelligence startup backed by crypto venture firm Paradigm , released a new competitive programming model on Monday that it says matches or exceeds several larger proprietary systems — trained in just four days using 48 of Nvidia's latest B200 graphics processors . The model, called NousCoder-14B , is another entry in a crowded field of AI coding assistants, but arrives at a particularly charged moment: Claude Code , the agentic programming tool from...

07.01.2026
RL without TD learning
Blog Robótica & RL

RL without TD learning

In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer . Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (which has scalability challenges ), and scales well to long-horizon tasks. We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning. Problem setting: off-policy RL Our problem setting is off-policy RL . Let’s briefly ...

01.11.2025
DeepSeek-V3 New Paper is coming! Unveiling the Secrets of Low-Cost Large Model Training through Hardware-Aware Co-design
Blog LLMs & Texto

DeepSeek-V3 New Paper is coming! Unveiling the Secrets of Low-Cost Large Model Training through Hardware-Aware Co-design

A newly released 14-page technical paper from the team behind DeepSeek-V3, with DeepSeek CEO Wenfeng Liang as a co-author, sheds light on the “Scaling Challenges and Reflections on Hardware for AI Architectures.” The post DeepSeek-V3 New Paper is coming! Unveiling the Secrets of Low-Cost Large Model Training through Hardware-Aware Co-design first appeared on Synced .

15.05.2025
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Blog LLMs & Texto

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated applications, where an LLM input contains a trusted prompt (instruction) and an untrusted data. The data may contain injected instructions to arbitrarily manipulate the LLM. As an example, to unfairly promote “Restaurant A”, its owner could use prompt injecti...

11.04.2025
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment
Blog Robótica & RL

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle "stop-and-go" waves , those frustrating slowdowns and speedups that usually have no clear cause but lead to congestion and significant energy waste. To train efficient flow-smoothing controllers, we built fast, data-driven simulations that RL agents interact with, learn...

25.03.2025
← anterior 15 / 15
175 itens no radar