Agentica Releases: Timeline of Every Launch

February 2026

Agentica Feb 26, 2026

Benchmarks & Evals

ARC-AGI-3 public set result

Agentica claims to solve all public ARC-AGI-3 tasks

Agentica published a claim of solving all public ARC-AGI-3 tasks, adding to the week's theme of benchmark saturation. The panel discussed it alongside METR and ARC-AGI-2 results as part of weighing signal versus noise in headline benchmark leaps.

Agentica claim on X ↗

🎙️ Hear our coverage →

#benchmarks #reasoning

July 2025

Agentica Jul 3, 2025

New ModelsOpen weights

DeepSWE-Preview

DeepSWE-Preview hits 59% SWE-Bench Verified with pure RL on Qwen3-32B

Agentica and collaborators (with guest Michael Luo of UC Berkeley) released DeepSWE-Preview, a fully open-sourced RL-trained coding agent built on Qwen3-32B that reached 59% on SWE-Bench Verified, a top open result in a benchmark dominated by closed systems. The team published training methodology and weights, emphasizing reproducible reward design and verification over sealed benchmark numbers.

59% SWE-Bench Verified

Training write-up (Notion) ↗Hugging Face model ↗

🎙️ Hear our coverage →

#open-source #coding #agents

Agentica

February 2026

ARC-AGI-3 public set result

July 2025

DeepSWE-Preview

Get this every week