Private beta

An agentic workspace
for everything you
follow online.

Filter ingests every source you follow — RSS, newsletters, X, papers, podcasts, PDFs. It scores, tags, and summarizes so you can triage in minutes and engage with what matters.

Join waitlistNo credit card required
Search…⌘K
Feed
Anthropic·just now
Computer use 2.0: stronger DOM grounding, lower error rates
The new release improves browser grounding accuracy and cuts cascading errors on long action chains…
Stratechery·just now
Anthropic's interactive tools and the unbundling of agent platforms
What computer-use 2.0 implies for the agent runtime layer — and why frameworks are scrambling to integrate…
@karpathy·just now
Thread: tool-use latency, not model size, is the new bottleneck
The interesting frontier in agent perf is no longer parameter count — it's the round-trip cost of a single tool call…
Anthropic·9m ago
Demo: Claude using a browser end-to-end
Walkthrough of Computer Use 2.0 booking a flight, filling a form, and recovering from an unexpected modal…
Latent Space·2h ago
The state of agent evals: nobody has a good answer
WebArena, SWE-bench, OSWorld — every public benchmark has gaps. What labs use internally and why it matters…
#Agents#Evals
Import AI·10h ago
Issue 412 — agents, retrieval, and the eval gap
This week: Anthropic ships Computer Use 2.0, a new long-context retrieval paper from DeepMind, and notes on…
@swyx·16h ago
Why every agent framework lost the runtime war
The interesting takeaway from Computer Use 2.0 isn't the demo — it's that the runtime layer just got absorbed by the model provider…
Ykentonv·22h ago
Show HN: I built an MCP server for my homelab in a weekend
Six tools: SSH, log tail, container restart, DNS edit, status check, deploy. Claude can now operate the box conversationally…
GH
#MCP#Agents
@swyx·16h ago

Why every agent framework lost the runtime war

Thread

1/ The interesting takeaway from Anthropic's Computer Use 2.0 isn't the demo. It's that the runtime layer just got absorbed by the model provider.

2/ The frameworks built last year — LangGraph, CrewAI, Autogen — all assumed agents needed a wrapper. The wrapper handled tool calls, retries, planning, recovery. That was the moat.

3/ Computer Use 2.0 collapses most of that into the API. DOM grounding latency dropped 3.2× on the WebArena suite — but more importantly, recovery from unexpected modals/popups is now native.

4/ What's left for frameworks: orchestration across multiple models, eval harnesses, observability. The "agent runtime" itself is becoming a commodity layer.

5/ If you're building a vertical agent product, you almost certainly want to drop your runtime dependency and call the API directly. The abstraction was useful when capability was missing — now it's just latency.

Copilot
Why every agent framework…
What's actually new in Computer Use 2.0 vs. the prior release?
DOM grounding latency dropped 3.2× on WebArena, and recovery from unexpected modals is now native — both removing common failure modes that frameworks were patching around.
1DOM grounding latency dropped 3.2×…2recovery from unexpected modals…
Find me more on agent runtime trends.
searched 14 sources · added 3 items to your feed
Ask anything about this article…

Engage with what matters.

Private beta availability limited.

Join waitlist