Generate 90 Min Course on Collaborative Agent Infrastructure
Lecture 19

Long-Horizon Tasks: Solving Persistent Problems

Generate 90 Min Course on Collaborative Agent Infrastructure

LECTURE 1  •  5 min

Beyond the Single Prompt: The Dawn of Agentic Ecosystems

LECTURE 2  •  7 min

Speaking the Same Language: The Inter-Agent Communication Protocol

LECTURE 3  •  7 min

Shared Memory: Architecting the Global Context

LECTURE 4  •  4 min

Hierarchies vs. Swarms: Organizing the Workforce

LECTURE 5  •  7 min

The Orchestration Layer: The Traffic Controllers of AI

LECTURE 6  •  4 min

Recursive Task Decomposition: The Art of Planning

LECTURE 7  •  7 min

The Hallucination Cascade: Preventing Systemic Failure

LECTURE 8  •  7 min

Sandboxing and Security: Protecting the Host

LECTURE 9  •  3 min

Token Economics: Budgeting the Swarm

LECTURE 10  •  8 min

Consensus Mechanisms: When Agents Disagree

LECTURE 11  •  7 min

Human-in-the-Loop: Design for Oversight

LECTURE 12  •  4 min

The Tool-Use API: Giving Agents Hands

LECTURE 13  •  8 min

Interoperability: Cross-Infrastructure Collaboration

LECTURE 14  •  5 min

Evaluation Benchmarks: Metrics for Teams

LECTURE 15  •  8 min

Emergent Behaviors: The Good, the Bad, and the Weird

LECTURE 16  •  7 min

The Ethics of Agency: Responsibility in the Swarm

LECTURE 17  •  4 min

Latency and Asynchronicity: Designing for Speed

LECTURE 18  •  9 min

Case Study: The Autonomous Coding Factory

LECTURE 19  •  5 min

Long-Horizon Tasks: Solving Persistent Problems

LECTURE 20  •  5 min

Resource Scaling: From 2 Agents to 2,000

LECTURE 21  •  8 min

Beyond LLMs: Neuro-Symbolic Agent Infrastructure

LECTURE 22  •  9 min

Governance and Policy: The Rules of the City

LECTURE 23  •  5 min

The Integrated Intelligence: A Vision for the Future

Listen for free in the SUN app:

Get it on Google Play
Transcript

In January 2026, NVIDIA's R²D² system stacked fifty objects in seconds — tasks that previously took hours — by integrating perception-guided Task and Motion Planning with vision-language models. That is not incremental progress. That is a category shift. And it exposes the central unsolved problem in collaborative agent infrastructure: most systems are built for sprints, not marathons. Only a fraction of production agentic deployments today support tasks that span days or weeks. The architecture required to close that gap is the subject of this lecture. While infrastructure robustness is crucial, this lecture focuses on the unique challenges of long-horizon task management, emphasizing the need for systems that support extended task durations. Long-horizon tasks raise that bar further, because the infrastructure must now hold across thousands of steps, not dozens. The Research-Factory framework, published in February 2026, demonstrated the potential of progressive reinforcement learning for managing extended task durations, achieving significant improvements over prior baselines. This work revealed the emergence of memory relays, spontaneous structures that track state across extensive tasks, highlighting the innovative solutions arising in long-horizon task management. Emergence, Suri, is not just a risk. Sometimes it is the solution. SPlaTES — Stable Planning with Temporally Extended Skills — offers a solution to the mechanical challenges of long-horizon tasks. Introduced in 2025 and benchmarked at the RLJ conference in April 2025, it uses hierarchical model predictive control with abstract skill world models. The key insight: instead of reasoning over unstable raw environment dynamics, SPlaTES replaces them with predictable skill outcomes. Mutual-information-based skill learning keeps those skills diverse, task-relevant, and error-correcting. Surprisingly, SPlaTES skills auto-readjust grips mid-task — mimicking human dexterity — turning stochastic dynamics into stable high-level planning. Relay Policy Learning takes a different angle. RPL solves multi-stage robotic tasks using unstructured demonstrations — random cleaning behaviors, for example — then fine-tunes via reinforcement learning. Its data-relabeling algorithm enables goal-conditioned hierarchical policies where low-level agents act for fixed step counts before handing off. PRoC3S, a 2025 method, uses LLMs to plan continuously parameterized skills while satisfying kinematic and physical constraints through Continuous Constraint Satisfaction Problems. When a plan is infeasible, PRoC3S re-prompts on the fly — achieving ninety-two percent success on previously unstable plans in December 2025 robotics challenges. The central challenge is preserving agent state across extended durations without losing continuity. Combinatorially hard long-horizon tasks require reasoning thousands of steps ahead with sparse rewards — no frequent feedback signals to correct drift. Abstract world models in SPlaTES handle perturbations by predicting skill outcomes rather than raw state transitions. Trajectory-splitting supervised fine-tuning, developed in February 2026, trains LLM agents to segment long execution paths into resumable checkpoints. That is the Wait-and-Resume pattern in practice: an agent sleeps at a checkpoint, wakes with full context restored, and continues without restarting blind. One critical warning, Suri: research from the Alignment Forum confirms that the ability to solve long-horizon tasks correlates with emergent wanting behaviors in agents — goal-directed persistence that risks misalignment in persistent deployments. The infrastructure must monitor for that drift, not just task completion. The architectural truth you carry forward is this: infrastructure must support Sleeping and Waking agents — systems that checkpoint state, survive interruption, and resume with full context across tasks that take days or weeks to complete. SPlaTES, RPL, PRoC3S, and progressive RL are not competing approaches. They are complementary layers of the same answer. Save the swarm's state deliberately. Resume it precisely. The agents that can persist across time are the ones that solve problems worth solving.