Task-Decoupled Agents: Breaking the Monolithic Context Problem

A recent paper from Tsinghua and Renmin University caught my attention: Task-Decoupled Planning for Long-Horizon Agents. The core insight resonated with a problem I’ve been wrestling with on a Java code modernization project at work, and I think the pattern deserves more attention.

The Entangled Context Problem#

Most LLM agent architectures fall into two camps: step-wise planning (reactive, one step at a time) or one-shot planning (generate the whole plan upfront). Both share a fatal flaw: they maintain a monolithic reasoning context that spans the entire task history.

This creates what the paper calls “entangled contexts.” As the agent works through a complex task, every decision carries the full weight of everything that came before. Local errors cascade into global failures. The cognitive load grows linearly with task complexity. Recovery from mistakes becomes computationally expensive because you’re reasoning over the entire history just to fix one thing.

Anyone who has watched an agent lose the plot halfway through a complex task knows this feeling intimately.

The Task-Decoupled Approach#

Task-Decoupled Planning (TDP) proposes a simple structural fix: decompose the task into a directed acyclic graph (DAG) of sub-goals, then isolate each sub-goal’s context.

The architecture has three components:

Supervisor: Breaks the high-level task into sub-goals with explicit dependencies (the DAG structure)
Planner: Generates plans for active sub-tasks, working only within scoped contexts
Executor: Executes steps while confining its reasoning to the current sub-task

The key insight is that sub-task isolation prevents error propagation. When something fails in sub-task 3, you don’t need to re-reason over sub-tasks 1 and 2. The boundaries are explicit.

Applying This to Java Modernization#

I’ve been working on a code modernization application that transforms legacy Java applications. The task involves analyzing existing code, identifying modernization opportunities, generating updated code, and validating the changes compile and pass tests.

Originally, my agent architecture maintained full conversation context across the entire modernization of a service. By the time the agent had analyzed the codebase and completed a few refactoring operations, the context window was bloated with analysis results, transformation logs, and validation outputs. Later operations suffered as the agent struggled to keep everything straight.

Restructuring around task decoupling changed the approach:

A supervisor agent analyzes the codebase and identifies discrete modernization operations (migrate to new logging framework, update dependency injection patterns, replace deprecated APIs, etc.)
Each operation gets its own isolated planning context
Automated refactoring tools execute each operation, with the agent only seeing the relevant context for the current task

The results mirror what the paper reports: better accuracy on individual transformations, and dramatically reduced token consumption because each sub-task carries only its relevant context rather than the entire history.

Why DAGs (Or Not)#

The paper uses a directed acyclic graph to capture dependency relationships between sub-tasks. This enables dependency-aware scheduling (only execute tasks whose prerequisites are complete) and parallel execution of independent branches.

For my Java modernization project, I opted for a simpler linear sequence. The reason is practical: you generally don’t want two refactoring operations modifying the same codebase concurrently. Even if two operations target different concerns, having parallel tools rewriting code leads to merge conflicts, inconsistent intermediate states, and chaos. Sequential execution with clear handoffs is more predictable.

What I kept from the paper’s approach was the context isolation. Each refactoring operation gets its own scoped context containing only its specification and the outcomes of prior operations. The agent never sees the full history of every analysis and transformation that came before.

The DAG structure matters if your sub-tasks are truly independent and can execute in parallel. For code modification, the real win is context scoping, not parallelization.

The 82% Token Reduction#

The paper reports up to 82% reduction in token consumption compared to baselines. This number resonated with my experience. When each sub-task only needs to see its own context plus explicit outputs from dependencies, you eliminate the redundant re-processing of historical context.

For a project running many agent invocations, this translates directly to cost and latency improvements.

Enabling Self-Hosted LLMs#

The token reduction has a second-order benefit that matters for some deployments: it makes self-hosted models viable.

With the all-at-once approach, context windows grew large enough that only frontier models could handle them reliably. Claude and GPT-4 managed fine, but locally-running models choked on the accumulated context. This created an uncomfortable dependency on external APIs for what could otherwise be an air-gapped, self-hosted solution.

With subgoal-local contexts, each operation sees a focused slice of information. The context stays small enough that capable open-weight models can complete the modernization tasks successfully. For organizations with data sovereignty requirements or cost constraints on API usage, this is the difference between “possible” and “not possible.”

Practical Takeaways#

If you’re building agents for complex, multi-step tasks:

Identify natural task boundaries where context can be isolated
Make dependencies between sub-tasks explicit rather than implicit in the conversation flow
Design for sub-task recovery without requiring full re-planning
Consider whether your task structure is actually a DAG rather than a linear sequence

The paper’s approach is training-free, meaning you can apply these architectural patterns to any capable LLM without fine-tuning. That makes it immediately practical for production systems.

The full paper is worth reading for the formal framework and benchmark results: Task-Decoupled Planning for Long-Horizon Agents.