ReAcTree consistently outperforms baselines across diverse LLMs. Notably, on the WAH-NL dataset, ReAcTree+WM achieves a 61% Goal Success Rate (GSR) with Qwen 2.5 72B, nearly doubling the performance of ReAct+WM.
| Method | LLaMA 3.1 8B | LLaMA 3.1 70B | Qwen 2.5 7B | Qwen 2.5 72B | Mistral 7B | Gemma 2 9B | Phi-4-RP 14B | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GSR | SSR | GSR | SSR | GSR | SSR | GSR | SSR | GSR | SSR | GSR | SSR | GSR | SSR | |
| ZSP [19] | 1.00 | 13.03 | 0.00 | 14.42 | 0.00 | 8.98 | 0.00 | 14.22 | 0.00 | 11.65 | 1.00 | 13.87 | 17.90 | 0.00 |
| Tree-Planner (N=25) | 1.00 | 17.00 | 2.00 | 16.72 | 6.00 | 22.23 | 6.00 | 32.41 | 1.00 | 20.43 | 2.00 | 17.58 | 3.00 | 17.52 |
| Tree-Planner (N=50) | 4.00 | 21.85 | 4.00 | 23.43 | 8.00 | 28.10 | 9.00 | 36.03 | 6.00 | 23.63 | 3.00 | 23.30 | 4.00 | 20.40 |
| ReAct [56] | 8.00 | 34.25 | 30.00 | 57.05 | 10.00 | 31.82 | 26.00 | 51.38 | 6.00 | 28.18 | 9.00 | 37.20 | 33.00 | 48.13 |
| ReAct + WM | 16.00 | 42.65 | 33.00 | 63.15 | 13.00 | 39.73 | 31.00 | 54.05 | 9.00 | 31.95 | 11.00 | 39.93 | 33.00 | 51.28 |
| ReAcTree (Ours) | 21.00 | 51.98 | 32.00 | 60.58 | 18.00 | 50.20 | 48.00 | 75.13 | 11.00 | 37.92 | 26.00 | 60.43 | 49.00 | 67.47 |
| ReAcTree + WM (Ours) | 30.00 | 60.77 | 58.00 | 79.27 | 37.00 | 59.63 | 61.00 | 79.58 | 15.00 | 49.57 | 38.00 | 67.08 | 49.00 | 69.30 |
Table 1. Goal Success Rate (GSR) and Subgoal Success Rate (SSR) on WAH-NL dataset (%). Bold indicates the best result, and underline indicates the second best.
| Split | Method | LLaMA 3.1 70B | Qwen 2.5 72B |
|---|---|---|---|
| Valid-Seen | ReAct + WM | 33.31 | 37.07 |
| ReAcTree + WM | 40.00 | 40.85 | |
| Valid-Unseen | ReAct + WM | 32.40 | 39.10 |
| ReAcTree + WM | 37.03 | 39.83 |
Table 2. Goal Success Rate (GSR) on ALFRED dataset (%). ReAcTree demonstrates strong generalization to unseen environments.
@misc{choi2025reactree,
title={ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning},
author={Jae-Woo Choi and Hyungmin Kim and Hyobin Ong and Youngwoo Yoon and Minsu Jang and Dohyung Kim and Jaehong Kim},
year={2025},
eprint={2511.02424},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2511.02424},
}
}