Implementing Auto-Retry for Agent CLIs like Claude Code and Codex
Auto-retry for Agent CLIs like Claude Code and Codex should be implemented with a layered design to handle exceptions properly and avoid endless retry loops. This is because tasks can break halfway through and require consideration of the current context, content validity, and recovery strategies. A simple retry on error approach can lead to problems such as treating transient errors as final failures and replaying non-retryable errors. To implement auto-retry correctly, consider the content already output, the current context, and whether the failure is worth retrying. HagiCode's experience with integrating multiple Agent CLIs has shown that auto-retry is not just a button, but a complex design that requires careful consideration of these factors.