Small edits

pfrommerd · pfrommerd · commit 3ebe76af1b7c · 2025-12-01T10:14:58.000-05:00
diff --git a/src/content/action-chunking.mdx b/src/content/action-chunking.mdx
@@ -15,6 +15,7 @@ Action-chunking is a popular practice in modern sequential modeling pipelines, w
  3. Improved representation learning via multi-step prediction.
  4. Simulating Model-Predictive Control.
 
+We show a different mechanism, one not described by the past literature: action-chunking can leverage the open-loop stability of the system to stabilize the learned policy.
 
 <Block title="Chunking Policy" type="Definition">
 A chunking policy is specified by a chunk-length $\ell$, and a chunking policy $\text{chunk}[\pi]: \mathcal{X} \to \mathcal{U}^{\ell}$ such that $\pi(\mathbf{x}_{1:t},\mathbf{u}_{1:t-1},t) = \text{chunk}[\pi](\mathbf{x}_{\ell k})_{t - \ell k}$ where $k = \lfloor \frac{t}{\ell}\rfloor$, i.e. we predict $\ell$-length sequences which are then executed "open-loop" without feedback from $\mathbf{x}$ until the chunk has been exhausted.
@@ -25,6 +26,8 @@ $$
 J_{\text{demo}}(\hat{\pi}_{\text{chunk}}) = \mathbb{E}_{\pi^\star}\left[ \sum_{k=1}^{(T-1)/\ell} \|\mathbf{u}^*_{1+(k-1)\ell:k\ell} - \text{chunk}[\hat{\pi}_{\text{chunk}}](\mathbf{x}^*_{(k-1)\ell})\|^2\right].
 $$
 
+We now formalize action-chunking for imitating deterministic expert policies:
+
 <Block type="Intervention 1" title="Learning over Chunked Policies">
 We sample $S_n$ i.i.d. trajectories drawn from the expert distribution $\mathcal{P}_{\text{demo}}$. Instead of learning $\hat{\pi}: \mathcal{X} \to \mathcal{U}$, we learn a $\ell$-chunked-policy $\text{chunk}[\hat{\pi}_{\text{chunk}}]: \mathcal{X} \to \mathcal{U}^\ell$, that attains low **on-expert error** $J_{\text{demo}}(\hat{\pi}_{\text{chunk}})$, e.g., by empirical risk minimization.
 </Block>
@@ -51,3 +54,5 @@ $$
 
 This implies that when the ambient dynamics $f$ are EISS, then a sufficiently chunked imitator policy will accrue limited compounding errors&mdash;**horizon-free**&mdash;relative to the on-expert error it sees.
 </Block>
+
+Our result follows from the following fact: under natural assumptions, the learners chunked policies are all closed-loop EISS. This circumvents the lower bound given earlier, in which it is hard for the learner to find policies which stabilize the dynamics if those policies must predict a single action at a time.
diff --git a/src/content/introduction.mdx b/src/content/introduction.mdx
@@ -49,7 +49,7 @@ We validate this finding in simulated robotic manipulation tasks from RoboMimic,
     <Figure src={`${BASE_URL}/figs/robomimic_traj100.svg`} alt="RoboMimic trajectory results"/>
     <Figure src={`${BASE_URL}/figs/robomimic_clean_vs_noise.svg`} alt="RoboMimic clean vs noise comparison"/>
   </HStack>
-  RoboMimic tool-hang task success, as a function of both prediction horizon and evaluated chunk length.
+  RoboMimic tool-hang task success, as a function of both prediction horizon and evaluated chunk length. Center: Chunk length ablation, 100 training trajectories. Right: Ablation on noise injection vs no noise injection, 50 training trajectories.
 </FigureEnv>
 
 
@@ -70,7 +70,7 @@ The effect of noise injection during demonstration collection for unstable envir
     <Figure src={`${BASE_URL}/figs/noise_inj_sweep_sigma.png`} alt="Noise injection sweep: sigma parameter" height="10rem"/>
     <Figure src={`${BASE_URL}/figs/noise_inj_sweep_alpha_sigma1.png`} alt="Noise injection sweep: alpha and sigma1" height="10rem"/>
   </HStack>
-  Mean accumulated reward for Half-Cheetah environment by timestep, with differing levels of noise injection.
+  Mean accumulated reward for Half-Cheetah environment by timestep, with differing levels of noise injection and using the clean expert actions vs noised expert actions for the training labels.
 </FigureEnv>
 
 For the adventurous reader, we will now introduce the general framework we use to make precise these fuzzy notions of stability and performance. This requires elements from Control Theory with which many Roboticists and RL theoristists may be unfamiliar with. We build up our analytical framework in a notation-light and broadly informal manner.