Skip to content

Commit d437c3d

Browse files
committed
Rewrote noise injection
1 parent 1fefbed commit d437c3d

8 files changed

Lines changed: 77 additions & 49 deletions

File tree

src/components/Block.astro

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,5 @@ function capitalize(string) {
2929
color: var(--color-info);
3030
font-weight: bold;
3131
}
32-
.inline-title + :global(p) {
33-
display: inline;
34-
}
3532
</style>
3633

src/content/action-chunking.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,17 @@ Action-chunking is a popular practice in modern sequential modeling pipelines, w
1616

1717

1818
<Block title="Chunking Policy" type="Definition">
19-
A chunking policy is specified by a chunk-length $\ell$, and a chunking policy $\text{chunk}[\pi]: \mathcal{X} \to \mathcal{U}^{\ell}$ such that $\pi(\mathbf{x}_{1:t},\mathbf{u}_{1:t-1},t) = \text{chunk}[\pi](\mathbf{x}_{\ell\lfloor \frac{t}{\ell}\rfloor})_{t - \ell\lfloor \frac{t}{\ell}\rfloor}$, i.e. we predict $\ell$-length sequences which are then executed "open-loop" without feedback from $\mathbf{x}$ until the chunk has been exhausted.
19+
A chunking policy is specified by a chunk-length $\ell$, and a chunking policy $\text{chunk}[\pi]: \mathcal{X} \to \mathcal{U}^{\ell}$ such that $\pi(\mathbf{x}_{1:t},\mathbf{u}_{1:t-1},t) = \text{chunk}[\pi](\mathbf{x}_{\ell k})_{t - \ell k}$ where $k = \lfloor \frac{t}{\ell}\rfloor$, i.e. we predict $\ell$-length sequences which are then executed "open-loop" without feedback from $\mathbf{x}$ until the chunk has been exhausted.
2020
</Block>
2121
For convenience we also write $\text{chunk}[\pi](\mathbf{x}) = (\text{chunk}_1(\mathbf{x}),\dots, \text{chunk}_{\ell}(\mathbf{x}))$ and denote a chunking policy as $\hat{\pi}_{\text{chunk}}$. For chunked policies, our demonstration loss becomes:
2222

2323
$$
2424
J_{\text{demo}}(\hat{\pi}_{\text{chunk}}) = \mathbb{E}_{\pi^\star}\left[ \sum_{k=1}^{(T-1)/\ell} \|\mathbf{u}^*_{1+(k-1)\ell:k\ell} - \text{chunk}[\hat{\pi}_{\text{chunk}}](\mathbf{x}^*_{(k-1)\ell})\|^2\right].
2525
$$
2626

27-
**Intervention 1: Learning over Chunked Policies.** We sample $S_n$ as denote $n$ i.i.d. trajectories drawn from the expert distribution $\mathcal{P}_{\text{demo}}$. We aim to find $\hat{\pi}_{\text{chunk}}$ from a class of length-$\ell$ chunked policies, $\Pi_{\text{chunk}}$, that attains low **on-expert error** $J_{\text{demo}}(\hat{\pi}_{\text{chunk}})$, e.g., by empirical risk minimization.
27+
<Block type="Intervention 1" title="Learning over Chunked Policies">
28+
We sample $S_n$ i.i.d. trajectories drawn from the expert distribution $\mathcal{P}_{\text{demo}}$. Instead of learning $\hat{\pi}: \mathcal{X} \to \mathcal{U}$, we learn a $\ell$-chunked-policy $\text{chunk}[\hat{\pi}_{\text{chunk}}]: \mathcal{X} \to \mathcal{U}^\ell$, that attains low **on-expert error** $J_{\text{demo}}(\hat{\pi}_{\text{chunk}})$, e.g., by empirical risk minimization.
29+
</Block>
2830

2931
The Control-Theoretic intuition behind this intervention is that, by making the chunk length long enough, the learned policy $\hat{\pi}_{\text{chunk}}$ **inherits the open-loop stability of the dynamics $f$**.
3032

src/content/discussion.mdx

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,11 @@ import Refs from '../components/Refs.astro';
66

77
## Discussion and Limitations
88

9-
Our action-chunking guarantees rely on a structural assumption of $(\hat{\pi}, \hat{f}) \in \mathcal{P}$ being an EISS pair. We believe either explicitly enforcing this, e.g., via regularization or hierarchy, or attaining it indirectly via implicit biases, are interesting directions of inquiry.
9+
Our combined action-chunking, noise-injection procedure relies on a structural assumption of either $f$ or $f^{\pi^\star}$ being EISS.
1010

11-
We assume smoothness in the noise injection section, which is not strictly satisfied in some applications, such as in model-predictive control. We remark our lower bound depends on smoothness in $C_\pi$, which implies it is in some sense a fundamental aspect of noise-injection. However, we believe our results should extend to piece-wise notions, and note ongoing research exploring **smoothing** for learning in dynamical systems.
11+
Without either of these assumptions, if $f^{\pi^\star}$ is unstable, errors may always compound in the worst-case [@simchowitz2025pitfalls]. This setting is, to some degree, uninteresting for Imitation Learning, as it means that the expert is inherently bad and cannot correct from failure.
1212

13-
In general, we leave a sharp characterization of the role of smoothness and control-theoretic quantities in IL as an open problem. We also note though our theory suggests isotropic noise injection suffices, this may not be desirable in certain practical contexts, such as highly dexterous robotics. In light of our findings elucidating the precise role of noising, we leave designing robust practical recipes for perturbative data collection for future inquiry.
14-
15-
Lastly, we leave investigating the marginal benefit of **iterative** interaction as future work.
13+
For settings where an external oracle can stabilize the dynamics (e.g. a low-level position-based control loop), the dynamics can be reformulated such that $f$ is open-loop EISS. As such, we believe our results cover the full spectrum of situations where learning is reasonable.
1614

1715
{/*
1816
Include fake references (not shown) and then generate the whole
@@ -21,6 +19,7 @@ bibliography here.
2119

2220
<Refs show={false}>
2321
[@pomerleau1988alvinn]
22+
[@block2023provable]
2423
[@zhao2023learning]
2524
[@chi2023diffusion]
2625
[@simchowitz2025pitfalls]

src/content/experiments.mdx

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,6 @@ export const BASE_URL = import.meta.env.BASE_URL.replace(/\/+$/, '');
66

77
## Experimental Validation
88

9-
<FigureEnv>
10-
<HStack>
11-
<Figure src={`${BASE_URL}/figs/halfcheetah_chunk_sweep.png`} alt="Action chunking sweep results" />
12-
<Figure src={`${BASE_URL}/figs/halfcheetah_prop_sweep.png`} alt="Proportional sweep results" />
13-
<Figure src={`${BASE_URL}/figs/halfcheetah_sweep.png`} alt="HalfCheetah sweep results" />
14-
<Figure src={`${BASE_URL}/figs/humanoid_sweep.png`} alt="Humanoid sweep results"/>
15-
</HStack>
16-
</FigureEnv>
179

1810
### Action Chunking
1911

@@ -23,19 +15,29 @@ To validate our predictions about the **stability-theoretic** benefits of action
2315
- The merits of action-chunking remain showcased in **deterministic, state-based control**. This reveals that action-chunking still improves performance independently of partial observability or compatibility with generative control policies.
2416
- **End-effector control** enables the benefits of action-chunking. This is because end-effector control renders the closed-loop between system state and end-effector prediction incrementally stable. Hence, the low-level end-effector controller transforms imitating the position policy to taking place in an open-loop stable dynamical system, precisely the regime where we prescribe our AC guarantees.
2517

26-
### Noise Injection
27-
2818
<FigureEnv>
2919
<HStack>
30-
<Figure src={`${BASE_URL}/figs/noise_inj_sweep_noisy.png`} alt="Noise injection sweep: noisy trajectories"/>
31-
<Figure src={`${BASE_URL}/figs/noise_inj_sweep_sigma.png`} alt="Noise injection sweep: sigma parameter"/>
32-
<Figure src={`${BASE_URL}/figs/noise_inj_sweep_alpha_sigma1.png`} alt="Noise injection sweep: alpha and sigma1"/>
20+
<Figure src={`${BASE_URL}/figs/halfcheetah_chunk_sweep.png`} alt="Action chunking sweep results" />
21+
<Figure src={`${BASE_URL}/figs/halfcheetah_prop_sweep.png`} alt="Proportional sweep results" />
22+
<Figure src={`${BASE_URL}/figs/halfcheetah_sweep.png`} alt="HalfCheetah sweep results" />
23+
<Figure src={`${BASE_URL}/figs/humanoid_sweep.png`} alt="Humanoid sweep results"/>
3324
</HStack>
25+
We visualize performance as a function of noise injection and chunk length for the MuJoCo HalfCheetah environment, and show performance relative to both DAgger and DART on HalfCheetah, Humanoid.
3426
</FigureEnv>
3527

28+
### Noise Injection
29+
30+
3631
We seek to validate our hypotheses about the exploratory benefits of noise-injection. We propose experiments on MuJoCo continuous control environments, where we seek to imitate pre-trained expert policies. To summarize:
3732

3833
- **Noise injection as in Intervention 2 provides the exploration necessary to mitigate compounding errors**, increasing performance on par with iteratively interactive methods such as DAgger and DART. We note Intervention 2 collects data in one shot, without ever observing learned policy rollouts.
39-
- **Larger noise scales $\sigma_u$ (within tolerance) improve performance**, in contrast to prior understanding which necessitates $\sigma_u$ set proportional to $J_{\text{demo}}^T(\hat{\pi}; \mathcal{P}_{\text{demo}})$, i.e. very small for policies with low on-expert error.
34+
- **Larger noise scales $\sigma_u$ (within tolerance) improve performance**, in contrast to prior understanding which necessitates $\sigma_u$ set proportional to $J_{\text{demo}}(\hat{\pi}; \mathcal{P}_{\text{demo}})$, i.e. very small for policies with low on-expert error.
4035
- **A mixture of noise-injected and clean expert trajectories is beneficial**, and the difference is small when provided more data. This matches the theoretical intuition that noise-injection is necessary up until $\hat{\pi}$ is "locally stabilized" sufficiently well around $\mathbf{x}^*$, and thus only enters the trajectory error as a higher-order term.
4136

37+
<FigureEnv>
38+
<HStack>
39+
<Figure src={`${BASE_URL}/figs/noise_inj_sweep_noisy.png`} alt="Noise injection sweep: noisy trajectories"/>
40+
<Figure src={`${BASE_URL}/figs/noise_inj_sweep_sigma.png`} alt="Noise injection sweep: sigma parameter"/>
41+
<Figure src={`${BASE_URL}/figs/noise_inj_sweep_alpha_sigma1.png`} alt="Noise injection sweep: alpha and sigma1"/>
42+
</HStack>
43+
</FigureEnv>

src/content/noise-injection.mdx

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
1+
---
2+
bibliography: ./src/content/references.bib
3+
---
4+
15
import Block from '../components/Block.astro';
26
import Figure from '../components/Figure.astro';
37
import FigureEnv from '../components/FigureEnv.astro';
48
import HStack from '../components/HStack.astro';
9+
import Refs from '../components/Refs.astro';
510

611
export const BASE_URL = import.meta.env.BASE_URL.replace(/\/+$/, '');
712

@@ -13,32 +18,47 @@ We now consider the difficult setting where the ambient dynamics $f$ may not be
1318
We define the **expert distribution under noise injection** as the distribution $\mathcal{P}_{\text{exp},\sigma}$ over trajectories $(\tilde{\mathbf{x}}_t, \tilde{\mathbf{u}}_t)_{t\geq 1}$ with $\tilde{\mathbf{x}}_1 \sim D$, and $\tilde{\mathbf{u}}_t = \pi^*(\tilde{\mathbf{x}}_t),\;\tilde{\mathbf{x}}_{t+1} = f(\tilde{\mathbf{x}}_t, \tilde{\mathbf{u}}_t + \sigma_u \mathbf{z}_t)$ for $t \geq 1$, where $\mathbf{z}_t \sim \text{Unif}(\mathbb{B}^{d_u}(1))$ is drawn uniformly over the unit ball.
1419
</Block>
1520

21+
Our key innovation over prior algorithms such as DAgger or DART is that we learn using a weighted **mixture** of both the noise-injected $\mathcal{P}_{\text{exp},\sigma}$ and the "vanilla" expert data distribution $\mathcal{P}_{\text{exp}}$.
22+
23+
Using a mixture is *provably better*, particularly in the high data regime with large $n$. This is an intuitive result: when $J_{\text{imitation}}$ is already low, using demonstrations with the fixed noise level $\sigma$, i.e. $\mathcal{P}_{\text{exp},\sigma}$ may explore *too* much and has low coverage on $\mathcal{P}_{\text{exp}}$.
24+
25+
1626
<Block title="Exploratory Data Collection" type="Intervention">
1727
For the noise-injected distribution $\mathcal{P}_{\text{exp},\sigma}$ defined above, provide a sample $S_{n,\sigma,\alpha}$ of trajectories, where for $1 \le i \le \lfloor\alpha n\rfloor$ the trajectories are i.i.d. from $\mathcal{P}_{\text{exp}}$, and the remaining trajectories are drawn i.i.d. from $\mathcal{P}_{\text{exp},\sigma}$. Define the corresponding mixture distribution $\mathcal{P}_{\text{exp},\sigma,\alpha} \triangleq \alpha \mathcal{P}_{\text{exp}} + (1-\alpha)\mathcal{P}_{\text{exp},\sigma}$. We then find $\hat{\pi}$ that attains low $J_{\text{demo}}^T(\hat{\pi}; \mathcal{P}_{\text{exp},\sigma,\alpha})$, e.g., by empirical risk minimization.
1828
</Block>
1929

30+
2031
<FigureEnv>
2132
<HStack>
2233
<Figure src={`${BASE_URL}/figs/exploration_diagram.svg`} alt="Exploratory data collection via noise injection"/>
2334
</HStack>
35+
We can think of the data mixture as ensuring coverage both on-expert, as well as a "tube" around the expert trajectories. Using either one or the other is suboptimal, either due to lack of on-expert data, or off-expert data.
2436
</FigureEnv>
2537

26-
<Block type="Key Result">
27-
Let the expert policy and true dynamics $(\pi^*, f)$ be $(C_\pi,C_{\text{smooth}})$-smooth, respectively, and all policies $\pi$ are $L_\pi$-Lipschitz. The closed-loop system induced by $(\pi^*, f)$ is $(C_{\text{ISS}}, \rho)$-EISS. Let $\hat{\pi}$ be a $L_\pi$-Lipschitz, $C_\pi$-smooth policy. Then, for $\sigma_u \lesssim O^*[\text{poly}(1/C_\pi, 1/C_{\text{smooth}})] = O^*(1)$, we have:
38+
Our results in this domain make extensive use of the analysis tools introduced in @pfrommer2022tasil, which provides strong guarantees when imitating a closed-loop EISS expert in an adversarial manner.
2839

40+
There are many technical subtleties that we gloss over here but explore in-detail in our full manuscript. Namely our analysis is carefully constructed to consider coverage only on the manifold of reachable states. To perform this analysis in a technically rigorous requires careful Control-Theoretic analysis involving concepts such as the Controllability Grammian. We additionally make several simplifying assumptions regarding first-order-smoothness (i.e that $f, \pi^\star$ are differentiable with $C_{\text{smooth}},$ $C_{\pi}$-Lipschitz derivatives, respectively).
41+
42+
<Block type="Key Result">
43+
Let the dynamics and expert policy $(f, \pi^*)$ be $(C_{\text{smooth}}, C_{\pi})$-smooth, respectively, and all policies $\pi$ are $L_\pi$-Lipschitz. Assume that the closed-loop system induced by $(f, \pi^*)$, $f^{\pi^\star}$, is $(C_{\text{ISS}}, \rho)$-EISS. Let $\hat{\pi}$ be a $L_\pi$-Lipschitz, $C_\pi$-smooth policy. Then, for any $n, T$ and,
44+
$$
45+
\sigma_u \lesssim O^*[\text{poly}(1/C_\pi, 1/C_{\text{smooth}})] = O^*(1),
46+
$$
47+
we have:
2948
$$
3049
J_{\mathrm{imitation}}(\hat{\pi}) \lesssim O^*(T) \sigma_u^{-2} J_{\text{demo}}^T(\hat{\pi}; \mathcal{P}_{\text{exp},\sigma,\alpha}[0.5]).
3150
$$
32-
33-
In particular, setting $\sigma_u = O^*(1)$, we have:
34-
51+
In particular, setting $\sigma_u = O^*(1)$ (i.e. some $n,T$-independent constant), we have:
3552
$$
3653
J_{\mathrm{imitation}}(\hat{\pi}) \lesssim O^*(T) J_{\text{demo}}^T(\hat{\pi}; \mathcal{P}_{\text{exp},\sigma,\alpha}[0.5]).
3754
$$
3855
</Block>
3956

40-
<FigureEnv>
41-
<HStack>
42-
<Figure src={`${BASE_URL}/figs/noising_manifold.svg`} alt="Effect of noise injection for controllable versus uncontrollable subspaces"/>
43-
</HStack>
44-
</FigureEnv>
57+
58+
{/*
59+
* Capture and hide the auto-generated bibliography
60+
* for this markdown fragment.
61+
*/}
62+
<Refs show={false}>
63+
[^ref]
64+
</Refs>

0 commit comments

Comments
 (0)