Skip to content
124 changes: 102 additions & 22 deletions learning/imitation/iil-dagger/README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,119 @@
# Imitation Learning using Dataset Aggregation
# Imitation Learning

## Introduction
In this baseline we train a small squeezenet model on expert trajectories to simply clone the behaviour of the expert.
Using only the expert trajectories would result in a model unable to recover from non-optimal positions ,Hence we use a technique called DAgger a dataset aggregation technique with mixed policies between expert and model.
This technique of random mixing would help the model learn a more general trajectory than the optimal one provided by the expert alone.

## Quickstart
1) Clone this [repo](https://github.com/duckietown/gym-duckietown):
In this baseline we train a small squeezenet model on expert trajectories to simply clone the behavior of the expert.
Using only the expert trajectories would result in a model unable to recover from non-optimal positions; Instead, we use a technique called DAgger: a dataset aggregation technique with mixed policies between expert and model.

$ git clone https://github.com/duckietown/gym-duckietown.git
## Quick start

2) Change into the directory:
Use the jupyter notebook notebook.ipynb to quickly start training and testing the imitation learning Dagger.

$ cd gym-duckietown
## Detailed Steps

3) Install the package:
### Clone the repo

$ pip3 install -e .
Clone this [repo](https://github.com/duckietown/gym-duckietown):

4) Start training:
$ git clone https://github.com/duckietown/gym-duckietown.git

$ python -m learning.imitation.iil-dagger.train
$ cd gym-duckietown

5) Test the trained agent specifying the saved model:
### Installing Packages

$ python -m learning.imitation.pytorch-v2.test --model-path ![path]
$ pip3 install -e .

## Training

## Acknowledgement
- We started from previous work done by Manfred Díaz as a boilerplate and we would like to thank him for his full support with code and answering our questions
$ python -m learning.imitation.iil-dagger.train

### Arguments

* --episode: number of episodes
* --horizon: number of steps per episode
* --learning-rate: index of learning rate from list [1e-1, 1e-2, 1e-3, 1e-4, 1e-5]
* --decay: mixing decay between expert and learner [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95]
* --save-path: directory used to save output model
* --map-name: name of the map used during the training
* --num-outputs: specify number of outputs from the learner model 1 to predict only angular velocity with fixed speed and 2 to predict both of them
* --domain-rand: flag to enable domain randomization to rbe able to transfer trained model to real world.
* --randomize-map: randomize training maps on reset

## Testing

$ python -m learning.imitation.iil-dagger.test

### Arguments

* --model-path: path of the model to be tested
* --episode: number of episodes
* --horizon: number of steps per episode

## Submitting
Use [Pytorch RL Template](https://github.com/duckietown/challenge-aido_LF-template-pytorch) and replace model with the model trained in model/squeezenet.py
and use the following code snippet to convert speed and angular velocity to pwm left and right.
``` Python
velocity, omega = self.compute_action(self.current_image)

# assuming same motor constants k for both motors
k_r = 27.0
k_l = 27.0
gain = 1.0
trim = 0.0

# adjusting k by gain and trim
k_r_inv = (gain + trim) / k_r
k_l_inv = (gain - trim) / k_l
wheel_dist = 0.102
radius=0.0318

omega_r = (velocity + 0.5 * omega * wheel_dist) / radius
omega_l = (velocity - 0.5 * omega * wheel_dist) / radius

# conversion from motor rotation rate to duty cycle
u_r = omega_r * k_r_inv
u_l = omega_l * k_l_inv

# limiting output to limit, which is 1.0 for the duckiebot
pwm_right = max(min(u_r, 1), -1)
pwm_left = max(min(u_l, 1), -1)

```

## Acknowledgment

* We started from previous work done by Manfred Díaz as a boilerplate, and we would like to thank him for his full support with code and answering our questions.

## Authors
- [Mostafa ElAraby ](https://www.mostafaelaraby.com/)
- [Linkedin](https://linkedin.com/in/mostafaelaraby)
- Ramon Emiliani
- [Linkedin](https://www.linkedin.com/in/ramonemiliani)

* [Mostafa ElAraby ](https://www.mostafaelaraby.com/)
+ [Linkedin](https://linkedin.com/in/mostafaelaraby)
* Ramon Emiliani
+ [Linkedin](https://www.linkedin.com/in/ramonemiliani)

## References
- Implementation idea and code skeleton based on Diaz Cabrera, Manfred Ramon (2018)Interactive and Uncertainty-aware Imitation Learning: Theory and Applications. Masters thesis, Concordia University.

```

@phdthesis{diaz2018interactive,
title={Interactive and Uncertainty-aware Imitation Learning: Theory and Applications},
author={Diaz Cabrera, Manfred Ramon},
year={2018},
school={Concordia University}
}

@inproceedings{ross2011reduction,
title={A reduction of imitation learning and structured prediction to no-regret online learning},
author={Ross, St{\'e}phane and Gordon, Geoffrey and Bagnell, Drew},
booktitle={Proceedings of the fourteenth international conference on artificial intelligence and statistics},
pages={627--635},
year={2011}
}

@article{iandola2016squeezenet,
title={SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size},
author={Iandola, Forrest N and Han, Song and Moskewicz, Matthew W and Ashraf, Khalid and Dally, William J and Keutzer, Kurt},
journal={arXiv preprint arXiv:1602.07360},
year={2016}
}
```
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ def _transform(self, observations, expert_actions):
]
)

observations = [compose_obs(observation).numpy() for observation in observations]
observations = [compose_obs(observation).cpu().numpy() for observation in observations]
try:
# scaling velocity to become in 0-1 range which is multiplied by max speed to get actual vel
# also scaling steering angle to become in range -1 to 1 to make it easier to regress
Expand All @@ -158,7 +158,7 @@ def _transform(self, observations, expert_actions):
]
except:
pass
expert_actions = [torch.tensor(expert_action).numpy() for expert_action in expert_actions]
expert_actions = [torch.tensor(expert_action).cpu().numpy() for expert_action in expert_actions]

return observations, expert_actions

Expand Down
6 changes: 3 additions & 3 deletions learning/imitation/iil-dagger/model/squeezenet.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def __init__(self, num_outputs=2, max_velocity=0.7, max_steering=np.pi / 2):
self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.model = models.squeezenet1_1()
self.num_outputs = num_outputs
self.max_velocity_tensor = torch.tensor(max_velocity).to(self._device)
self.max_velocity_tensor = torch.tensor([max_velocity]).to(self._device)
self.max_steering = max_steering

# using a subset of full squeezenet for input image features
Expand Down Expand Up @@ -117,12 +117,12 @@ def predict(self, *args):
output = self.model(images)
if self.num_outputs == 1:
omega = output
v_tensor = self.max_velocity_tensor.clone()
v_tensor = self.max_velocity_tensor.clone().unsqueeze(1)
else:
v_tensor = output[:, 0].unsqueeze(1)
omega = output[:, 1].unsqueeze(1) * self.max_steering
output = torch.cat((v_tensor, omega), 1).squeeze().detach()
return output
return output.cpu().numpy()


if __name__ == "__main__":
Expand Down
Loading