duckietown · mostafaelaraby · Feb 13, 2020 · Feb 13, 2020 · Feb 14, 2020 · Oct 31, 2020
diff --git a/learning/imitation/iil-dagger/README.md b/learning/imitation/iil-dagger/README.md
@@ -1,39 +1,119 @@
-# Imitation Learning using Dataset Aggregation
+# Imitation Learning
 
 ## Introduction
-In this baseline we train a small squeezenet model on expert trajectories to simply clone the behaviour of the expert.
-Using only the expert trajectories would result in a model unable to recover from non-optimal positions ,Hence we use a technique called DAgger a dataset aggregation technique with mixed policies between expert and model.
-This technique of random mixing would help the model learn a more general trajectory than the optimal one provided by the expert alone.
 
-## Quickstart
-1) Clone this [repo](https://github.com/duckietown/gym-duckietown):
+In this baseline we train a small squeezenet model on expert trajectories to simply clone the behavior of the expert.
+Using only the expert trajectories would result in a model unable to recover from non-optimal positions; Instead, we use a technique called DAgger: a dataset aggregation technique with mixed policies between expert and model.
 
-    $ git clone https://github.com/duckietown/gym-duckietown.git
+## Quick start
 
-2) Change into the directory:
+Use the jupyter notebook notebook.ipynb to quickly start training and testing the imitation learning Dagger.
 
-    $ cd gym-duckietown
+## Detailed Steps
 
-3) Install the package:
+### Clone the repo
 
-    $ pip3 install -e .
+Clone this [repo](https://github.com/duckietown/gym-duckietown):
 
-4) Start training:
+$ git clone https://github.com/duckietown/gym-duckietown.git
 
-    $ python -m learning.imitation.iil-dagger.train
+$ cd gym-duckietown
 
-5) Test the trained agent specifying the saved model:
+### Installing Packages
 
-    $ python -m learning.imitation.pytorch-v2.test --model-path ![path]
+$ pip3 install -e .
 
+## Training
 
-## Acknowledgement
-- We started from previous work done by Manfred Díaz as a boilerplate and we would like to thank him for his full support with code and answering our questions
+$ python -m learning.imitation.iil-dagger.train
+
+### Arguments
+
+* --episode: number of episodes
+* --horizon: number of steps per episode
+* --learning-rate: index of learning rate from list [1e-1, 1e-2, 1e-3, 1e-4, 1e-5]
+* --decay: mixing decay between expert and learner [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95]
+* --save-path: directory used to save output model
+* --map-name: name of the map used during the training
+* --num-outputs: specify number of outputs from the learner model 1 to predict only angular velocity with fixed speed and 2 to predict both of them
+* --domain-rand: flag to enable domain randomization to rbe able to transfer trained model to real world.
+* --randomize-map: randomize training maps on reset
+
+## Testing
+
+$ python -m learning.imitation.iil-dagger.test
+
+### Arguments
+
+*  --model-path: path of the model to be tested
+* --episode: number of episodes
+* --horizon: number of steps per episode
+
+## Submitting 
+Use [Pytorch RL Template](https://github.com/duckietown/challenge-aido_LF-template-pytorch) and replace model with the model trained in model/squeezenet.py
+and use the following code snippet to convert speed and angular velocity to pwm left and right.
+``` Python
+velocity, omega = self.compute_action(self.current_image) 
+
+# assuming same motor constants k for both motors
+k_r = 27.0
+k_l = 27.0
+gain = 1.0
+trim = 0.0
+
+# adjusting k by gain and trim
+k_r_inv = (gain + trim) / k_r
+k_l_inv = (gain - trim) / k_l
+wheel_dist = 0.102
+radius=0.0318
+
+omega_r = (velocity + 0.5 * omega * wheel_dist) / radius
+omega_l = (velocity - 0.5 * omega * wheel_dist) / radius
+
+# conversion from motor rotation rate to duty cycle
+u_r = omega_r * k_r_inv
+u_l = omega_l * k_l_inv
+
+# limiting output to limit, which is 1.0 for the duckiebot
+pwm_right = max(min(u_r, 1), -1)
+pwm_left = max(min(u_l, 1), -1)
+
+```
+
+## Acknowledgment
+
+* We started from previous work done by Manfred Díaz as a boilerplate, and we would like to thank him for his full support with code and answering our questions.
 
 ## Authors
-- [Mostafa ElAraby ](https://www.mostafaelaraby.com/)
-	- [Linkedin](https://linkedin.com/in/mostafaelaraby)
--  Ramon Emiliani
-	- [Linkedin](https://www.linkedin.com/in/ramonemiliani)
+
+* [Mostafa ElAraby ](https://www.mostafaelaraby.com/)
+  + [Linkedin](https://linkedin.com/in/mostafaelaraby)
+* Ramon Emiliani
+  + [Linkedin](https://www.linkedin.com/in/ramonemiliani)
+
 ## References
-- Implementation idea and code skeleton based on Diaz Cabrera, Manfred Ramon (2018)Interactive and Uncertainty-aware Imitation Learning: Theory and Applications. Masters thesis, Concordia University.
+
+``` 
+
+@phdthesis{diaz2018interactive,
+  title={Interactive and Uncertainty-aware Imitation Learning: Theory and Applications},
+  author={Diaz Cabrera, Manfred Ramon},
+  year={2018},
+  school={Concordia University}
+}
+
+@inproceedings{ross2011reduction,
+  title={A reduction of imitation learning and structured prediction to no-regret online learning},
+  author={Ross, St{\'e}phane and Gordon, Geoffrey and Bagnell, Drew},
+  booktitle={Proceedings of the fourteenth international conference on artificial intelligence and statistics},
+  pages={627--635},
+  year={2011}
+}
+
+@article{iandola2016squeezenet,
+  title={SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size},
+  author={Iandola, Forrest N and Han, Song and Moskewicz, Matthew W and Ashraf, Khalid and Dally, William J and Keutzer, Kurt},
+  journal={arXiv preprint arXiv:1602.07360},
+  year={2016}
+}
+```
diff --git a/learning/imitation/iil-dagger/learner/neural_network_policy.py b/learning/imitation/iil-dagger/learner/neural_network_policy.py
@@ -148,7 +148,7 @@ def _transform(self, observations, expert_actions):
             ]
         )
 
-        observations = [compose_obs(observation).numpy() for observation in observations]
+        observations = [compose_obs(observation).cpu().numpy() for observation in observations]
         try:
             # scaling velocity to become in 0-1 range which is multiplied by max speed to get actual vel
             # also scaling steering angle to become in range -1 to 1 to make it easier to regress
@@ -158,7 +158,7 @@ def _transform(self, observations, expert_actions):
             ]
         except:
             pass
-        expert_actions = [torch.tensor(expert_action).numpy() for expert_action in expert_actions]
+        expert_actions = [torch.tensor(expert_action).cpu().numpy() for expert_action in expert_actions]
 
         return observations, expert_actions
 

diff --git a/learning/imitation/iil-dagger/model/squeezenet.py b/learning/imitation/iil-dagger/model/squeezenet.py
@@ -38,7 +38,7 @@ def __init__(self, num_outputs=2, max_velocity=0.7, max_steering=np.pi / 2):
         self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
         self.model = models.squeezenet1_1()
         self.num_outputs = num_outputs
-        self.max_velocity_tensor = torch.tensor(max_velocity).to(self._device)
+        self.max_velocity_tensor = torch.tensor([max_velocity]).to(self._device)
         self.max_steering = max_steering
 
         # using a subset of full squeezenet for input image features
@@ -117,12 +117,12 @@ def predict(self, *args):
         output = self.model(images)
         if self.num_outputs == 1:
             omega = output
-            v_tensor = self.max_velocity_tensor.clone()
+            v_tensor = self.max_velocity_tensor.clone().unsqueeze(1)
         else:
             v_tensor = output[:, 0].unsqueeze(1)
             omega = output[:, 1].unsqueeze(1) * self.max_steering
         output = torch.cat((v_tensor, omega), 1).squeeze().detach()
-        return output
+        return output.cpu().numpy()
 
 
 if __name__ == "__main__":