Skip to content

21-Reinforcement-Learning #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
redblobgames opened this issue Mar 14, 2017 · 20 comments
Closed

21-Reinforcement-Learning #62

redblobgames opened this issue Mar 14, 2017 · 20 comments
Labels
chapter discussion (Archived) Discussion of the design of a chapter

Comments

@redblobgames
Copy link
Contributor

Visualizations and code for chapter 21

@alireza-a
Copy link
Contributor

I start with the Passive-ADP-Agent (Fig 21.2) and I want to visualize it evaluating a fixed policy in the 3x4 world (Fig 21.1). I have decided to use D3.js to visualize the environment. I want to use the Python code as a base for my implementation.

@Ghost---Shadow
Copy link
Contributor

Our original plan was to stick to javascript as we wanted to make the whole think static host-able. I will wait for @redblobgames for any revisions on that.

@redblobgames
Copy link
Contributor Author

@alireza-a Sounds like a good start. We have been using two.js for visualization so before you use d3.js look to see if two.js works (to keep consistency with the existing code base). Translating the python code to javascript seems like a good approach.

@alireza-a
Copy link
Contributor

Sure I will only depend on two.js. I mapped over MDP, GridWorld, and PassiveADPAgent (only the parts that I need) from the Python Implementation to JS. I could not find anything about testing on the repo. Do we only rely on the visualizations since it's a static site? If so, should I only make a pull request after the visualizations have been added?

@redblobgames
Copy link
Contributor Author

Testing would be nice. I'm not sure how to best approach it. Yes, we could have automated tests for the algorithms, but the main focus of the work is the visualizations, and those we'll have to test by asking people to try it out (user testing instead of unit testing).

Yes, make a pull request after you've made a visualization or two. For some visualizations you may have to write the algorithm in a non-standard way to make the visualization work.

A lot of this project will be experimental. Implementing the algorithms is work but it's work we know how to do. Interactive/animated visualizations for algorithms is something where we don't know which visualizations will be good, so we may have to make some and throw some away.

Some visualizations will be for the problems, some will show concepts, and other visualizations will show the solutions (algorithms). Some visualizations can ask the reader to interact; look at this example. Some visualizations may be animated without the reader providing much input; look at these examples. We'll figure this out by experimenting :-) and I think it may be different in each chapter.

@alireza-a
Copy link
Contributor

So far, I have mapped over the MDP, GridWorld, and Policy Evaluation from the Python Implementation and I have visualized the Policy Evaluation on the GridWorld with a fixed policy (Fig 21.1). Here is the Policy Evaluation Demo.

The MDP and GridWorld are shared between chapter 17 and 21. Would this mean a reimplementation of MDP and GridWorld by whoever takes on chapter 17 or do you prefer a single implementation between the two chapter?

@redblobgames
Copy link
Contributor Author

@alireza-a cool

The main goal is the visualizations, so feel free to reuse between chapters if it works better, or reimplement if that works better. With the experimental nature of this project, sometimes it's easier to have two copies of the code while rapidly iterating on the visualizations, and then go back and merge them after we figure out which visualizations worked well.

@alireza-a
Copy link
Contributor

My next steps:

  1. Visualize Passive Agents (on Grid World)
  2. Explore adding a graph to show the delta in state values over iterations (to compare how fast the information propagates in different algorithms)

@alireza-a
Copy link
Contributor

State Value Graph Demo

comments & questions

  1. I changed the width of the content to fit in both the graph and the GridWorld side by side. This is inconsistent with the other chapters.
    Is it acceptable to have a different content width for each chapter?
  2. I used NVD3 to create the graph. NVD3 is built on top of D3.js. However, it is not yet recommended to use NVD3 with D3 version 4.X (I used D3 version 3.5.17).
    Is NVD3 a reasonable choice?
  3. The graph progresses after an iteration completes. In the demo, I tap quickly to reach the equilibrium state values. I think it is better to just click on a start button and let the environment evolve compared with tapping over and over to see the changes.
    What do you think?

@redblobgames is this close to what you had in mind?

@redblobgames
Copy link
Contributor Author

@alireza-a pretty cool to see the convergence! It's not quite what I had in mind but it works :-)

  1. Yes, it's acceptable and expected to have different formatting for each chapter. Chapters are so different from one another that I don't think we can expect consistency at the beginning. We want to focus on finding the most useful visualizations (experimentation).
  2. NVD3 is fine. I think we can re-evaluate later to see if we end up using lots of charts or not. It's unclear at this early stage.
  3. I agree, the tapping seems excessive, and either an animation or slider might be nicer. Tapping might be more useful in an algorithm where the reader can make a choice.

I also wonder about how to show the individual steps. Right now you tap to see one number improve at a time, but it's unclear how that estimate is actually improving. This might be something that we can think about once we get to the later algorithms — we can show how the algorithms differ. Or maybe the individual steps aren't what we want to focus on, and there's some other aspect of the algorithm we want the reader to focus on.

@alireza-a
Copy link
Contributor

I agree, the tapping seems excessive, and either an animation or slider might be nicer. Tapping might be more useful in an algorithm where the reader can make a choice.

My original implementation used animations. To show what happened at every step, I slowed down the animations. This made it take far too long to converge so I switched to tapping. I will try slider this time.

Also, I will be preoccupied with final exams until April 23rd.

@redblobgames
Copy link
Contributor Author

Ah, makes sense. A slider can either control animation speed or the simulation time.

  • If you're doing animation speed then instead of setInterval you can use setTimeout each time through the loop, and then the slider would control the timeout parameter.
  • If you're doing simulation time then one way to implement it is to run the entire algorithm to the end, and record the state along the way. Then the slider will display the recording.

@alireza-a
Copy link
Contributor

@redblobgames I like the progress bar, start/pause, and next/back interface you used in your A* search article. I will create a similar interface.

@alireza-a
Copy link
Contributor

I reimplemented all the visualizations with D3.js and created a similar interface to the A* search article. Here is a demo for what I have so far.

  • I need to play with the color scheme and transitions further
  • I am considering other graphing libraries for the state-values

Is there anything else I should try before moving on to the next agent?

@redblobgames
Copy link
Contributor Author

Looks cool!

The diagram right now shows how states are evaluated and values updated. Can you put something in there describing what the colors mean? What is dark green vs light green?

BTW if you want arrows in d3, I have some code (adapted from the SVG documentation) to produce an arrowhead marker:

// run this once on the page
var defs = d3.select('svg').insert('defs', ':first-child');
var marker = defs.append('marker')
    .attr('id', 'arrowhead')
    .attr('viewBox', '0 0 10 10')
    .attr('refX', 7)
    .attr('refY', 5)
    .attr('markerUnits', 'strokeWidth')
    .attr('markerWidth', 4)
    .attr('markerHeight', 3)
    .attr('orient', 'auto');
var path = marker.append('path')
    .attr('fill', 'green')
    .attr('d', 'M 0 0 L 10 5 L 0 10 z');

Once you create the marker (just once on the page) you can attach the marker to any <path> or <line> by using .attr('marker-end', 'url(#arrowhead)'). Arrows might be useful for showing how the states are connected.

I think the main thing before moving on to the next agent/diagram is to make a list of the concepts you wanted to show in this diagram, and which you want in the next diagram(s). Sometimes it's useful to have a concept introduced in one and then used in the next without having to explain it again, and sometimes it's useful to have two diagrams side by side to show a concept that can't be shown by itself.

@alireza-a
Copy link
Contributor

I am thinking of redesigning the states (squares) to make the changes in state-value more pronounced. I want to remove the text representing the state value and replace it with a bar in the square. This design is what I'm currently considering.

@redblobgames
Copy link
Contributor Author

@alireza-a I think that's a good idea. When there are lots of things changing on the page it may be difficult for the reader to read all the numbers. The visual representations would make it easier to see at a glance.

Your proposed design images 2 and 3 could be used for another design too: instead of showing all the lines in one chart on the right, you could show one chart inside each state's square. Maybe the line chart could instead be a bar chart with light gray bars (not prominent) but the current state (the rightmost bar) would be the green/red like in your proposed design. That way when you are looking at the current state you could also see the history of the value.

@alireza-a
Copy link
Contributor

This is fun! I'll create a history bar chart for each state and update the animations to match.
I refactored my code and created an MVC-like architecture (I might have gone too far with this). Here the state values have been replaced with bars.

@redblobgames
Copy link
Contributor Author

Looks nice!

@alireza-a
Copy link
Contributor

I'm occupied with final exams until Aug 17th. I'll get back to this after I get free.

@redblobgames redblobgames added the chapter discussion (Archived) Discussion of the design of a chapter label Sep 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chapter discussion (Archived) Discussion of the design of a chapter
Projects
None yet
Development

No branches or pull requests

3 participants