Conversation
| import dacbench.envs.toysgd | ||
| importlib.reload(dacbench.envs.toysgd) | ||
|
|
||
| HISTORY_LENGTH = 40 |
There was a problem hiding this comment.
Is the history length used somewhere I don't see right now?
There was a problem hiding this comment.
oops forgot to delete, not used right now
| { | ||
| "action_space_class": "Box", | ||
| "action_space_args": [-np.inf * np.ones((2,)), np.inf * np.ones((2,))], | ||
| "observation_space_class": "Dict", |
There was a problem hiding this comment.
I'll not say you shouldn't or can't do that, but Dict spaces are impractical without the dict wrapper. If you don't care, feel free to ignore
There was a problem hiding this comment.
I just took inspiration from the sgd benchmark. If that's suboptimal, feel free to remodel it :) or I could also do it tomorrow
|
Initial x as 0 makes sense, I think. Without haing every run it, I can't really say if the coefficient range is good or not, but it's super easy to adapt, so I think that's good. |
|
Any updates here, @benjamc ? I found another thing to fix in the meantime, the reset doesn't seem to return a state. I missed that, but it's definitely not desired behavior ;D |
|
Also, we may want to set an upper limit for values. The lr get huge pretty fast with actions > 1 |
|
Changes from my side:
Things where input would be nice:
As soon as we settled those two, we can merge |
|
Please have a look. :-)
Especially on the bounds for the coefficients and the initial x (not sure about that).