Skip to content

added toysgd benchmark#103

Merged
TheEimer merged 4 commits intomainfrom
AndySGD
Sep 9, 2021
Merged

added toysgd benchmark#103
TheEimer merged 4 commits intomainfrom
AndySGD

Conversation

@benjamc
Copy link
Contributor

@benjamc benjamc commented Aug 9, 2021

Please have a look. :-)
Especially on the bounds for the coefficients and the initial x (not sure about that).

@benjamc benjamc requested a review from TheEimer August 9, 2021 15:27
import dacbench.envs.toysgd
importlib.reload(dacbench.envs.toysgd)

HISTORY_LENGTH = 40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the history length used somewhere I don't see right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops forgot to delete, not used right now

{
"action_space_class": "Box",
"action_space_args": [-np.inf * np.ones((2,)), np.inf * np.ones((2,))],
"observation_space_class": "Dict",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll not say you shouldn't or can't do that, but Dict spaces are impractical without the dict wrapper. If you don't care, feel free to ignore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took inspiration from the sgd benchmark. If that's suboptimal, feel free to remodel it :) or I could also do it tomorrow

@TheEimer
Copy link
Contributor

TheEimer commented Aug 9, 2021

Initial x as 0 makes sense, I think. Without haing every run it, I can't really say if the coefficient range is good or not, but it's super easy to adapt, so I think that's good.
So thanks for the benchmarks, I'll merge as soon as you adressed my comment (second one is optional).

@TheEimer
Copy link
Contributor

TheEimer commented Sep 8, 2021

Any updates here, @benjamc ? I found another thing to fix in the meantime, the reset doesn't seem to return a state. I missed that, but it's definitely not desired behavior ;D

@TheEimer
Copy link
Contributor

TheEimer commented Sep 8, 2021

Also, we may want to set an upper limit for values. The lr get huge pretty fast with actions > 1

@TheEimer
Copy link
Contributor

TheEimer commented Sep 8, 2021

Changes from my side:

  • limited actions to stay below 1
  • made reset return a state
  • initialized lr with 0 (might not be smart at all, but None is worse)

Things where input would be nice:

  • how can we initialize the lr without giving the agent a nan in the reset state?
  • is 1 a good limit for the actions?

As soon as we settled those two, we can merge

@benjamc
Copy link
Contributor Author

benjamc commented Sep 8, 2021

  • [lr] setting the learning rate to 0 as a starter should be fine. this way the agent is definitely forced to do something :D
    and the agent does not set the learning rate itself but the log learning rate . I am not sure if this is really good or if we should set the learning rate to a really small value because the agent can never achieve a learning rate of 0 because of log.
  • [actions] do not limit actions

@TheEimer TheEimer merged commit ac31683 into main Sep 9, 2021
@TheEimer TheEimer deleted the AndySGD branch March 21, 2023 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants