Fractal-RL is a lightweight and easy-to-use reinforcement learning library for TypeScript and JavaScript. It provides implementations of popular deep reinforcement learning algorithms, including Deep Q-Network (DQN) and Proximal Policy Optimization (PPO).
To install the library, use npm or bun:
npm install fractal-rl
bun add fractal-rl
The DQN agent is a value-based reinforcement learning algorithm that is well-suited for environments with discrete action spaces. The implementation in Fractal-RL is designed to be easy to use and configure.
Here is a basic example of how to use the DQNAgent
to solve the CartPole environment:
import { DQNAgent } from 'fractal-rl';
// Define the environment
class CartPoleEnvironment {
// ... (environment implementation)
}
// Configure the DQN agent
const config = {
stateDim: 4,
actionDim: 2,
learningRate: 0.01,
gamma: 0.99,
epsilonStart: 1.0,
epsilonEnd: 0.01,
epsilonDecay: 0.9995,
batchSize: 64,
targetUpdateFrequency: 50,
replayBufferSize: 10000,
dropoutRate: 0.0,
l2RegularizationRate: 0.0001,
clipNorm: 1.0,
hiddenActivation: 'leakyRelu',
};
const agent = new DQNAgent(config);
const env = new CartPoleEnvironment();
// Training loop
for (let episode = 0; episode < 5000; episode++) {
let state = env.reset();
let done = false;
while (!done) {
const action = agent.selectAction(state);
const { state: nextState, reward, done: envDone } = env.step(action);
agent.remember(state, action, reward, nextState, envDone);
if (agent.replayBuffer.size() > config.batchSize) {
agent.learn();
}
state = nextState;
done = envDone;
}
}
stateDim
: The dimension of the state space.actionDim
: The dimension of the action space.learningRate
: The learning rate for the optimizer.gamma
: The discount factor for future rewards.epsilonStart
: The initial value of epsilon for the epsilon-greedy exploration strategy.epsilonEnd
: The final value of epsilon.epsilonDecay
: The decay rate of epsilon.batchSize
: The size of the mini-batch for training.targetUpdateFrequency
: The frequency (in steps) at which the target network is updated.replayBufferSize
: The maximum size of the replay buffer.dropoutRate
: The dropout rate for regularization.l2RegularizationRate
: The L2 regularization rate.clipNorm
: The maximum norm for gradient clipping.hiddenActivation
: The activation function for the hidden layers.
The PPO agent is a policy-based reinforcement learning algorithm that is well-suited for environments with continuous or discrete action spaces. The implementation in Fractal-RL is designed to be easy to use and configure.
Here is a basic example of how to use the PPOAgent
to solve the CartPole environment:
import { PPOAgent } from 'fractal-rl';
// Define the environment
class CartPoleEnvironment {
// ... (environment implementation)
}
// Configure the PPO agent
const config = {
stateDim: 4,
actionDim: 2,
learningRateActor: 0.0003,
learningRateCritic: 0.0001,
gamma: 0.95,
lambda: 0.9,
epsilon: 0.02,
ppoEpochs: 5,
miniBatchSize: 64,
dropoutRate: 0.0,
l2RegularizationRate: 0.000001,
clipNorm: 1.0,
hiddenActivation: 'leakyRelu',
};
const agent = new PPOAgent(config);
const env = new CartPoleEnvironment();
// Training loop
for (let episode = 0; episode < 5000; episode++) {
let state = env.reset();
let done = false;
while (!done) {
const { action, logProb, value } = agent.selectAction(state);
const { state: nextState, reward, done: envDone } = env.step(action);
agent.remember(state, action, reward, nextState, envDone, logProb, value);
state = nextState;
done = envDone;
}
agent.update();
}
stateDim
: The dimension of the state space.actionDim
: The dimension of the action space.learningRateActor
: The learning rate for the actor network.learningRateCritic
: The learning rate for the critic network.gamma
: The discount factor for future rewards.lambda
: The lambda parameter for Generalized Advantage Estimation (GAE).epsilon
: The clipping parameter for the PPO objective function.ppoEpochs
: The number of epochs to run when updating the policy.miniBatchSize
: The size of the mini-batch for training.dropoutRate
: The dropout rate for regularization.l2RegularizationRate
: The L2 regularization rate.clipNorm
: The maximum norm for gradient clipping.hiddenActivation
: The activation function for the hidden layers.
To run the PPO CartPole GUI, goto:
https://fractal-solutions.github.io/fractal-RL/