PikaBot-RL

**Pokemon** is one of the most famous media franchises, spanning across different mediums such as video games, anime, movies, and manga for over 25 years. The Pokemon video games are adventure-based, turn-based strategy games where battles take place between Pokemon trainers. In each turn, players choose between four moves or maneuver their Pokemon while keeping track of the most optimal moves and strategies to win. This formula has remained mostly unchanged since the beginning of the franchise.

The idea for this project was to build a bot that can learn to play Pokemon, specifically to battle other trainers. The bot would learn the different mechanics of the game, from choosing the optimal moves each turn to making long-term strategies to win matches.

The easiest platform to develop such a bot is Pokemon Showdown, an online platform that is lightweight, free to play, and very accessible for this purpose. Previous work has also been done on similar projects, specifically with the Poke-env environment, which provides easy access to all the data needed, eliminating much of the technical implementation required for a classic Pokemon game.

Motivations for the Project

The goal was to build a bot for the online game Pokemon Showdown using reinforcement learning methods such as:

DDQN
PPO
Reinforce

The bot would be hosted on the online Pokemon Showdown server, allowing players to battle against it with the help of Poke-env.

Methods Used

1)Reinforce

Reinforce is a policy gradient method that directly optimizes the agent's policy through trial and error by adjusting action probabilities based on rewards. It relies solely on the return from the environment to update the policy, without the need for a value function. While simple, it can be slow and less stable due to high variance in the updates, especially in complex environments with delayed rewards.

2)Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a policy gradient method that improves on REINFORCE by using a clipped objective to prevent large, destabilizing policy updates. Unlike REINFORCE, PPO often pairs with an Actor-Critic architecture, where the critic estimates the value function to stabilize learning. Its stability and efficiency make it a more robust choice, especially for continuous and large-scale tasks.

3)DDQN

Double Deep Q-Network (DDQN) is a value-based method that refines the original DQN by separating action selection and evaluation to avoid overestimating Q-values. Unlike PPO and REINFORCE, which focus on learning a policy, DDQN learns the value of state-action pairs and uses these values to guide decision-making. This method is particularly effective in environments where learning precise action values is crucial for long-term success.

Network Used

The Actor-Critic method combines two networks: the actor, which selects actions based on the current policy, and the critic, which evaluates the value of the state to guide the actor's updates. This architecture reduces the high variance typically seen in pure policy gradient methods like REINFORCE by incorporating value estimates. By leveraging the critic's feedback, the actor improves its policy more efficiently, making the Actor-Critic method well-suited for continuous action spaces and complex environments.The final model we used was an Actor-Critic with PPO policy. The architecture consists of an actor-network and a critic network, with the following layers:

Actor Network

Input Layer: Takes in the state of the environment (state_dim features).
1st Hidden Layer: Fully connected layer with 64 units and Tanh activation.
2nd Hidden Layer: Fully connected layer with 128 units and Tanh activation.
3rd Hidden Layer: Another fully connected layer with 128 units and Tanh activation.
Output Layer: Fully connected layer with action_dim units, using Softmax activation to output probabilities for each action.

Critic Network

Input Layer: Same as the actor-network, takes in the state (state_dim features).
1st Hidden Layer: Fully connected layer with 64 units and Tanh activation.
2nd Hidden Layer: Fully connected layer with 128 units and Tanh activation.
3rd Hidden Layer: Another fully connected layer with 128 units and Tanh activation.
Output Layer: A single unit (scalar output), representing the estimated value of the input state (used for value prediction).

State and Action Spaces

State Space

The state space ( S ) consists of all possible states in the environment. Each state ( s ) is defined at each turn with 12 battle elements concatenated, which correspond to:

[0] Our Active Pokémon index
[1] Opponent Active Pokémon index
[2-5] Active Pokémon moves base power (default to -1 if a move doesn't have base power)
[6-9] Active Pokémon moves damage multipliers
[10] Our remaining Pokémon
[11] Opponent remaining Pokémon

Action Space

The action space ( A ) consists of all possible actions we can take. The action space is a range ([0, 8]) with a total length of 9. Each action ( a ) in ( A ) corresponds to one of the following choices:

[0] Use 1st Active Pokémon move
[1] Use 2nd Active Pokémon move
[2] Use 3rd Active Pokémon move
[3] Use 4th Active Pokémon move
[4] Switch to 1st next Pokémon
[5] Switch to 2nd next Pokémon
[6] Switch to 3rd next Pokémon
[7] Switch to 4th next Pokémon
[8] Switch to 5th next Pokémon

Installation Instructions

Ensure Python 3.8 or later and torch is installed on your system.
Install the required Python dependencies using pip:
```
pip install -r requirements.txt
```

part1.mp4

Running the Code for battle

To battle the bot, follow these steps:

Create Two Pokémon Showdown Accounts:
- You need two accounts: one to host the bot and another for yourself.
- Create these accounts at Pokémon Showdown.
Prepare the Account Information:
- Create a file named Account.txt in the same directory as your PPO2.py script.
- This file should contain the username and password of the account you will use to host the bot.
Run the PPO Script:
- Ensure the PPO2.py script, model weights file, and accounts.txt are all in the same folder.
- Execute the script with the following command:
```
python PPO2.py
```

Set Up Your Team:

Go to the Pokémon Showdown team builder and create a team using the following string. Copy and paste this string into the team builder:

Qwilfish (Qwilfish-Hisui) @ Eviolite  
Ability: Intimidate  
Level: 83  
Tera Type: Flying  
EVs: 85 HP / 85 Atk / 85 Def / 85 SpA / 85 SpD / 85 Spe  
- Toxic Spikes  
- Crunch  
- Gunk Shot  
- Spikes  

Medicham @ Choice Band  
Ability: Pure Power  
Level: 86  
Tera Type: Fighting  
EVs: 85 HP / 85 Atk / 85 Def / 85 SpA / 85 SpD / 85 Spe  
- Zen Headbutt  
- Ice Punch  
- Poison Jab  
- Close Combat  

Orthworm @ Chesto Berry  
Ability: Earth Eater  
Level: 88  
Tera Type: Electric  
EVs: 85 HP / 85 Atk / 85 Def / 85 SpA / 85 SpD / 85 Spe  
- Body Press  
- Coil  
- Rest  
- Iron Tail  

Chandelure @ Choice Scarf  
Ability: Flash Fire  
Level: 83  
Tera Type: Fire  
EVs: 85 HP / 85 Def / 85 SpA / 85 SpD / 85 Spe  
IVs: 0 Atk  
- Trick  
- Shadow Ball  
- Energy Ball  
- Fire Blast  

Floatzel @ Leftovers  
Ability: Water Veil  
Level: 85  
Tera Type: Dark  
EVs: 85 HP / 85 Atk / 85 Def / 85 SpA / 85 SpD / 85 Spe  
- Crunch  
- Low Kick  
- Wave Crash  
- Bulk Up  

Spiritomb @ Leftovers  
Ability: Infiltrator  
Level: 90  
Tera Type: Dark  
EVs: 85 HP / 85 Atk / 85 Def / 85 SpA / 85 SpD / 85 Spe  
- Poltergeist  
- Toxic  
- Foul Play  
- Sucker Punch

Challenge the Bot:
- In Pokémon Showdown, use the search feature to find the username associated with the bot.
- Challenge the bot. It should automatically accept the challenge.
Enjoy the Battle:
- Have fun battling the bot!

part2.1.1.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
0.6666666666666666ac_state_dict3.pth		0.6666666666666666ac_state_dict3.pth
DDQNAgent.py		DDQNAgent.py
LICENSE		LICENSE
Memory.py		Memory.py
README.md		README.md
Screenshot 2024-09-23 190400.png		Screenshot 2024-09-23 190400.png
SegmentTree.py		SegmentTree.py
part1.mp4		part1.mp4
part2.mp4		part2.mp4
ppo2.py		ppo2.py
rainbowdqn.py		rainbowdqn.py
reinforce.py		reinforce.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PikaBot-RL

Motivations for the Project

Methods Used

1)Reinforce

2)Proximal Policy Optimization

3)DDQN

Network Used

Actor Network

Critic Network

State and Action Spaces

State Space

Action Space

Installation Instructions

Running the Code for battle

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

dsgiitr/PikaBot-RL

Folders and files

Latest commit

History

Repository files navigation

PikaBot-RL

Motivations for the Project

Methods Used

1)Reinforce

2)Proximal Policy Optimization

3)DDQN

Network Used

Actor Network

Critic Network

State and Action Spaces

State Space

Action Space

Installation Instructions

Running the Code for battle

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages