WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection

WebSeer is a reinforcement learning framework for training intelligent web-based search agents capable of deeper reasoning, longer tool-use chains, and self-reflective correction.
Unlike traditional Retrieval-Augmented Generation (RAG) systems, WebSeer integrates self-reflection into every stage of reasoning, enabling agents to backtrack, reformulate queries, and iteratively improve answers in real-world web environments.

🤖 Model

Model Name	Hugging Face Checkpoint	Size
WebSeer-14B	🤗 WebSeer-14B	14B

Inference

Package Installation

We recommend using uv for environment management:

uv venv test_inf --python=3.10
source test_inf/bin/activate
uv pip install flask elasticsearch requests-cache requests urllib3 google-cloud-discoveryengine fanoutqa gunicorn openai jsonlines regex multiprocess pebble
uv pip install vllm --torch-backend=auto

Start retrieval server

We use Serper to retrieve Google search results. You need to add your Serper API key in the server_w_ws.py file.

gunicorn -w 4 -b 0.0.0.0:21021 server_w_ws:app --timeout 120

Launch the Inference Engine

Replace PATH_TO_MODEL with your local or remote model path/checkpoint.

vllm serve --host 0.0.0.0 --port 20090 PATH_TO_MODEL --served-model-name 'WebSeer-14b' --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser deepseek_r1 --tensor-parallel-size 1 --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072

Quick Inference Demo

python demo_inference.py

Training

Data Preparation

git clone https://huggingface.co/datasets/99hgz/WebSeer-sft-dataset ~/data/re_rag/
git clone https://huggingface.co/datasets/99hgz/WebSeer-dataset ~/data/re_rag_rl/

SFT

uv venv webseer --python=3.10
source webseer/bin/activate
git clone https://github.com/GAIR-NLP/DeepResearcher.git
cd DeepResearcher
uv pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
uv pip install flash-attn --no-build-isolation
uv pip install -e .
bash ./sft/recipe/retool/run_qwen2.5_14b_sp4.sh

RL

gunicorn -w 4 -b 0.0.0.0:21021 server_w_ws:app --timeout 120    # start retrieval server
bash ./tests/e2e/run_re_rag.sh

Evaluation

We provide model outputs in the outputs directory. Complete evaluation scripts will be released later.

🤝 Acknowledgements

This training implementation is based on verl. The base model is Qwen2.5.

📚 Citation

If you find this work useful, please cite it as follows:

Name		Name	Last commit message	Last commit date
Latest commit History 829 Commits
.github		.github
.vscode		.vscode
assets		assets
docker		docker
docs		docs
examples		examples
outputs		outputs
recipe		recipe
scripts		scripts
sft		sft
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
demo_inference.py		demo_inference.py
executor.py		executor.py
manage_server.py		manage_server.py
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
run.py		run.py
server_w_ws.py		server_w_ws.py
setup.py		setup.py
sim2.py		sim2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection

🤖 Model

Inference

Package Installation

Start retrieval server

Launch the Inference Engine

Quick Inference Demo

Training

Data Preparation

SFT

RL

Evaluation

🤝 Acknowledgements

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection

🤖 Model

Inference

Package Installation

Start retrieval server

Launch the Inference Engine

Quick Inference Demo

Training

Data Preparation

SFT

RL

Evaluation

🤝 Acknowledgements

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages