WebSeer is a reinforcement learning framework for training intelligent web-based search agents capable of deeper reasoning, longer tool-use chains, and self-reflective correction.
Unlike traditional Retrieval-Augmented Generation (RAG) systems, WebSeer integrates self-reflection into every stage of reasoning, enabling agents to backtrack, reformulate queries, and iteratively improve answers in real-world web environments.
| Model Name | Hugging Face Checkpoint | Size |
|---|---|---|
| WebSeer-14B | 🤗 WebSeer-14B | 14B |
We recommend using uv for environment management:
uv venv test_inf --python=3.10
source test_inf/bin/activate
uv pip install flask elasticsearch requests-cache requests urllib3 google-cloud-discoveryengine fanoutqa gunicorn openai jsonlines regex multiprocess pebble
uv pip install vllm --torch-backend=autoWe use Serper to retrieve Google search results. You need to add your Serper API key in the server_w_ws.py file.
gunicorn -w 4 -b 0.0.0.0:21021 server_w_ws:app --timeout 120Replace PATH_TO_MODEL with your local or remote model path/checkpoint.
vllm serve --host 0.0.0.0 --port 20090 PATH_TO_MODEL --served-model-name 'WebSeer-14b' --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser deepseek_r1 --tensor-parallel-size 1 --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072python demo_inference.pygit clone https://huggingface.co/datasets/99hgz/WebSeer-sft-dataset ~/data/re_rag/
git clone https://huggingface.co/datasets/99hgz/WebSeer-dataset ~/data/re_rag_rl/uv venv webseer --python=3.10
source webseer/bin/activate
git clone https://github.com/GAIR-NLP/DeepResearcher.git
cd DeepResearcher
uv pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
uv pip install flash-attn --no-build-isolation
uv pip install -e .
bash ./sft/recipe/retool/run_qwen2.5_14b_sp4.shgunicorn -w 4 -b 0.0.0.0:21021 server_w_ws:app --timeout 120 # start retrieval server
bash ./tests/e2e/run_re_rag.shWe provide model outputs in the outputs directory. Complete evaluation scripts will be released later.
This training implementation is based on verl. The base model is Qwen2.5.
If you find this work useful, please cite it as follows:

