Skip to content

Commit fdb171f

Browse files
Merge pull request #14 from SalesforceAIResearch/zhiwei/dev
Zhiwei/dev
2 parents b3be9d8 + 016d021 commit fdb171f

File tree

13 files changed

+885
-725
lines changed

13 files changed

+885
-725
lines changed

ROADMAP.md

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
### MCP Server 🖥️
44
- ✅ Python stdio server support
55
- ✅ node.js stdio server support
6+
- ✅ http mcp server support
7+
- 🔲 connecting mcp servers with json file as a standard way
68

79
### MCP Client 🤖
810
- ✅ Stdio client implementation
@@ -19,20 +21,14 @@
1921
### Evaluation 📊
2022
- ✅ Implement core evaluation metrics (accuracy, latency)
2123
- ✅ Create automated testing framework
24+
- 🔲 Automatic Deep Evaluating
25+
- 🔲 Evaluating the implementation of MCP server
2226

2327
### Data Pipeline 🔄
2428
- ✅ Design unified data schema for all benchmarks
2529
- ✅ Implement data preprocessing tools
2630
- ✅ Add support for multiple data formats
2731

28-
### Benchmarks 🧪
29-
- ✅ Airbnb MCP benchmark
30-
- ✅ Healthcare MCP benchmark
31-
- ✅ yahoo finance MCP benchmark
32-
- ✅ Sports benchmark
33-
- ✅ travel_assistant benchmark
34-
- ✅ File System benchmark
35-
3632
### LLM Provider 🧠
3733
- ✅ OpenAI API integration (used for data generation and testing)
3834
- ✅ local vllm-based model
@@ -43,4 +39,21 @@
4339
- ✅ Data converter
4440
- ✅ Model evaluator
4541
- ✅ Report generator
46-
- ✅ Auto end-to-end evaluation
42+
- ✅ Auto end-to-end evaluation
43+
44+
### Front-end 🎨
45+
- ✅ React application setup with TypeScript
46+
- ✅ Core navigation and routing
47+
- ✅ MCP server configuration interface
48+
- ✅ Chat client for MCP interactions
49+
- ✅ Task generation and verification UI
50+
- ✅ Model evaluation dashboard
51+
- ✅ Results and analytics pages
52+
- ✅ Data management interfaces
53+
- 🔲 Unifying the model config for all the pages and sharing the same component
54+
- 🔲 Saving any existing model config as a config file and support load it again
55+
56+
## Issues
57+
- Evluating multiple models does not working
58+
- Analyze feature does not support not generating AI report
59+
- Judge Rubrics select not generate report

frontend/src/components/MCPChatServerConfiguration.tsx

Lines changed: 0 additions & 141 deletions
This file was deleted.

frontend/src/components/MCPServerConfiguration.tsx

Lines changed: 0 additions & 179 deletions
This file was deleted.

0 commit comments

Comments
 (0)