Skip to content

Commit 5dcc2d5

Browse files
RolandMinruipeteryang1Xupeteryangms
authored
feat: new exp gen v2 implementation (#725)
* first framework commit * idea proposal v2 Co-authored-by: Roland Minrui <RolandMinrui@users.noreply.github.com> * fix a small bug in v1 * fix a small bug * add problem to DShypothesis * use exp gen as unified interface * merge yuante's code into pr * fix a small bug in draft * update all minrui's code * small update * fix small bug & remove useless code * fix return type * fix CI --------- Co-authored-by: Xu Yang <peteryang@vip.qq.com> Co-authored-by: Roland Minrui <RolandMinrui@users.noreply.github.com> Co-authored-by: Xu <v-xuminrui@microsoft.com> Co-authored-by: Xu Yang <xuyang1@microsoft.com>
1 parent 86f8bbf commit 5dcc2d5

13 files changed

Lines changed: 945 additions & 489 deletions

File tree

rdagent/app/data_science/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,7 @@ class DataScienceBasePropSetting(KaggleBasePropSetting):
2424
#### enable specification
2525
spec_enabled: bool = True
2626

27+
proposal_version: str = "v1"
28+
2729

2830
DS_RD_SETTING = DataScienceBasePropSetting()

rdagent/oai/backend/litellm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ def _create_chat_completion_inner_function( # type: ignore[no-untyped-def] # no
8080
if json_mode and supports_response_schema(model=LITELLM_SETTINGS.chat_model):
8181
kwargs["response_format"] = {"type": "json_object"}
8282

83+
logger.info(self._build_log_messages(messages), tag="llm_messages")
8384
# Call LiteLLM completion
8485
response = completion(
8586
model=LITELLM_SETTINGS.chat_model,
@@ -93,7 +94,6 @@ def _create_chat_completion_inner_function( # type: ignore[no-untyped-def] # no
9394
f"{LogColors.GREEN}Using chat model{LogColors.END} {LITELLM_SETTINGS.chat_model}", tag="llm_messages"
9495
)
9596

96-
logger.info(self._build_log_messages(messages), tag="llm_messages")
9797
if LITELLM_SETTINGS.chat_stream:
9898
logger.info(f"{LogColors.BLUE}assistant:{LogColors.END}", tag="llm_messages")
9999
content = ""

rdagent/scenarios/data_science/dev/feedback.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ def generate_feedback(self, exp: DSExperiment, trace: DSTrace) -> ExperimentFeed
7777
)
7878
)
7979

80+
# Currently, we do not use `observations`, `hypothesis_evaluation`, and `new_hypothesis` in the framework.
81+
# `new_hypothesis` should not exist in the feedback.
8082
return HypothesisFeedback(
8183
observations=resp_dict.get("Observations", "No observations provided"),
8284
hypothesis_evaluation=resp_dict.get("Feedback for Hypothesis", "No feedback provided"),

rdagent/scenarios/data_science/dev/prompts.yaml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,14 @@ exp_feedback:
1515
Your feedback should:
1616
1. Confirm if the current result supports or refutes the hypothesis.
1717
2. Compare with previous best results.
18-
3. Suggest improvements or new directions. Stay innovative and adaptive.
19-
4. SOTA results are the best outcomes we have achieved in this scenario. If we do not have complete experiment available (i.e., results that are runnable and can generate evaluation outcomes), **please replace it as the best result/SOTA**.
18+
3. SOTA results are the best outcomes we have achieved in this scenario.
2019
2120
Please provide detailed and constructive feedback.
2221
Example JSON Structure for Result Analysis:
2322
{
24-
"Observations": "Your overall observations here",
25-
"Feedback for Hypothesis": "Observations related to the hypothesis",
26-
"New Hypothesis": "Your new hypothesis here",
27-
"Reasoning": "Reasoning for the new hypothesis",
23+
"Observations": "A detailed summary of the experimental results, including the description and scores for both SOTA and the current experiment. Limit this field to no more than three sentences, focusing on concrete data rather than general statements.",
24+
"Feedback for Hypothesis": "A brief evaluation of the original hypothesis that highlights specific data points or trends which support or contradict it. Limit this field to two sentences.",
25+
"Reasoning": "A clear explanation of why the current result performs better or worse than SOTA. This should reference both the SOTA description score and the current experiment's description score, providing insight into the factors contributing to the observed differences. Limit this field to one to three sentences.",
2826
"Replace Best Result": "yes or no"
2927
}
3028

0 commit comments

Comments
 (0)