-
Notifications
You must be signed in to change notification settings - Fork 2
Finetuning on concatenated datasets #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There are several other changes: - Upgrade beaker-py to 2.x (queues are not supported in older version). - Update helios.Dockerfile to use pytorch 2.7.0 to reduce build time. - Update one_off_projects/convert_satlas_webmercator_to_rslearn/lib/__init__.py for new rslearn VectorFormat.encode_vector API. - Remove manage_scratch_dir_on_data_disk option since it's not needed anymore (since the Docker volumes are now on the big /data disk across all Beaker nodes).
I also updated the launcher code to accept specifying a list of configs. This enables reducing duplication between some of the config files, although makes it more complicated to start the experiments since you need to specify a list of configs to get the right combination (this is documented in the README files within each task dir in data/helios though).
+ "_" | ||
+ model_path_parts[-1].replace(".ckpt", "") | ||
) | ||
eval_task = "__".join(config_paths[0].split(os.path.sep)[-2:]).strip( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reuse the rslp_project and rslp_experiment here instead of extracting a model name and task name from the provided files?
rslp/helios/launch_finetune.py
Outdated
from rslp.log_utils import get_logger | ||
|
||
DEFAULT_RSLP_PREFIX = "project_data/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this script should have defaults for RSLP_PREFIX, since the assumption everywhere else in this codebase is that the user is responsible for setting the environment variable. It can be placed in .env file.
Mostly just applying changes from rslearn (allenai/rslearn#207)
New configs in
data/helios/v3_multitask
anddata/helios/v3_perf_benchmark
are for multi-dataset training. These are the majority of the edits, plus a bunch of scripts for early evals I did (kind of messy, I don't mind deleting them since they're not really useful for anyone else). Onlymake_multidataset_config.py
should really be used with any frequency, to create multi-dataset training configs.Also, made some improvements to
launch_finetune
. It now supports aprofiler
flag, ado_eval
flag (run on validation set and save metrics, useful for eval sweeps), and alocal
flag (useful for debugging finetuning in the current Beaker session). Seedata/helios/v3_multitask/README.md
for docs on how to run a multi-dataset job.BREAKING: When building docker images, please place
rslearn
andhelios
in./docker_build/rslearn
and./docker_build/helios
, instead of at the repository root. This is to avoid linter issues, where the linter thinks thatrslearn
andhelios
are local packages instead of standardpip
installs.