Skip to content

Conversation

@JBWilkie
Copy link
Contributor

@JBWilkie JBWilkie commented Aug 21, 2024

Problem

Currently when pulling a dataset release with darwin-py, if:

  • 1: Any item is a multi-slotted item, or:
  • 2: Any item has a slot with multiple source files

Then every item is named and structured into directories in the multi-slotted way. That is the paths include the slot name: {prefix}/{item_name}/{slot_name}/{file_name}

This is because if any item meets either condition 1 or 2 above, we set force_slots=True for all items

If a user passes force_slots=False to pull() (which is the default behaviour), we should pass force_slots=False for all single-slotted, single-source-file items, but as True for all other items, since in that case it only makes sense for the local paths of multi-slotted / multi-source-file items to include the slot names

Solution

Instead of passing the same value of force_slots for all items, determine & record the value of force_slots for every item to be downloaded, then propagate these values into the download functions

Changelog

When downloading dataset releases containing a mixture of multi-slotted & single-slotted items, only represent the slots of the multi-slotted items locally

@linear
Copy link

linear bot commented Aug 21, 2024

Copy link
Contributor

@dorfmanrobert dorfmanrobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice lgtm

@JBWilkie JBWilkie merged commit ad35478 into master Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants