Skip to content

[AL-5784] Remove backports because it failed to installed on Windows #1119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 6, 2023

Conversation

vbrodsky
Copy link
Contributor

@vbrodsky vbrodsky commented May 30, 2023

BACKGROUND
Prior to 3.11, Python datetime.fromisoformat() function did not deal well with some accepted iso string formats we would like to support for the sdk. Python has addressed this in newer versions, beginning with 3.11
In sdk v3.44.0 we have install a backport to use the 3.11 version of fromisoformat(). Unfortunately, our users had problems installing the backport on Windows. Consequently, they are blocked from upgrading our sdk.

Just removing the backport is not enough, it would revert to older version datetime.fromisoformat() and would break time string parsing again. So we have switched to using the dateutils datetime string parsing and it works.

I have created 3 scripts each matching to pre-, during- and post-backport (current). The results and the scripts will be attached.

  • Using the dateutil library (https://github.com/dateutil/dateutil/) to parse iso string as datetime. It replaces backports.datetime_fromisoformat that did not install well on Windows. The new library is Apache2 licensed (compliant with our open source licensing policy) and passed on Snyk, so all good for us to use
  • Minor cleanup:
  • NOTE typical usage of this code does not do any timezone conversions. The function I have added, format_iso_from_string will preserve timezones, however it's sister function format_iso_datetime just takes current value of datetime and formats it as UTC. Our use case is to first call format_iso_from_string to make sure the string is valid date followed by format_iso_datetime to still return back a string (see def _validate_parse_datetime). If a string is passed '2011-11-04T00:05:23+05:00', we get back '2011-11-04T00:05:23Z'. I think we should not be doing 'symmetrical conversion' here but I did not want to go outside of the scope of my story to address this (comments are welcome)

@vbrodsky vbrodsky force-pushed the VB/remove-datetime-backport branch from 4b645af to 529668c Compare May 30, 2023 21:33
@vbrodsky vbrodsky requested review from whistler, kkim-labelbox and yamini-labelbox and removed request for whistler May 30, 2023 21:44
@vbrodsky vbrodsky force-pushed the VB/remove-datetime-backport branch from 529668c to 4a8c474 Compare June 1, 2023 18:09
@yamini-labelbox
Copy link
Contributor

yamini-labelbox commented Jun 1, 2023

Hi @vbrodsky , dateutil looks like a promising choice for sure! But looks like https://docs.python.org/3/library/datetime.html is being used for date conversions consistently throughout the repository. It probably fits your usecase too. Suggesting just to reduce one extra dependency. Sorry.

return dt.strftime(ISO_DATETIME_FORMAT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work if we're not backporting anymore? I see format_iso_datetime is still being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very good point and I am glad you have asked
This function uses strftime and I have tested it and found working the same with and without backport, so I have decided to keep it in light of the immediate goal of this story

HOWEVER I have raised it in this PR this function does not convert timezones into UTC correctly... I was hesitant to address it as a part of this story since no one has ever flagged this as an issue before

LMK, happy to address it here or as a separate ticket

('2011-11-04T00:05:23Z', '2011-11-04T00:05:23Z'),
('2011-11-04T00:05:23+00:00', '2011-11-04T00:05:23Z'),
('2011-11-04T00:05:23+05:00', '2011-11-04T00:05:23Z'
), #NOTE not converting with timezone... this is compatible with out current implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing some typos around the comments. Since it's a public repository, let's make sure we don't have too many of these :)
Could you proofread all the comments?

@kopreschko kopreschko self-requested a review June 2, 2023 21:42
@vbrodsky
Copy link
Contributor Author

vbrodsky commented Jun 5, 2023

Hi @vbrodsky , dateutil looks like a promising choice for sure! But looks like https://docs.python.org/3/library/datetime.html is being used for date conversions consistently throughout the repository. It probably fits your usecase too. Suggesting just to reduce one extra dependency. Sorry.

@yamini-labelbox are you saying that the dateutil library under the hood uses python datetime?

@vbrodsky No. I meant because in the python SDK repo, datetime module that i linked above is used in various places to handle date formatting( example: https://github.com/Labelbox/labelbox-python/blob/develop/labelbox/orm/db_object.py#LL1), why don't we use the same module to handle date formatting in the context of this PR.? I just wanted hear what you think about that idea.

hi @yamini-labelbox (dunno why I can't just use a github Reply... in our thread) but here it is. The reason I am not using Python datetime library is exactly because the fromisoformat method of that library did not cover all cases we want to cover and returned an error for some for all versions of python prior to 3.11 (and we need to support EXACTLY 3.7-9 versions :(

I am going to update the PR description to outline not just WHAT I did, but WHY I did it

@vbrodsky vbrodsky force-pushed the VB/remove-datetime-backport branch from 4a8c474 to b2f348d Compare June 5, 2023 16:40
Make comments read better
@vbrodsky
Copy link
Contributor Author

vbrodsky commented Jun 5, 2023

Testing - prebackport
This is the script

from datetime import datetime
from labelbox.schema.data_row_metadata import DataRowMetadataField, _validate_parse_datetime

datestrings = ['2011-11-04T00:05:23Z',
    '2011-11-04T00:05:23+00:00',
    '2011-11-04T00:05:23+05:00',
    '2011-11-04T00:05:23'
]
for datestring in datestrings:
    print(f"parsing {datestring}")

    try:
        field = DataRowMetadataField(
            schema_id="clii8u95s00jc072s11xvflrd",
            value=datetime.fromisoformat(datestring[:-1] + "+00:00"))
    except ValueError as e:
        print(f"failed to create DataRowMetadataField for {datestring}")
        # print exception message
        print(f"Exception message: {e}")
        continue
    _validate_parse_datetime(field)
    print(f"Sucessffully created and validated DataRowMetadataField for {datestring}")

Here are the results:
parsing 2011-11-04T00:05:23Z
Sucessffully created and validated DataRowMetadataField for 2011-11-04T00:05:23Z
parsing 2011-11-04T00:05:23+00:00
failed to create DataRowMetadataField for 2011-11-04T00:05:23+00:00
Exception message: Invalid isoformat string: '2011-11-04T00:05:23+00:0+00:00'
parsing 2011-11-04T00:05:23+05:00
failed to create DataRowMetadataField for 2011-11-04T00:05:23+05:00
Exception message: Invalid isoformat string: '2011-11-04T00:05:23+05:0+00:00'
parsing 2011-11-04T00:05:23
failed to create DataRowMetadataField for 2011-11-04T00:05:23
Exception message: Invalid isoformat string: '2011-11-04T00:05:2+00:00'

@vbrodsky
Copy link
Contributor Author

vbrodsky commented Jun 5, 2023

Testing with backport. Here is the script

from datetime import datetime
from labelbox.schema.data_row_metadata import DataRowMetadataField, _validate_parse_datetime

datestrings = ['2011-11-04T00:05:23Z',
    '2011-11-04T00:05:23+00:00',
    '2011-11-04T00:05:23+05:00',
    '2011-11-04T00:05:23'
]
for datestring in datestrings:
    print(f"parsing {datestring}")

    try:
        field = DataRowMetadataField(schema_id="clii8u95s00jc072s11xvflrd",
                                        value=datetime.fromisoformat(
                                            datestring))

    except ValueError as e:
        print(f"failed to create DataRowMetadataField for {datestring}")
        print(f"Exception message: {e}")
        next
    _validate_parse_datetime(field)
    print(f"Sucessffully created and validated DataRowMetadataField for {datestring}")

src/labelbox-python - (v.3.46.0^2~31) > python ./test_date_with_backports.py
parsing 2011-11-04T00:05:23Z
Sucessffully created and validated DataRowMetadataField for 2011-11-04T00:05:23Z
parsing 2011-11-04T00:05:23+00:00
Sucessffully created and validated DataRowMetadataField for 2011-11-04T00:05:23+00:00
parsing 2011-11-04T00:05:23+05:00
Sucessffully created and validated DataRowMetadataField for 2011-11-04T00:05:23+05:00
parsing 2011-11-04T00:05:23
Sucessffully created and validated DataRowMetadataField for 2011-11-04T00:05:23

@vbrodsky
Copy link
Contributor Author

vbrodsky commented Jun 5, 2023

Post backport:

from labelbox.schema.data_row_metadata import DataRowMetadataField, _validate_parse_datetime
from labelbox.utils import format_iso_from_string

datestrings = ['2011-11-04T00:05:23Z',
    '2011-11-04T00:05:23+00:00',
    '2011-11-04T00:05:23+05:00',
    '2011-11-04T00:05:23'
]
for datestring in datestrings:
    print(f"parsing {datestring}")

    try:
        field = DataRowMetadataField(schema_id="clii8u95s00jc072s11xvflrd",
                                        value=format_iso_from_string(
                                            datestring))

    except ValueError as e:
        print(f"failed to create DataRowMetadataField for {datestring}")
        print(f"Exception message: {e}")
        next
    _validate_parse_datetime(field)
    print(f"Sucessffully created and validated DataRowMetadataField for {datestring}")

parsing 2011-11-04T00:05:23Z
Sucessffully created and validated DataRowMetadataField for 2011-11-04T00:05:23Z
parsing 2011-11-04T00:05:23+00:00
Sucessffully created and validated DataRowMetadataField for 2011-11-04T00:05:23+00:00
parsing 2011-11-04T00:05:23+05:00
Sucessffully created and validated DataRowMetadataField for 2011-11-04T00:05:23+05:00
parsing 2011-11-04T00:05:23
Sucessffully created and validated DataRowMetadataField for 2011-11-04T00:05:23

@@ -838,7 +838,7 @@ def _validate_parse_number(
def _validate_parse_datetime(
field: DataRowMetadataField) -> List[Dict[str, Union[SchemaId, str]]]:
if isinstance(field.value, str):
field.value = datetime.fromisoformat(field.value)
field.value = format_iso_from_string(field.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From line 849, we take the field.value and convert it into string from the datetime using format_iso_datetime. Since format_iso_datetime does not take into consideration the timezone, does that mean that the timezone information will get lost here?

For instance, if user inputs 2011-11-04T00:05:23+05:00 as the date metadata, it looks like we'll be upserting 2011-11-04T00:05:23 and the timezone value gets lost? Let me know if I'm not understanding this correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time zone does get lost! This is existing before... that was my question - should I fix it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know the original issue was to make sure the time string gets properly parsed if timezone passed in.. we should make this also properly parse the timezone, because otherwise, non-UTC timezones will not store accurate datetimes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok... to be clear format_iso_from_string is not a problem, as far as I remember from my testing format_iso_datetime is the problem, it just formats a string with a Z time zone, no conversion. Those two functions are used together in the code (like to validate string is a valid datetime AND convert it back to string)

@vbrodsky vbrodsky merged commit 5b61ad9 into develop Jun 6, 2023
@vbrodsky vbrodsky deleted the VB/remove-datetime-backport branch June 6, 2023 18:24
@yamini-labelbox
Copy link
Contributor

Hi @vbrodsky , dateutil looks like a promising choice for sure! But looks like https://docs.python.org/3/library/datetime.html is being used for date conversions consistently throughout the repository. It probably fits your usecase too. Suggesting just to reduce one extra dependency. Sorry.

@yamini-labelbox are you saying that the dateutil library under the hood uses python datetime?

@vbrodsky No. I meant because in the python SDK repo, datetime module that i linked above is used in various places to handle date formatting( example: https://github.com/Labelbox/labelbox-python/blob/develop/labelbox/orm/db_object.py#LL1), why don't we use the same module to handle date formatting in the context of this PR.? I just wanted hear what you think about that idea.

hi @yamini-labelbox (dunno why I can't just use a github Reply... in our thread) but here it is. The reason I am not using Python datetime library is exactly because the fromisoformat method of that library did not cover all cases we want to cover and returned an error for some for all versions of python prior to 3.11 (and we need to support EXACTLY 3.7-9 versions :(

I am going to update the PR description to outline not just WHAT I did, but WHY I did it

Thank you, Val!!!! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants