-
Notifications
You must be signed in to change notification settings - Fork 449
Description
Hi!
I played around a bit with libpostal and pypostal, and I am quite impressed. Kudos!
My country is
Germany
Here's how I'm using libpostal
Thinking about including it in the Odoo instance of where I work, but I am just in the (private) explorative stage yet. The use-case would be to deduplicate purchased leads.
Here's what I did
I tried using expand_address (from pypostal, but this is not a pypostal issue).
This is my unittest:
def test_normalize_address__abbreviations(self):
address = Address(
id=0,
name=Keeper(value="H. P. Lovecraft", keep=True),
email=Keeper(value="[email protected]", keep=True),
company=Keeper(value="Arkham House", keep=True),
street="Town Sqr.",
building_number="5",
postcode="01938",
city="Innsmouth",
state="MA",
country="U.S.A.",
)
normalized_address = normalize_address(address, languages=["en"])
self.assertEqual("town square", normalized_address.normalized_street)
self.assertEqual("massachusetts", normalized_address.normalized_state)
self.assertEqual("united states of america", normalized_address.normalized_country)
This is the normalize_address
function:
def normalize_address(address: Address, languages: list[str]) -> NormalizedAddress:
"""Normalizes the fields of an address that make sense to be normalized,
adds fields to the dict of the address with normalized values"""
normalized_address = NormalizedAddress(
id=address.id,
name=address.name,
email=address.email,
company=address.company,
street=address.street,
building_number=address.building_number,
postcode=address.postcode,
city=address.city,
state=address.state,
country=address.country,
)
normalized_address.normalized_name = normalize_string(address.name.value)
normalized_address.normalized_email = normalize_email_address(address.email.value)
normalized_address.normalized_company = normalize_string(address.company.value)
normalized_address.normalized_street = normalize_address_string(
address.street,
languages=languages
)
normalized_address.normalized_building_number = normalize_string(address.building_number)
normalized_address.normalized_postcode = normalize_string(address.postcode)
normalized_address.normalized_city = normalize_string(address.city)
normalized_address.normalized_state = normalize_state_string(address.state, languages=languages)
normalized_address.normalized_country = normalize_country_string(
address.country,
languages=languages
)
return normalized_address
And this is the normalize_country_string
function:
def normalize_country_string(state: str, languages: list[str]) -> str:
"""Normalize state String, like e.g. "MA" for Massachusetts, by expanding it with pypostal"""
parsed_country = postal.parser.parse_address(
state,
language=languages[0],
)
expanded_country = postal.expand.expand_address(parsed_country[0][0], languages=languages)
return expanded_country[0]
Here's what I got
Worked well with addresses, like sqr
to Square
But what I got back from U.S.A.
I got back usa
Here's what I was expecting
I was not able to expand the abbreviation of U.S.A.
to e.g.: United States of America
ar another representation. Maybe the library is not intended to do so, which would be completely fine with me. I was just wondering, if I made an Error or if that is intentional?
Here's what I think could be improved
More documentation, and maybe reStructuredText docstrings, instead of something doxygen-like in the python parts, because they can be better parsed by Python tools (like e.g. PyCharm)