-
Notifications
You must be signed in to change notification settings - Fork 449
Description
I have installed latest version as of today
(ee7aa9a 5 days ago)
with senzing model and tested with some romanian company addresses.
And returns contain several issues:
Șos Mihai Bravu, 1, Bl:2, Sc:c, Et:12, Ap:129, -, București, Sect 2
road: șos mihai bravu
house_number: 1 bl 2 sc:c
level: et 12
unit: ap 129
road: bucurești sect 2 - it should be city and suburb
Șos Mihai Bravu, 1, Bl:2, Sc:c, Et:12, Ap:129, Compartiment 2, București, Sect 2
road: șos mihai bravu
house_number: 1 bl 2 sc:c
level: et 12
unit: ap 129
house: compartiment 2 bucurești - bucurești should be city
road: sect 2 - sect 2 should be suburb
P-ța Emanuil Gojdu, 37, Bl:a5, Parter, Oradea
house: p-ța - p-ța should be part of road (https://github.com/openvenues/libpostal/blob/master/resources/dictionaries/ro/street_types.txt)
road: emanuil gojdu
house_number: 37
road: bl:a - should be bl:a5
house_number: 5 - should be part of previous line
level: parter
city: oradea
B-dul Pipera, 1/i, Et:7, Constructia C2, Biroul Nr.10, Compartiment 59, Oraș Voluntari
road: b-dul pipera
house_number: 1/i
level: et 7
house: constructia c2 biroul nr.10 compartiment 59 oraș - oraș is just "city"
city: voluntari
same address:
Oraş Voluntari, B-dul PIPERA, Nr. 1/I, CONSTRUCTIA C2, BIROUL NR.10, COMPARTIMENT 59, Etaj 7, Județ Ilfov, Cod poștal 77190
house: oraş voluntari - Oraş Voluntari is City Voluntari
road: b-dul pipera
house_number: nr. 1/i
house: constructia c2 biroul nr.10 compartiment
house_number: 59
level: etaj 7
house: județ ilfov cod poștal - județ is suburb, "cod poștal" should not be here
postcode: 77190
Following two are pretty good:
Str. Eufrosina Popescu, 46, -, București, Sect 3
road: str. eufrosina popescu
house_number: 46
city: bucurești
suburb: sect 3
Str. Balta Albina, 4, Et:1, Inedit Building, București, Sect 3
road: str. balta albina
house_number: 4
level: et 1
house: inedit building
city: bucurești
suburb: sect 3
The most confusing part of it is several "house_number" or "road" or "house" items in parsed data which make it very difficult to differentiate later.
Romanian company DB can be found here (open data):
working companies:
https://data.gov.ro/dataset/firme-inregistrate-la-registrul-comertului-pana-la-data-de-18-decembrie-2024/resource/3043787a-832a-4ccc-9712-f10da0092e14?inner_span=True
There are tons of addresses to train the model.
Looking forward for support.
Thanks