Commit 9eb9c98
committed
feat(dedupe): Handle Geonames 'City of' prefixes
A common cause of deduplication errors is Geonames locality/localadmin
records that start with 'City of'.
Our name comparison logic is fairly conservative: it only looks at
things like punctuation, diacriticals, etc. Otherwise, we have to
consider names that are different meaning the underlying records
represent genuinely different places.
Getting too far away from this general stance could be dangerous, but we
can handle specific outliers just fine.
Geonames records that start with 'City of' are one of these cases.
Often, there is a Geonames `locality` record with just the name, (like
'New York'), and then a Geonames `localadmin` record with the 'City of'
prefix. Usually only one of those records will have a WOF concordance,
so this is still helpful even combined with
#16061 parent 6aa997d commit 9eb9c98
2 files changed
+44
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
103 | 130 | | |
104 | 131 | | |
105 | 132 | | |
106 | 133 | | |
107 | 134 | | |
108 | | - | |
109 | | - | |
| 135 | + | |
| 136 | + | |
110 | 137 | | |
111 | 138 | | |
112 | 139 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
539 | 539 | | |
540 | 540 | | |
541 | 541 | | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
542 | 557 | | |
543 | 558 | | |
544 | 559 | | |
| |||
0 commit comments