Summary
In the 2026-03-18.0 release, several countries listed with 3 address_levels have a variable number of populated levels. When consumers assume the finest level (index [3]) always contains the city/municipality, they lose significant coverage — up to 62% for Latvia and 55% for Slovakia.
This is similar to #367 (US addresses with NULL address_levels[2]), but affects the 3-level countries more severely.
Affected Countries
3-level countries with variable depth
| Country |
Total addresses |
level3 populated |
level3 NULL (city must come from level2 or level1) |
% lost if only checking level3 |
| LV (Latvia) |
548,712 |
208,256 |
340,456 |
62.0% |
| SK (Slovakia) |
1,697,528 |
757,325 |
940,203 |
55.4% |
| EE (Estonia) |
2,228,661 |
2,076,759 |
151,902 |
6.8% |
| IT (Italy) |
25,914,431 |
25,912,438 |
1,993 |
<0.01% |
| TW (Taiwan) |
9,630,602 |
9,630,597 |
5 |
<0.01% |
Latvia — 3 distinct hierarchy patterns
Pattern 1 (111K): Major cities — only level1 populated
level1=Rīga, level2=NULL, level3=NULL
→ City IS level1 (Rīga)
Pattern 2 (229K): Novads + town — level1 and level2 populated
level1=Jēkabpils nov., level2=Jēkabpils, level3=NULL
→ City IS level2 (Jēkabpils)
Pattern 3 (208K): Novads + pagasts + village — all 3 levels
level1=Olaines nov., level2=Olaines pag., level3=Jāņupe
→ City IS level3 (Jāņupe)
Slovakia — 2 patterns
Pattern 1 (940K): District — level3 NULL
level1=Bratislavský, level2=Bratislava-Ružinov, level3=NULL
→ City IS level2 (Bratislava-Ružinov)
Pattern 2 (757K): Municipality — all 3 levels
level1=Prešovský, level2=Spišská Nová Ves, level3=Spišská Nová Ves
→ City IS level3 (Spišská Nová Ves)
Estonia — 2 patterns
Pattern 1 (152K): Linn (town) — level3 NULL
level1=Ida-Viru maakond, level2=Narva linn, level3=NULL
→ City IS level2 (Narva linn)
Pattern 2 (2.1M): Village/district — all 3 levels
level1=Harju maakond, level2=Tallinna linn, level3=Kesklinn
→ City IS level3 (Kesklinn)
Also: US (related to #367)
The 2-level US data still has 37.6M addresses (30%) with address_levels[2] = NULL. Of those, 85% (32.1M) have postal_city as a fallback, but 5.5M US addresses have no city information at all — no level2 AND no postal_city. Top states affected: TX (1.7M), MS (852K), CA (575K), FL (455K).
Query to Reproduce
-- Shows all country × depth combinations
SELECT country,
len(address_levels) as levels_count,
CASE
WHEN address_levels[3].value IS NOT NULL THEN 'level3'
WHEN address_levels[2].value IS NOT NULL THEN 'level2'
WHEN address_levels[1].value IS NOT NULL THEN 'level1'
ELSE 'none'
END AS finest_populated_level,
count(*) as cnt
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-03-18.0/theme=addresses/type=address/*',
hive_partitioning=0
)
GROUP BY country, levels_count, finest_populated_level
ORDER BY country, levels_count, finest_populated_level
Suggestion
It would help consumers if the documentation clarified that:
address_levels depth is variable within a country — the array length doesn't guarantee all values are populated
- The recommended city extraction pattern is a COALESCE cascade (finest → coarsest):
COALESCE(address_levels[3].value, address_levels[2].value, address_levels[1].value)
- For US addresses without level2,
postal_city is the intended fallback (and covers 85%)
This would prevent other consumers from hitting the same issue we did when building a geocoder on top of this data.
Environment
- Release:
2026-03-18.0
- Queried via DuckDB 1.5 + MotherDuck
- 39 countries, 469M addresses total
Summary
In the
2026-03-18.0release, several countries listed with 3address_levelshave a variable number of populated levels. When consumers assume the finest level (index[3]) always contains the city/municipality, they lose significant coverage — up to 62% for Latvia and 55% for Slovakia.This is similar to #367 (US addresses with NULL
address_levels[2]), but affects the 3-level countries more severely.Affected Countries
3-level countries with variable depth
Latvia — 3 distinct hierarchy patterns
Slovakia — 2 patterns
Estonia — 2 patterns
Also: US (related to #367)
The 2-level US data still has 37.6M addresses (30%) with
address_levels[2]= NULL. Of those, 85% (32.1M) havepostal_cityas a fallback, but 5.5M US addresses have no city information at all — no level2 AND no postal_city. Top states affected: TX (1.7M), MS (852K), CA (575K), FL (455K).Query to Reproduce
Suggestion
It would help consumers if the documentation clarified that:
address_levelsdepth is variable within a country — the array length doesn't guarantee all values are populatedpostal_cityis the intended fallback (and covers 85%)This would prevent other consumers from hitting the same issue we did when building a geocoder on top of this data.
Environment
2026-03-18.0