Parse Japanese addresses with regular expressions

background

The format of Japanese addresses posted on the net is usually fixed. It should be displayed as it flows from the zip code to the prefecture, city, ward, and town. And there are many specifications such as Google Geocoding by extracting the regular expression zip code or prefecture name.

Regular expressions

What you have to be careful about here is that you cannot use the greedy mode to extract the prefecture name or city name. Not (. \ * Prefecture) but (. \ *? Prefecture) Otherwise, for example, the analysis result of "Nagoya City, Aichi Prefecture" will be extracted to "Nagoya City, Aichi Prefecture" instead of "Aichi Prefecture".

Source code


String matchString = "〒066-0012\n" +
        "Bibi, Chitose City, Hokkaido New Chitose Airport Domestic Terminal Building 2F";

Matcher matcher = Pattern.compile("\\s*〒(\\d{3}-\\d{4})[\\s ]*(.*?Tokyo)?(.*?road)?(.*?Fu)?(.*?Prefecture)?(.*?city)?(.*?Ward)?").matcher(matchString);

while (matcher.find()){
    System.out.println("Street address:" + matcher.group(0));
    System.out.println();
    System.out.println("Postal code:" + matcher.group(1));
    System.out.println();
    System.out.println("City name:" + matcher.group(2));
    System.out.println();
    System.out.println("Road name:" + matcher.group(3));
    System.out.println();
    System.out.println("Prefecture name:" + matcher.group(4));
    System.out.println();
    System.out.println("Prefecture name:" + matcher.group(5));
    System.out.println();
    System.out.println("City name:" + matcher.group(6));
    System.out.println();
    System.out.println("Ward name" + matcher.group(7));
}
      

result

Street address:〒066-0012
Chitose City, Hokkaido

Postal code:066-0012

City name:null

Road name:Hokkaido

Prefecture name:null

Prefecture name:null

City name:Chitose

Ward name null

It's a simple content, but I posted it because I thought it would be used often.

Recommended Posts

Parse Japanese addresses with regular expressions
Regular expressions
Easily make troublesome regular expressions with Rubular
Regular expressions that match 99% of email addresses
Easy to trip with Java regular expressions
Switch beans with @ConditionalOnExpression and SpEL regular expressions
Match IP addresses using regular expressions in Java
Distinguish between integers and decimals with regular expressions
[Ruby] Exclude and replace specific patterns with regular expressions
Japanese installer with javapackager
Notes on regular expressions
colorize and regular expressions
[Java] Summary of regular expressions
About regular expressions in Ruby
Learn regular expressions little by little ①