[PYTHON] Aim for GIS Master # 3 I investigated a mesh-like Geocoding method

I will try to become a GIS master after a long time. I'm not my main job, so I feel like I'll screw it in if I get the chance.

I thought Geohash was the one and only strongest

Since I learned about the existence of Geohash, I have been using it by taking advantage of its useful features.

■ Convenient reason

--The range of latitude and longitude can be represented by one character string. --If you geohash-encode the latitude and longitude of your current location, you are in that mesh. → You do not have to search the range in DB --The number of characters in Geohash represents the level, and the smaller the number of characters, the larger the mesh. --Child level contains parent Geohash string → In other words, when searching with a Geohash level 4 string The first 4 characters of the Geohash string to be searched should match. --Library for Python, Ruby, etc. is provided --There is a function in that library that can take adjacent meshes ――It feels good to enter a value using a Geohash string as a key in a DB such as a KeyValueStore.

Create a Geohash character string group that covers Japan, and collect data for each mesh. I've done anything. If the amount of data is large, you have to fetch the data in some range, so It was too convenient to use Geohash as the standard.

It has also been proposed to take advantage of these characteristics and use them for anonymization. With k-anonymization + Geohash, it is said to use a Geohash mesh at a level that satisfies k under certain conditions. Something like. → I don't dare to put a link.

■ Of course there are weaknesses

--The shape of the mesh is rectangular, and depending on the level, it changes from portrait to landscape to portrait. → Because it is Base32 encoded with 5bit Every time the level goes up (vertical 2bit, horizontal 3bit) → (vertical 3bit, horizontal 2bit) → (vertical 2bit, horizontal 3bit) This is because the number of divisions is different in the vertical and horizontal directions. --It is a little rough that the child mesh is divided into 32 (depending on the application, of course)

■ Actually, there are many similar things

Recently, there has been talk about whether to use something like a mesh. Then I was thinking about pushing Geohash http://qiita.com/kochizufan/items/2fe5f4c9f74636d22ddb Quadkey is better than Geohash! I found an article to the effect.

So, in this article, I will talk about Quadkey and the Open Location Code described later. While actually using the library Describe from the viewpoint of "whether it can be used like a mesh".

Quadkey, is there such a thing?

Goro Inogashira style It seems to be used for the tile index of Bing Maps. This is the official appearance of MS. https://msdn.microsoft.com/en-us/library/bb259689.aspx

Although it is a tile index, if you take the positions of the four corners of the tile, it becomes a mesh. Like Geohash, you can generate one key per mesh, One key character is one level. I see. It seems that it can be used in almost the same way. As you can see on the above site, the key is generated from the tile index. In other words, the key is generated by converting the position information to the tile index once. Similarly, the range can be taken by converting the key to the tile index. Therefore, even users who say, "Tile? I don't know, what you are looking for is a mesh" are once made aware of the tile index. It's a little annoying.

But Quadkey is square and divided into four

This is fantastic. Since it is divided into four, it is expressed in quaternary numbers. The key is also made up of 0-3. Max level is 20. I think it's not a bad particle size. As a Geohash believer, I thought, "Oh, I lost." In terms of versatility, I got the impression that it was completely Quadkey.

■ Actually used in Bing Maps

The URL of this image is 13300211231022.jfif @2017 Microsoft Below. https://t0.ssl.ak.dynamic.tiles.virtualearth.net/comp/ch/13300211231022?mkt=ja-JP&it=G,BX,L,LA&shading=hill&og=94&n=z&c4w=1 13300211231022 This is the Quad key for this tile. It is also firmly embedded in the URL.

The Imperial Palace has no deep meaning. If so, congratulations on your engagement with Mako. That's it.

■ Try to get the latitude / longitude range from Quadkey

Python has a package called quadkey, but it doesn't work in my environment, probably because of Python3. I used this because it works with the mercantile package.

I will write it briefly.

#Import mercantile
import mercantile

#Get the position in 4 directions from the quad key
def make_bounds_by_quadkey(t_quadkey):
    #Get tile index from quadkey
    t_tile = mercantile.quadkey_to_tile(t_quadkey)
    #Get endpoints in 4 directions from tile index
    t_box = mercantile.bounds(t_tile.x, t_tile.y,t_tile.z)
    
    return t_box

if __name__ == "__main__":
    out_str=[]
    #Header line
    out_str.append("south,east,north,west,Quadkey,zoom")
    
    #Quadkey parent array to retrieve range
    #Child is zoom_append_Automatically generate and add with the length of range
    t_quadkeys=[
        "13300211231022"
    ]
    
    #Calculate latitude / longitude range from Quadkey
    for t_quadkey in t_quadkeys:
        t_box = make_bounds_by_quadkey(t_quadkey)
        print([t_quadkey,t_box])
        #You can take it like this
        #['13300211231022',
        # LngLatBbox(west=139.74609375, south=35.67514743608467,
        #            east=139.76806640625, north=35.6929946320988)]
        
        #Because it is a principle of writing in an easy-to-understand manner
        bounds_str = str(t_box.south) + "," + str(t_box.east) + "," + str(t_box.north) + ","+ str(t_box.west)

        #Zoom level=Number of characters in Quadkey
        t_zoom=len(t_quadkey)
    
        out_str.append(bounds_str+","+t_quadkey+","+str(t_zoom))
    
    #output
    f = open("quadkey_bounds.csv", 'w')
    f.write("\n".join(out_str) + "\n")
    f.close()

Let's visualize the CSV that came out. You can visualize it by making 4 points from south, east, north, and west. → The visualization method is omitted. I made my own OSM + Openlayers tool.

The target is the code 13300211231022 mentioned above.

BingMap Convert from Quadkey and visualize
13300211231022.jfif@2017 Microsoft qiita.png @OpenStreetMap

The left is for one tile, and the right is the range converted from Quad key by the green frame. There is no doubt that the ranges are the same when compared. BingMaps, Quadkey level 14. I'm using it ~~.

■ See also the level of the child

This is what happens when you set the level to 15. Rewrite the t_quadkeys = ~ part of the above source.

    t_quadkeys=[
        "133002112310220",
        "133002112310221",
        "133002112310222",
        "133002112310223"
    ]

Just add 0 to 3 at the end. The figure below is visualized. qiita2.png

In this way, it is divided into four. When raising the level with Geohash, it is difficult to raise it manually, If you divide the Quadkey into 4 parts, you can afford it manually. There are cases like "I want another level of mesh and this part of the mesh!" Because it is a possible request for data analysis I think that being able to do it intuitively is not a small merit.

Quadkey, this is good. I was happy, but Google is also developing a GeoCoding method, It turns out that it cannot be ignored.

Google's Open Location Code, cheeks, so come

Goro Inogashira style

Open Location Code (OLC), a code that pinpoints every point on the globe plus+codes https://plus.codes/ → Let's turn on the grid from the menu And Google Map. https://japan.cnet.com/article/35068908/

■ What is it?

The position can be coded by specifying it at the door position level. The code hierarchy is determined below.

hierarchy range
region code Identify a range of about 100km square
city code Identify a 5km square area
neighbourhood code Identify a 250m square area
building cod Identify a 14m square area
door Smaller range

5 stages. Coarse than the 20 stages of Quadkey, If you deepen one layer, it will be divided into 400, so it will be less versatile. But

It may be better to have decided so quickly if you are worried about what level to cut.

How to use this depends on the nature of the service. When Google says "city code in this range", it may be "OK".

■ Try using it anyway.

The library of OLC is open to the public on Github. https://github.com/google/open-location-code

Let's check the operation easily using the python library. Get openlocationcode.py from the above github and Create and execute the following script. → A mesh of one city near the Imperial Palace and its child neighborhood This is the process to acquire.

import openlocationcode
if __name__ == "__main__":
    #Location near the Imperial Palace
    latitude=35.68407104
    longitude=139.7570801

    PAIR_CODE_LENGTH_=10
    #Used when generating child code
    CODE_ALPHABET_ = '23456789CFGHJMPQRVWX'

    out_str=[]
    out_str.append("south,east,north,west,Quadkey,zoom")
    
    #Less than 10 will result in an error
    length_list = [10]
    code_list = []
    for ll in length_list:
        tcode = openlocationcode.encode(latitude, longitude, ll)
        print(tcode)
        code_list.append(tcode)
    
    #decode target code list
    neighbourhood_list= []
    use_child = True
    for cl in code_list:
        #Cut up to the building
        split_tcode = cl.split("+")
        
        #If you cut with city, set the last two characters to 00
        city_code = split_tcode[0][0:6]
        neighbourhood_list.append(city_code +"00+")

        #CODE to get all neighborhood_ALPHABET_ * CODE_ALPHABET_To generate and add
        if use_child:
            for ca1 in range(len(CODE_ALPHABET_)):
                for ca2 in range(len(CODE_ALPHABET_)):
                    neighbourhoodcode=city_code +CODE_ALPHABET_[ca1]+CODE_ALPHABET_[ca2]
                    neighbourhood_list.append(neighbourhoodcode+"+")
    
    #decode
    for nc in neighbourhood_list:
        decode = openlocationcode.decode(nc)
        print(decode)
    
        #Takes the necessary data from the CodeArea class and outputs it
        bounds_str = str(decode.latitudeLo) + "," + str(decode.longitudeLo) + "," + str(decode.latitudeHi) + ","+ str(decode.longitudeHi)
        t_zoom=decode.codeLength
    
        out_str.append(bounds_str+","+nc+","+str(t_zoom))
    
    ##output
    f = open("quadkey_bounds_olc.csv", 'w')
    f.write("\n".join(out_str) + "\n")
    f.close()
city neighbourhood
qiita3.png qiita4.png

Is it difficult to use with this size? For example, when searching data in the range of city, it seems that a considerable number of hits will be made, The neighborhood level is too small, so you need to search in multiple ranges.

Summary (Be careful because it is super subjective)

Everyone is different and everyone is good. I would like to say that Quadkey's versatility seems to be good due to its nature. Even if it is applied to k-anonymization as a privacy measure, with Geohash it is a mesh of 32 times each Compared to evaluation, it may be possible to express with a smaller mesh by evaluating with 4 times each mesh. I think that the users of the data are also smiling. At the time of search, character string matching may be used, but numerical matching may be used. Performance at the time of search may come out ... Conversely, coding involves Mercator-like calculations, so It seems to be slower than Geohash, which is simply derived from latitude and longitude. After all, there is a part to convert to tile index once, you can wrap it, but it feels useless ...

Geohash has become so popular that it has more information and libraries than others. That's about it.

OLC is difficult to use, but if you want to finely code the position at the door level. I wonder if I can use it as a mesh. But compared to the top two, there are worries like "Which level should I use for this service?" Since it is almost nonexistent, it is a good choice depending on the application.

There is also a hexagonal mesh called Geohex, which I haven't been able to introduce. http://geogames.net/geohex/v3 It looks beautiful. But I can't have a complete parent-child relationship, so Depending on the level, it will be a different mesh. In addition, even if it is visualized or data is acquired with a kettle A square is convenient because it is flexible. You can visualize Polygon with WKT format, or search for GIS. It costs money If you want to use Polygon anyway, you can use Polygon in prefectures or Polygon in blocks. I want to use detailed ones. It's a very promising option if you use it for positioning games.

so

I understand the greatness of Quadkey. I will use it from now on. But I'm also interested in OLC, and there is no explanation of conversion logic, so let's look at the source. I think.

Recommended Posts

Aim for GIS Master # 3 I investigated a mesh-like Geocoding method
Aim for GIS Master # 2 R-tree for fast search of nearby roads
I touched PyAutoIt for a moment
I made a dash docset for Holoviews
I made a library for actuarial science