Basic map information using Python Geotiff conversion of numerical elevation data

Introduction

The data in the title is distributed on this site. Very detailed and accurate elevation data (DEM) in Japan is available. What is downloaded is a zipped file for each secondary division area mesh defined by the Statistics Bureau of the Ministry of Internal Affairs and Communications. It contains elevation data in xml format. The regional mesh will be described later.

The code is at the end of this article. An example is as follows.

** (The file name is also used as meta information, so please apply it to the downloaded one) **

command


python zdem2tif.py FG-GML-1234-56-DEM10B.zip FG-GML-6543-21-DEM5A.zip ...

One geotiff image is output for each zip file (= secondary partition area mesh). Supports both 5m and 10m resolutions. We have confirmed the operation with python3 series. The gdal package must be installed.

This time, I made a lot of reference to the code placed in this site. However, if it is left as it is, a bug will occur in the case of data loss, so this code has been fixed.

There seem to be various other tools, but I came up with this code because it can be executed on commands, it can be shared with foreigners because it is written in English, and there is no need to extract the zip. ..

Also, as I noticed after posting, qiita already has Article on basic map information and python. The explanation of the file format is more detailed here.

About regional mesh

The detailed definition is explained in this document. Regional meshes are defined in the 1st to 3rd order partitions according to the spatial resolution. The primary mesh is assigned a partition code of 4 digits, the secondary mesh is assigned an additional 2 digits, and the tertiary mesh is assigned an additional 2 digits. The latitude and longitude of the four corners of the compartment can be known using the number of the compartment code.

Source code description

In the future, there may be changes in the specifications of the distributed data, and there is a possibility that it will not be supported in areas that the author has not confirmed, so I will briefly explain the code.

The whole flow (convert) is

--Get area mesh information and DEM resolution from zip file name --Define an array of output files (so far DEMOutArray, mesh_info_from_zipname) --Loop about xml file in zip file (for filename in filelist) --Get the required information from the xml file (XMLContents, read_xml) --Generate DEM array for each xml file (DEMInArray) --Overwrite output array (ʻupdate_raster) --Save in Tiff format (save_raster`)

Get area mesh information and DEM resolution from zip file name

Calculate the latitude and longitude of the four corners by reading the 4-digit primary partition code and the 2-digit secondary partition code. Furthermore, the resolution is interpreted by whether the file name is written as DEM5X or DEM10X. For each resolution, the array size of one xml file is fixed (for example, in the case of 10mDEM, it is assumed that there is only one xml in the zip).

Define an array of output files

Create in advance the size of one secondary compartment. In the case of 5mDEM, it is assumed that a maximum of 10 x 10 xml (tertiary partition) is contained in one zip (secondary partition). However, it seems that there are many areas that have not been observed, and there are cases where only about 10 xml exist. An outlier of -9999. Was used for the location corresponding to the area where xml does not exist.

Loop about xml file in zip file

I first learned about the module that is included by default called zipfile. By using this, you can read the files in the zip without decompressing each one.

Get the required information from the xml file

For 5mDEM, read the last two digits of the tertiary partition code in the file name to determine the insertion position in the output array. In the case of 10mDEM, as mentioned above, it is assumed that there is no description of the tertiary partition code in the first place and there is only one xml. Get the necessary information in text format for the time being. The rules of tags in xml etc. are assumed as in the code. If you read xml with different rules, it should appear as a bug immediately.

Generate DEM array for each xml file

Generate an array using the information described in the xml file. File Specifications has a definition related to DEM data. According to it, the land division for each pixel is either {ground surface, surface surface, sea level, inland water surface, no data, etc.}, and the number of -9999 is entered for places where the value is undefined. ..

In the code originally referred to, DEM information is acquired depending on whether it is "ground surface" or not, but in this code, it is divided according to whether the DEM value in xml is -9999 or not (indicated by "Other"). Because there was a DEM value in the area. The reason for saying "other" is unknown). For DEM undefined locations, we applied outliers -1 and separated the use of outliers from cases where the xml file mentioned above did not exist in the first place.

Overwrite output array

Regarding the orientation of the array, it is assumed that the latitude direction is defined as northward. There is a data called gml: sequenceRule in the data obtained from xml, and the fact that it is written as'+ x-y' supports it. If this is '+ x + y' etc., it's probably facing south, but I haven't divided the conditions around that.

Save in Tiff format

There is nothing special to say.

Drawing of output tiff

Converted files downloaded with 5mDEM on the top and 10mDEM on the bottom for the same secondary plot area (almost unedited axis values, etc.). Negative values are masked. It can be seen that the abundance rate of data at each resolution differs depending on the region. I'd be happy if I knew it before I downloaded it, but isn't there any information missing? .. ..

sample05.png

sample10.png

Whole source code

Finally, the whole code.

zdem2tif.py


import os
import sys
import zipfile
import numpy as np
import xml.etree.ElementTree as et
import osgeo.gdal
import osgeo.osr


class XMLContents(object):
    def __init__(self):
        self.ns = {
                   'ns': 'http://fgd.gsi.go.jp/spec/2008/FGD_GMLSchema',
                   'gml': 'http://www.opengis.net/gml/3.2',
                   'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
                   'xlink': 'http://www.w3.org/1999/xlink'
                   }

    def read_xml(self, zf, filename, dem_mode):
        if dem_mode == 'DEM10':
            self.posix = 0
            self.posiy = 0
        elif dem_mode == 'DEM5':
            _split = os.path.basename(filename).split('-')
            self.posix = int(_split[4][1])
            self.posiy = int(_split[4][0])

        with zf.open(filename, 'r') as file:
            self.text = file.read().decode('utf_8')
            self.root = et.fromstring(self.text)
            self.dem = self.root.find('ns:DEM', self.ns)
            self.coverage = self.dem.find('ns:coverage', self.ns)
            self.envelope = self.coverage.find(
                'gml:boundedBy//gml:Envelope', self.ns)
            self.lower = self.envelope.find('gml:lowerCorner', self.ns).text
            self.upper = self.envelope.find('gml:upperCorner', self.ns).text
            self.grid = self.coverage.find(
                'gml:gridDomain//gml:Grid//gml:limits//gml:GridEnvelope', self.ns)
            self.low = self.grid.find('gml:low', self.ns).text
            self.high = self.grid.find('gml:high', self.ns).text
            self.tuplelist = self.coverage.find(
                'gml:rangeSet//gml:DataBlock//gml:tupleList', self.ns).text
            self.gridfunc = self.coverage.find(
                'gml:coverageFunction//gml:GridFunction', self.ns)
            self.rule = self.gridfunc.find('gml:sequenceRule', self.ns)
            self.start = self.gridfunc.find('gml:startPoint', self.ns).text

            if self.rule.attrib['order'] != '+x-y':
                print('warning sequence order not +x-y')
            if self.rule.text != 'Linear':
                print('warning sequence rule not Linear')

            file.close()


class DEMInArray(object):
    def __init__(self, contents):
        self._noValue = -1.
        self.llat, self.llon = np.array(
            contents.lower.split(' '), dtype=np.float64)
        self.ulat, self.ulon = np.array(
            contents.upper.split(' '), dtype=np.float64)
        self.lowx, self.lowy = np.array(
            contents.low.split(' '), dtype=np.int)
        self.highx, self.highy = np.array(
            contents.high.split(' '), dtype=np.int)
        self.sizex, self.sizey = self.get_size()
        self.x_init, self.y_init = np.array(
            contents.start.split(' '), dtype=np.int)
        self.datapoints = contents.tuplelist.splitlines()
        self.raster = self.get_raster()

    def get_size(self):
        sizex = self.highx - self.lowx + 1
        sizey = self.highy - self.lowy + 1
        return sizex, sizey

    def get_raster(self):
        _x, _y = self.x_init, self.y_init
        raster = np.zeros([self.sizey, self.sizex])
        raster.fill(self._noValue)
        for dp in self.datapoints:
            if _y > self.highy:
                print('DEM datapoints are unexpectedly too long')
                break
            s = dp.split(',')
            if len(s) == 1:
                continue
            desc, value = s[0], np.float64(s[1])
            # if desc == 'Ground surface':
            raster[_y, _x] = value if value > -9998 else self._noValue
            _x += 1
            if _x > self.highx:
                _x = 0
                _y += 1
        return raster


class DEMOutArray(object):
    def __init__(self):
        self._fillValue = -9999.

    def initialize_raster(self):
        raster = np.zeros([self.sizey, self.sizex])
        raster.fill(self._fillValue)
        return raster

    def get_trans(self):
        # +x-y
        return [self.llon, self.resx, 0, self.ulat, 0, -self.resy]

    def update_raster(self, tile, posiy, posix):
        xmin = posix * tile.shape[1]
        ymin = posiy * tile.shape[0]
        xmax = xmin + tile.shape[1]
        ymax = ymin + tile.shape[0]
        self.raster[ymin:ymax, xmin:xmax] = tile[::-1]

    def get_size(self):
        sizex = (self.highx - self.lowx + 1) * self.tilex
        sizey = (self.highy - self.lowy + 1) * self.tiley
        return sizex, sizey

    def mesh_info_from_zipname(self, zipname):
        '''
        Ref: Table 4 of https://www.stat.go.jp/data/mesh/pdf/gaiyo1.pdf
        '''
        _split = os.path.basename(zipname).split('-')
        _lat1 = int(_split[2][:2])
        _lon1 = int(_split[2][2:])
        _lat2 = int(_split[3][0])
        _lon2 = int(_split[3][1])
        self.llat = _lat1 * 0.66666666666 + _lat2 * 0.08333333333
        self.llon = _lon1 + 100. + _lon2 * 0.125
        self.ulat = self.llat + 0.08333333333
        self.ulon = self.llon + 0.125

        if _split[4][:5] == 'DEM10':
            self.mode = 'DEM10'
            self.tilex, self.tiley = 1, 1
            self.lowx, self.lowy = 0, 0
            self.highx, self.highy = 1124, 749
            self.sizex, self.sizey = self.get_size()
            self.resx, self.resy = 1.1111111111e-04, 1.1111111111e-04
        elif _split[4][:4] == 'DEM5':
            self.mode = 'DEM5'
            self.tilex, self.tiley = 10, 10
            self.lowx, self.lowy = 0, 0
            self.highx, self.highy = 224, 149
            self.sizex, self.sizey = self.get_size()
            self.resx, self.resy = 5.5555555555e-05, 5.5555555555e-05
        else:
            raise ValueError('Unexpected definition of DEM:', _split[4])

        self.raster = self.initialize_raster()

    def save_raster(self, out_filename):
        trans = self.get_trans()
        srs = osgeo.osr.SpatialReference()
        srs.ImportFromEPSG(4612)
        driver = osgeo.gdal.GetDriverByName('GTiff')
        output = driver.Create(out_filename, self.sizex, self.sizey,
                               1, osgeo.gdal.GDT_Float32)
        output.GetRasterBand(1).WriteArray(self.raster)
        output.GetRasterBand(1).SetNoDataValue(self._fillValue)
        output.SetGeoTransform(trans)
        output.SetProjection(srs.ExportToWkt())
        output.FlushCache()


def convert(in_filename, out_filename):
    output_array = DEMOutArray()
    output_array.mesh_info_from_zipname(in_filename)
    dem_mode = output_array.mode

    with zipfile.ZipFile(in_filename, 'r') as zf:
        filelist = zf.namelist()
        for filename in filelist:
            print('loading', filename)
            contents = XMLContents()
            contents.read_xml(zf, filename, dem_mode)
            posix, posiy = contents.posix, contents.posiy
            dem_tile = DEMInArray(contents)
            output_array.update_raster(dem_tile.raster, posiy, posix)
        zf.close()
    print('save', out_filename)
    output_array.save_raster(out_filename)


def main(argv):
    for i in range(1, len(argv)):
        convert(argv[i], argv[i].replace('.zip', '.tif'))


if __name__ == '__main__':
    main(sys.argv)

Recommended Posts

Basic map information using Python Geotiff conversion of numerical elevation data
[Python] Conversion memo between time data and numerical data
Python introductory study-output of sales data using tuples-
Numerical summary of data
[Python] [Word] [python-docx] Simple analysis of diff data using python
Data analysis using Python 0
[Question] About API conversion of chat bot using Python
Data cleaning using Python
Basic knowledge of Python
Extract the band information of raster data with python
Let's utilize the railway data of national land numerical information
A memo of writing a basic function in Python using recursion
[Python] I tried collecting data using the API of wikipedia
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
[Python] Correct usage of map
[Python] Using OpenCV with Python (Basic)
python: Basics of using scikit-learn ①
Image data type conversion [Python]
Basic usage of Python f-string
Data analysis using python pandas
[Python] I wrote the route of the typhoon on the map using folium
Make multiple numerical elevation data into a single picture with Python
[Python] LINE notification of the latest information using Twitter automatic search
Collect product information and process data using Rakuten product search API [Python]
[Python] I tried to get various information using YouTube Data API!
Image capture of firefox using python
Data acquisition using python googlemap api
Basic map information-Use digital elevation model
Removal of haze using Python detailEnhanceFilter
Memory-saving matrix conversion of log data
Implementation of desktop notifications using Python
Basic grammar of Python3 system (dictionary)
Recommendation of data analysis using MessagePack
Python application: data visualization part 1: basic
Basic study of OpenCV with Python
Collect video information of "Singing with XX people" [Python] [Youtube Data API]
Try to image the elevation data of the Geographical Survey Institute with Python
[Python] Summary of conversion between character strings and numerical values (ascii code)