Try to import to the database by manipulating ShapeFile of national land numerical information with Python

ShapeFile is a data format for storing information about the position and shape of spatial data and its attribute information. The file with the .shp extension that comes with the download from the national land numerical information etc. is applicable.

This time, the goal is to read this ShapeFile with Python and store it in SpatiaLite.

ShapeFile details

ShapeFile consists of three files. There are three files: main file, index file, and attribute file. These file names will be the same except for the extension.

■ Main file: counties.shp ■ Index file: counties.shx ■ Attribute file: counties.dbf

The main file contains spatial data. The index file is an index that facilitates access to each spatial data. The attribute file stores the attribute values.

Please refer to the following for the detailed specifications of these.

** Shapefile technical information ** http://www.esrij.com/cgi-bin/wp/wp-content/uploads/documents/shapefile_j.pdf

Manipulate ShapeFile in Python

To operate ShapeFile in Python, it is recommended to use the following library.

https://github.com/GeospatialPython/pyshp

** Installation method ** Place shapefile.py in any folder and import it.

This library can be used with python2.4-3.x series.

Operation example of national land numerical information

In this example, let's operate N02-05-g_RailroadSection.shp of the railway line information of the national land numerical information.

** National land numerical information Railway data ** http://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-N02-v2_2.html

# -*- coding: utf-8 -*-
import os
import sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)) + '/pyshp')
import shapefile

sf = shapefile.Reader('original_data\\N02-05_GML\\N02-05\\N02-05-g_RailroadSection.shp')
shapeRecs = sf.iterShapeRecords()
for sr in shapeRecs:
  #Contains attribute values
  print ('attribute:' , sr.record)

  #Type of type
  #NULL = 0
  #POINT = 1
  #POLYLINE = 3
  #POLYGON = 5
  #MULTIPOINT = 8
  #POINTZ = 11
  #POLYLINEZ = 13
  #POLYGONZ = 15
  #MULTIPOINTZ = 18
  #POINTM = 21
  #POLYLINEM = 23
  #POLYGONM = 25
  #MULTIPOINTM = 28
  #MULTIPATCH = 31
  print ('shapeType:' ,sr.shape.shapeType)

  #List of coordinate points
  print ('points:', sr.shape.points)

  #Where to split points for MultiLing and MultiPolygon
  print ('parts:' ,sr.shape.parts)

iterShapeRecords () parses shp files from the beginning. At this time, only one data is expanded in the memory, so it is suitable for processing large data.

However, iterShapeRecord assumes that the next record is in the next byte of the content length recorded in the record header. In many cases, this assumption can be used for analysis, but in some cases this assumption is incorrect. For example, running the following A31-12_17_GML.shp will result in an error.

** National land numerical information Ishikawa prefecture of estimated inundation area ** http://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-A31.html

This cannot be parsed by the shp file alone because there is garbage between the records, and you need to use the index file.

In this case, it can be implemented without using iterShapeRecords as shown below.

# -*- coding: utf-8 -*-
import os
import sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)) + '/pyshp')
import shapefile

sf = shapefile.Reader('original_data\\A31-12\\output\\A31-12_17_GML\\old\\A31-12_17.shp')
try:
    #The shp file of the national land numerical information is invalid, and the content length in the shp file does not match the actual length.
    #There is no choice but to get each record via shx file
    i = 0
    while True:
        shape = sf.shape(i)
        rec = sf.record(i)
        #Contains attribute values
        print ('attribute:' , rec)

        #Type of type
        #NULL = 0
        #POINT = 1
        #POLYLINE = 3
        #POLYGON = 5
        #MULTIPOINT = 8
        #POINTZ = 11
        #POLYLINEZ = 13
        #POLYGONZ = 15
        #MULTIPOINTZ = 18
        #POINTM = 21
        #POLYLINEM = 23
        #POLYGONM = 25
        #MULTIPOINTM = 28
        #MULTIPATCH = 31
        print ('shapeType:' ,shape.shapeType)

        #List of coordinate points
        print ('points:', shape.points)

        #Where to split points for MultiLing and MultiPolygon
        print ('parts:' , shape.parts)

        i += 1
except IndexError:
    pass

By using this, you can analyze the shape file with Python and import it into spatialite.

In the following program, sediment disaster risk location data, inundation area data, gust data such as tornadoes, etc. of national land numerical information are stored in spatialite from the Shape file. https://github.com/mima3/kokudo/blob/master/kokudo_db.py

Demo http://needtec.sakura.ne.jp/kokudo/

Please refer to the following article for how to use SPATIALITE. http://qiita.com/mima_ita/items/64f6c2b8bb47c4b5b391