[PYTHON] That's why I'll be able to search for single-seat constituencies by location

One year after the ban on online elections was lifted, the online transmission of official political party officials turned into an extreme game that keeps cutting the SAN value of supporters like dried bonito flakes. How are you doing now? Now, let's consider how to get single-seat constituency information from your current location.

result

http://needtec.sakura.ne.jp/analyze_election/page/ElectionArea/shuin_47

On this page, candidates for single-seat constituencies are displayed by selecting a prefecture or getting your current location. Furthermore, by selecting a single-seat constituency, the approximate location of the constituency and a list of candidates will be displayed.

Source code

https://github.com/mima3/analyze_election

Dependent libraries lxml-3.4.0-py2.7-freebsd-9.1-RELEASE-p15-amd64.egg rdp-0.5-py2.7.egg numpy-1.9.1-py2.7-freebsd-9.1-RELEASE-p15-amd64.egg sympy-0.7.5-py2.7.egg Beaker-1.6.4-py2.7.egg

sympy only works with 0.7.5

Usage data

** National land numerical information Administrative area data ** http://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-N03.html

** Electoral districts of members elected to the House of Representatives single-member constituencies (by prefecture) ** http://www.soumu.go.jp/senkyo/senkyo_s/news/senkyo/shu_kuwari/

Data obtained by manually converting the above to CSV https://github.com/mima3/analyze_election/blob/master/election_area.csv

** Asahi Shimbun Digital> 2014 House of Representatives Election> Candidate ** http://www.asahi.com/senkyo/sousenkyo47/kouho/

Data converted from the above to CSV https://github.com/mima3/analyze_election/blob/master/script/candidate_shuin_47.csv

Data creation procedure

#Creating a database
python create_db.py election.sqlite

#National land numerical information Import of administrative area data It will be completed in a few hours!
python import_administrative_boundary.py election.sqlite area\N03-14_140401.xml

#Converting administrative area data to sympy Polygon It takes about 24 hours!
python convert_poly.py election.sqlite

#Register single-seat constituency information
python import_election_area.py election.sqlite election_area.csv

#Register candidate information for single-seat constituencies
python import_candidate.py election.sqlite shuin_47 script\candidate_shuin_47.csv
 

Commentary

Administrative area data

Administrative area data is provided in XML for national land numerical information. If you use this, you can display the administrative district division on Google Map. However, since this data is huge, there are some precautions to be taken when handling it.

Parse large size XML

When parsing large size XML, converting the XML file to characters and parsing it will dramatically increase the memory usage and will not be able to process it.

Therefore, use lxml.etree.iterparse to process sequentially. Let's see the actual processing.

election_db.py


    def ImportAdministrativeBoundary(self, xml):
        f = None
        contents = None
        namespaces = {
            'ksj': 'http://nlftp.mlit.go.jp/ksj/schemas/ksj-app',
            'gml': 'http://www.opengis.net/gml/3.2',
            'xlink': 'http://www.w3.org/1999/xlink',
            'xsi': 'http://www.w3.org/2001/XMLSchema-instance'
        }
        self._conn.execute('begin')

        print ('admins....')
        context = etree.iterparse(xml, events=('end',), tag='{http://nlftp.mlit.go.jp/ksj/schemas/ksj-app}AdministrativeBoundary')
        for event, admin in context:
            adminId = admin.get('{http://www.opengis.net/gml/3.2}id')
            print (adminId)
            bounds = admin.find('ksj:bounds', namespaces=namespaces).get('{http://www.w3.org/1999/xlink}href')[1:]
            prefectureName = admin.find('ksj:prefectureName', namespaces=namespaces).text
            subPrefectureName = admin.find('ksj:subPrefectureName', namespaces=namespaces).text
            countyName = admin.find('ksj:countyName', namespaces=namespaces).text
            cityName = admin.find('ksj:cityName', namespaces=namespaces).text
            areaCode = admin.find('ksj:administrativeAreaCode', namespaces=namespaces).text
            sql = '''INSERT INTO administrative_boundary
                     (gml_id, bounds, prefecture_name, sub_prefecture_name, county_name, city_name, area_code)
                     VALUES(?, ?, ?, ?, ?, ?, ?);'''
            self._conn.execute(sql, [adminId, bounds, prefectureName, subPrefectureName, countyName, cityName, areaCode ])

            admin.clear()
            # Also eliminate now-empty references from the root node to <Title> 
            while admin.getprevious() is not None:
                del admin.getparent()[0]
        del context

The above code is a part of the process that analyzes the national land numerical information and stores it in the DB. In this example, etree.iterparse is used to process each time the "Administrative Boundary" tag is detected.

For details, refer to the following page. ** Use lxml to get high performance XML parsing in Python ** http://www.ibm.com/developerworks/jp/xml/library/x-hiperfparse/

Thinning out coordinate information

The size of the national land numerical data is large because accurate coordinate information is stored. This time, we only need to know the rough coordinates, so we will thin out that information.

There is an algorithm called Ramer-Douglas-Peucker as an algorithm for thinning lines. See the page below for a description of this algorithm.

** [Mathematica] Thin out polygonal lines ** http://www.330k.info/essay/oresenwomabiku

The following are available as modules of Python's Ramer-Douglas-Peucker algorithm. https://pypi.python.org/pypi/rdp

Information registration for single-seat constituencies

As usual, the Ministry of Internal Affairs and Communications only publishes single-seat constituency information in PDF format. So you have to do your best to enter it manually or try another method.

Also, the information on single-seat constituencies is surprisingly fuzzy. For example, take a look at the first and second wards of Iwate prefecture. http://www.soumu.go.jp/main_content/000240041.pdf

Morioka City, Morioka City (main office jurisdiction, Morioka City Hall Aoyama branch jurisdiction, Morioka City Hall Yanagawa branch jurisdiction, Morioka City Hall Ota branch jurisdiction, Morioka City Hall branch jurisdiction, Morioka City Tonan General Branch jurisdiction)

Second ward Morioka city (area that does not belong to the first ward)

There is. As a matter of course, no information is provided here about where the Office jurisdiction represents. Therefore, this time, in the case of Morioka City, the first and second wards are listed as candidates for single-member constituencies.

The CSV created while spitting a curse against the Ministry of Internal Affairs and Communications, which has a strong will not to analyze data in this way, is as follows. https://github.com/mima3/analyze_election/blob/master/election_area.csv

Also, since I was so impressed around here, I am pulling information on candidates from the Asahi Shimbun. The script below will remove that kind of data, so I am using it by hand if necessary.

https://github.com/mima3/analyze_election/blob/master/script/analyze_asahi.py

Get a single-seat constituency from your current location.

Get your current location in your browser.

To use your current location in your browser, use navigator.geolocation. You can get the longitude and latitude with the following code.

if (!navigator.geolocation) {
  //Processing for environments where Geolocation API cannot be used
  alert('Geolocate API is not available');
}
navigator.geolocation.getCurrentPosition(function(position) {
  console.log(position)
}

In the case of IE, the current position is severely displaced. This is probably because the location usage database used by Microsoft is less accurate than the location database used by Chrome and Firefox. (I'm not bad! I'm not bad!) https://social.technet.microsoft.com/Forums/ie/en-US/aea4db4e-0720-44fe-a9b8-09917e345080/geolocation-wrong-in-ie9-but-not-other-browsers

Find out if a particular coordinate is included in an administrative area.

Finding out if a particular coordinate is contained in an administrative area has the same meaning as checking if a point is contained in a polygon. In the case of Python, it is good for ** basic ** if you use SymPy to judge the inside and outside of the point.

** [Python] Use SymPy to determine inside / outside points ** http://d.hatena.ne.jp/factal/20121013

Basically I said that because this process is so slow. It would take too much time to work on tens of thousands of administrative divisions one by one.

For this reason, we have taken two measures here.

First, narrow down the number of target polygons before making a point inside / outside judgment. This is only searched if the distance between the vertices of the polygon and the specified point is within a certain range.

Specifically, the code is as follows.

election_db.py


    def GetPos(self, lat, long):
        """
Get curve data associated with the current longitude and latitude
        """
        m =0.005
        while 1:
            rows = self._getCurveId(lat, long, m).fetchall()
            if rows:
              break
            m = m * 2
            if m > 0.1:
              return None

        dict = {}
        pt = Point(lat, long)

        for r in rows:
            key = r[0]
            ret = self._isCrossCurveId(pt, key)
            if ret:
                return ret
        return None

The second is to create a Polygon object in advance and register it in the database. Specifically, the implementation is as follows.

    def ConvertPoly(self):
        """
Creating a polygon from the curve table
        """
        #gc.enable()
        #gc.set_debug(gc.DEBUG_LEAK)
        sql = '''DELETE FROM polygon'''
        self._conn.execute(sql)

        sql = '''select curve_id,lat,lng from curve order by curve_id'''
        rows = self._conn.execute(sql)
        dict = {}
        for r in rows:
            key = r[0]
            if dict.get(key) is None:
                dict[key] = []
            dict[key].append((r[1], r[2]))
        i = 0
        self.Commit()

        self._conn.execute('begin')
        for key in dict:
            print (key + ":" + str(i))
            #b = len(gc.get_objects())
            self._createPoly(key, dict[key])
            i = i + 1
            #gc.collect()
            #print str(b) + ":" + str(len(gc.get_objects()))
            if i % 100 == 0:
                clear_cache()
                self.Commit()

    def _createPoly(self, key, list):
        poly = Polygon(*list)
        obj = pickle.dumps(poly)
        sql = '''INSERT INTO polygon
                       (curve_id, object)
                     VALUES(?, ?);'''
        self._conn.execute(sql, [key, obj ])
        del poly
        del obj

Use pickle.dumps to serialize the object. Also, sympy uses a cache mechanism, so creating a large number of objects quickly consumes a few gigabytes of memory. To avoid that, clear_chache is used to delete the cache to create 100.

Also, the conditions for outputting an exception "Polygon has intersecting sides." In the latest sympy Polygon are different between 0.7.5 and 0.7.6. In 0.7.6, the conditions are strict and it cannot be created normally with this data. Therefore, use 0.7.5 for sympy.

Recommended Posts

That's why I'll be able to search for single-seat constituencies by location
Everything for beginners to be able to do machine learning