[PYTHON] A story about kindergartens, nursery schools, and children's gardens

I am active in Code for Kobe, but the frame of the regular meeting is magically full thank you. Occasionally I decided to write on the Advent calendar, hoping to put out my own material. This is an article of Civic Tech Advent Calendar 2016.

Recently, I am investigating the status of facilities for preschoolers. It is often talked about in the news. I feel that the data is not clearly visible for the topic, but what about it? It's a complicated system, but I'm currently a party, so I started collecting data on a trial basis.

The new system for supporting children and child-rearing is described in detail on the Cabinet Office page. The general idea is to transfer what used to be a kindergarten / nursery school to a local government by putting it in a system called a "children's garden" so that it can be operated more integrally. Benefits have been set for facility operation. In addition, when using the system, local governments gained control of the quota by "certifying" children.

This system suddenly starts, for example, in this form.

"From 2015, certification is required for the use of kindergartens and nursery schools."

Roughly explaining the certification, No. 1 certification = full-time housewife = kindergarten, and No. 2 and 3 certification = double-income = nursery school. The difference between No. 2 and No. 3 is the age of the child. Since the management side wants benefits and the local governments increase the management work because they pay money but also speak out, it seems that there are many places where the substance is currently quite rigid. .. No. 2 = first class, No. 1 = economy may or may not be. If you don't have a frame, you can't enter, so for example, if you have multiple children, it may be virtually impossible to move from No. 1 certification to No. 2 certification. When this becomes an elementary school, it is compulsory education, and you will definitely be admitted to a school in the school district, so you will be surprised at the gap in the system.

Now. Although it is such a new system, various local governments are currently undergoing system changes. In fact, it happens quite often that kindergartens become children's gardens. In other words, the list of kindergartens, nursery schools, and children's schools is updated more frequently than I expected. It's hard to keep up with this, even though it depends on life.

Perhaps due to such a situation, it seems that nursery school maps are being actively created. Nursery Map project fork made with Code for Sapporo There are some Actively It seems to be active. In addition to this, for example, Osaka City seems to maintain its own Map. It seems that a data creation event is also being held, so it seems to be fun. Jealous.

But well, what is it? Even so, I really want the original data to be open data, and I don't want everyone to work hard on data maintenance on a regular basis, and I think it should be the way it should be. Sustainability is important, and getting rid of and caring for outdated information can be costly and ridiculous.

Just at the Barcelona Workshop in Kobe City, [maintain] the data of Kobe City (https://github.com/hkwi/our-data/blob/) master / shinseido.json) I [made] the work (http://hkwi.github.io/kobe-barcelona/), so I expanded the scope and tried to collect data throughout Hyogo prefecture. Sustainability is important, so I'll do my best to automate it.

First, let's aim at the facility list. I pulled the data and tried it with the line "I will try to extract the surface-like part". The repository is located at U5. The data acquisition source is ʻu5 / task28.py, and the one that can be detected is [all.ttl](https://hkwi.github.io/U5/all.ttl). RDFis good at this kind of data," I don't know if there is, I don't know how many ". It's quite difficult withcsv and json`.

How about that. It's interesting because you can see various habits.

―― “・” is often used. However, strings that are prohibited by Turtle's prefixed name. ――Half-width numbers are often included in the headline. --Line breaks may have the meaning of data delimiters (in cells) --The appearance may be adjusted in full-width space. --Addresses such as prefectures and cities that are easily omitted in context --Area code that is easily omitted in context --Comments are often embedded --The parentheses "(" and ")" are also often used. --Notational fluctuations occur roughly (starting with full-width and half-width characters) --Various expressions of BOOLEAN ――The ones on the table are still better

Generally, when we say "I want data", we mean "structured data" or "semi-structured data". If it is a table, it will give you one structure, so it's a little better. What isn't even in the table is a letter if you can guess further from the document structure and somehow make it structured data. PDF is very difficult to use because it destroys even the document structure once.

Also, even if it is a table, it is troublesome if it is not structured data. Of course, "Neshin Excel" is out of the question, but it is also not good to express the data structure with line breaks, parentheses, and other character string rules in the table cells. "Cell = one data" is desirable. You should also stop entering space characters to adjust the appearance. It's just the same story that HTML shouldn't be used to adjust the appearance of table. If you clear these two, the usability will be much better. Even in HTML, it took a considerable amount of time for the table layout to be expelled, so it seems that this will also take some time. In any case, the current situation is that the appearance and contents are integrated and distributed, so it is true that these are in an environment that is easy to maintain without distinction. Maybe I'm talking about Data Academy.

The following is extracted as what seems to be a field name. Mainly for files. There are more types than I expected.

--When to enter 0-year-old child --3-year-old child ―― 4-year-old child --Capacity for 4-year-old children ――5-year-old child -Capacity for 5-year-old children --Fax number

It seems to be a pain to put together while suppressing the notation fluctuation. Imminently, I am worried about how to organize the information when the current list and the future list (planned) are posted at the same time. I have to geocode to put it on the map. I want something that has cleared the license.

I hope that the data will be distributed quickly and with a good feeling.

Figure of "5 star open data"

Derailment story # 1. When you start open data, you often see the figure of 5 star open data. To be honest, I'm sick of this. What is written in the text is decent. It's okay to read it carefully. However, the picture is a propaganda that "LOD is the best". It's a position talk, so you shouldn't make it a cormorant.

If you think about operating open data now while looking to the future, you should look at the reality. I will write down the characteristics of each that I think I am using.

LOD

--Reliability is essential for access methods --Implementation is developing. What is an efficient implementation? --Inheriting the properties of RDF

RDF

--Must be treated in the form of a triple --Null can be represented by the absence of triples --Data type can be expressed --The tools are undeveloped and you can't edit the table data directly. ――There are times when you can handle data that is difficult to tabulate well.

CSV

--No comment --colspan and rowspan cannot be used --There is a tacit understanding that the first line is the heading, and Multi-index cannot be used.

Excel

--There is no choice but to guess the area of data --Relatives of OpenDocument and Office Open XML have open formats to avoid lock-ins --One cell can be used as a unit of data --Can store multiple sheets --The tools are available

PDF

--Document structure is lost --Difficult to extract meaningful data

Isn't Excel reasonable at this point for manual updates? For example, you can read it with pandas.read_excel. It's cool to extend the Open Packaging Convention to bundle schemas. CSV is never always better than Excel.

Also, table data should be maintained as a table as long as the table is properly represented. It is better to emphasize ** maintenance cost **. There are some things that can be achieved by using triple (data distribution and repository), but that is rather easy to automate.

The HTML table is also easy to use.

Social issue?

Derailment # 2. Looking at Civic tech, I see something like "solving social issues" and "creating business". I think I'm doing Civic tech myself, but neither is likely to be the case. While working as an office worker, I feel that I want to at least personally arrange my personal belongings in a modern way.

As a matter of fact, the one that I think is the most "unexpectedly usable" right now is Calendarization of school lunch.

Of course, solving social issues and creating businesses are important, but I hope that activities that merely make people's lives comfortable will be recognized as Civic tech. The maintenance of kindergarten / nursery data is also done in such a way as "what will be solved?", "What will you do then?" And "is it profitable?", But it is difficult to answer.

Code for Kobe

Derailment # 3. At Code for Kobe, we look forward to your participation. There are no eligibility to participate. If you have any questions, please ask! → Facebook page

Recommended Posts

A story about kindergartens, nursery schools, and children's gardens
A story about Python pop and append
A story about Go's global variables and scope
A story about modifying Python and adding functions
A refreshing story about Python's Slice
A sloppy story about Python's Slice
A story about using Python's reduce
A story about trying to run JavaScripthon on Windows and giving up.
A story about custom users having a sweet and painful look at Django
A story about automating online mahjong (Mahjong Soul) with OpenCV and machine learning
A story about trying to connect to MySQL using Heroku and giving up
A story about remodeling Lubuntu into a Chromebook
A story connecting Slack and google spreadsheets
A story about machine learning with Kyasuket
A story about a 503 error on Heroku open
A story about trying to install uwsgi on an EC2 instance and failing
A story about cross-compiling a python package for AWS Lambda and deploying it serverless
About _ and __
A story about simple machine learning using TensorFlow
A story about operating a GCP instance from Discord
A story about displaying article-linked ads on Jubatus
A story about implementing a login screen with django
A story about running Python on PHP on Heroku
A story about data analysis by machine learning