([Addition] As you can see in the comment, with the cooperation of gc373, the problem data has been corrected as JSON.) Data is important. Good data will provide good value to society. Good data is not only about having valuable information, but also about being easy to handle. Easy to work with means that you can easily access and read the data. It's awkward to have a special software format or something extra written that humans have to read before reading. SHIFT_JIS is also a problem.
However, first of all, something can be done only with data, and it should be appreciated that the government is starting to publish data as open data. However, there is insufficient know-how on what kind of data should be released.
By the way, there is a site called http://www.data.go.jp/, and it seems that innumerable data can be obtained. I'm sure that the data that can improve our country must be buried. Expecting ..., I touched it for a moment. For the time being, I saw Metadata list for July 2016, so I was wondering what was written. So I downloaded the JSON one.
import json
text = "".join(open("hoge.json").readlines())
data = json.loads(text)
If you do, you will be able to read it. It was supposed to be. But it doesn't work. I couldn't help it, so I took a look at the file. Then what is it?
[{u'license_title': None, u'maintainer': None,・ ・ ・
JSON is data that conforms to the javascript format. However, there is no such thing as None in javascript. There is no grammar like u'hoge'. Yeah, this isn't JSON.
Rather, None would be python. There was a suspicion that this might have output the python data without converting it to json. So, without regard for danger,
data = eval(text)
When I tried it, the data could be read without any error. Of course, this method is very dangerous. If it contains malicious parts, there will be no lumps. It's not okay because it's data issued by the government, but I decided to blame the government if something happened. A good child should not imitate. In fact, the CPU usage has reached 100%, and 16G of memory has been used up.
Now, let's save the read data in JSON this time.
with open("out.json", "w") as f:
f.write(json.dumps(data))
You have successfully saved it as json. I'm happy.
After that, I thought I would complain that the data was incorrect even at the opinion reception desk, but when I followed the opinion reception link, it said "It is not a secure connection" and requested correction. I can't do that. So I wrote something like this here.
Isn't there any good data somewhere ...
Recommended Posts