It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).
artist.json.gz is a file in the open music database MusicBrainz that is converted to JSON format and compressed in gzip format. In this file, information about one artist is stored in JSON format on one line. The outline of JSON format is as follows.
field Mold Contents Example id Unique identifier integer 20660 gid Global identifier String "ecf9f3a3-35e9-4c58-acaa-e707fba45060" name artist name String "Oasis" sort_name Artist name (for dictionary order) String "Oasis" area Place of activity String "United Kingdom" aliases alias List of dictionary objects aliases[].name alias String "oasis" aliases[].sort_name Alias (for alignment) String "oasis" begin Activity start date dictionary begin.year Activity start year integer 1991 begin.month Activity start month integer begin.date Activity start date integer end Activity end date dictionary end.year End of activity year integer 2009 end.month Activity end month integer 8 end.date Activity end date integer 28 tags tag List of dictionary objects tags[].count Number of times tagged integer 1 tags[].value Tag content String "rock" rating Rating Dictionary object rating.count Rating votes integer 13 rating.value Rating value (average value) integer 86 Consider storing and retrieving artist.json.gz data in a key-value-store (KVS) and document-oriented database. Use LevelDB, Redis, Kyoto Cabinet, etc. as KVS. MongoDB was adopted as the document-oriented database, but CouchDB, RethinkDB, etc. may also be used.
Find the number of artists whose activity location is "Japan" using the database constructed in> 60.
main.py
# coding: utf-8
import leveldb
fname_db = 'test_db'
#LevelDB open
db = leveldb.LevelDB(fname_db)
#value is'Japan'Enumerate
clue = 'Japan'.encode()
result = [value[0].decode() for value in db.RangeIter() if value[1] == clue]
#Number display
print('{}Case'.format(len(result)))
Execution result
22821
The registration contents were enumerated by acquiring the iterator with LevelDB.RangeIter ()
.
That's all for the 63rd knock. If you have any mistakes, I would appreciate it if you could point them out.
Recommended Posts