[Completed version] Try to find out the number of residents in the town from the address list with Python

Introduction

Qiita is amazing, isn't it?

Previous article (https://qiita.com/kkhouse/items/d8ff2c67cef686a508f8) So, if I was an amateur but published the code, Mr. konandoiruasa (https://qiita.com/konandoiruasa) I received the correction code from.

I am grateful to the amateurs who have just started to do this.

That's why I tried to upload the modified one!

Completion code

`main.py`


import csv
import openpyxl
import sys
import os
import openpyxl as px


aggregate_list =["Toda-gun","Shiogama City","Shiogama City","Oshika-gun","Kami-gun","Katta-gun","Iwanuma City","Shichigahama Town","Rifu Town","Kurihara City","Kurokawa-gun","Shibata-gun","Ishinomaki City","Miyagino Ward","Wakabayashi Ward","Aoba Ward","Izumi Ward","Taichi Ward","Tagajo","Osaki City","Tome City","Higashi Matsushima City","Tomiya City","Motoyoshi-gun","Natori City","Watari-gun","Other"]

# ~~csv → xlsx conversion start~~~
wb = openpyxl.Workbook()
ws = wb.active
query_word = sys.argv[1] 

f = open("./data_csv/{}.csv".format(query_word))
reader = csv.reader(f, delimiter=":")
for row in reader:
  ws.append(row)
f.close()

if not os.path.exists("./data_xlsx"):
  os.makedirs("./data_xlsx")
wb.save("./data_xlsx/{0}{1}.xlsx".format(query_word,"_Aggregate"))
# ~~csv→xlsx convert end~~~

# ~~xlsx Extraction start~~
folder = "data_xlsx"

filePath = []
for root,dirs,files in os.walk(folder):
  for fname in files:
    fnamePath = os.path.join(root,fname)
    if fnamePath.find(".xlsx")!= -1:
      filePath.append(fnamePath)
    if filePath ==[]:
      print("there is no data!")
xlsx_files_list = [s for s in filePath if query_word in s]



#Get the created xlsx
wb = px.load_workbook(xlsx_files_list[0])
ws = wb[wb.sheetnames[0]]

#List all addresses in xlsx
adress_list = []

table = [0] * len(aggregate_list)

for i in range(ws.max_row):
  temp = ws.cell(row=i+1, column=1).value #Because it comes out many times
  adress_list.append(temp)

  for j, name in enumerate(aggregate_list):
    if name in temp:
      table[j] += 1 #Count to the corresponding place
      break
  else:
    table[26] += 1 #Other

#Put together Shiogama and Shiogama
table[1] = table[1]+table[2]
table.pop(2)


for i,j in zip(list(range(1,3)),aggregate_list):
  ws.cell(row=i, column=7).value = j
for i,j in zip(list(range(3,27)),aggregate_list[3:]):
  ws.cell(row=i, column=7).value = j
for i,g in zip(list(range(1,28)),table):
  ws.cell(row=i, column=8).value = g

wb.save("./data_xlsx/{0}{1}.xlsx".format(query_word,"_Aggregate"))

I think the code is more applicable than last time.

By the way

In this example, the code below is needed to count Shiogama and Shiogama in the same column,

`main.py`


#Put together Shiogama and Shiogama
table[1] = table[1]+table[2]
table.pop(2)
・
・
for i,j in zip(list(range(1,3)),aggregate_list):
  ws.cell(row=i, column=7).value = j
for i,j in zip(list(range(3,27)),aggregate_list[3:]):
  ws.cell(row=i, column=7).value = j

If there are no duplicate city / ward names in aggregate_list

`main.py`


for i,j in zip(list(range(1,len(aggregate_list)),aggregate_list):
  ws.cell(row=i, column=7).value = j

You can also add up by fixing it to. (Maybe if you do your best, you can write the code so that the difference in kanji of the city and ward can be counted by default, but there are other things I want to make so far)

What I learned personally

I will omit how to use OpenPyXL because it will come out if you check it. I personally learned the code below.

`main.py`


folder = "data_xlsx"

filePath = []
for root,dirs,files in os.walk(folder):
  for fname in files:
    fnamePath = os.path.join(root,fname)
    if fnamePath.find(".xlsx")!= -1:
      filePath.append(fnamePath)
    if filePath ==[]:
      print("there is no data!")
xlsx_files_list = [s for s in filePath if query_word in s]

os.walk （）

How to use https://www.sejuku.net/blog/63816 This site is easy to understand.

Get the xlsx file by batch fetching the folder structure from the directory. When the program is executed, if the xlsx file is included in the arguments entered in the terminal, the following code will be executed. os.walk is convenient.

Finally

I thought it was written, but it may be better to use input and enter the file name so that the aggregation is completed. I will do it soon. .. Thanks to this, I have learned a lot about how to use Openpyxl. I'm still an amateur, but let's do our best.

References: ・ Https://www.sejuku.net/blog/63816 ・ From comments by konandoiruasa (https://qiita.com/konandoiruasa)

Recommended Posts

[Completed version] Try to find out the number of residents in the town from the address list with Python

Python --Find out number of groups in the regex expression

How to identify the element with the smallest number of characters in a Python list?

How to count the number of occurrences of each element in the list in Python with weight

[Cloudian # 5] Try to list the objects stored in the bucket with Python (boto3)

How to get a list of files in the same directory with python

I tried to predict the number of people infected with coronavirus in consideration of the effect of refraining from going out

Get the value of a specific key in a list from the dictionary type in the list with Python

Try scraping the data of COVID-19 in Tokyo with Python

How to find the optimal number of clusters in k-means