I decided to keep a learning record of python, so I started Qitta. Since my job is a non-IT company, python is really a hobby ... I mean, I'm learning with interest.
Right now, I'm working on food hygiene, so I wonder if I can do something with python, so let's analyze the food recall information data! I thought.
As the first step, I tried to create a data frame of food recall information by scraping. The source of the data is a site called Recall Plus.
food_recall_info.py
from bs4 import BeautifulSoup
import requests
import re
import csv
import time
import pandas as pd
def recalls(url):
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
recall_soup = soup.findAll("tr",{"class":{"return","info","apology"}})
campany_list = []
recall_list = []
action_list = []
recall_date = []
for j in range(len(recall_soup)):
#Get company name
campany_list.append(recall_soup[j].find("a", href=re.compile("/company/*")).get_text())
#Recall details
recall_list.append(recall_soup[j].find("a", {"style":"float:left"}).get_text())
#How to respond
keyword = re.compile(r'Recovery|Recovery&Refund|Recovery&Refund/Exchange|Recovery&Exchange|Refund|Exchange|点検&Exchange|Notice|Violation of the prize labeling law|apology|Refund/Exchange|Send')
action_list.append(re.search(keyword, str(recall_soup[j])).group())
#Accrual date
recall_date.append(recall_soup[j].find("td", {"class":"day"}).get_text().replace("\n ","20"))
return campany_list,recall_list,action_list,recall_date
campany_lists = []
recall_lists = []
action_lists = []
recall_dates = []
for i in range(1,20):
resl = recalls("https://www.recall-plus.jp/category/1?page={}".format(i))
campany_lists.extend(resl[0])
recall_lists.extend(resl[1])
action_lists.extend(resl[2])
recall_dates.extend(resl[3])
recall_df = pd.DataFrame({'company name':campany_lists,'Recall details':recall_lists,'Correspondence':action_lists,'Accrual date':recall_dates})
Execution result
recall_df.head()
Company name Recall details Correspondence date
0 Kobe Bussan Business Supermarket Imo Stick Some products are mixed with resin pieces Recovery 2020/03/17
1 Marubun Marubun Domestic soybeans used Yose tofu Expiration date mislabeled Collection 2020/03/17
2 AEON Hitachiomiya store...Allergen for pork loin seasoned tonteki(milk)Missing display Apology 2020/03/18
3 Tsuruya Karuizawa store Delicious white fish Fry Allergen Milk ingredient display missing Recovery 2020/03/16
4 Hatanaka Koiya Hatanaka Koiya Koiya Sweet boiled allergen "wheat" missing display Recovery 2020/03/13
Apparently, I think it could be stored in the pandas data frame.
At first, I thought about writing the listing in the data frame each time, but I gave up because I didn't know how to do it. First of all, I made a list of each column and then tried to incorporate it into pandas.
I'm an amateur, so I remembered one thing and added the value with append () at first. However, it was added in list format and I couldn't import it into pandas.
After a lot of research, I found that I could use extend () to add only the values in the list.
I learned one thing.
Now that I have created the data safely, I would like to analyze the data. What can be analyzed with this data now is (1) Percentage of collections and returns when recalls occur ② Is there a time when recalls are likely to occur?
Is not it. I would like to make various trials and errors.
Recommended Posts