[PYTHON] Convert PDF of product list containing effective surfactants for new coronavirus to CSV

CSV of PDF of Product list containing surfactants effective for new coronavirus of National Institute of Technology and Evaluation Conversion to

apt install python3-tk ghostscript
pip install camelot-py[cv]

Scraping

from urllib.parse import urljoin

import requests
from bs4 import BeautifulSoup

url = "https://www.nite.go.jp/information/osirasedetergentlist.html"

r = requests.get(url)
r.raise_for_status()

soup = BeautifulSoup(r.content, "html.parser")

tag = soup.select_one("div.main div.cf ul > li > a")

link = urljoin(url, tag.get("href"))

Data wrangling

import camelot
import pandas as pd

tables = camelot.read_pdf(
    link, pages="all", split_text=True, line_scale=40, copy_text=["v"]
)

df_tmp = pd.concat([table.df for table in tables[:-1]])

#Detergent for home furniture, etc.

df1 = df_tmp.iloc[1:].set_axis(df_tmp.iloc[0].to_list(), axis=1).reset_index(drop=True)
df1.index += 1
df1.to_csv("housing.csv", encoding="utf_8_sig")

#Synthetic detergent for kitchen, etc.

df2 = tables[-1].df.iloc[1:].set_axis(tables[-1].df.iloc[0].to_list(), axis=1)
df2.to_csv("kitchen.csv", encoding="utf_8_sig")

Recommended Posts

Convert PDF of product list containing effective surfactants for new coronavirus to CSV
Convert PDF of Go To Eat Hokkaido campaign dealer list to CSV
Convert PDF of Sagamihara City presentation materials (occurrence status, etc.) regarding new coronavirus infection to CSV
Convert PDF of Kumamoto Prefecture Go To EAT member store list to CSV
Convert PDF of new corona outbreak case in Aichi prefecture to CSV
Convert PDF of Chiba Prefecture Go To EAT member store list to CSV (command)
Convert PDF of list of Go To EAT member stores in Niigata prefecture to CSV
Convert PDF of the situation of people infected in Tokyo with the new coronavirus infection of the Tokyo Metropolitan Health and Welfare Bureau to CSV
Convert PDF of available stores of Go To EAT in Kagoshima prefecture to CSV
Convert from PDF to CSV with pdfplumber
Convert PDF of Go To EAT member stores in Ishikawa prefecture to CSV
COCO'S Breakfast Buffet List PDF Converted to CSV
Convert financial information of all listed companies for the past 5 years to CSV file
Convert a slice object to a list of index numbers
[Python] Convert PDF text to CSV page by page (2/24 postscript)
[Command] Command to get a list of files containing double-byte characters
Quantify the degree of self-restraint required to contain the new coronavirus
Convert PDF of the progress of the division of labor (trends in insurance dispensing) of the Japan Pharmaceutical Association to CSV