[Python] Creating a scraping tool Memo

Mainly memos for myself.

goal

Create a scraping tool to get pachinko data

design

Unit data acquisition [OK until test]

It is a program to get the number of G and the number of big hits. This is almost complete, and we have been able to acquire 5 units in the test, so we should be able to acquire all of them without any problems. Click here for the acquisition flow

  1. Create a URL list
  2. Access the URL list in order
  3. Get the information you want when you access it and create a list → For example, if you want to get the number of BBs and the number of Gs, create a list for each.
  4. Convert the list to a data frame

I also tried to get it with read_html in the table, but when joining each data frame, I could not join well, so I got only the information I wanted in a list and converted and joined them into a data frame. ..

After that, adjust the type of the acquired data.

Acquisition of slump graph [OK until test]

It is a program to acquire the slump graph of each unit.

Click here for the acquisition flow you are thinking about

  1. Create a URL list (the list is the same as the unit data)
  2. Create an image URL list
  3. Get the image URL and append it to the list → The point to note is that you don't have to bother to turn it with a for statement because you only want one image on each page.
  4. Create a function to download the image
  5. Download execution → Fixed to print the error

What you have to pay attention to in the image is that the SRC is partly a relative path, whether the site is taking measures for the data of the day. I can't feel the regularity of which model has a relative path. Therefore, the data to be acquired is basically the data of the previous day. It is necessary to investigate what time the site will switch.

Data conversion of graph [OK until test]

Next, it is a program that analyzes the image of the slump graph and converts it into data.

Click here for the flow of thinking

  1. Analysis based on the acquired image
  2. Add analysis information to the list
  3. Recalculate the analysis information in the list
  4. Convert to data frame
  5. Merge with the first data frame

Summary

Especially because it's close to my own memo, I don't think it will be helpful to anyone.

Recommended Posts

[Python] Creating a scraping tool Memo
Creating a scraping tool
Memo for creating a text formatting tool
A memo when creating a python environment with miniconda
A memo for creating a python environment by a beginner
Problems when creating a csv-json conversion tool with python
python memo
Python memo
A memo when creating a directed graph using Graphviz in Python
[Scraping] Python scraping
python memo
Create a tool to check scraping rules (robots.txt) in Python
Precautions when creating a Python generator
"Scraping & machine learning with Python" Learning memo
[Python] Chapter 03-01 turtle graphics (creating a turtle)
Python memo
Python memo
Creating a simple PowerPoint file with Python
A memo with Python2.7 and Python3 on CentOS
Creating a python virtual environment on Windows
AtCoderBeginnerContest154 Participation memo (Python, A ~ E problem)
Scraping a website using JavaScript in Python
[Python] Creating a stock price drawdown chart
Try HTML scraping with a Python library
A tool for easily entering Python code
Memo about Sphinx Part 1 (Creating a project)
I created a password tool in Python.
python: Creating a ramen timer (pyttsx3, time)
[Python] Memo dictionary
Python scraping notes
Python Scraping get_ranker_categories
Scraping with Python
python beginner memo (9.2-10)
Scraping with Python
python beginner memo (9.1)
Python Scraping eBay
★ Memo ★ Python Iroha
[Python memo] Be careful when creating a two-dimensional array (list of lists)
Python Scraping get_title
[Python] EDA memo
Python: Scraping Part 1
[My memo] python
Python3 metaclass memo
[Python] Basemap memo
Python beginner memo (2)
Scraping using Python
[Python] Numpy memo
Python: Scraping Part 2
A memo of a tutorial on running python on heroku
A tool for creating symbolic links on Windows
Ubuntu18.04.05 Creating a python virtual environment in LTS
A memo that I wrote a quicksort in Python
[Python] A tool that allows intuitive relative import
python memo: Treat lists as a set type
DJango Memo: From the beginning (creating a view)
A memo about writing merge sort in Python
[Memo] I tried a pivot table in Python
Commands for creating a python3 environment with virtualenv
Procedure for creating a Python quarantine environment (venv environment)
I tried running alembic, a Python migration tool
Creating a Python document generation tool because it is difficult to use sphinx