Do you guys sell what you no longer need using Mercari? I also sell books that I no longer need at Mercari, but all of them are old reference books and study books, so they are hard to sell. .. ..
Mercari will suggest a "sellable price" when listing. However, if you set it to a high price, it will not sell, and if it is too cheap, you will feel like you have lost something.
Before setting the price of an item, I do a search once to find out what the real market price is. (Maybe I'm not the only one ...)
However, this work was quite troublesome, and I was wondering if it could be automated somehow. So I used python to scrape Mercari and find out how much it would sell!
github
at the end of the page, so please play with it!The following is a graph that was actually scraped with Mercari. With this kind of feeling, when selling a certain product, it would be better to set the price around 600 yen. I got the result. The following describes how to scrape with Mercari.
I didn't want my local environment to get dirty, so I created a venv virtual environment. In addition, Python 3.8.2 is installed locally and the path is also passed.
python -m venv venv
The above command will create a venv
directory in the executed directory.
Enter the following command in the location where the venv
directory is located.
venv\Scripts\activate
Now you can enter the virtual environment if the terminal has the letters (venv)
at the beginning.
deactivate
The following modules are required to execute this program. I will put it in advance.
pip install pandas matplotlib
I installed pip inside a python virtual environment.
pip install selenium
From the following, prepare the Chrome driver to be used with Selenium. ChromeDriver - WebDriver for Chrome
Here, select the one that suits your version of Chrome.
My Chrome version was 80.0.3987.132
, so the closest to it ChromeDriver 80.0.3987.106 I chose the windows version.
(Actually, the 64-bit version was good, but since there was only the 32-bit version, I had no choice but to use it.)
By the way, you can check the version of Chrome as follows.
Google Chrome Settings-> About Chrome (bottom item by clicking the three lines on the left)
After downloading and unzipping, place chromedriver.exe
in the same directory as your python file.
The URL for searching for products in Mercari is as follows. Example 1: Search by "PC".
https://www.mercari.com/jp/search/?keyword=computer
Example 2: Search by "PC" or "used".
https://www.mercari.com/jp/search/?keyword=computer+second hand
When searching with multiple words, it seems that +
is added between the search words.
Go to the Mercari page and use the developer tools to check the HTML source. Developer tools can be viewed on a web page with the "F12" key.
Below is the reference information for each product that is displayed when you search with Mercari. From this information, you can get the information you want by scraping.
<section class="items-box">
<a href="https://item.mercari.com/jp/~~~~~~~~~~~~~~~~~~~~~~~~~~">
<figure class="items-box-photo">
<img
class="lazyloaded"
data-src="https://static.mercdn.net/c!/w=240/thumb/photos/~~~~~~~~~~~~"
alt="computer"
src="https://static.mercdn.cet/c!/w=240/thumb/photos/~~~~~~~~~~~~~~~~"
/>
<figcaption>
<div class="item-sold-out-badge">
<div>SOLD</div>
</div>
</figcaption>
</figure>
<div class="itmes-box-body">
<h3 class="items-box-name font-2">
computer
</h3>
<div class="items-box-num">
<div class="items-box-price font-5">¥19,800</div>
</div>
</div>
</a>
</section>
<h3 class="items-box-name font-2">
computer
</h3>
<div class="items-box-price font-5">¥19,800</div>
The following tags have been added for items that have already been sold.
<figcaption>
<div class="item-sold-out-badge">
<div>SOLD</div>
</div>
</figcaption>
As I was scraping, I needed to know the information of the next page button, so I will describe it.
<ul class="pager">
<li class="pager-num">{Page number 1,2,3,4,5 etc.}</li>
<li class="pager-next visible-pc">
<ul>
<li class="pager-cell">
<a href="/jp/search/?page=~~~~~~~~~~~~~~~~~~~~~">
<i class="icon-arrow-right"></i>
</a>
</li>
<li class="pager-cell">{Button to move to the last page}</li>
</ul>
</li>
</ul>
<li class="pager-next visible-pc">
<ul>
<li class="pager-cell">
<a href="/jp/search/?page=~~~~~~~~~~~~~~~~~~~~~">
<i class="icon-arrow-right"></i>
</a>
</li>
</ul>
</li>
scraping
)graphing
)batch processing
)Here's what we did to implement the above: The source code just slurps to make it happen.
.
├── chromedriver.exe
├── mercari_search.csv
├── scraping_batch.py
└── venv
This source is roughly divided into three parts.
search_mercari(search_words)
A function that scrapes. The argument is a search word.
make_graph(search_words, except_words, max_price, bins)
A function that draws a graph based on scraped information. Enter the search word, the word to be excluded, the maximum value of the search product, and the graph width in each argument.
read_csv()
Load the csv file of the search list prepared in advance.
scraping_batch.py
import pandas as pd
from selenium import webdriver
import matplotlib.pyplot as plt
import time
import csv
import os
def search_mercari(search_words):
#Temporarily evacuate because the search word is used as the directory name as it is
org_search_words = search_words
#If there are multiple search words, "+Shape to connect with "
words = search_words.split("_")
search_words = words[0]
for i in range(1, len(words)):
search_words = search_words + "+" + words[i]
#URL to search by Mercari
url = "https://www.mercari.com/jp/search/?keyword=" + search_words
#Open browser
#Chrome river in the same directory as this python file.If you have an exe
#The argument may be empty
browser = webdriver.Chrome()
#Sleep for 5 seconds as it takes time to boot
time.sleep(5)
#Display page
page = 1
#Create a list
columns = ["Name", "Price", "Sold", "Url"]
#Specify the array name
df = pd.DataFrame(columns=columns)
#Run
try:
while(True):
#Search in browser
browser.get(url)
#Get all HTML for each product
posts = browser.find_elements_by_css_selector(".items-box")
#Show how many pages you are getting
print(str(page) + "Getting page")
#Get the name and price for each product, whether it has been purchased or not, and the URL
for post in posts:
#Product name
title = post.find_element_by_css_selector(
"h3.items-box-name").text
#Get the price
price = post.find_element_by_css_selector(
".items-box-price").text
#Deleted because extra things will be acquired
price = price.replace("¥", "")
price = price.replace(",", "")
#Set to 1 if purchased, 0 if not purchased
sold = 0
if (len(post.find_elements_by_css_selector(".item-sold-out-badge")) > 0):
sold = 1
#Get the product URL
Url = post.find_element_by_css_selector(
"a").get_attribute("href")
#Add scraped information to list
se = pd.Series([title, price, sold, Url], columns)
df = df.append(se, columns)
#Increment the number of pages
page += 1
#Get the URL to go to the next page
url = browser.find_element_by_css_selector(
"li.pager-next .pager-cell a").get_attribute("href")
print("Moving to next page ...")
except:
print("Next page is nothing.")
#Save the last obtained data as CSV
filename = "mercari_scraping_" + org_search_words + ".csv"
df.to_csv(org_search_words + "/" + filename, encoding="utf-8-sig")
print("Finish!")
def make_graph(search_words, except_words, max_price, bins):
#Open CSV file
df = pd.read_csv(search_words + "/" +
"mercari_scraping_" + search_words + ".csv")
# "Name"To"except_words"Except for those containing
if(len(except_words) != 0):
exc_words = except_words.split("_")
for i in range(len(exc_words)):
df = df[df["Name"].str.contains(exc_words[i]) == False]
else:
pass
#Purchased(sold=1)Show only products
dfSold = df[df["Sold"] == 1]
#price(Price)Shows only products with a price of 1500 yen or less
dfSold = dfSold[dfSold["Price"] < max_price]
#Specify the column name "Price" "Number at that price" "Percent"
columns = ["Price", "Num", "Percent"]
#Specify the array name
all_num = len(dfSold)
num = 0
dfPercent = pd.DataFrame(columns=columns)
for i in range(int(max_price/bins)):
MIN = i * bins - 1
MAX = (i + 1) * bins
#List only what is between the MIN and MAX values, len()Get the number using
df0 = dfSold[dfSold["Price"] > MIN]
df0 = df0[df0["Price"] < MAX]
sold = len(df0)
#I want to make it cumulative, so I will add this number to num
num += sold
#Calculate the percentage here
percent = num / all_num * 100
#The price is the median of MIN and MAX
price = (MIN + MAX + 1) / 2
se = pd.Series([price, num, percent], columns)
dfPercent = dfPercent.append(se, columns)
#Save to CSV
filename = "mercari_histgram_" + search_words + ".csv"
dfPercent.to_csv(search_words + "/" + filename, encoding="utf-8-sig")
#Drawing a graph
"""
:param kind:Specify the graph type
:param y:Specify y-axis value
:param bins:Specify graph width
:param alpha:Graph transparency(0:Transparent~ 1:Dark)
:param figsize:Specify the size of the graph
:param color:Graph color
:param secondary_y:Specification of 2-axis use(If True)
"""
ax1 = dfSold.plot(kind="hist", y="Price", bins=25,
secondary_y=True, alpha=0.9)
dfPercent.plot(kind="area", x="Price", y=[
"Percent"], alpha=0.5, ax=ax1, figsize=(20, 10), color="k")
plt.savefig(search_words + "/" + "mercari_histgram_" +
search_words + ".jpg ")
def read_csv():
#Read the csv file of the Mercari search list
with open("mercari_search.csv", encoding="utf-8") as f:
#Prepare an empty list for storing search words
csv_lists = []
#Counter to check which line of the csv file to read
counter = 0
#Read the csv file line by line
reader = csv.reader(f)
for row in reader:
counter += 1
csv_lists.append(row)
try:
#Search word check
#If empty, display an error message and exit
if(len(row[0]) == 0):
print("File Error:No search word-> " +
"mercari_search.csv " + str(counter) + "Line")
break
except IndexError:
#If the line is empty, display an error message and exit
print("File Error:There is a problem with the CSV file. Please close the line spacing.")
break
try:
if(len(row[2]) == 0):
#Check the highest value when drawing a graph
#If empty, display an error message and exit
print("File Error:No amount has been set-> " +
"mercari_search.csv " + str(counter) + "Line")
break
else:
try:
int(row[2])
except ValueError:
#If the value does not appear, an error message is displayed and the process ends.
print("File Error:Please enter a number for the amount-> " +
"mercari_search.csv " + str(counter) + "Line")
break
except IndexError:
#If the amount itself is not written in the first place, an error message will be displayed and the process will end.
print("File Error:No amount has been set-> " +
"mercari_search.csv " + str(counter) + "Line")
break
try:
if(len(row[3]) == 0):
#Check the highest value when drawing a graph
#If empty, display an error message and exit
print("File Error:Graph width is not set-> " +
"mercari_search.csv " + str(counter) + "Line")
break
else:
try:
int(row[3])
except ValueError:
#If the value does not appear, an error message is displayed and the process ends.
print("File Error:Please enter a number for the graph width->" +
"mercari_search.csv " + str(counter) + "Line")
break
except IndexError:
#If the amount itself is not written in the first place, an error message will be displayed and the process will end.
print("File Error:Graph width is not set-> " +
"mercari_search.csv " + str(counter) + "Line")
break
return csv_lists
# ------------------------------------------------------ #
# 0.Prepare a box to store the list read from the Mercari search CSV file
"""
Read the list from the search CSV file
:param csv_lists[i][0]:Search word
:param csv_lists[i][1]:Words to exclude from search results
:param csv_lists[i][2]:Maximum amount of money when displaying a graph
:param csv_lists[i][3]:Graph width(bin)
"""
csv_lists = read_csv()
#Batch processing
for i in range(len(csv_lists)):
# 1.Directory creation
os.mkdir(csv_lists[i][0])
# 2.Scraping process
search_mercari(csv_lists[i][0])
# 3.Graph drawing
make_graph(csv_lists[i][0], csv_lists[i][1],
int(csv_lists[i][2]), int(csv_lists[i][3]))
Enter the words you want to search, the words you want to exclude, the maximum amount, and the graph width in mercari_search.csv.
--Search word (required): Enter the word you want to search with Mercari
--If there are multiple search words, please connect with half-width underscore (_)
--Example: Pokemon \ _game
--Be careful not to put a space between it and the search word (operation is not guaranteed).
--Exclusion word (optional): Enter when you want to exclude products with that word when drawing a graph.
--If you do not enter the exclusion word, you do not need to enter anything.
--Maximum amount (required): Please enter the maximum amount that will be the horizontal axis when drawing the graph.
--Please enter in half-width numbers (integers).
--Graph width (required): Enter the graph width when drawing the graph
--Please enter in half-width numbers (integers).
When separating each word, it is a csv file, so separate it with a comma (,).
Example:
clock,Digital,10000,100
wallet,Cow,3000,100
Pokémon_game,card_CD,3000,100
computer,,15000,500
Make sure that chromedriver.exe
and mercari_search.py
are in the same directory as this source (scraping_batch.py
) file, and execute the following command.
python scraping_batch.py
During execution, the number of scraped pages is displayed as shown below.
1 page is being acquired
Moving to next page ...
Getting 2 pages
Moving to next page ...
Acquiring 3 pages
・ ・ ・
Getting 22 pages
Moving to next page ...
23 pages are being acquired
Next page is nothing.
Finish!
Running python in 2 above will create a directory according to your search terms. A graph will be created in that directory as a result of scraping with Mercari, so check the result.
If you are not satisfied with the result, review the structure of the csv file and try scraping again!
The same directory as the search word will be created in the same directory as the python file (to prevent a large number of files being created in the directory where the python file exists).
If the same directory as the search word already exists, or if the same search word exists in the search csv (mercari_search.csv
), the process of creating the directory (ʻos.mkdir ()`) will not work properly. Scraping stops halfway.
Therefore, when starting scraping, be careful that the same directory as the search word does not exist and that you do not enter the same search word in the csv file.
Actually, I tried scraping with the product "Linga Metallica" (English word book known to those in the know) currently on sale.
The words and parameters used during the search are:
(Because I want to sell the main body of Ringa Metallica's vocabulary, I try to exclude the ones that have the word "CD" in the search word.)
Linga Metallica, CD, 1500, 50
As a result of turning the above source, the following graph is created.
Looking at this,
--Sells well for around 600 yen ――About 80% of the items sold are 800 yen or less
The result was that. From this graph, you can see that if you want to sell "Linga Metallica", about 600 yen is a reasonable price.
At the time of writing this article, I tried to sell "Linga Metallica" with "no noticeable scratches or stains". At that time, the market price offered by Mercari was "640 yen" (the easy-to-sell price was 460 to 790 yen).
Perhaps Mercari was offering a reasonable amount of money without scraping and checking for myself. .. ..
There are five things I'm thinking about right now: I would like to do it even when I have time slowly in the future.
――Since products are listed on other flea market sites, I would like to use the same scraping as above to obtain product sales information. ――I think that the value of a product will change depending on the date of its release and the season, so I would like to classify the price by time series or by season. ――The price changes depending on the condition of the product, so I would like to scrape the "condition of the product" as an element. --Some parts of the source have become redundant, so I'd like to refactor it (those who are familiar with python, I would appreciate it if you could review it!). ――I would like to set the price based on the scraped information and investigate whether the sales actually increased.
That's all for this time. Until the end Thank you for reading.
kewpie134134/fleamarket_app_scraping
-Get the lowest price from Mercari by scraping with Python -[Python] Mercari scraping
Recommended Posts