I want to sell Mercari by scraping python

Trigger

Do you guys sell what you no longer need using Mercari? I also sell books that I no longer need at Mercari, but all of them are old reference books and study books, so they are hard to sell. .. ..

Mercari will suggest a "sellable price" when listing. However, if you set it to a high price, it will not sell, and if it is too cheap, you will feel like you have lost something.

Before setting the price of an item, I do a search once to find out what the real market price is. (Maybe I'm not the only one ...)

However, this work was quite troublesome, and I was wondering if it could be automated somehow. So I used python to scrape Mercari and find out how much it would sell!

There is a link to github at the end of the page, so please play with it!

Image of the result

The following is a graph that was actually scraped with Mercari. With this kind of feeling, when selling a certain product, it would be better to set the price around 600 yen. I got the result. mercari_histgram_リンガメタリカ-cd.jpg The following describes how to scrape with Mercari.

environment

Windows 10 Pro
Python 3.8.2
Selenium 3.141.0
ChromeDriver 80.0.3987.106

Environmental preparation

Python virtual environment preparation

I didn't want my local environment to get dirty, so I created a venv virtual environment. In addition, Python 3.8.2 is installed locally and the path is also passed.

python -m venv venv

The above command will create a venv directory in the executed directory.

How to start in a virtual environment

Enter the following command in the location where the venv directory is located.

venv\Scripts\activate

Now you can enter the virtual environment if the terminal has the letters (venv) at the beginning.

By the way, enter the following command to get out of the virtual environment.

deactivate

python module preparation

The following modules are required to execute this program. I will put it in advance.

pip install pandas matplotlib

Selenium environment preparation

I installed pip inside a python virtual environment.

pip install selenium

Chrome Driver environment preparation

From the following, prepare the Chrome driver to be used with Selenium. ChromeDriver - WebDriver for Chrome

Here, select the one that suits your version of Chrome. My Chrome version was 80.0.3987.132, so the closest to it ChromeDriver 80.0.3987.106 I chose the windows version. (Actually, the 64-bit version was good, but since there was only the 32-bit version, I had no choice but to use it.)

By the way, you can check the version of Chrome as follows. Google Chrome Settings-> About Chrome (bottom item by clicking the three lines on the left)

After downloading and unzipping, place chromedriver.exe in the same directory as your python file.

Preparing for scraping

Analysis of URL for Mercari search

The URL for searching for products in Mercari is as follows. Example 1: Search by "PC".

https://www.mercari.com/jp/search/?keyword=computer

Example 2: Search by "PC" or "used".

https://www.mercari.com/jp/search/?keyword=computer+second hand

When searching with multiple words, it seems that + is added between the search words.

HTML parsing

Go to the Mercari page and use the developer tools to check the HTML source. Developer tools can be viewed on a web page with the "F12" key.

Product information

Below is the reference information for each product that is displayed when you search with Mercari. From this information, you can get the information you want by scraping.

<section class="items-box">
  <a href="https://item.mercari.com/jp/~~~~~~~~~~~~~~~~~~~~~~~~~~">
    <figure class="items-box-photo">
      <img
        class="lazyloaded"
        data-src="https://static.mercdn.net/c!/w=240/thumb/photos/~~~~~~~~~~~~"
        alt="computer"
        src="https://static.mercdn.cet/c!/w=240/thumb/photos/~~~~~~~~~~~~~~~~"
      />
      <figcaption>
        <div class="item-sold-out-badge">
          <div>SOLD</div>
        </div>
      </figcaption>
    </figure>
    <div class="itmes-box-body">
      <h3 class="items-box-name font-2">
computer
      </h3>
      <div class="items-box-num">
        <div class="items-box-price font-5">¥19,800</div>
      </div>
    </div>
  </a>
</section>

Product name

<h3 class="items-box-name font-2">
computer
</h3>

price

<div class="items-box-price font-5">¥19,800</div>

Sold

The following tags have been added for items that have already been sold.

<figcaption>
  <div class="item-sold-out-badge">
    <div>SOLD</div>
  </div>
</figcaption>

Next page transition button

As I was scraping, I needed to know the information of the next page button, so I will describe it.

<ul class="pager">
  <li class="pager-num">{Page number 1,2,3,4,5 etc.}</li>
  <li class="pager-next visible-pc">
    <ul>
      <li class="pager-cell">
        <a href="/jp/search/?page=~~~~~~~~~~~~~~~~~~~~~">
          <i class="icon-arrow-right"></i>
        </a>
      </li>
      <li class="pager-cell">{Button to move to the last page}</li>
    </ul>
  </li>
</ul>

Next page button

<li class="pager-next visible-pc">
  <ul>
    <li class="pager-cell">
      <a href="/jp/search/?page=~~~~~~~~~~~~~~~~~~~~~">
        <i class="icon-arrow-right"></i>
      </a>
    </li>
  </ul>
</li>

Before implementation

What I wanted to do

Get the price of sold items for the search word in Mercari (scraping)
Make scraped data visually easy to understand (graphing)
For a large number of search words, go around 1 and 2 above (batch processing)

Design overview

Here's what we did to implement the above: The source code just slurps to make it happen.

Put up the list of products to be searched by Mercari in csv
Read 1 csv file with python
Scraping with the read search word
If the search result spans multiple pages, scrape all pages.
When scraping is complete, output the result to another csv file and save it. Create a graph based on the csv file created in 1.5
Save the csv file and graph (.jpg) file for graphing
After that, loop 1 to 7 until all the product list csv is scraped.

Implementation

Directory structure

.
├── chromedriver.exe
├── mercari_search.csv
├── scraping_batch.py
└── venv

Description of configuration

This source is roughly divided into three parts.

search_mercari(search_words)

A function that scrapes. The argument is a search word.

make_graph(search_words, except_words, max_price, bins)

A function that draws a graph based on scraped information. Enter the search word, the word to be excluded, the maximum value of the search product, and the graph width in each argument.

read_csv()

Load the csv file of the search list prepared in advance.

Implementation

`scraping_batch.py`


import pandas as pd
from selenium import webdriver
import matplotlib.pyplot as plt
import time
import csv
import os


def search_mercari(search_words):

    #Temporarily evacuate because the search word is used as the directory name as it is
    org_search_words = search_words

    #If there are multiple search words, "+Shape to connect with "
    words = search_words.split("_")
    search_words = words[0]
    for i in range(1, len(words)):
        search_words = search_words + "+" + words[i]

    #URL to search by Mercari
    url = "https://www.mercari.com/jp/search/?keyword=" + search_words

    #Open browser
    #Chrome river in the same directory as this python file.If you have an exe
    #The argument may be empty
    browser = webdriver.Chrome()

    #Sleep for 5 seconds as it takes time to boot
    time.sleep(5)

    #Display page
    page = 1
    #Create a list
    columns = ["Name", "Price", "Sold", "Url"]
    #Specify the array name
    df = pd.DataFrame(columns=columns)

    #Run
    try:
        while(True):
            #Search in browser
            browser.get(url)
            #Get all HTML for each product
            posts = browser.find_elements_by_css_selector(".items-box")
            #Show how many pages you are getting
            print(str(page) + "Getting page")

            #Get the name and price for each product, whether it has been purchased or not, and the URL
            for post in posts:
                #Product name
                title = post.find_element_by_css_selector(
                    "h3.items-box-name").text

                #Get the price
                price = post.find_element_by_css_selector(
                    ".items-box-price").text
                #Deleted because extra things will be acquired
                price = price.replace("¥", "")
                price = price.replace(",", "")

                #Set to 1 if purchased, 0 if not purchased
                sold = 0
                if (len(post.find_elements_by_css_selector(".item-sold-out-badge")) > 0):
                    sold = 1

                #Get the product URL
                Url = post.find_element_by_css_selector(
                    "a").get_attribute("href")

                #Add scraped information to list
                se = pd.Series([title, price, sold, Url], columns)
                df = df.append(se, columns)

            #Increment the number of pages
            page += 1
            #Get the URL to go to the next page
            url = browser.find_element_by_css_selector(
                "li.pager-next .pager-cell a").get_attribute("href")
            print("Moving to next page ...")
    except:
        print("Next page is nothing.")

    #Save the last obtained data as CSV
    filename = "mercari_scraping_" + org_search_words + ".csv"
    df.to_csv(org_search_words + "/" + filename, encoding="utf-8-sig")
    print("Finish!")


def make_graph(search_words, except_words, max_price, bins):
    #Open CSV file
    df = pd.read_csv(search_words + "/" +
                     "mercari_scraping_" + search_words + ".csv")

    # "Name"To"except_words"Except for those containing
    if(len(except_words) != 0):
        exc_words = except_words.split("_")
        for i in range(len(exc_words)):
            df = df[df["Name"].str.contains(exc_words[i]) == False]
    else:
        pass

    #Purchased(sold=1)Show only products
    dfSold = df[df["Sold"] == 1]

    #price(Price)Shows only products with a price of 1500 yen or less
    dfSold = dfSold[dfSold["Price"] < max_price]

    #Specify the column name "Price" "Number at that price" "Percent"
    columns = ["Price",  "Num", "Percent"]

    #Specify the array name
    all_num = len(dfSold)
    num = 0
    dfPercent = pd.DataFrame(columns=columns)

    for i in range(int(max_price/bins)):

        MIN = i * bins - 1
        MAX = (i + 1) * bins

        #List only what is between the MIN and MAX values, len()Get the number using
        df0 = dfSold[dfSold["Price"] > MIN]
        df0 = df0[df0["Price"] < MAX]
        sold = len(df0)

        #I want to make it cumulative, so I will add this number to num
        num += sold

        #Calculate the percentage here
        percent = num / all_num * 100

        #The price is the median of MIN and MAX
        price = (MIN + MAX + 1) / 2
        se = pd.Series([price, num, percent], columns)
        dfPercent = dfPercent.append(se, columns)

    #Save to CSV
    filename = "mercari_histgram_" + search_words + ".csv"
    dfPercent.to_csv(search_words + "/" + filename, encoding="utf-8-sig")

    #Drawing a graph
    """
    :param kind:Specify the graph type
    :param y:Specify y-axis value
    :param bins:Specify graph width
    :param alpha:Graph transparency(0:Transparent~ 1:Dark)
    :param figsize:Specify the size of the graph
    :param color:Graph color
    :param secondary_y:Specification of 2-axis use(If True)
    """
    ax1 = dfSold.plot(kind="hist", y="Price", bins=25,
                      secondary_y=True, alpha=0.9)
    dfPercent.plot(kind="area", x="Price", y=[
        "Percent"], alpha=0.5, ax=ax1, figsize=(20, 10), color="k")
    plt.savefig(search_words + "/" + "mercari_histgram_" +
                search_words + ".jpg ")


def read_csv():
    #Read the csv file of the Mercari search list
    with open("mercari_search.csv", encoding="utf-8") as f:

        #Prepare an empty list for storing search words
        csv_lists = []
        #Counter to check which line of the csv file to read
        counter = 0

        #Read the csv file line by line
        reader = csv.reader(f)
        for row in reader:
            counter += 1
            csv_lists.append(row)
            try:
                #Search word check
                #If empty, display an error message and exit
                if(len(row[0]) == 0):
                    print("File Error:No search word-> " +
                          "mercari_search.csv " + str(counter) + "Line")
                    break
            except IndexError:
                #If the line is empty, display an error message and exit
                print("File Error:There is a problem with the CSV file. Please close the line spacing.")
                break
            try:
                if(len(row[2]) == 0):
                    #Check the highest value when drawing a graph
                    #If empty, display an error message and exit
                    print("File Error:No amount has been set-> " +
                          "mercari_search.csv " + str(counter) + "Line")
                    break
                else:
                    try:
                        int(row[2])
                    except ValueError:
                        #If the value does not appear, an error message is displayed and the process ends.
                        print("File Error:Please enter a number for the amount-> " +
                              "mercari_search.csv " + str(counter) + "Line")
                        break
            except IndexError:
                #If the amount itself is not written in the first place, an error message will be displayed and the process will end.
                print("File Error:No amount has been set-> " +
                      "mercari_search.csv " + str(counter) + "Line")
                break
            try:
                if(len(row[3]) == 0):
                    #Check the highest value when drawing a graph
                    #If empty, display an error message and exit
                    print("File Error:Graph width is not set-> " +
                          "mercari_search.csv " + str(counter) + "Line")
                    break
                else:
                    try:
                        int(row[3])
                    except ValueError:
                        #If the value does not appear, an error message is displayed and the process ends.
                        print("File Error:Please enter a number for the graph width->" +
                              "mercari_search.csv " + str(counter) + "Line")
                        break
            except IndexError:
                #If the amount itself is not written in the first place, an error message will be displayed and the process will end.
                print("File Error:Graph width is not set-> " +
                      "mercari_search.csv " + str(counter) + "Line")
                break
        return csv_lists

# ------------------------------------------------------ #


# 0.Prepare a box to store the list read from the Mercari search CSV file
"""
Read the list from the search CSV file
:param csv_lists[i][0]:Search word
:param csv_lists[i][1]:Words to exclude from search results
:param csv_lists[i][2]:Maximum amount of money when displaying a graph
:param csv_lists[i][3]:Graph width(bin)
"""
csv_lists = read_csv()

#Batch processing
for i in range(len(csv_lists)):
    # 1.Directory creation
    os.mkdir(csv_lists[i][0])
    # 2.Scraping process
    search_mercari(csv_lists[i][0])
    # 3.Graph drawing
    make_graph(csv_lists[i][0], csv_lists[i][1],
               int(csv_lists[i][2]), int(csv_lists[i][3]))

How to use

1. Preparation of search word list

Enter the words you want to search, the words you want to exclude, the maximum amount, and the graph width in mercari_search.csv.

--Search word (required): Enter the word you want to search with Mercari --If there are multiple search words, please connect with half-width underscore (_) --Example: Pokemon \ _game --Be careful not to put a space between it and the search word (operation is not guaranteed). --Exclusion word (optional): Enter when you want to exclude products with that word when drawing a graph. --If you do not enter the exclusion word, you do not need to enter anything. --Maximum amount (required): Please enter the maximum amount that will be the horizontal axis when drawing the graph. --Please enter in half-width numbers (integers). --Graph width (required): Enter the graph width when drawing the graph --Please enter in half-width numbers (integers).

When separating each word, it is a csv file, so separate it with a comma (,).

Example:

clock,Digital,10000,100
wallet,Cow,3000,100
Pokémon_game,card_CD,3000,100
computer,,15000,500

2. Perform scraping

Make sure that chromedriver.exe and mercari_search.py are in the same directory as this source (scraping_batch.py) file, and execute the following command.

python scraping_batch.py

During execution, the number of scraped pages is displayed as shown below.

1 page is being acquired
Moving to next page ...
Getting 2 pages
Moving to next page ...
Acquiring 3 pages
・ ・ ・
Getting 22 pages
Moving to next page ...
23 pages are being acquired
Next page is nothing.
Finish!

3. Check the result

Running python in 2 above will create a directory according to your search terms. A graph will be created in that directory as a result of scraping with Mercari, so check the result.

If you are not satisfied with the result, review the structure of the csv file and try scraping again!

Caution

The same directory as the search word will be created in the same directory as the python file (to prevent a large number of files being created in the directory where the python file exists). If the same directory as the search word already exists, or if the same search word exists in the search csv (mercari_search.csv), the process of creating the directory (ʻos.mkdir ()`) will not work properly. Scraping stops halfway. Therefore, when starting scraping, be careful that the same directory as the search word does not exist and that you do not enter the same search word in the csv file.

Try to create

Actually, I tried scraping with the product "Linga Metallica" (English word book known to those in the know) currently on sale. The words and parameters used during the search are: (Because I want to sell the main body of Ringa Metallica's vocabulary, I try to exclude the ones that have the word "CD" in the search word.) Linga Metallica, CD, 1500, 50 As a result of turning the above source, the following graph is created. mercari_histgram_リンガメタリカ-cd.jpg Looking at this,

--Sells well for around 600 yen ――About 80% of the items sold are 800 yen or less

The result was that. From this graph, you can see that if you want to sell "Linga Metallica", about 600 yen is a reasonable price.

By the way

At the time of writing this article, I tried to sell "Linga Metallica" with "no noticeable scratches or stains". At that time, the market price offered by Mercari was "640 yen" (the easy-to-sell price was 460 to 790 yen).

Perhaps Mercari was offering a reasonable amount of money without scraping and checking for myself. .. ..

from now on

There are five things I'm thinking about right now: I would like to do it even when I have time slowly in the future.

――Since products are listed on other flea market sites, I would like to use the same scraping as above to obtain product sales information. ――I think that the value of a product will change depending on the date of its release and the season, so I would like to classify the price by time series or by season. ――The price changes depending on the condition of the product, so I would like to scrape the "condition of the product" as an element. --Some parts of the source have become redundant, so I'd like to refactor it (those who are familiar with python, I would appreciate it if you could review it!). ――I would like to set the price based on the scraped information and investigate whether the sales actually increased.

That's all for this time. Until the end Thank you for reading.

Link to Github

kewpie134134/fleamarket_app_scraping

reference

-Get the lowest price from Mercari by scraping with Python -[Python] Mercari scraping