Get property information by scraping with python

Introduction

When I wanted to move in the future, I wanted to find out what kind of properties there are and if there are any bargains. Since it is troublesome to check each time by hand, I wanted to use the scraping I did last time to get the property information.

In the end, I would like to map the acquired information to a map, but I will start by acquiring the property information first.

What is scraping?

To put it simply, scraping is "using a program to collect information on the Internet." Scraping is performed in the following two steps.

① Get html information → ② Extract necessary data

First of all, regarding ① In the first place, a web page is composed using a language called html. Click the arrow page in the upper right corner of Google Chrome and click If you press "Other tools"-> "Developer tools", a list of codes will be output on the right side of the screen. That's the code for drawing the screen, and I pull this code to my computer for scraping. And regarding (2), html has a nested structure, and it is distinguished for each element or labeled for each element. Therefore, you can get the required data from all the data by selecting the label or tag.

Execution environment

The execution environment is as follows.

Implementation

For implementation, I referred to Other articles. Other articles I can get the scraping result just by using the one in the article, but it takes a lot of time, so I rewrote it a little.

Whole code

The whole code is here.

output.py


from bs4 import BeautifulSoup
import requests
import csv
import time

#URL (please enter the URL here)
url = 'https://suumo.jp/jj/chintai/ichiran/FR301FC001/?ar=030&bs=040&ta=13&sc=13101&cb=0.0&ct=9999999&mb=0&mt=9999999&et=9999999&cn=9999999&shkr1=03&shkr2=03&shkr3=03&shkr4=03&sngz=&po1=09&pc=50'

result = requests.get(url)
c = result.content

soup = BeautifulSoup(c, 'html.parser')

summary = soup.find("div",{'id':'js-bukkenList'})
body = soup.find("body")
pages = body.find_all("div",{'class':'pagination pagination_set-nav'})
pages_text = str(pages)
pages_split = pages_text.split('</a></li>\n</ol>')
pages_split0 = pages_split[0]
pages_split1 = pages_split0[-3:]
pages_split2 = pages_split1.replace('>','')
pages_split3 = int(pages_split2)

urls = []

urls.append(url)

print('get all url...')
for i in range(pages_split3-1):
    pg = str(i+2)
    url_page = url + '&page=' + pg
    urls.append(url_page)
print('num all urls is {}'.format(len(urls)))

f = open('output.csv', 'a')
for url in urls:
    print('get data of url({})'.format(url))
    new_list = []
    result = requests.get(url)
    c = result.content
    soup = BeautifulSoup(c, "html.parser")
    summary = soup.find("div",{'id':'js-bukkenList'})
    apartments = summary.find_all("div",{'class':'cassetteitem'})
    for apart in apartments:
        room_number = len(apart.find_all('tbody'))
        name = apart.find("div",{'class':'cassetteitem_content-title'}).text
        address = apart.find("li", {'class':"cassetteitem_detail-col1"}).text
        age_and_height = apart.find('li', class_='cassetteitem_detail-col3')
        age = age_and_height('div')[0].text
        height = age_and_height('div')[1].text

        money = apart.find_all("span", {'class':"cassetteitem_other-emphasis ui-text--bold"})
        madori = apart.find_all("span", {'class':"cassetteitem_madori"})
        menseki = apart.find_all("span", {'class':"cassetteitem_menseki"})
        floor = apart.find_all("td")
        for i in range(room_number):
            write_list = [name, address, age, height, money[i].text, madori[i].text, menseki[i].text, floor[2+i*9].text.replace('\t', '').replace('\r','').replace('\n', '')]
            writer = csv.writer(f)
            writer.writerow(write_list)
    time.sleep(10)

Execution method / execution result

Of the above code

#URL (please enter the URL here)
url = 'https://suumo.jp/jj/chintai/ichiran/FR301FC001/?ar=030&bs=040&ta=13&sc=13101&cb=0.0&ct=9999999&mb=0&mt=9999999&et=9999999&cn=9999999&shkr1=03&shkr2=03&shkr3=03&shkr4=03&sngz=&po1=09&pc=50'

Enter the url of the property information of suumo in the part of. Then, execute this, and if output.csv is output, it is successful.

The output should look like the following in output.csv.

output.csv(part)


Tokyo Metro Hanzomon Line Jimbocho Station 7 stories 16 years old,2 Kanda Jimbocho, Chiyoda-ku, Tokyo,16 years old,7 stories,6.90,000 yen,Studio,13.04m2,4th floor
Tokyo Metro Hanzomon Line Jimbocho Station 7 stories 16 years old,2 Kanda Jimbocho, Chiyoda-ku, Tokyo,16 years old,7 stories,7.70,000 yen,Studio,16.64m2,4th floor
Kudan Flower Home,4 Kudankita, Chiyoda-ku, Tokyo,42 years old,9 stories,7.50,000 yen,Studio,21.07m2,5th floor
Villa Royal Sanbancho,Sanbancho, Chiyoda-ku, Tokyo,44 years old,8 stories,8.50,000 yen,Studio,23.16m2,4th floor
Villa Royal Sanbancho,Sanbancho, Chiyoda-ku, Tokyo,44 years old,8 stories,8.50,000 yen,Studio,23.16m2,4th floor

The elements are output separated by commas, and the following elements are shown respectively.

[Building name],[Street address],[Age],[hierarchy],[rent],[Floor plan],[Breadth],[Number of floors in the room]

We have confirmed that information on Chiyoda Ward and Setagaya Ward can be retrieved from suumo.

Summary

I scraped and got the property information from suumo. It was a lot of fun to do things that have nothing to do with what I usually do. Ultimately, I think it will be more fun if we can map these property information on a map and perform various analyzes, so I will try it.

Recommended Posts

Get property information by scraping with python
Get weather information with Python & scraping
Get Alembic information with Python
Get Qiita trends with Python scraping
Scraping with Python
Scraping with Python
[Python] Get Python package information with PyPI API
Scraping with Python (preparation)
Try scraping with Python.
Scraping with Python + PhantomJS
Get date with python
Scraping with Selenium [Python]
Scraping with Python + PyQuery
Scraping RSS with Python
Get CPU information of Raspberry Pi with Python
Get boat race match information by web scraping
Python script to get note information with REAPER
Get country code with python
I tried scraping with Python
Web scraping with python + JupyterLab
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
Get Twitter timeline with python
Scraping with chromedriver in python
Festive scraping with Python, scrapy
Get Youtube data with python
Get information with zabbix api
Scraping with Selenium in Python
Scraping with Tor in Python
Python beginners get stuck with their first web scraping
Get thread ID with python
Scraping weather forecast with python
Scraping with Selenium + Python Part 2
Get started with Python! ~ ② Grammar ~
[Python] Get user information and article information with Qiita API
I tried scraping with python
Get stock price with Python
Web scraping beginner with python
Get home directory with python
Get keyboard events with python
[Python x Zapier] Get alert information and notify with Slack
Try Juniper JUNOS PyEz (python library) Memo 2 ~ Get information with PyEz ~
Try scraping with Python + Beautiful Soup
Get video file information with ffmpeg-python
Get started with Python! ~ ① Environment construction ~
Get Splunk download link by scraping
Link to get started with python
Scraping with Node, Ruby and Python
Web scraping with Python ① (Scraping prior knowledge)
Get reviews with python googlemap api
Scraping with Selenium in Python (Basic)
Scraping with Python, Selenium and Chromedriver
Web scraping with Python First step
I tried web scraping with python.
Scraping with Python and Beautiful Soup
Get the weather with Python requests
Get web screen capture with python
Get the weather with Python requests 2
[Python] Get economic data with DataReader
How to get started with Python
[python] Read information with Redmine API