Study on Tokyo Rent Using Python (3-2)

Result excerpt

Simultaneous posting on the blog: https://leoluistudio.com/blog/24/python%e3%82%92%e4%bd%bf%e3%81%a3%e3%81%a6%e6%9d%b1%e4%ba%ac%e9%83%bd%e5%ae%b6%e8%b3%83%e3%81%ab%e3%81%a4%e3%81%84%e3%81%a6%e3%81%ae%e7%a0%94%e7%a9%b6-4%e3%81%ae2/

Statistical method 1

Data collection

--Data source: https://suumo.jp --Library used: requests, BeautifulSoup, re, time, sqlite3 --The code is omitted because the collection part is just a request and put it in the database.


Database

Database structure

info
id 1 (PRIMARY KEY)
municipal Chiyoda Ward
train 7 (Walk to the station)
type apartment
date 1983
structure Steel rebar
floor 8
carpark 0 (0 is none, 1 is yes)
price
id 1 (PRIMARY KEY)
pid 160000 (Circle)
1 (=info id) 7 (Walk to the station)
area 42.9 (Square meter)
date east

First 5th row for each table db.png


Programs and results

First install the necessary libraries

import sqlite3
import pandas as pd

import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import plot

from sklearn.decomposition import PCA

Connect the database

#If the result is output only
conn = sqlite3.connect(‘info.db’)
c = conn.cursor()
cursor = c.execute(“SQL code”)
for row in cursor:
    print(result)
conn.close()

#For processing by saving the result
conn = sqlite3.connect(‘info.db’)
df = pd.read_sql_query(“SQL code”, conn)
conn.close()

First look at the distribution of the data

#number of info lines
SELECT COUNT(id) FROM info
#Result: 82,812

#Number of lines in price
SELECT COUNT(id) FROM price
#Result: 624,499

#Number of "apartments" and "apartments"
SELECT type,COUNT(type) FROM info GROUP BY type
#Result: Apartment 33,110 /Mansion 49,702

Distribution of completion dates

# SQL
SELECT date AS year,COUNT(date) AS count FROM info WHERE date > 0 GROUP BY date

#Graph
fig = px.bar(df, x=’year’, y=’count’, height=500, width=1000)

L1no4.png


Distribution of building structure

# SQL
SELECT structure,COUNT(structure) AS count FROM info WHERE structure != 0 GROUP BY structure

#Graph
fig = px.bar(df, x=structure, y=’count’, height=500, width=1000)

L1no5.png


Rank distribution

# SQL
SELECT floor,COUNT(floor) AS count FROM info WHERE floor != 0 GROUP BY floor

#Graph
fig = px.bar(df, x=floor, y=’count’, height=500, width=1000)

L1no6A.png

In the table above, the distribution on the 20th floor and above cannot be seen at all, so let's take a closer look here.

# SQL
SELECT floor,COUNT(floor) AS count FROM info WHERE floor > 20 GROUP BY floor

#Graph
fig = px.bar(df, x=floor, y=’count’, height=500, width=1000)

L1no6B.png


You must group the continuous variables (e.g. price) before you analyze them.

def pricegroup(df):
    if df[‘price’] < 30000:
      return ‘<30,000’
    elif df[‘price’] < 60000:
      return ‘30,000-60,000’
    ……
    else:
      return ‘>270,000’

pricegroup_list = [‘<30,000’,
    ‘30,000-60,000’,
    ‘60,000-90,000’,
    ……
    ‘240,000-270,000’,
    ‘>270,000’]

Price distribution

# SQL
SELECT price FROM price

#Dataframe processing df[‘pricegroup’] = df.apply(pricegroup, axis=1)
dfcount = df.groupby([‘pricegroup’]).count()

#Graph
fig = px.bar(dfcount, x=dfcount.index, y=’price’, height=500, width=1000)
fig.update_layout(xaxis={‘categoryorder’:’array’, ‘categoryarray’:pricegroup_list}, yaxis_title=’count’)

L1no7V2.png


The area distribution is the same

def pricegroup(df):
    if df[‘area’] < 5:
      return ‘<5’
    elif df[‘area’] < 10:
      return ‘5-10’
    ……
    else:
      return ‘>45’

pricegroup_list = [‘<5′, ’5-10′, ’10-15′, ’15-20′,
    ’20-25′, ’25-30′, ’30-35′, ’35-40′,
    ’40-45′,’>45′]

# SQL
SELECT area FROM price

#Dataframe processing
df[‘areagroup’] = df.apply(areagroup, axis=1)
dfcount = df.groupby([‘areagroup’]).count()

#Graph
fig = px.bar(dfcount, x=dfcount.index, y=’area’, height=500, width=1000)
fig.update_layout(xaxis={‘categoryorder’:’array’, ‘categoryarray’:areagroup_list}, yaxis_title=’count’)

L1no8V2.png


Directional distribution

# SQL
SELECT direction,COUNT(direction) AS count FROM price WHERE direction != ‘-‘ GROUP BY direction

#Graph
fig = px.bar(df, x=direction, y=’count’, height=500, width=1000)

L1no9.png


Before analyzing the wards, cities, towns and villages, divide them into 23 wards and cities.

m23_list = [‘Chiyoda Ward’,'Chuo-ku','Minato-ku','Shinjuku ward',‘Bunkyo Ward’,'Taito',‘Sumida Ward’,
‘Koto Ward’,‘Shinagawa Ward’,'Meguro','Ota-ku','Setagaya','Shibuya Ward','Nakano',
'Suginami','Toshima ward',‘Kita Ward’,'Arakawa',‘Itabashi Ward’,‘Nerima Ward’,'Adachi Ward',
'Katsushika',‘Edogawa Ward’]

municipal_dict = {}
conn = sqlite3.connect(‘info.db’)
c = conn.cursor()
cursor = c.execute(“SELECT id,municipal FROM info”)
for row in cursor:
    municipal_dict.update({row[0]:row[1]})
conn.close()

def municipal(df):
    return municipal_dict[df[‘pid’]]

def municipal23(df):
    if df[‘municipal’] in m23_list:
      return ‘Special Wards’
    else:
      return ‘Non Special Wards’

Relationship between price and area (divided into 23 wards and city)

# SQL
SELECT pid,price,area FROM price

#Dataframe processing
df[‘municipal’] = df.apply(municipal, axis=1)
df[‘municipal23’] = df.apply(municipal23, axis=1)
dfmedian = df.groupby([‘pid’, ‘municipal23’])[‘price’, ‘area’].median()
dfmedian_reset = dfmedian.reset_index(level=’municipal23′)

#Graph
fig = px.scatter(dfmedian_reset, x=’area’, y=’price’, color=’municipal23′, labels={‘municipal23’: ‘Special Wards’}, height=500, width=1000)

L2no1.png

Recommended Posts

Study on Tokyo Rent Using Python (3-2)
Study on Tokyo Rent Using Python (3-3)
Study on Tokyo Rent Using Python (3-1 of 3)
Broadcast on LINE using python
Introducing Python using pyenv on Ubuntu 20.04
Preparing python using vscode on ubuntu
Notes on installing Python using PyEnv
Notes on using rstrip with python.
Install Python on CentOS using Pyenv
Install Python on CentOS using pyenv
Notes for using OpenCV on Windows10 Python 3.8.3.
Execute Python code on C ++ (using Boost.Python)
Install python library on Lambda using [/ tmp]
Run servomotor on Raspberry Pi 3 using python
Detect temperature using python on Raspberry Pi 3!
Video processing using Python + OpenCV on Mac
Notes on using code formatter in Python
Python study note_002
Python study notes _000
Python on Windows
twitter on python3
Python study notes_006
Start using Python
python on mac
Python study note_003
Python study notes _005
Python on Windbg
Scraping using Python
Python study notes_001
Python study day 1
Build Python3.5 + matplotlib environment on Ubuntu 12 using Anaconda
Detect slide switches using python on Raspberry Pi 3!
Notes on installing Python3 and using pip on Windows7
Install Python 3.8.6 on macOS Big Sur using pyenv
Detect magnet switches using python on Raspberry Pi 3!
Notes on using dict in python [Competition Pro]
Sound the buzzer using python on Raspberry Pi 3!
Generate points only in Tokyo using python + shapely
[Python] Notes on accelerating genetic algorithms using multiprocessing
Initial settings for using Python3.8 and pip on CentOS8
Python conda on cygwin
Install python on WSL
PyOpenGL setup on Python 3
Install Python on Pidora.
Install Scrapy on python3
Fibonacci sequence using Python
Data analysis using Python 0
[Heroku] Memo for deploying Python apps using Heroku on Windows [Python]
Dry-run sql query using psycopg2 on Redshift in Python
Install Python on Mac
Install Python 3 on Mac
Install Python3.4 on CentOS 6.6
Data cleaning using Python
Check types_map when using mimetypes on AWS Lambda (Python)
Installing pandas on python2.6
Using Python #external packages
python basic on windows ②
WiringPi-SPI communication using Python
Build a Python environment on your Mac using pyenv
Install python on windows
Age calculation using python