[PYTHON] Preparing to try "Data Science 100 Knock (Structured Data Processing)"

Summary

The Data Scientist Association https://www.datascientist.or.jp/ has released "Data Science 100 Knock (Structured Data Processing)". Record the work to try the Python version of Jupyter Notebook on a PC with Windows 10 Home Edition, Anaconda (2020.02) installed, without installing Docker.

What is "Data Science 100 Knock (Structured Data Processing)"?

It will be a collection of questions with data introduced below (3 languages, with answers). https://digitalpr.jp/r/39499

I have Docker

My environment -PC is Windows 10 Home. Although it is supported by Docker Toolbox. -Memory is 8GB. -Since Anaconda has been installed, Python and Jupyter Notebook work. First of all, I thought about trying what would happen if I didn't install Docker.

Download the Zip file from GitHub

Go to the top page of the project. For example, the top page of this "Data Science 100 Knock" is below. https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess

Now click "Green Code" on the right. image.png

Then click Download ZIP. image.png

PC Download 100knocks-preprocess-master.zip is downloaded to Florda. image.png

When you unzip the zip, the contents are as follows. image.png

All I need is a code with the question, a set of data, and an answer code.

For example, the code for Jupyter Notebook is below. image.png

For example, the data is below. image.png

Move the entire folder under MyPython (the folder that contains the Python code). image.png

Launch Jupyter Notebook

image.png

Folder Go to MyPython → 100knocks-preprocess-master → docker → work. image.png

Click preprocess_knock_Python.ipynb to open it. image.png

Click the first Inbox to run Run.

When you confirm import image.png It gets stuck with psycopg2.

If you look closely, some libraries are not installed. Think here. (1) Do you install these (there is a question that you may never use them?). (2) Do you define the dataframe yourself (I have a feeling that I will be in trouble later if I can not operate from the csv data).

I decided to define the dataframe myself.

Import is used as it is (libraries that are not installed are excluded for the time being), and geocode.csv has blank data, so I defined the type.

import os
import pandas as pd
import numpy as np
from datetime import datetime, date
from dateutil.relativedelta import relativedelta
import math

from sklearn import preprocessing
from sklearn.model_selection import train_test_split

df_customer = pd.read_csv('data/customer.csv')
df_category = pd.read_csv('data/category.csv')
df_product = pd.read_csv('data/product.csv')
df_receipt = pd.read_csv('data/receipt.csv')
df_store = pd.read_csv('data/store.csv')
df_geocode = pd.read_csv('data/geocode.csv',\
    converters={'prefecture':str,'city':str,'town':str,'street':str,'address':str})

Exercises can be carried out in their own way. image.png

Referenced site

Practical learning environment for data science beginners "Data Science 100 Knock (Structured Data Processing)" is released for free on GitHub: https://digitalpr.jp/r/39499

Recommended Posts

Preparing to try "Data Science 100 Knock (Structured Data Processing)"
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
Data science 100 knock (structured data processing) environment construction (Windows10)
[Python] Data Science 100 Knock (Structured Data Processing) 001-010 Impressions + Explanation Link Summary
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 2]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 1]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 5]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 4]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 6]
[Python] 100 knocks on data science (structured data processing) 018 Explanation
[Python] 100 knocks on data science (structured data processing) 023 Explanation
[Python] 100 knocks on data science (structured data processing) 030 Explanation
[Python] 100 knocks on data science (structured data processing) 022 Explanation
[Python] 100 knocks on data science (structured data processing) 017 Explanation
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 024 Explanation
[Python] 100 knocks on data science (structured data processing) 027 Explanation
[Python] 100 knocks on data science (structured data processing) 029 Explanation
[Python] 100 knocks on data science (structured data processing) 015 Explanation
[Python] 100 knocks on data science (structured data processing) 028 Explanation
100 language processing knock-92 (using Gensim): application to analogy data
Data science 100 knock commentary (P021 ~ 040)
Data science 100 knock commentary (P061 ~ 080)
Data science 100 knock commentary (P041 ~ 060)
Data science 100 knock commentary (P081 ~ 100)
Try "100 knocks on data science" ①
Introducing books related to data science.
Try to put data in MongoDB
Preparing to start natural language processing
100 Language Processing Knock-91: Preparation of Analogy Data
Try converting to tidy data with pandas
Books on data science to read in 2020
[Introduction to SEIR model] Try fitting COVID-19 data ♬
Try using django-import-export to add csv data to django
Try to aggregate doujin music data with pandas
Python inexperienced person tries to knock 100 language processing 14-16
I want to knock 100 data sciences with Colaboratory
Use decorators to prevent re-execution of data processing
Python inexperienced person tries to knock 100 language processing 07-09
Python inexperienced person tries to knock 100 language processing 10 ~ 13
Python inexperienced person tries to knock 100 language processing 05-06
Python inexperienced person tries to knock 100 language processing 00-04
100 Language Processing Knock (2020): 28
Learn data science
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
Data Science 100 Knock ~ Battle for less than beginners part6
Data Science 100 Knock ~ Battle for less than beginners part2
Data Science 100 Knock ~ Battle for less than beginners part1
Data Science 100 Knock ~ Battle for less than beginners part9
Data Science 100 Knock ~ Battle for less than beginners part7