Summary

The Data Scientist Association https://www.datascientist.or.jp/ has released "Data Science 100 Knock (Structured Data Processing)". Record the work to try the Python version of Jupyter Notebook on a PC with Windows 10 Home Edition, Anaconda (2020.02) installed, without installing Docker.

What is "Data Science 100 Knock (Structured Data Processing)"?

It will be a collection of questions with data introduced below (3 languages, with answers). https://digitalpr.jp/r/39499

I have Docker

My environment -PC is Windows 10 Home. Although it is supported by Docker Toolbox. -Memory is 8GB. -Since Anaconda has been installed, Python and Jupyter Notebook work. First of all, I thought about trying what would happen if I didn't install Docker.

Download the Zip file from GitHub

Go to the top page of the project. For example, the top page of this "Data Science 100 Knock" is below. https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess

Now click "Green Code" on the right.

Then click Download ZIP.

PC Download 100knocks-preprocess-master.zip is downloaded to Florda.

When you unzip the zip, the contents are as follows.

All I need is a code with the question, a set of data, and an answer code.

For example, the code for Jupyter Notebook is below.

For example, the data is below.

Move the entire folder under MyPython (the folder that contains the Python code).

Launch Jupyter Notebook

Folder Go to MyPython → 100knocks-preprocess-master → docker → work.

Click preprocess_knock_Python.ipynb to open it.

Click the first Inbox to run Run.

When you confirm import It gets stuck with psycopg2.

If you look closely, some libraries are not installed. Think here. (1) Do you install these (there is a question that you may never use them?). (2) Do you define the dataframe yourself (I have a feeling that I will be in trouble later if I can not operate from the csv data).

I decided to define the dataframe myself.

Import is used as it is (libraries that are not installed are excluded for the time being), and geocode.csv has blank data, so I defined the type.

import os
import pandas as pd
import numpy as np
from datetime import datetime, date
from dateutil.relativedelta import relativedelta
import math

from sklearn import preprocessing
from sklearn.model_selection import train_test_split

df_customer = pd.read_csv('data/customer.csv')
df_category = pd.read_csv('data/category.csv')
df_product = pd.read_csv('data/product.csv')
df_receipt = pd.read_csv('data/receipt.csv')
df_store = pd.read_csv('data/store.csv')
df_geocode = pd.read_csv('data/geocode.csv',\
    converters={'prefecture':str,'city':str,'town':str,'street':str,'address':str})

Exercises can be carried out in their own way.

Referenced site

Practical learning environment for data science beginners "Data Science 100 Knock (Structured Data Processing)" is released for free on GitHub: https://digitalpr.jp/r/39499

Recommended Posts

Preparing to try "Data Science 100 Knock (Structured Data Processing)"

"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation

[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation

[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation

[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation

[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation