[PYTHON] Try "100 knocks on data science" ①

Practical learning environment for data science beginners "Data Science 100 Knock (Structured Data Processing)" released on GitHub for free https://digitalpr.jp/r/39499

I will try this. By the way, I don't know R at all, so I don't think I'll do it.

environment

・ Windows 10 ・ Docker for Desktop ・ Git

Environment

Start Docker

 wsl -e docker-desktop

Clone from Git

git clone https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess.git

Build Docker

cd 100knocks-preprocess
docker-compose up -d --build

It will take a while ... Something is dropped ...

Precautions for environment construction

If you are using Docker Toolbox, the URL of the access destination will change. http://192.168.99.100:8888 The host when accessing the DB using the client tool also changes.

Confirmation of 100 knock environment

Access below http://localhost:8888

image.png

A screen like this should open

When you open work / preprocess_knock_SQL.ipynb

image.png

There is a SQL problem like this. The same is true for Python and R, and there seems to be an answer in the anser folder. DB is Postgre12.

It's not that difficult, but when asked to normalize it, it's okay ...? I google for a moment w

bonus

This environment seems to use *** Jupyter Notebook ***. I thought I'd look at it a lot, but I didn't know it so I looked it up.

Jupyter notebook is one of the Jupyter projects and is an OSS web application. The Jupyter project is a project for developing services and OSS to realize interactive processing execution across multiple languages. And you can manage the execution result of the program collectively.

It seems that this screen with the extension * .ipynb is made, and when I opened the inside, it was a json format file like the one below.

Entity_Relationship.ipynb


{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#DB logical design (ER diagram)\n",
    "-Some data that is FK may have data that does not exist in the master table.\n",
    "-Example) Non-member customer ID is not included in the customer table\n",
    "-Therefore, the FK information does not meet the external reference constraints in a typical database.\n",
    "-Please use it as reference information when combining data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![ER diagram](data/100knocks_ER.png\"sample\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

reference

Jupyter notebook https://qiita.com/szk3/items/920fd3e905ed16469780

Recommended Posts

Try "100 knocks on data science" ①
Challenge 100 data science knocks
[Python] 100 knocks on data science (structured data processing) 018 Explanation
[Python] 100 knocks on data science (structured data processing) 023 Explanation
[Python] 100 knocks on data science (structured data processing) 017 Explanation
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 024 Explanation
[Python] 100 knocks on data science (structured data processing) 027 Explanation
[Python] 100 knocks on data science (structured data processing) 029 Explanation
[Python] 100 knocks on data science (structured data processing) 015 Explanation
[Python] 100 knocks on data science (structured data processing) 028 Explanation
Learn data science
Books on data science to read in 2020
Preparing to try "Data Science 100 Knock (Structured Data Processing)"
Try importing MLB data on Mac and Python
Try FEniCS on Windows!
Try Poerty on Windows
Try NeosVR on Linux
OPT data science competition
Try deepdream on Mac
Try rudimentary sentiment analysis on Twitter Stream API data.
Data science 100 knocks ~ Battle for less than beginners part10
Data science 100 knocks ~ Battle for less than beginners part8
Try translating the Python Data Science Handbook into Japanese
Data science 100 knock commentary (P021 ~ 040)
Data science 100 knock commentary (P061 ~ 080)
Try StyleGAN on Google Colaboratory
Try accessing AWS Redshift data using Oracle Cloud Infrastructure Data Science
Data science 100 knock commentary (P041 ~ 060)
Try using OpenCV on Windows
Data Science Cheat Sheet (Python)
How to implement 100 data science knocks for data science beginners (for windows10 Home)
[Python] Notes on data analysis
Dockerfile for creating a data science environment based on pip3
Infra_ Data Science Course Output
Try Random Erasing Data Augmentation
Pandas 100 knocks on Google Colaboratory
I tried AdaNet on table data
Try running tensorflow on Docker + anaconda
Try using Pillow on iPython (Part 1)
Try installing OpenAM on Amazon Linux
Try using Pillow on iPython (Part 2)
Try implementing k-NN on your own
Introducing books related to data science.
Try to put data in MongoDB
Try Ajax on the Django page
Try using Pillow on iPython (Part 3)
Install fabric on Ubuntu and try
[Data science basics] Data acquisition from API
Try running Jupyter Notebook on Mac
Try data parallelism with Distributed TensorFlow
Try installing OpenCV 3.0 on your AMI
Data science environment construction with Docker