[PYTHON] I want to knock 100 data sciences with Colaboratory

Data analysis practice content "Data Science 100 Knock (Structured Data Processing)" has been released by the Data Scientist Association. Since it requires Docker operation to move it, I will leave a method to move it with Colaboratory for those who want to see it for the first time easily.

1. Download the data

First, create a suitable notebook and open Colaboratory. After opening, execute the following command to download the data on Google Drive.

from google.colab import drive
drive.mount('/content/drive')

!git clone https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess.git 'drive/My Drive/100knocks-preprocess'

If you are mounting the drive for the first time, you will see the following display below the cell you executed. Click the URL to grant access to Google Colaboratory's Drive. スクリーンショット 2020-06-17 9.45.24.png At the end, the message "Please copy this code, switch to the application and paste it." Is displayed. Paste the copied code into the "Enter your authorization code:" field above and execute it. If you go back to My Drive, you will see a folder called "100 knocks-preprocess". If all goes well, I won't use this notebook anymore.

2. Open Jupyter Notebook from My Drive

The notebook file is stored in the following directory. スクリーンショット 2020-06-17 10.04.05.png Let's open preprocess_knock_Python.ipynb in Google Colabatory.

3. Try to move

If you execute the first cell as it is, an error will occur, so if you only import the library, let's load the data with the following code

def get_df(filename):
  path = 'drive/My Drive/100knocks-preprocess/docker/work/data'
  return pd.read_csv(os.path.join(path, filename))

df_customer = get_df('customer.csv')
df_category = get_df('category.csv')
df_geocode = get_df('geocode.csv')
df_product = get_df('product.csv')
df_receipt = get_df('receipt.csv')
df_store = get_df('store.csv')

By the way, there is a pdf file that explains the aim of this content under the following folder, so it seems good to read it before working on it. 100knocks-preprocess/docker/dock

Now you are ready If you run it after a while, you may lose the connection with Drive. (Maybe ...) In that case, execute the following code again, or mount the drive from the sidebar and read the data again.

from google.colab import drive
drive.mount('/content/drive')

that's all

As I wrote this article, building an environment with Docker is not so difficult, and it is often useful if you can do it, so I think it is good to take this opportunity to challenge. The article here seems to be good for how to build on Mac. If you can create an environment, you can practice SQL!

Recommended Posts

I want to knock 100 data sciences with Colaboratory
I want to do ○○ with Pandas
I want to debug with Python
I want to be able to analyze data with Python (Part 3)
I want to be able to analyze data with Python (Part 1)
I want to be able to analyze data with Python (Part 4)
I want to be able to analyze data with Python (Part 2)
I want to detect objects with OpenCV
I want to blog with Jupyter Notebook
I want to pip install with PythonAnywhere
I want to analyze logs with Python
I want to play with aws with python
I want to use MATLAB feval with python
I want to analyze songs with Spotify API 2
I tried to save the data with discord
Anyway, I want to check JSON data easily
I want to display multiple images with matplotlib.
I want to make a game with Python
I want to be an OREMO with setParam!
I tried to get CloudWatch data with Python
I want to analyze songs with Spotify API 1
I want to use Temporary Directory with Python2
I want to get League of Legends data ③
I want to get League of Legends data ②
I don't want to use -inf with np.log
#Unresolved I want to compile gobject-introspection with Python3
I want to use ip vrf with SONiC
I want to solve APG4b with Python (Chapter 2)
I want to start over with Django's Migrate
I want to get League of Legends data ①
I want to write to a file with Python
Feel free to knock 100 data sciences with Google Colab and Azure Notebooks!
I want to convert an image to WebP with lollipop
I want to give a group_id to a pandas data frame
I want to handle optimization with python and cplex
I want to climb a mountain with reinforcement learning
I want to inherit to the back with python dataclass
I want to work with a robot in python.
I want to split a character string with hiragana
I want to AWS Lambda with Python on Mac!
I want to manually create a legend with matplotlib
[TensorFlow] I want to process windows with Ragged Tensor
[ML Ops] I want to do multi-project with Python
I tried to analyze J League data with Python
I want to say that there is data preprocessing ~
I want to run a quantum computer with Python
I want to bind a local variable with lambda
I want to solve Sudoku (Sudoku)
I want to remove Python's Unresolved Import Warning with vsCode
I want to use R functions easily with ipython notebook
I want to specify another version of Python with pyvenv
I tried to make various "dummy data" with Python faker
I want to make a blog editor with django admin
I want to start a jupyter environment with one command
[NetworkX] I want to search for nodes with specific attributes
I want to make a click macro with pyautogui (desire)
I want to change the Japanese flag to the Palau flag with Numpy
I want to color black-and-white photos of memories with GAN
I want to automatically attend online classes with Python + Selenium!
I want to make a click macro with pyautogui (outlook)
[Python] I want to use the -h option with argparse