[PYTHON] [Colab] How to copy a huge dataset

Background

PyTorch's DCGAN Tutorial requires a huge dataset (1GB, about 220,000 images) ↓ Learning is slow & memory shortage when running on local Jupyter Lab ↓ Let's learn on ** Google Colaboratory ** (hereinafter Colab) that can also use GPU ↓ Problems occur while moving hands

** How to copy a dataset to Colab? ** **

problem

――Can Colab refer to Google Drive files? ――Can ZIP be decompressed on Colab?

Can Colab refer to Google Drive files?

You can refer to it by mounting Google Drive. Create a new notebook and run the following code.

from google.colab import drive
drive.mount('/content/drive')

A link to generate an authentication code will be attached, so access it. Select the account that uses Colab in the account selection. Google Drive File Stream will ask for access, so allow it. An authentication code will be issued, so copy it, paste it, and enter.

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=xxx
Enter your authorization code:

Can ZIP be decompressed on Colab?

You can use the unzip command on Colab. First, upload the ZIP file of the dataset to Google Drive and copy it to Colab.

cp "./drive/My Drive/Colab Notebooks/data/celeba/img_align_celeba.zip" "."

After that, use the unzip command to unzip it on Colab.

!unzip "img_align_celeba.zip"

Try to display the unzipped image

from PIL import Image
Image.open('img_align_celeba/000001.jpg')

image.png I was able to display it safely.

Summary

  1. ZIP upload your dataset to Google Drive
  2. Mount Google Drive on Colab
  3. Copy the ZIP file onto Colab
  4. Unzip the ZIP file on Colab

Postscript

If you just want to upload a file to Colab, you can select a local file with the following code. However, I feel that it takes a long time to upload a large file. I feel that I can upload faster via Google Drive in this article.

from google.colab import files
files.upload()

Recommended Posts

[Colab] How to copy a huge dataset
How to hack a terminal
How to make a Japanese-English translation
How to write a Python class
How to put a symbolic link
How to make a slack bot
How to create a Conda package
How to read the SNLI dataset
How to make a crawler --Advanced
How to make a recursive function
How to create a virtual bridge
How to make a deadman's switch
How to create a Dockerfile (basic)
[Blender] How to make a Blender plugin
How to delete a Docker container
How to make a crawler --Basic
How to create a config file
How to create a clone from Github
How to split and save a DataFrame
How to build a sphinx translation environment
How to create a git clone folder
Qiita (1) How to write a code name
How to add a package with PyCharm
[Python] How to make a class iterable
How to draw a graph using Matplotlib
[Python] How to convert a 2D list to a 1D list
How to use mecab, neologd-ipadic on colab
[Python] How to invert a character string
How to install a package using a repository
[Ubuntu] How to execute a shell script
How to get a stacktrace in python
How to create a repository from media
How to make a Backtrader custom indicator
How to choose a Seaborn color palette
How to test on a Django-authenticated page
How to make a Pelican site map
How to run a Maya Python script
How to copy and paste the contents of a sheet in Google Spreadsheet in JSON format (using Google Colab)
How to make a dialogue system dedicated to beginners
How to read a CSV file with Python 2/3
How to disguise a ZIP file as a PNG file
A simple example of how to use ArgumentParser
How to send a message to LINE with curl
How to create a Python virtual environment (venv)
How to code a drone using image recognition
How to clear tuples in a list (Python)
How to draw a 2-axis graph with pyplot
How to embed a variable in a python string
How to create a function object from a string
How to draw a 3D graph before optimization
How to develop a cart app with Django
Randomly sample MNIST data to create a dataset
How to make a dictionary with a hierarchical structure.
How to generate a Python object from JSON
How to deploy a Streamlit application to GCP (GAE)
How to implement a gradient picker in Houdini
How to add a Python module search path
How to make a QGIS plugin (package generation)
How to extract coefficients from a fractional formula
How to write a ShellScript Bash for statement
How to remember when you forget a word