[PYTHON] A memo about using Colab Pro for about 2 months (good points / bad points)

What is Google Colaboratory?

A free Jupyter Notebook environment where you can write and execute Python code on your browser. There are usage restrictions, but you can also use the GPU.

Colab Pro A paid version of Google Colaboratory ($ 9.99 / month). There are various advantages compared to the regular version. Currently only in the US (contract requires US address and credit card).

Since I live in the United States, I signed up for Colab Pro at the same time as the launch. After using it for about 2 months, I got a feel for it, so I will leave a note of my know-how. Based on Official text, I will write about the actual situation (N = 1) that I tried using.

** Information as of May 13, 2020. ** ** This service is not resource-guaranteed, so the resources allocated to users change dynamically. Therefore, it can be completely unhelpful information if the time is different or the area is different (for example, when the service starts in Japan). Please be careful.

good point

High-performance GPU is basically assigned all the time

Register with Colab Pro to give priority to the fastest GPU. For example, when an unregistered user is assigned a K80 GPU, the registered user can use the T4 or P100 GPU. TPU is also available preferentially.

In my case, Tesla-P100 was assigned by default (I haven't hit T4 even once in 2 months), but no matter how much I abuse the GPU, it's not Tesla-P100. Was never assigned. Specifically, I've been running the GPU almost all the time (on one notebook) for machine learning for a month and it's okay. colab3_s.jpg

You don't have to worry too much about the usage limit, you can run the GPU on multiple notebooks at the same time

The free version of Colab has significantly restricted access to faster GPUs and has a much lower usage limit than Colab Pro. Colab Pro is not without usage caps. Also, the types of GPUs and TPUs available in Colab Pro are subject to change in the future.

In the regular version, the usage limit was reached as soon as the GPU was used to some extent, but in Pro it has almost disappeared. However, I also confirmed that there is no upper limit on usage. It took about half a day to run the GPU on 4 notebooks at the same time, and the GPU itself became unusable with the warning below that the limit was reached. colab_0420.jpg (I was advised to join Colab Pro even though I already use Colab Pro).

However, without such extreme usage, the upper limit was never reached even if it was operated in parallel. By the way, even if the usage limit was reached, it could be used after a few hours. The GPU has also been reassigned a Tesla P100. So you don't have to be so scared.

It can be run continuously for 24 hours, and timeouts are unlikely to occur.

With Colab Pro, you can keep your notebook connected for up to 24 hours with relatively low idle timeouts. However, the connection time is not guaranteed and the idle timeout behavior may change. With the free version of Colab, your notebook can be used for up to 12 hours. Idle timeouts are much more stringent than Colab Pro.

The continuous operation time, which was 12 hours in the normal version, has been increased to 24 hours. This is quite important when turning a little heavy processing. It takes about a day to complete the study, but when I say that, it's good to resume the study that stopped each time, but it will wither if I do it every time. So far, as long as the process is running, it did not stop in the middle before 24 hours passed, I remember that it stopped quite a bit in the normal version.

Idle timeout, that is, the time when the runtime disconnects when the notebook is not operated (even when the notebook is closed), but I feel that it has become longer, but I also feel that it is a cause for concern. Even with Pro, it seems that it is quite cut off after about 30 minutes, I feel that it was cut off earlier with the normal version, but it may be due to my mind (I was told in the past that there is a 90 minute rule in Colab). I feel like it, but I have the impression that it almost never lasts 90 minutes).

Large capacity memory can be used

Colab Pro gives you priority access to high memory VMs. High-memory VMs typically have twice as much memory and CPU as standard Colab VMs. Colab Pro users can enable the use of high memory VMs from their notebook settings. High memory VMs may also be automatically assigned if Colab deems it necessary. However, resources are not guaranteed and high memory VMs have usage limits. High memory settings are not available in the free version of Colab, and users are rarely automatically assigned high memory VMs.

Certainly, I was able to easily set up a high memory VM (27G) with just one click. It's easy (note that it can't be assigned unless it's set like the GPU). So far, we haven't reached the usage limit for memory. image.png

Cheap price

You can use a high-performance GPU to some extent for $ 9.99 a month. I think the cost performance is amazing because it was quite expensive to make a VM with a similar configuration with GCP.

Not good points

Colab Pro is not inferior to the regular version, so the main story is about Colab itself.

Poor editor and debug environment

The editor for editing files (.py, etc.) other than the Colab notebook can only be done with Colab's standard almost plain notepad or a dubious cloud editor that can be used by linking with Google services. It's hard. Debugging [How to do something with magic commands](https://ja.stackoverflow.com/questions/62955/google-colaboratory%E3%81%A7%E3%81%AE%E3%83%87% E3% 83% 90% E3% 83% 83% E3% 82% B0% E3% 81% AE% E3% 82% 84% E3% 82% 8A% E6% 96% B9), about Print by dividing the cell into small pieces .. Large programs that require a decent debugger may not be suitable for handling on Colab in the first place.

I try not to mess with module level code too much on the basic Colab. I code and lightly debug in the local environment, git push before running on Colab, and run ! Git pull every time in the execution cell of Colab's node book to run with the latest version of the code.

Sync with Google Drive may be delayed

This is a complex issue of cooperation between services rather than Colab, but the synchronization timing between files on Google Drive and Colab may be off. Specifically, it may not be synchronized by the time you edit a script such as .py and execute it, and the edited content may not be reflected. It is dangerous because it often drags bugs. This is also one of the disadvantages of writing code on Colab or Google Drive.

If you generate a large file on Google Drive in a short time, a problem may occur.

This is completely a problem on the Google Drive side, but if you generate a large amount of data in a short period of time (Checkpoint in machine learning etc.), you will not be able to generate files in Google Drive for a certain period of time (save and upload are also impossible) The phenomenon happened frequently. It felt like it would happen when it exceeded 100 meters per minute. Once the file couldn't be generated, it would continue for several hours, which would hinder my work. Therefore, it was necessary to avoid generating a large amount of data in a short period of time.

It expires in 24 hours without any questions

Compared to 12 hours, the experience is quite different, but after all it is annoying to cut off in the middle. It is easier to implement a mechanism that allows you to stop thinking and resume learning.

Processing may stop if the network is unstable

It's a completely local issue, but it will stop working over time if the network is disconnected and the notebook is disconnected. As mentioned above, it is said that this idle timeout time is also longer than the normal version, but it is not so obvious. I have the impression that it has stopped unless I notice it and return immediately. I personally have an unstable line at home, so before going to bed, I remotely enter a PC with a stable line and open my notebook (I wonder what I'm doing). is).

Only available in the US

I want you to start the service in Japan (strongly requested).

Summary

It was a memo of Colab Pro. We will update if there are any additional information or changes. The situation may be different for each user, so I would appreciate it if you could provide information (too little information). Please let us know if you have any ideas on how to use Colab effectively.

Recommended Posts

A memo about using Colab Pro for about 2 months (good points / bad points)
Tips for using ElasticSearch in a good way
Memo for building a machine learning environment using Python
A story about using Python's reduce
A memo when setting up a Docker container for using JUMAN ++, KNP, python