[PYTHON] Download the VGG Face2 dataset directly to the server

Introduction

I have a dataset called VGFFace2 and I need to log in to download the data. Since the total data set is about 40GB, you want to download it to the server on AWS instead of downloading it locally. However, since the server on AWS is CUI, you cannot open a browser, log in, and download. Therefore, in this article, I will explain how to download using wget in the same situation as logging in even in the CUI environment.

Check cookies in your local environment

First, open the following site in your local environment and log in. http://zeus.robots.ox.ac.uk/vgg_face2/ On that site, after logging in, a token is issued and managed by cookies. If you use those cookies, you can also download them with CUI. To see cookies, use chrome developer tools, open the "applications" tab and click on cookies to see a list of cookies used on your site.

Download cookies.txt

In order to wget using cookies, it is necessary to save the cookies information as txt according to the format. You can write it manually according to the format, but since there is a Chrome extension called "get cookies.txt", let's use it. https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/related Once you have downloaded cookies.txt, save cookies.txt on your server.

Download dataset with wget

Now that the cookies with login information are ready, all you have to do is download them using wget. The download link for the VGG Face2 dataset is below. I found the link below by right-clicking on the link on the dataset download page and getting the link.

Train Data_v1. http://zeus.robots.ox.ac.uk/vgg_face2/get_file?fname=vggface2_train.tar.gz Test Data_v1 http://zeus.robots.ox.ac.uk/vgg_face2/get_file?fname=vggface2_test.tar.gz Train_Images_v1. http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/meta/train_list.txt Test_Images_v1. http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/meta/test_list.txt

The wget command that uses cookies.txt is below. You need to put cookies.txt in the same folder when you hit this command.

wget --load-cookies cookies.txt -r -k -E  url

Download the 36GB of train data directly to your server as follows:

wget --load-cookies cookies.txt -r -k -E  http://zeus.robots.ox.ac.uk/vgg_face2/get_file?fname=vggface2_train.tar.gz

Recommended Posts

Download the VGG Face2 dataset directly to the server
The road to download Matplotlib
How to read the SNLI dataset
Preparing to load the original dataset
Try to face the integration by parts
How to set the server time to Japanese time
Log in to the remote server with SSH
POST images from ESP32-CAM (MicroPython) to the server
Download files directly to Google Drive (using Google Colaboratory)
[Python] I will upload the FTP to the FTP server.
Upload the image downloaded by requests directly to S3
[Python] How to specify the download location with youtube-dl
Send log data from the server to Splunk Cloud
I want to use the R dataset in python