Language processing 100 knock 2020 (Rev 2) often processes after downloading the file.

There are various methods, so I will collect them.

The file is jawiki-country.json.gz in Chapter 3: Regular Expressions.

Conducted at Google Colaboratory requests

`requests.py`


import requests

url='https://nlp100.github.io/data/jawiki-country.json.gz'
filename='jawiki-country.json.gz'

urlData = requests.get(url).content

with open(filename ,mode='wb') as f: #Byte type can be written with wb
  f.write(urlData)

My way. Requets should be used in the same way as wget.

Writing a file is a standard method

It may be possible to handle large files such as Download using Python, Requests.

`requests2.py`


import requests
import os

url='https://nlp100.github.io/data/jawiki-country.json.gz'
filename=os.path.basename(url)

r = requests.get(url, stream=True)
with open(filename, 'wb') as f:
  for chunk in r.iter_content(chunk_size=1024):
    if chunk:
      f.write(chunk)
      f.flush()

Click here to write directly.

urllib.request

`urllib_request.py`


import urllib.request

url='https://nlp100.github.io/data/jawiki-country.json.gz'
save_name='jawiki-country.json.gz'

urllib.request.urlretrieve(url, save_name)

I checked it this time and it came out. Download files from the web with python Excellent thing that can even save files

pandas.read_X

`read_X.py`


import pandas as pd

url='https://nlp100.github.io/data/jawiki-country.json.gz'

df=pd.read_json(url, lines=True)

It was a big success in Chapter 2 pandas

Each command in input/output can be read directly from the url and the compression is automatically determined.

If you read it, it will be Data frame, so you can choose the usage, but if you want to process it as it is, this is fine.

Summary

If it's a command, you don't have to think about anything with wget, but when you do it programmatically, you have to think about it.

I think you should do it either way.

Recommended Posts

Download the file in Python

Save the binary file in Python

Download the file while viewing the progress in Python 3.x

File processing in Python

Parallel download in Python

File operations in Python

Read the file line by line in Python

[Python] Read the specified line in the file

Find the difference in Python

Download csv file with python

Download python

Getting the arXiv API in Python

File / folder path manipulation in Python

Python in the browser: Brython's recommendation

Implemented file download with Python + Bottle

Hit the Sesami API in Python

Linebot creation & file sharing in Python

Get the desktop path in Python

Download Google Drive files in Python

Get the script path in Python

In the python command python points to python3.8

Implement the Singleton pattern in Python

Create a binary file in Python

Download the file deployed with appcfg.py

The story of the "hole" in the file

Hit the web API in Python

Extract the targz file using python

I wrote the queue in Python

Calculate the previous month in Python

Examine the object's class in python

Get the desktop path in Python

ORC, Parquet file operations in Python

Get the host name in Python

Access the Twitter API in Python

The first step in Python Matplotlib

I wrote the stack in Python

Master the weakref module in Python

Change the standard output destination to a file in Python

Get the MIME type in Python and determine the file format

[Note] Import of a file in the parent directory in Python

After enabling the python virtual environment in the batch file, run the python file

Google search for the last line of the file in Python

Quadtree in Python --2

Python in optimization

Learn the design pattern "Prototype" in Python

CURL in python

Learn the design pattern "Builder" in Python

Script python file

Load the remote Python SDK in IntelliJ

Geocoding in python

SendKeys in Python

Exclusive control with lock file in Python

Check the behavior of destructor in Python

Meta-analysis in Python

Learn the design pattern "Flyweight" in Python

Read CSV file with python (Download & parse CSV file)

Try using the Kraken API in Python

Learn the design pattern "Observer" in Python

Learn the design pattern "Memento" in Python

Unittest in python