[LINUX] [For beginners] Script within 10 lines (3. Data acquisition / csv conversion with datareader)

[For beginners] Script within 10 lines (3. Get data with datareader / write to csv)

If you use various libraries with python, I thought, "You can do a little thing with a little code, and you can make a little script with a little 5 steps, which is convenient." So I just listed python and other commands. I may come up with this, but I will post a 10-step script on an irregular basis.

As ** 3rd **, I would like to get the data with datareader and post the writing to csv. </ font>

I think that there are many cases where csv and excel data are passed in-house, and even if it is published on the net, it is lined up with json etc. and csv is published.

The data to be acquired is from January 1, 2016 of fred's Nikkei 225. This data is stored in the pandas data frame and written to csv.

Finally, as a supplement, plot with the plot function.

【environment】 Linux: debian10 python: 3.7.3 pandas: 1.0.3 pandas-datareader: 0.8.1

1. Data acquisition / storage in data frame

To get the datareader of Nikkei 225 data acquisition, the syntax is as follows. pdr.DataReader('NIKKEI225' ,'fred' ,start)

When writing to csv, write with pandas. The syntax is dataframe .to_csv ('write filename.csv')

The code ran in jupyter.

`datareader`



#Get Nikkei225 data from fred with datareader and write to csv
#outfile = ('./nikkei225_20200428.csv')
import pandas as pd
from pandas_datareader import data, wb
import datetime
import matplotlib.pyplot as plt

#Data acquisition / storage in data frame
start = datetime.datetime(2016, 1 ,1)
df_nikkei225 = pdr.DataReader('NIKKEI225' ,'fred' ,start)

#Write to csv
df_nikkei225.to_csv('./nikkei225_20200428.csv')

In the above, in the storage of the script in the data frame, the start date to be acquired is specified in the first line, and the Nikkei 225 is acquired from fred in the second line from the specified date in the first line, and it is described as a data frame.

start = datetime.datetime(2016, 1 ,1) df_nikkei225 = pdr.DataReader('NIKKEI225' ,'fred' ,start)

The name of the data frame is "df_nikkei225", but in reality anything is fine.

The output to csv was output as "nikkei225_20200428.csv" in the current directory.

2. Contents of the acquired data

Let's take a look at the acquired data. "Df_nikkei225" is the data frame used in the previous code.

`Data frame`



df_nikkei225
#Contents of the stored data frame
 	NIKKEI225
DATE 	
2016-01-01 	NaN
2016-01-04 	18450.98
2016-01-05 	18374.00
2016-01-06 	18191.32
2016-01-07 	17767.34
... 	...
2020-04-22 	19137.95
2020-04-23 	19429.44
2020-04-24 	19262.00
2020-04-27 	19783.22
2020-04-28 	19771.19

1128 rows × 1 columns

As a [supplement], I will draw a graph and output it as a jpg. The easiest way to plot the above data is with two lines of code. First, I imported matplotlib with "import matplotlib.pyplot as plt".

In that state, if the "data frame name.plot ()" and data frame is "df", you can plot with df.plot (). </ font> As an example, when outputting a graph as jpg, plt.savefig ("output file name.jpg ") </ font>

`plot`



#Graph drawing / output
df_nikkei225.plot()
plt.savefig("nikkei225_20200428.jpg ")

`Data information`



df_nikkei225.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1128 entries, 2016-01-01 to 2020-04-28
Data columns (total 1 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   NIKKEI225  1056 non-null   float64
dtypes: float64(1)
memory usage: 17.6 KB

df_nikkei225.isnull().sum()

NIKKEI225    72
dtype: int64

df_nikkei225.head()

DATE 	
2016-01-01 	NaN
2016-01-04 	18450.98
2016-01-05 	18374.00
2016-01-06 	18191.32
2016-01-07 	17767.34

There is no data on days when there are no transactions, but since the acquired data is plotted as it is, I think that the actual graph will be processed for missing values for the time being. I will touch on that in a separate article.

** Above, it was data acquisition and csv conversion with datareader. ** **