[PYTHON] Parsing CSV format data using SQL

You can use q to issue SQL directly to CSV or TSV format files.

Installation procedure

Example of data analysis using q

When you have the following CSV (file name is orders.csv) of order data.

OrderId,OrderDate,ProductID,UserID,Quantity,TotalPrice
10000005,2015/2/13,100,10000,1,10000
10000006,2015/2/13,101,10001,2,2000
10000007,2015/2/13,102,10002,3,6000
10000008,2015/2/13,100,10003,1,10000
10000009,2015/2/13,101,10004,2,2000
10000010,2015/2/14,100,10005,3,30000
10000011,2015/2/14,101,10006,1,2000
10000012,2015/2/14,102,10007,2,4000
10000013,2015/2/14,100,10008,3,30000
10000014,2015/2/14,101,10009,1,1000

I want to extract data with sales (Total Price) of 10,000 yen or more

$ q -H -d',' "SELECT * FROM ./orders.csv WHERE TotalPrice >= 10000"

10000005,2015/2/13,100,10000,1,10000
10000008,2015/2/13,100,10003,1,10000
10000010,2015/2/14,100,10005,3,30000
10000013,2015/2/14,100,10008,3,30000

-H means skip header lines. Specify the delimiter with -d (specify "," because it is CSV)

I want to aggregate daily sales

$ q -H -d',' "SELECT OrderDate, SUM(TotalPrice) FROM ./orders.csv GROUP BY OrderDate"

2015/2/13,30000
2015/2/14,67000

I want to put out a product sales ranking

q -H -d',' "SELECT ProductId, SUM(Quantity), SUM(TotalPrice) AS total FROM ./orders.csv GROUP BY ProductId Order BY total DESC"

100,8,80000
102,5,10000
101,6,7000

Data analysis is quick.

Recommended Posts

Parsing CSV format data using SQL
Try using django-import-export to add csv data to django
Convert json format data to txt (using yolo)
Data analysis using xarray
Get Amazon RDS (PostgreSQL) data using SQL with pandas
Data analysis using Python 0
Data cleansing 2 Data cleansing using DataFrame
Data cleaning using Python
Aggregate event data into one-user, one-line format using pandas
Process csv data with python (count processing using pandas)
SQL format with sqlparse
Extract Twitter data with CSV
Select features using text data
Write data in HDF format
Data visualization method using matplotlib (1)
Data visualization method using matplotlib (2)
SQL connection method using pyodbc
Data analysis using python pandas
Extract csv data and calculate
Hachinai's grade management using SQL
[Python] Read a csv file with a large data size using a generator