Extract database tables with CSV [ODBC connection from R and python]

What do you do when you want to extract database data in CSV? please tell me about that if you do not mind! I look like this ↓ ↓ ↓

I want to convert the database table to CSV

I want to convert the result of an extraction request to the database by writing a query to CSV.

In-house data analysts write SQL and create files for analysis. It's okay to use Excel to connect to ODBC, but I'd like to just put the data into the statistical software. I want to process the numerical calculation logic and replacement logic as they are on the analysis software side.

Connect with ODBC

There is something called odbc that acts as an intermediary for connecting to the database. Let's leave the ODBC settings and what to do to other sites. If you're not the one who sets up the database yourself, someone should be using it, so ask. For the time being, just ask for the IP address of the database server and the driver for connection.

The method is a link. Here or ODBC This is also ODBC

Set the "Data Source Name" at the time of setting. Make a note of this.

First from the R language

You can install R by downloading it from CRAN, installing it, and typing the following script in the R editor.

Table assumptions and scripts without bumps

Suppose you have a table full of 6 million records called user_log It takes a tremendous amount of time to extract everything with software that normally connects to a database. If other people are also connected, it is essential that the CPU load on the server becomes scary and everyone gets angry.

For software that connects to databases, I like → CSE. This can also be connected if the ODBC data source name, ID, and pass are known.

Basically this composition

library(RODBC)

conn_DB <- odbcConnect("Data source name", "Assigned ID", "Pass that will be issued with the ID")
query <- paste0("select count(*) from user_log")
tbl <- sqlQuery(conn_DB, query)
odbcClose(conn_DB)

Turn this with a FOR statement.

First you can connect to the database above to see how many rows the table has. Let's say you can confirm that it is 6 million. If you try to pull out all at once, the CPU of the server will stop thinking, so pull out on a small scale by dividing. On the analysis software side as well, there is a possibility that the memory will become full, so it is recommended to write it out in detail and combine it at the end.

SEQ<-seq(1,6000000,5000)
LIM<-length(SEQ)-1

conn_DB <- odbcConnect("Data source name", "Assigned ID", "Pass that will be issued with the ID")
for(i in 1:LIM){
  query <- paste0(
    "select * from user_log ",
    "limit 5000 offset ",
      as.numeric(SEQ[i])
  )
  tbl <- sqlQuery(conn_DB, query)
  write.csv(tbl, paste0("this_is_",i,"th.csv"), row.names=F)
  Sys.sleep(10)
}
odbcClose(conn_DB)

Detailed explanation and caution

With this, the data could be extracted by dividing it into 5000 records. Queries written in SQL can be combined with paste0. If there is a single quotation mark in SQL, prefix it with a backslash.

where colum = '10'
where colum = \'10\'

If you combine characters with paste0, SQL line breaks may disappear, so be careful when writing.

paste0("select * from user_log", "where colum = '10'")
select * from user_logwhere colum = '10'

It's stuck together.

Combine the extracted data

setwd("path of the file that spit out csv")
lf <- list.files(pattern="csv") 

data <- data.frame()

  for(i in 1:length(lf)){
    add <- fread(lf[i])
    data <- rbind(data,add)
  }

This will put 6 million data in data. Note that the processing will be katamaru if the PC does not have enough memory.

If you want to do machine learning, why not make each csv equivalent to a mini-batch?

python language edition

Since I explained the policy to some extent, I will omit the explanation from now on.

import pyodbc
import numpy as np
import pandas as pd

cnx = pyodbc.connect('DSN=Data source name; UID=Enter ID; PWD=Insert pass')
cursor = cnx.cursor()

list_for = np.arange(1,60000,5000)
list_for=list_for.tolist()

for i in range(len(list_for)):    
    made_que = "SELECT * FROM user_log " + "LIMIT 5000 OFFSET " + str(list_for[i])
    cursor.execute(made_que) 
    tbl = cursor.fetchall()
    array_format = np.array(tbl)
    dataframe_format = pd.DataFrame(array_format)
    dataframe_format.to_csv('csv_data/' + str(i) + '.csv')

that's all

After extraction, analysis production

Recommended Posts

Extract database tables with CSV [ODBC connection from R and python]
Extract text from PowerPoint with Python! (Compatible with tables)
Get data from database via ODBC with Python (Access)
Works with Python and R
Extract images and tables from pdf with python to reduce the burden of reporting
Reading and writing CSV with Python
Extract bigquery dataset and table list with python and output as CSV
[python] Extract text from pdf and read characters aloud with Open-Jtalk
[Natural language processing] Extract keywords from Kakenhi database with MeCab-ipadic-neologd and termextract
Scraping tabelog with python and outputting to CSV
ODBC connection to FileMaker 11 Server Advanced with Python 3
Read JSON with Python and output as CSV
Generate an insert statement from CSV with Python.
Make JSON into CSV with Python from Splunk
Example of reading and writing CSV with Python
Extract data from a web page with Python
Extract "current date only" and "current date and time" with python datetime.
[Python3] Connection to Oracle Database and SQL execution [cx_Oracle]
[Python] Read Japanese csv with pandas without garbled characters (and extract columns written in Japanese)
[Python beginner] Extract prefectures and cities from addresses (3 lines).
Extract components and callbacks from app.py with plotly Dash
Csv tinkering with python
[Python] Extract only numbers from lists and character strings
Remove headings from multiple format CSV files with python
ODBC access to SQL Server from Linux with Python
[Python / Ruby] Understanding with code How to get data from online and write it to CSV
Precautions when inputting from CSV with Python and outputting to json to make it an exe
Extract template of EML file saved from Thunderbird with python3.7
Notes on importing data from MySQL or CSV with Python
[Python] Read the csv file and display the figure with matplotlib
Hash with python and escape from a certain minister's egosa
Collecting information from Twitter with Python (MySQL and Python work together)
Python: Extract file information from shared drive with Google Drive API
Programming with Python and Tkinter
Encryption and decryption with Python
Read csv with python pandas
Python and hardware-Using RS232C with Python-
Extract Twitter data with CSV
UDP simultaneous connection with Python
Write to csv with Python
python with pyenv and venv
Download csv file with python
Extract csv data and calculate
Connection pooling with Python + MySQL
Operate Jupyter with REST API to extract and save Python code
How to import CSV and TSV files into SQLite with Python
Try to extract a character string from an image with Python3
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
Manipulating kintone data with Python & C Data ODBC Driver from AWS Lambda
Operate Firefox with Selenium from python and save the screen capture
[Python] Try to recognize characters from images with OpenCV and pyocr
LaTeX and R (a little Python) environment construction with SublimeText3 (Windows)
Communicate with FX-5204PS with Python and PyUSB
Shining life with Python and OpenCV
Extract the xz file with python
Try IAM Database Authentication from Python
Robot running with Arduino and python
Call C from Python with DragonFFI
Install Python 2.7.9 and Python 3.4.x with pip.
Neural network with OpenCV 3 and Python 3
AM modulation and demodulation with python