[PYTHON] Extract the Azure SQL Server data table with pyodbc and try to make it numpy array / pandas dataframe

From the place where you can pull out the data table in Azure SQL Server with the python library called pyodbc, you can convert the table to numpy's ʻarray format or pandas's DataFrame` format. This is a memo.

→ Definition of library, environment and function is written here. ,, So please refer to that first.

Preparation for array / DataFrame format

## libraries
import pyodbc
import numpy as np
import pandas as pd

## initial setting
##Please change the set value as appropriate
server = 'abc_server.database.windows.net'  
database = 'abc_database'  
username = 'abc_user'  
password = 'abc_password'  

##Function definition
###Define DB connection
def db_connection(sv=server, db=database, un=username, pw=password):    
    cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+sv+';DATABASE='+db+';UID='+un+';PWD='+ pw)
    return cnxn.cursor()

###Issue SQL
def query_output(sql):
    cursor.execute(sql)
    row = cursor.fetchone()
    while row:  
        print row[0]  
        row = cursor.fetchone()

→ For the meaning of the settings, see Trial Premise Etc.

Case 2: Push the contents of the table into the pandas data frame

###DB connection
cursor = db_connection()

###SQL settings
sql = 'select * from SalesLT.Customer;'

cursor.execute(sql)
rows = cursor.fetchall()

Check the contents of rows around here:

>>> rows

[(1, False, u'Mr.', u'Orlando', u'N.', u'Gee', None, u'A Bike Store', u'adventure-works\\pamela0', u'[email protected]', u'245-555-0173', u'L/Rlwxzp4w7RWmEgXX+/A7cXaePEPcp+KwQhl2fJL7w=', u'1KjXYs4=', u'3F5AE95E-B87D-4AED-95B4-C3797AFCB74F', datetime.datetime(2005, 8, 1, 0, 0)),
 (2, False, u'Mr.', u'Keith', None, u'Harris', None, u'Progressive Sports', u'adventure-works\\david8', u'[email protected]', u'170-555-0127', u'YPdtRdvqeAhj6wyxEsFdshBDNXxkCXn+CRgbvJItknw=', u'fs1ZGhY=', u'E552F657-A9AF-4A7D-A645-C429D6E02491', datetime.datetime(2006, 8, 1, 0, 0)),
 (3, False, u'Ms.', u'Donna', u'F.', u'Carreras', None, u'Advanced Bike Components', u'adventure-works\\jillian0', u'[email protected]', u'279-555-0130', u'LNoK27abGQo48gGue3EBV/UrlYSToV0/s87dCRV7uJk=', u'YTNH5Rw=', u'130774B1-DB21-4EF3-98C8-C104BCD6ED6D', datetime.datetime(2005, 9, 1, 0, 0)),
 :

It's like an array inside an array (likely, in the exact format pyodbc.Row), so I will try plunging into numpy's array

array_format = np.array(rows)
dataframe_format = pd.DataFrame(array_format)

Then the numpy ʻarray` format is ...


>>>array_format
array([[1, False, u'Mr.', ..., u'1KjXYs4=',
        u'3F5AE95E-B87D-4AED-95B4-C3797AFCB74F',
        datetime.datetime(2005, 8, 1, 0, 0)],
       [2, False, u'Mr.', ..., u'fs1ZGhY=',
        u'E552F657-A9AF-4A7D-A645-C429D6E02491',
        datetime.datetime(2006, 8, 1, 0, 0)],
       [3, False, u'Ms.', ..., u'YTNH5Rw=',
        u'130774B1-DB21-4EF3-98C8-C104BCD6ED6D',
        datetime.datetime(2005, 9, 1, 0, 0)],
       ..., 
       [30116, False, u'Ms.', ..., u'HDCU1Bk=',
        u'EC409609-D25D-41B8-9D15-A1AA6E89FC77',
        datetime.datetime(2007, 7, 1, 0, 0)],
       [30117, False, u'Mr.', ..., u'iES3IZA=',
        u'6F08E2FB-1CD3-4F6E-A2E6-385669598B19',
        datetime.datetime(2005, 8, 1, 0, 0)],
       [30118, False, u'Ms.', ..., u'QhHP+y8=',
        u'2495B4EB-FE8B-459E-A1B6-DBA25C04E626',
        datetime.datetime(2006, 9, 1, 0, 0)]], dtype=object)
>>> array_format.shape
(847L, 15L)

Also, the DataFrame of pandas


>>>dataframe_format.head()
  0      1    2        3     4           5     6                           7   \
0  1  False  Mr.  Orlando    N.         Gee  None                A Bike Store   
1  2  False  Mr.    Keith  None      Harris  None          Progressive Sports   
2  3  False  Ms.    Donna    F.    Carreras  None    Advanced Bike Components   
3  4  False  Ms.    Janet    M.       Gates  None       Modular Cycle Systems   
4  5  False  Mr.     Lucy  None  Harrington  None  Metropolitan Sports Supply   

                         8                             9             10  \
0   adventure-works\pamela0  [email protected]  245-555-0173   
1    adventure-works\david8    [email protected]  170-555-0127   
2  adventure-works\jillian0    [email protected]  279-555-0130   
3  adventure-works\jillian0    [email protected]  710-555-0173   
4      adventure-works\shu0     [email protected]  828-555-0186   

                                             11        12  \
0  L/Rlwxzp4w7RWmEgXX+/A7cXaePEPcp+KwQhl2fJL7w=  1KjXYs4=   
1  YPdtRdvqeAhj6wyxEsFdshBDNXxkCXn+CRgbvJItknw=  fs1ZGhY=   
2  LNoK27abGQo48gGue3EBV/UrlYSToV0/s87dCRV7uJk=  YTNH5Rw=   
3  ElzTpSNbUW1Ut+L5cWlfR7MF6nBZia8WpmGaQPjLOJA=  nm7D5e4=   
4  KJqV15wsX3PG8TS5GSddp6LFFVdd3CoRftZM/tP0+R4=  cNFKU4w=   

                                     13                   14  
0  3F5AE95E-B87D-4AED-95B4-C3797AFCB74F  2005-08-01 00:00:00  
1  E552F657-A9AF-4A7D-A645-C429D6E02491  2006-08-01 00:00:00  
2  130774B1-DB21-4EF3-98C8-C104BCD6ED6D  2005-09-01 00:00:00  
3  FF862851-1DAA-4044-BE7C-3E85583C054D  2006-07-01 00:00:00  
4  83905BDC-6F5E-4F71-B162-C98DA069F38A  2006-09-01 00:00:00  

>>> dataframe_format.shape
(847, 15)

It looks like it's okay.

Reference -Try to extract Azure SQL Server data table with pyodbc