[PYTHON] Database search with db.py

If you have a database with lots of tables and columns, or if you are new to that database, it can be a daunting task to find where you want the data.

I found a tool called `` `db.py``` that seems to be useful in such a case, so I tried it.

See the following article for how to search directly with SQL.

What you can do with db.py

db.With py you can easily do the following:



 * Search for table names and column names
 * Execute query
 * Data sampling
 * Save connection information for frequently used databases
 * Processing query results using pandas features

 It is useful to get a complete picture of the data stored in the database and to find out where the information you want is.

## Installation

### Install db.py

 db.py is written in ``` Python``` and can be installed with` `` pip```.

```bash
$ pip install db.py

Driver installation

Install the drivers needed to connect to the database.

See the README in db.py for the drivers available for each database. For the installation method of each driver, refer to the document of each driver.

The MySQL driver is only listed in the README for MySQLdb, but you can also use `` `pymysql. If both are available, MySQLdb``` will be used.

This time we will use SQLite, so no driver installation is required. (Pre-installed.)

Install bpython, ipython

Use db.py from the interpreter. You can use it with a normal interpreter, but using `` `bpythonorIPythonis convenient because you can use functions such as completion. Both can be installed with pip```.

Personally, I recommend `` `bpython```, which automatically provides help and completion candidates.

$ pip install bpython
$ pip install ipython

Connect to database

Start the interpreter and import the `` `DB``` class.

You can connect to the database by giving it the information you need to connect.

>>> from db import DB
>>> db = DB(filename="chinook.sql", dbtype="sqlite")

chinook.sql is db.This is the database that py provides for the sample.


 You can also read it using the `` `DemoDB``` as described in the README.

```pycon
>>> from db import DemoDB
>>> db = DemoDB()

Display a list of tables

db.The tables are stored in tables.


 You can also check what kind of columns there are.

```pycon
>>> db.tables
+---------------+----------------------------------------------------------------------------------+
| Table         | Columns                                                                          |
+---------------+----------------------------------------------------------------------------------+
| Album         | AlbumId, Title, ArtistId                                                         |
| Artist        | ArtistId, Name                                                                   |
| Customer      | CustomerId, FirstName, LastName, Company, Address, City, State, Country, PostalC |
|               | ode, Phone, Fax, Email, SupportRepId                                             |
| Employee      | EmployeeId, LastName, FirstName, Title, ReportsTo, BirthDate, HireDate, Address, |
|               |  City, State, Country, PostalCode, Phone, Fax, Email                             |
| Genre         | GenreId, Name                                                                    |
| Invoice       | InvoiceId, CustomerId, InvoiceDate, BillingAddress, BillingCity, BillingState, B |
|               | illingCountry, BillingPostalCode, Total                                          |
| InvoiceLine   | InvoiceLineId, InvoiceId, TrackId, UnitPrice, Quantity                           |
| MediaType     | MediaTypeId, Name                                                                |
| Playlist      | PlaylistId, Name                                                                 |
| PlaylistTrack | PlaylistId, TrackId                                                              |
| Track         | TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, Uni |
|               | tPrice                                                                           |
+---------------+----------------------------------------------------------------------------------+

If you look at the information about each table, you can see information such as column types.

>>> db.tables.Customer
+------------------------------------------------------------------------+
|                                Customer                                |
+--------------+--------------+---------------------+--------------------+
| Column       | Type         | Foreign Keys        | Reference Keys     |
+--------------+--------------+---------------------+--------------------+
| CustomerId   | INTEGER      |                     | Invoice.CustomerId |
| FirstName    | NVARCHAR(40) |                     |                    |
| LastName     | NVARCHAR(20) |                     |                    |
| Company      | NVARCHAR(80) |                     |                    |
| Address      | NVARCHAR(70) |                     |                    |
| City         | NVARCHAR(40) |                     |                    |
| State        | NVARCHAR(40) |                     |                    |
| Country      | NVARCHAR(40) |                     |                    |
| PostalCode   | NVARCHAR(10) |                     |                    |
| Phone        | NVARCHAR(24) |                     |                    |
| Fax          | NVARCHAR(24) |                     |                    |
| Email        | NVARCHAR(60) |                     |                    |
| SupportRepId | INTEGER      | Employee.EmployeeId |                    |
+--------------+--------------+---------------------+--------------------+

Fetch data from table

all, select, unique

allYou can use the method to retrieve all the data in the table. select * from table_nameCorresponds to.

lenIt can also be used in combination with to count the number of lines.

>>> len(db.tables.Customer.all())
59

selectYou can specify which column to use. select co1, col2, col3 from table_nameCorresponds to.

>>> db.tables.Customer.select("CustomerId", "FirstName", "LastName")[:5]
   CustomerId  FirstName     LastName
0           1       Luís    Gonçalves
1           2     Leonie       Köhler
2           3   François     Tremblay
3           4      Bjørn       Hansen
4           5  František  Wichterlová

uniqueIsselectIs almost the same asselect distinctTo do. select distinct co1, col2, col3 from table_nameCorresponds to.

>>> len(db.tables.Customer.select("SupportRepId"))
59
>>> len(db.tables.Customer.unique("SupportRepId"))
3

If you specify only one column as above, you can also write as follows.

>>> len(db.tables.Customer.SupportRepId.unique())
3

head, sample

headYou can retrieve the first data withsampleYou can sample the data with.

>>> db.tables.Customer.head(1)
   CustomerId FirstName   LastName  \
0           1      Luís  Gonçalves   

                                            Company  \
0  Embraer - Empresa Brasileira de Aeronáutica S.A.   

                           Address                 City State Country  \
0  Av. Brigadeiro Faria Lima, 2170  São José dos Campos    SP  Brazil   

  PostalCode               Phone                 Fax                 Email  \
0  12227-000  +55 (12) 3923-5555  +55 (12) 3923-5566  [email protected]   

   SupportRepId  
0             3 

>>> db.tables.Customer.sample(5)
   CustomerId FirstName    LastName Company                 Address  \
0          42     Wyatt      Girard    None  9, Place Louis Barthou   
1          59      Puja  Srivastava    None       3,Raj Bhavan Road   
2          51    Joakim   Johansson    None             Celsiusg. 9   
3          33     Ellie    Sullivan    None          5112 48 Street   
4          52      Emma       Jones    None       202 Hoxton Street   

          City State         Country PostalCode               Phone   Fax  \
0     Bordeaux  None          France      33000  +33 05 56 96 96 96  None   
1    Bangalore  None           India     560001    +91 080 22289999  None   
2    Stockholm  None          Sweden      11230    +46 08-651 52 52  None   
3  Yellowknife    NT          Canada    X1A 1N6   +1 (867) 920-2233  None   
4       London  None  United Kingdom     N1 5LH   +44 020 7707 0707  None   

                       Email  SupportRepId  
0      [email protected]             3  
1   [email protected]             3  
2  [email protected]             5  
3     [email protected]             3  
4     [email protected]             3 

Operation using pandas functions

Since the retrieved data is `DataFrame``` of `` pandas```, you can perform operations such as cutting out only the necessary columns or sorting by a specific column.

>>> db.tables.Customer.sample(5)[["CustomerId", "FirstName", "LastName", "Email"]].sort("CustomerId")
   CustomerId FirstName   LastName                      Email
1          36    Hannah  Schneider  [email protected]
0          46      Hugh   O'Reilly       [email protected]
4          51    Joakim  Johansson  [email protected]
2          52      Emma      Jones     [email protected]
3          54     Steve     Murray      [email protected]

Search table / column

find_tablesYou can use the method to find the table name. If you want to find a table that contains the string list in the table, do the following:

>>> db.find_table("*list*")
+---------------+---------------------+
| Table         | Columns             |
+---------------+---------------------+
| Playlist      | PlaylistId, Name    |
| PlaylistTrack | PlaylistId, TrackId |
+---------------+---------------------+

You can search for column names as well as table names.

>>> db.find_column("*Name")
+-----------+-------------+---------------+
| Table     | Column Name | Type          |
+-----------+-------------+---------------+
| Artist    |     Name    | NVARCHAR(120) |
| Customer  |  FirstName  | NVARCHAR(40)  |
| Customer  |   LastName  | NVARCHAR(20)  |
| Employee  |  FirstName  | NVARCHAR(20)  |
| Employee  |   LastName  | NVARCHAR(20)  |
| Genre     |     Name    | NVARCHAR(120) |
| MediaType |     Name    | NVARCHAR(120) |
| Playlist  |     Name    | NVARCHAR(120) |
| Track     |     Name    | NVARCHAR(200) |
+-----------+-------------+---------------+

You can also specify the column type.

>>> db.find_column("*Name", data_type="NVARCHAR(20)")
+----------+-------------+--------------+
| Table    | Column Name | Type         |
+----------+-------------+--------------+
| Customer |   LastName  | NVARCHAR(20) |
| Employee |  FirstName  | NVARCHAR(20) |
| Employee |   LastName  | NVARCHAR(20) |
+----------+-------------+--------------+

Execute query

queryWhen using the methodsqlCan be executed.

>>> db.query("SELECT CustomerId, FirstName, LastName, Email FROM Customer LIMIT 5")
   CustomerId  FirstName     LastName                     Email
0           1       Luís    Gonçalves      [email protected]
1           2     Leonie       Köhler     [email protected]
2           3   François     Tremblay       [email protected]
3           4      Bjørn       Hansen     [email protected]
4           5  František  Wichterlová  [email protected]

Of course you can also join.

>>> db.query("SELECT c.CustomerId, c.FirstName, e.EmployeeId, e.FirstName FROM Customer c JOIN Employee e ON c.SupportRepId = e.EmployeeId LIMIT 5")
   CustomerId  FirstName  EmployeeId FirstName
0           1       Luís           3      Jane
1           2     Leonie           5     Steve
2           3   François           3      Jane
3           4      Bjørn           4  Margaret
4           5  František           4  Margaret

Use of profile

If you save the connection information of the frequently used database in the profile, you can easily connect using the profile name.

View existing profiles

>>> from db import list_profiles
>>> list_profiles()
{}

Save profile

You can save the profile by using the save_credentials method while connected to the database. If you omit the name, it will be `` `default```.

>>> db.save_profile
>>> db.save_credentials("demodb")
>>> list_profiles()
{'demodb': {u'username': None, u'dbtype': u'sqlite' ......

Connecting to a database using a profile

You can easily connect using the profile name.

>>> from db import DB
>>> db = DB(profile="demodb")

Delete profile

remove_profileYou can delete the profile with.

>>> from db import remove_profile
>>> from db import remove_profile
>>> remove_profile("demodb")
>>> db = DB(profile="demodb")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/N1212A001/virtualenv/lib/python2.7/site-packages/db/db.py", line 730, in __init__
    self.load_credentials(profile)
  File "/Users/N1212A001/virtualenv/lib/python2.7/site-packages/db/db.py", line 840, in load_credentials
    raise Exception("Credentials not configured!")
Exception: Credentials not configured!

Summary

Even using a simple function seems to save a lot of work compared to connecting directly to the database with the mysql command. I think it would be even more convenient to create a function that executes frequently used queries and graphs the results.

Recommended Posts

Database search with db.py
Sequential search with Python
Fix database with pytest-docker
Binary search with python
Binary search with Python3
Create / search / create table with PynamoDB
Bit full search with Go
Full bit search with Python
I can't search with # google-map. ..
Search engine work with python
Search twitter tweets with python
Streamline web search with python
Database search (verification of processing speed with or without index)
Use Azure SQL Database with SQLAlchemy
Grid search of hyperparameters with Scikit-learn
ROS Lecture 108 Using Database (mongo) with ROS
Learn search with Python # 2bit search, permutation search
First OSMnx ~ With shortest path search ~
Search for homeomorphic idioms with opencv
Twitter search client made with bottle
Algorithm learned with Python 10th: Binary search
Algorithm learned with Python 9th: Linear search
Browse an existing external database with Django
Search for files with the specified extension
Search the maze with the python A * algorithm
Heat Map for Grid Search with Matplotlib
How to search Google Drive with Google Colaboratory
Easily implement ItemView incremental search with PySide
Algorithm learned with Python 12th: Maze search