If you have a database with lots of tables and columns, or if you are new to that database, it can be a daunting task to find where you want the data.
I found a tool called `` `db.py``` that seems to be useful in such a case, so I tried it.
See the following article for how to search directly with SQL.
db.With py you can easily do the following:
* Search for table names and column names
* Execute query
* Data sampling
* Save connection information for frequently used databases
* Processing query results using pandas features
It is useful to get a complete picture of the data stored in the database and to find out where the information you want is.
## Installation
### Install db.py
db.py is written in ``` Python``` and can be installed with` `` pip```.
```bash
$ pip install db.py
Install the drivers needed to connect to the database.
See the README in db.py for the drivers available for each database. For the installation method of each driver, refer to the document of each driver.
The MySQL driver is only listed in the README for MySQLdb
, but you can also use `` `pymysql. If both are available,
MySQLdb``` will be used.
This time we will use SQLite
, so no driver installation is required. (Pre-installed.)
Use db.py from the interpreter. You can use it with a normal interpreter, but using `` `bpythonor
IPythonis convenient because you can use functions such as completion. Both can be installed with
pip```.
Personally, I recommend `` `bpython```, which automatically provides help and completion candidates.
$ pip install bpython
$ pip install ipython
Start the interpreter and import the `` `DB``` class.
You can connect to the database by giving it the information you need to connect.
>>> from db import DB
>>> db = DB(filename="chinook.sql", dbtype="sqlite")
chinook.sql is db.This is the database that py provides for the sample.
You can also read it using the `` `DemoDB``` as described in the README.
```pycon
>>> from db import DemoDB
>>> db = DemoDB()
db.The tables are stored in tables.
You can also check what kind of columns there are.
```pycon
>>> db.tables
+---------------+----------------------------------------------------------------------------------+
| Table | Columns |
+---------------+----------------------------------------------------------------------------------+
| Album | AlbumId, Title, ArtistId |
| Artist | ArtistId, Name |
| Customer | CustomerId, FirstName, LastName, Company, Address, City, State, Country, PostalC |
| | ode, Phone, Fax, Email, SupportRepId |
| Employee | EmployeeId, LastName, FirstName, Title, ReportsTo, BirthDate, HireDate, Address, |
| | City, State, Country, PostalCode, Phone, Fax, Email |
| Genre | GenreId, Name |
| Invoice | InvoiceId, CustomerId, InvoiceDate, BillingAddress, BillingCity, BillingState, B |
| | illingCountry, BillingPostalCode, Total |
| InvoiceLine | InvoiceLineId, InvoiceId, TrackId, UnitPrice, Quantity |
| MediaType | MediaTypeId, Name |
| Playlist | PlaylistId, Name |
| PlaylistTrack | PlaylistId, TrackId |
| Track | TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, Uni |
| | tPrice |
+---------------+----------------------------------------------------------------------------------+
If you look at the information about each table, you can see information such as column types.
>>> db.tables.Customer
+------------------------------------------------------------------------+
| Customer |
+--------------+--------------+---------------------+--------------------+
| Column | Type | Foreign Keys | Reference Keys |
+--------------+--------------+---------------------+--------------------+
| CustomerId | INTEGER | | Invoice.CustomerId |
| FirstName | NVARCHAR(40) | | |
| LastName | NVARCHAR(20) | | |
| Company | NVARCHAR(80) | | |
| Address | NVARCHAR(70) | | |
| City | NVARCHAR(40) | | |
| State | NVARCHAR(40) | | |
| Country | NVARCHAR(40) | | |
| PostalCode | NVARCHAR(10) | | |
| Phone | NVARCHAR(24) | | |
| Fax | NVARCHAR(24) | | |
| Email | NVARCHAR(60) | | |
| SupportRepId | INTEGER | Employee.EmployeeId | |
+--------------+--------------+---------------------+--------------------+
all, select, unique
all
You can use the method to retrieve all the data in the table.
select * from table_name
Corresponds to.
len
It can also be used in combination with to count the number of lines.
>>> len(db.tables.Customer.all())
59
select
You can specify which column to use.
select co1, col2, col3 from table_name
Corresponds to.
>>> db.tables.Customer.select("CustomerId", "FirstName", "LastName")[:5]
CustomerId FirstName LastName
0 1 Luís Gonçalves
1 2 Leonie Köhler
2 3 François Tremblay
3 4 Bjørn Hansen
4 5 František Wichterlová
unique
Isselect
Is almost the same asselect distinct
To do.
select distinct co1, col2, col3 from table_name
Corresponds to.
>>> len(db.tables.Customer.select("SupportRepId"))
59
>>> len(db.tables.Customer.unique("SupportRepId"))
3
If you specify only one column as above, you can also write as follows.
>>> len(db.tables.Customer.SupportRepId.unique())
3
head, sample
head
You can retrieve the first data withsample
You can sample the data with.
>>> db.tables.Customer.head(1)
CustomerId FirstName LastName \
0 1 Luís Gonçalves
Company \
0 Embraer - Empresa Brasileira de Aeronáutica S.A.
Address City State Country \
0 Av. Brigadeiro Faria Lima, 2170 São José dos Campos SP Brazil
PostalCode Phone Fax Email \
0 12227-000 +55 (12) 3923-5555 +55 (12) 3923-5566 [email protected]
SupportRepId
0 3
>>> db.tables.Customer.sample(5)
CustomerId FirstName LastName Company Address \
0 42 Wyatt Girard None 9, Place Louis Barthou
1 59 Puja Srivastava None 3,Raj Bhavan Road
2 51 Joakim Johansson None Celsiusg. 9
3 33 Ellie Sullivan None 5112 48 Street
4 52 Emma Jones None 202 Hoxton Street
City State Country PostalCode Phone Fax \
0 Bordeaux None France 33000 +33 05 56 96 96 96 None
1 Bangalore None India 560001 +91 080 22289999 None
2 Stockholm None Sweden 11230 +46 08-651 52 52 None
3 Yellowknife NT Canada X1A 1N6 +1 (867) 920-2233 None
4 London None United Kingdom N1 5LH +44 020 7707 0707 None
Email SupportRepId
0 [email protected] 3
1 [email protected] 3
2 [email protected] 5
3 [email protected] 3
4 [email protected] 3
Since the retrieved data is `DataFrame``` of
`` pandas```, you can perform operations such as cutting out only the necessary columns or sorting by a specific column.
>>> db.tables.Customer.sample(5)[["CustomerId", "FirstName", "LastName", "Email"]].sort("CustomerId")
CustomerId FirstName LastName Email
1 36 Hannah Schneider [email protected]
0 46 Hugh O'Reilly [email protected]
4 51 Joakim Johansson [email protected]
2 52 Emma Jones [email protected]
3 54 Steve Murray [email protected]
find_tables
You can use the method to find the table name.
If you want to find a table that contains the string list in the table, do the following:
>>> db.find_table("*list*")
+---------------+---------------------+
| Table | Columns |
+---------------+---------------------+
| Playlist | PlaylistId, Name |
| PlaylistTrack | PlaylistId, TrackId |
+---------------+---------------------+
You can search for column names as well as table names.
>>> db.find_column("*Name")
+-----------+-------------+---------------+
| Table | Column Name | Type |
+-----------+-------------+---------------+
| Artist | Name | NVARCHAR(120) |
| Customer | FirstName | NVARCHAR(40) |
| Customer | LastName | NVARCHAR(20) |
| Employee | FirstName | NVARCHAR(20) |
| Employee | LastName | NVARCHAR(20) |
| Genre | Name | NVARCHAR(120) |
| MediaType | Name | NVARCHAR(120) |
| Playlist | Name | NVARCHAR(120) |
| Track | Name | NVARCHAR(200) |
+-----------+-------------+---------------+
You can also specify the column type.
>>> db.find_column("*Name", data_type="NVARCHAR(20)")
+----------+-------------+--------------+
| Table | Column Name | Type |
+----------+-------------+--------------+
| Customer | LastName | NVARCHAR(20) |
| Employee | FirstName | NVARCHAR(20) |
| Employee | LastName | NVARCHAR(20) |
+----------+-------------+--------------+
query
When using the methodsql
Can be executed.
>>> db.query("SELECT CustomerId, FirstName, LastName, Email FROM Customer LIMIT 5")
CustomerId FirstName LastName Email
0 1 Luís Gonçalves [email protected]
1 2 Leonie Köhler [email protected]
2 3 François Tremblay [email protected]
3 4 Bjørn Hansen [email protected]
4 5 František Wichterlová [email protected]
Of course you can also join.
>>> db.query("SELECT c.CustomerId, c.FirstName, e.EmployeeId, e.FirstName FROM Customer c JOIN Employee e ON c.SupportRepId = e.EmployeeId LIMIT 5")
CustomerId FirstName EmployeeId FirstName
0 1 Luís 3 Jane
1 2 Leonie 5 Steve
2 3 François 3 Jane
3 4 Bjørn 4 Margaret
4 5 František 4 Margaret
If you save the connection information of the frequently used database in the profile, you can easily connect using the profile name.
>>> from db import list_profiles
>>> list_profiles()
{}
You can save the profile by using the save_credentials
method while connected to the database. If you omit the name, it will be `` `default```.
>>> db.save_profile
>>> db.save_credentials("demodb")
>>> list_profiles()
{'demodb': {u'username': None, u'dbtype': u'sqlite' ......
You can easily connect using the profile name.
>>> from db import DB
>>> db = DB(profile="demodb")
remove_profile
You can delete the profile with.
>>> from db import remove_profile
>>> from db import remove_profile
>>> remove_profile("demodb")
>>> db = DB(profile="demodb")
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/N1212A001/virtualenv/lib/python2.7/site-packages/db/db.py", line 730, in __init__
self.load_credentials(profile)
File "/Users/N1212A001/virtualenv/lib/python2.7/site-packages/db/db.py", line 840, in load_credentials
raise Exception("Credentials not configured!")
Exception: Credentials not configured!
Even using a simple function seems to save a lot of work compared to connecting directly to the database with the mysql command. I think it would be even more convenient to create a function that executes frequently used queries and graphs the results.
Recommended Posts