BigQuery is very convenient, isn't it? For the time being, you can put tables in, aggregate the put tables at high speed, and even machine learning using BigQuery ML, etc., can be done only with BigQuery.
I use it conveniently, and when I notice it, various Datasets and Tables have been created. It seems that the meta information can be seen using INFORMATION_SCHEMA.
This time, we will organize how to use INFORMATION_SCHEMA and other meta information.
The first is to specify the project and get the list of datasets inside.
SELECT * FROM `myproject.INFORMATION_SCHEMA.SCHEMATA`
The following items are output.
--catalog_name: project name --schema_name: dataset name --schema_owner: Owner name? (All were null) --creation_time: Creation date and time --last_modified_time: Modified date and time --location: storage location
It was a pity that the owner name was Null. This would be very easy to manage if you knew who created the dataset. .. .. Is there any way to put it in?
Next, project: How to specify the dataset and get the list of tables inside.
SELECT * FROM `myproject.mydataset.INFORMATION_SCHEMA.TABLES`
The following items are output.
--table_catalog: Project name --table_schema: Dataset name --table_name: table name --table_type: table or VIEW or xref table --is_insertable_into: Can you INSERT? --is_typed: Unknown (is there anything other than No?) --creation_time: Creation date and time
I don't understand the meaning of is_insertable_into. There seems to be no element other than Yes for the table and No for VIEW. In that case, table_type is enough. .. ..
Similarly, Project: A method to specify a dataset and get a list of tables inside. The items that are output are slightly different
SELECT * FROM `myproject.mydataset.__TABLES__`
--project_id: project name --dataset_id: dataset name --table_id: table name --creation_time: Creation date and time --last_modified_time: Modified date and time --row_count: number of rows --size_bytes: Data size --Type: 1 for Table, 2 for VIEW
It's unpleasant that the name is slightly different depending on the meta information you see. It may be catalog_name, table_catalog, or project_id to refer to the same project name. Is it used properly in BigQuery?
You can also see additional information about the table in INFORMATION_SCHEMA.
SELECT * FROM `myproject.mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS`
The following items are output.
--table_catalog: Project name --table_schema: Dataset name --table_name: table name --table_type: table or VIEW or xref table
It is difficult to understand what the option is saying for a moment, but option_name contains "expiration_timestamp", option_type contains "TIMESTAMP", and option_value contains the date and time, and these three are a set.
There were also descriptions and labels.
The more tables you have, the more output you have.
SELECT * FROM `myproject.mydataset.INFORMATION_SCHEMA.COLUMNS`
--table_catalog: Project name --table_schema: Dataset name --table_name: table name --column_name: column name --ordinal_position: column number --is_nullable: Is Null OK? --data_type: data type
The ones that don't have a commentary are the columns that only contain NEVER or Null. I didn't understand the meaning of the columns.
I would like to calculate the billing amount for each dataset in a project by combining meta information.
import pandas as pd
query="SELECT schema_name FROM `myproject.INFORMATION_SCHEMA.SCHEMATA`"
df = pd.read_gbq(query, project_id='myproject', dialect="standard")
df_output = pd.DataFrame()
query="""
SELECT
"{dataset}" AS dataset,
SUM(size_bytes) / 1000000000 AS DataSize,
0.020 * SUM(size_bytes) / 1000000000 AS Cost
FROM `myproject.{dataset}.__TABLES__`
GROUP BY dataset"""
for i, dataset in df.iterrows():
d = pd.read_gbq(query.format(dataset=dataset[0]), project_id='myproject',dialect="standard")
df_output=df_output.append(d)
BQ storage costs are $ 0.02 per GB from "BigQuery Fees". For long-term storage, it costs $ 0.01, so the actual bill may be a little cheaper than the result.
I summarized the meta information of BigQuery. Since there are various meta information, there seems to be other ones, but for the time being, I will put out the ones that I often use. However, there are some mysterious items, so I need to study a little more.
Recommended Posts