[PYTHON] MemSQL processing capacity verification (application 2)

This time·····

In Application 1, we focused on improving the environment used in this verification. Since the final SQL query processing worked as expected, we will verify the processing after that in this verification. The ultimate goal is ... (1) At the same time as the data generation side (field work) (2) Without imposing a new load on the existing data system that has been sized tightly and has withered to the surroundings of operation. (3) Realize real-time data utilization beyond the silo wall ... Is it possible? Confirm ...

The keyword is ** change the present and create the future **. Save and don! For things that could not be seen by type processing, the necessary information is automatically prepared in real time according to your own convenience, and creative data utilization is realized on demand ... Maybe a new form Data driven will be possible.

First, simply throw only the SQL query repeatedly

I think that it is a basic system in terms of SQL, but first I will try to query the aggregate function for data that meets some conditions. (1) Perform aggregation processing for each category from the automatically generated "Nanchatte sales information". (2) The target prefectures are Osaka prefecture, Tokyo prefecture, Fukuoka prefecture, Aichi prefecture, and Hokkaido. (3) Issue a query to the database table that the generator is continuously inserting and check the result. I will go there.

Create a rush process. The SQL statement is intentionally made easy to read (assuming that the pattern will be changed later), but I think we can devise a little more efficiently. Also, at the next and subsequent timings, we plan to select the column to be extracted and create another new table (assuming that the internal response will be used for analysis), so please write in this process with this operation in mind. I took it in.

 coding: utf-8

# Execute tasks in Python at regular intervals (power version)
# Version 2.7


import sys
stdout = sys.stdout
reload(sys)
sys.setdefaultencoding('utf-8')
sys.stdout = stdout

# Module to import
import schedule
import time


# Set the process to run at regular intervals here


def job():

 # Set the SQL of the aggregation system to be used this time.
 # I intentionally divide it so that I can try various things ...
    SQL1 = "SELECT SUM(Units) AS Sum_of_Units, AVG(Payment) AS Avarage_of_Payment, "
    SQL2 = "MAX(Payment) AS Max_of_Payment, MIN(Payment) AS Min_of_Payment, Category "
    SQL3 = "FROM Qiita_Test "
 SQL4 = "WHERE Category IN ('Alcoholic beverages','Miscellaneous goods','Books','Home appliances','DVD / CD')"
 SQL5 = "AND Prefecture IN ('Osaka Prefecture','Tokyo','Fukuoka Prefecture','Aichi Prefecture','Hokkaido') "
    SQL6 = "GROUP BY Category"
    SQL_Data = SQL1 + SQL2 + SQL3 + SQL4 + SQL5 + SQL6
       
 # Start processing
    from datetime import datetime
 print ("JOB execution date and time:" + datetime.now (). strftime ("% Y /% m /% d% H:% M:% S"))
    print
           
 #Connect with MemSQL
    db = pymysql.connect(host = 'xxx.xxx.xxx.xxx',
                         port=3306,
                         user='qiita',
                         password='adminqiita',
                         db='Test',
                         charset='utf8',
                         cursorclass=pymysql.cursors.DictCursor)
    
    with db.cursor() as cursor:
        
 #Initialize working buffer
        Tmp_Data = []
           
 # Send query and commit
        cursor.execute(SQL_Data)                    
        db.commit()
               
 # Get query results
        rows = cursor.fetchall()
          
 # The content of the query will increase in the future, so be prepared for that ...
        for Query_Data in rows:
                    
            for item in Query_Data.values():
                
                Tmp_Data.append(item)
                
            print
                
 print ("Product classification:" + str (Tmp_Data [0]))
 print ("Minimum selling price:" + str (Tmp_Data [1]))
 print ("Highest selling price:" + str (Tmp_Data [2]))
 print ("Average selling price:" + str (Tmp_Data [3]))
 print ("Total number of sales:" + str (Tmp_Data [4]))

            print
            
            Tmp_Data = []

 #Close the database connection
    db.close()


# Main part from here


def main():

 #Set variables to use
    Loop_Count = 3
    Count = 0

    Interval_Time = 60

 # Start time of the whole process
    from datetime import datetime
 print ("Program start date and time:" + datetime.now (). Strftime ("% Y /% m /% d% H:% M:% S"))
    print

 # Every 10 minutes
    # schedule.every(10).minutes.do(job)
 # Every 2 hours
    # schedule.every(2).hours.do(job)
 # 10 o'clock every day
    # schedule.every().day.at("10:00").do(job)
 # every Monday
    # schedule.every().monday.do(job)

    schedule.every(Interval_Time).seconds.do(job)

 # Perform processing in an infinite loop
    while True:

        schedule.run_pending()

 #Mysterious spell ... ww
        time.sleep(Interval_Time)

 # Check the specified number of times
        if (Count >= Loop_Count):

            break

        else:

            Count += 1

 print (str (Count) + "times: The specified number of jobs have been completed!")
    print
    from datetime import datetime
 print ("Program end date and time:" + datetime.now (). Strftime ("% Y /% m /% d% H:% M:% S"))

if __name__ == "__main__":
    main()

The execution result is ...

The following is the result of running the script created earlier by automatically starting the continuous generation / insertion of random random data for MemSQL. This time, I wanted to process after collecting a certain amount of data, so I set the execution interval to 60 seconds and executed 3 times. (Actually, there is no problem even if it is 30 seconds)


 Program start date and time: 2020/10/06 16:46:54

 JOB execution date and time: 2020/10/06 16:47:54


 Product category: Home appliances
 Minimum selling price: 35800
 Maximum selling price: 88000
 Average selling price: 51840.0000
 Total number of sales: 5


 Product category: Books
 Minimum selling price: 1710
 Maximum selling price: 5940
 Average selling price: 3825.0000
 Total number of sales: 6


 Product category: Miscellaneous goods
 Minimum selling price: 1592
 Maximum selling price: 3580
 Average selling price: 2586.0000
 Total number of sales: 5


 Product category: Alcoholic beverages
 Minimum selling price: 9000
 Maximum selling price: 18000
 Average selling price: 12750.0000
 Total number of sales: 23


 Product category: DVD / CD
 Minimum selling price: 4400
 Maximum selling price: 5960
 Average selling price: 5246.6667
 Total number of sales: 6

 JOB execution date and time: 2020/10/06 16:48:54


 Product category: Home appliances
 Minimum selling price: 198
 Maximum selling price: 88000
 Average selling price: 38177.8000
 Total number of sales: 10


 Product category: Books
 Minimum selling price: 280
 Maximum selling price: 5940
 Average selling price: 2601.6667
 Total number of sales: 14


 Product category: Miscellaneous goods
 Minimum selling price: 1592
 Maximum selling price: 3580
 Average selling price: 2254.6667
 Total number of sales: 9


 Product category: Alcoholic beverages
 Minimum selling price: 3960
 Maximum selling price: 50000
 Average selling price: 16456.0000
 Total number of sales: 64


 Product category: DVD / CD
 Minimum selling price: 4400
 Maximum selling price: 7600
 Average selling price: 6188.0000
 Total number of sales: 10

 JOB execution date and time: 2020/10/06 16:49:54


 Product category: Home appliances
 Minimum selling price: 198
 Maximum selling price: 88000
 Average selling price: 44548.1667
 Total number of sales: 12


 Product category: Books
 Minimum selling price: 280
 Maximum selling price: 5940
 Average selling price: 2601.6667
 Total number of sales: 14


 Product category: Miscellaneous goods
 Minimum selling price: 396
 Maximum selling price: 3580
 Average selling price: 1790.0000
 Total number of sales: 11


 Product category: Alcoholic beverages
 Minimum selling price: 2450
 Maximum selling price: 50000
 Average selling price: 13355.0000
 Total sales: 79


 Product category: DVD / CD
 Minimum selling price: 1980
 Maximum selling price: 7600
 Average selling price: 5633.3333
 Total number of sales: 17

 3 times: The specified number of jobs have been completed!

 Program end date and time: 2020/10/06 16:50:54

By the way, it is the verification result of running the same query on the SQL editor of MemSQL while inserting data in the same way.

image.png

The query was executed according to the time transition (executed while looking at the clock roughly), but the execution time was stable in the several milliseconds range. Previously, I had this SQL query (in that case, executed for a static table state) processed for 100,000 rows of similar data, but it still gave good results in the 10ms range. Therefore, by scaling the in-memory area according to the actual amount of expansion and utilizing distributed cluster processing, it can be expected that processing with dramatic performance will be supported even for larger-scale data processing. Let's do it.

Next time ...

Next time, I'd like to modify this SQL statement a little so that snapshot-like table creation can be done automatically at short time intervals. By utilizing the in-memory performance of MemSQL, while collecting data in real time via Equalum, the explosive performance of MemSQL is utilized to execute necessary SQL processing on demand without imposing a transaction load on the upstream side. I would like to verify that I will utilize it. Also, if you can afford it, I would like to intentionally create multiple tables (Imagine being extracted from a siled data system ...) and run SQL across those tables to see the results. ..

Acknowledgments

This verification is carried out using the official Free version (V6) of MemSQL.

** About the Free version of MemSQL ... **

We would like to thank MemSQL for providing this valuable opportunity, and if this content differs from the content published on MemSQL's official website, MemSQL's information will take precedence. Please understand that.

Recommended Posts

MemSQL processing capacity verification (application 2)
MemSQL processing capacity verification (Application 1)
MemSQL processing capacity verification (application 3)