Data Analytics Automation Scripts with SQL Stored Procedures

0
11


Image by Editor

 

Introduction

 
Data has become an easier commodity to store in the current digital era. With the advantage of having abundant data for business, analyzing data to help companies gain insight has become more critical than ever.

In most businesses, data is stored within a structured database, and SQL is used to acquire it. With SQL, we can query data in the form we want, as long as the script is valid.

The problem is that, sometimes, the query to acquire the data we want is complex and not dynamic. In this case, we can use SQL stored procedures to streamline tedious scripts into simple callables.

This article discusses creating data analytics automation scripts with SQL stored procedures.

Curious? Here’s how.

 

SQL Stored Procedures

 
SQL stored procedures are a collection of SQL queries stored directly within the database. If you are adept in Python, you can think of them as functions: they encapsulate a series of operations into a single executable unit that we can call anytime. It’s beneficial because we can make it dynamic.

That’s why it’s helpful to understand SQL stored procedures, which let us simplify code and automate repetitive tasks.

Let’s try it out with an example. In this tutorial, I will use MySQL for the database and stock data from Kaggle for the table example. Set up MySQL Workbench on your local machine and create a schema where we can store the table. In my example, I created a database called finance_db with a table called stock_data.

We can query the data using something like the following.

USE finance_db;

SELECT * FROM stock_data;

 

In general, a stored procedure has the following structure.

DELIMITER $$
CREATE PROCEDURE procedure_name(param_1, param_2, . . ., param_n)
BEGIN
    instruct_1;
    instruct_2;
    . . .
    instruct_n;
END $$
DELIMITER ;

 

As you can see, the stored procedure can receive parameters that are passed into our query.

Let’s examine an actual implementation. For example, we can create a stored procedure to aggregate stock metrics for a specific date range.

USE finance_db;
DELIMITER $$
CREATE PROCEDURE AggregateStockMetrics(
    IN p_StartDate DATE,
    IN p_EndDate DATE
)
BEGIN
    SELECT
        COUNT(*) AS TradingDays,
        AVG(Close) AS AvgClose,
        MIN(Low) AS MinLow,
        MAX(High) AS MaxHigh,
        SUM(Volume) AS TotalVolume
    FROM stock_data
    WHERE 
        (p_StartDate IS NULL OR Date >= p_StartDate)
      AND (p_EndDate IS NULL OR Date <= p_EndDate);
END $$
DELIMITER ;

 

In the query above, we created the stored procedure named AggregateStockMetrics. This procedure accepts a start date and end date as parameters. The parameters are then used as conditions to filter the data.

You can call the stored procedure like this:

CALL AggregateStockMetrics('2015-01-01', '2015-12-31');

 

The procedure will execute with the parameters we pass. Since the stored procedure is saved in the database, you can use it from any script that connects to the database containing the procedure.

With stored procedures, we can easily reuse logic in other environments. For example, I will call the procedure from Python using the MySQL connector.

To do that, first install the library:

pip install mysql-connector-python

 

Then, create a function that connects to the database, calls the stored procedure, retrieves the result, and closes the connection.

import mysql.connector

def call_aggregate_stock_metrics(start_date, end_date):
    cnx = mysql.connector.connect(
        user="your_username",
        password='your_password',
        host="localhost",
        database="finance_db"
    )
    cursor = cnx.cursor()
    try:
        cursor.callproc('AggregateStockMetrics', [start_date, end_date])
        results = []
        for result in cursor.stored_results():
            results.extend(result.fetchall())
        return results
    finally:
        cursor.close()
        cnx.close()

 

The result will be similar to the output below.

[(39, 2058.875660431691, 1993.260009765625, 2104.27001953125, 140137260000.0)]

 

That’s all you need to know about SQL stored procedures. You can extend this further for automation using a scheduler in your pipeline.

 

Wrapping Up

 
SQL stored procedures provide a method to encapsulate complex queries into dynamic, single-unit functions that can be reused for repetitive data analytics tasks. The procedures are stored within the database and are easy to use from different scripts or applications such as Python.

I hope this has helped!
 
 

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.