10 Python Math & Statistical Analysis One-Liners

0
11



Image by Author | Ideogram

 

Python’s expressive syntax along with its built-in modules and external libraries make it possible to perform complex mathematical and statistical operations with remarkably concise code.

In this article, we’ll go over some useful one-liners for math and statistical analysis. These one-liners show how to extract meaningful info from data with minimal code while maintaining readability and efficiency.

🔗 Link to the code on GitHub

 

Sample Data

 
Before coding our one-liners, let’s create some sample datasets to work with:

import numpy as np
import pandas as pd
from collections import Counter
import statistics

# Sample datasets
numbers = [12, 45, 7, 23, 56, 89, 34, 67, 21, 78, 43, 65, 32, 54, 76]
grades = [78, 79, 82, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 96]
sales_data = [1200, 1500, 800, 2100, 1800, 950, 1600, 2200, 1400, 1750,3400]
temperatures = [55.2, 62.1, 58.3, 64.7, 60.0, 61.8, 59.4, 63.5, 57.9, 56.6]

 

Please note: In the code snippets that follow, I’ve excluded the print statements.

 

1. Calculate Mean, Median, and Mode

 
When analyzing datasets, you often need multiple measures of central tendency to understand your data’s distribution. This one-liner computes all three key statistics in a single expression, providing a comprehensive overview of your data’s central characteristics.

stats = (statistics.mean(grades), statistics.median(grades), statistics.mode(grades))

 

This expression uses Python’s statistics module to calculate the arithmetic mean, middle value, and most frequent value in one tuple assignment.

 

2. Find Outliers Using Interquartile Range

 
Identifying outliers is necessary for data quality assessment and anomaly detection. This one-liner implements the standard IQR method to flag values that fall significantly outside the typical range, helping you spot potential data entry errors or genuinely unusual observations.

outliers = [x for x in sales_data if x < np.percentile(sales_data, 25) - 1.5 * (np.percentile(sales_data, 75) - np.percentile(sales_data, 25)) or x > np.percentile(sales_data, 75) + 1.5 * (np.percentile(sales_data, 75) - np.percentile(sales_data, 25))]

 

This list comprehension calculates the first and third quartiles, determines the IQR, and identifies values beyond 1.5 times the IQR from the quartile boundaries. The boolean logic filters the original dataset to return only the outlying values.

 

3. Calculate Correlation Between Two Variables

 
Sometimes, we need to understand relationships between variables. This one-liner computes the Pearson correlation coefficient, quantifying the linear relationship strength between two datasets and providing immediate insight into their association.

correlation = np.corrcoef(temperatures, grades[:len(temperatures)])[0, 1]

 

The numpy corrcoef function returns a correlation matrix, and we extract the off-diagonal element representing the correlation between our two variables. The slicing ensures both arrays have matching dimensions for proper correlation calculation.

np.float64(0.062360807968294615)

 

4. Generate Descriptive Statistics Summary

 
A comprehensive statistical summary provides essential insights about your data’s distribution characteristics. This one-liner creates a dictionary containing key descriptive statistics, offering a complete picture of your dataset’s properties in a single expression.

summary = {stat: getattr(np, stat)(numbers) for stat in ['mean', 'std', 'min', 'max', 'var']}

 

This dictionary comprehension uses .getattr() to dynamically call numpy functions, creating a clean mapping of statistic names to their calculated values.

{'mean': np.float64(46.8),
 'std': np.float64(24.372662281061267),
 'min': np.int64(7),
 'max': np.int64(89),
 'var': np.float64(594.0266666666666)}

 

5. Normalize Data to Z-Scores

 
Standardizing data to z-scores enables meaningful comparisons across different scales and distributions. This one-liner transforms your raw data into standardized units, expressing each value as the number of standard deviations from the mean.

z_scores = [(x - np.mean(numbers)) / np.std(numbers) for x in numbers]

 

The list comprehension applies the z-score formula to each element, subtracting the mean and dividing by the standard deviation.

[np.float64(-1.4278292456807755),
 np.float64(-0.07385323684555724),
 np.float64(-1.6329771258073238),
 np.float64(-0.9765039094023694),
 np.float64(0.3774720994328488),
...
 np.float64(0.29541294738222956),
 np.float64(1.1980636199390418)]

 

6. Calculate Moving Average

 
Smoothing time series data helps reduce short-term fluctuations and noise. This one-liner computes a rolling average over a specified window, providing a cleaner view of your data’s directional movement.

moving_avg = [np.mean(sales_data[i:i+3]) for i in range(len(sales_data)-2)]

 

The list comprehension creates overlapping windows of three consecutive values, calculating the mean for each window. This technique is particularly useful for financial data, sensor readings, and any sequential measurements where trend identification is important.

[np.float64(1166.6666666666667),
 np.float64(1466.6666666666667),
 np.float64(1566.6666666666667),
 np.float64(1616.6666666666667),
 np.float64(1450.0),
 np.float64(1583.3333333333333),
 np.float64(1733.3333333333333),
 np.float64(1783.3333333333333),
 np.float64(2183.3333333333335)]

 

7. Find the Most Frequent Value Range

 
Understanding data distribution patterns often requires identifying concentration areas within your dataset. This one-liner bins your data into ranges and finds the most populated interval, revealing where your values cluster most densely.

most_frequent_range = Counter([int(x//10)*10 for x in numbers]).most_common(1)[0]

 

The expression bins values into decades, creates a frequency count using Counter, and extracts the most common range. This approach is valuable for histogram analysis and understanding data distribution characteristics without complex plotting.

 

8. Calculate Compound Annual Growth Rate

 
Financial and business analysis often requires understanding growth trajectories over time. This one-liner computes the compound annual growth rate, providing a standardized measure of investment or business performance across different time periods.

cagr = (sales_data[-1] / sales_data[0]) ** (1 / (len(sales_data) - 1)) - 1

 

The formula takes the ratio of final to initial values, raises it to the power of the reciprocal of the time period, and subtracts one to get the growth rate. This calculation assumes each data point represents one time period in your analysis.

 

9. Compute Running Totals

 
Cumulative calculations help track progressive changes and identify inflection points in your data. This one-liner generates running totals, showing how values accumulate over time.

running_totals = [sum(sales_data[:i+1]) for i in range(len(sales_data))]

 

The list comprehension progressively extends the slice from the beginning to each position, calculating cumulative sums.

[1200, 2700, 3500, 5600, 7400, 8350, 9950, 12150, 13550, 15300, 18700]

 

10. Calculate Coefficient of Variation

 
Comparing variability across datasets with different scales requires relative measures of dispersion. This one-liner computes the coefficient of variation, expressing standard deviation as a percentage of the mean for meaningful comparisons across different measurement units.

cv = (np.std(temperatures) / np.mean(temperatures)) * 100

 

The calculation divides the standard deviation by the mean and multiplies by 100 to express the result as a percentage. This standardized measure of variability is particularly useful when comparing datasets with different units or scales.

np.float64(4.840958085381635)

 

Conclusion

 
These Python one-liners show how to perform mathematical and statistical operations with minimal code. The key to writing effective one-liners lies in balancing conciseness with readability, ensuring your code remains maintainable while maximizing efficiency.

Remember that while one-liners are powerful, complex analyses may benefit from breaking operations into multiple steps for easier debugging.
 
 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.