What Does Python’s __slots__ Actually Do?

0
5



Image by Author | Canva

 

What if there is a way to make your Python code faster? __slots__ in Python is easy to implement and can improve the performance of your code while reducing the memory usage.

In this article, we will walk through how it works using a data science project from the real world, where Allegro is using this as a challenge for their data science recruitment process. However, before we get into this project, let’s build a solid understanding of what __slots__ does.

 

What is __slots__ in Python?

 
In Python, every object keeps a dictionary of its attributes. This allows you to add, change, or delete them, but it also comes at a cost: extra memory and slower attribute access.
The __slots__ declaration tells Python that these are the only attributes this object will ever need. It is kind of a limitation, but it will save us time. Let’s see with an example.

class WithoutSlots:
    def __init__(self, name, age):
        self.name = name
        self.age = age

class WithSlots:
    __slots__ = ['name', 'age']

    def __init__(self, name, age):
        self.name = name
        self.age = age

 

In the second class, __slots__ tells Python not to create a dictionary for each object. Instead, it reserves a fixed spot in memory for the name and age values, making it faster and decreasing memory usage.

 

Why Use __slots__?

 
Now, before starting the data project, let’s name the reason why you should use __slots__.

  • Memory: Objects take up less space when Python skips creating a dictionary.
  • Speed: Accessing values is quicker because Python knows where each value is stored.
  • Bugs: This structure avoids silent bugs because only the defined ones are allowed.

 

Using Allegro’s Data Science Challenge as an Example

 
In this data project, Allegro asked data science candidates to predict laptop prices by building machine learning models.

 
A real data project to understand Python slots
 

Link to this data project: https://platform.stratascratch.com/data-projects/laptop-price-prediction

There are three different datasets:

  • train_dataset.json
  • val_dataset.json
  • test_dataset.json

Good. Let’s continue with the data exploration process.

 

Data Exploration

Now let’s load one of them to see the dataset’s structure.

with open('train_dataset.json', 'r') as f:
    train_data = json.load(f)
df = pd.DataFrame(train_data).dropna().reset_index(drop=True)
df.head()

 

Here is the output.

 
Python slots example
 

Good, let’s see the columns.

 

Here is the output.

 
Python slots example
 

Now, let’s check the numerical columns.

 

Here is the output.

 
Python slots example
 

Data Exploration with __slots__ vs Regular Classes

Let’s create a class called SlottedDataExploration, which will use the __slots__ attribute. It allows only one attribute called df. Let’s see the code.

class SlottedDataExploration:
    __slots__ = ['df']

    def __init__(self, df):
        self.df = df

    def info(self):
        return self.df.info()

    def head(self, n=5):
        return self.df.head(n)

    def tail(self, n=5):
        return self.df.tail(n)

    def describe(self):
        return self.df.describe(include="all")

 

Now let’s see the implementation, and instead of using __slots__ let’s use regular classes.

class DataExploration:
    def __init__(self, df):
        self.df = df

    def info(self):
        return self.df.info()

    def head(self, n=5):
        return self.df.head(n)

    def tail(self, n=5):
        return self.df.tail(n)

    def describe(self):
        return self.df.describe(include="all")

 

You can read more about how class methods work in this Python Class Methods guide.

 

Performance Comparison: Time Benchmark

Now let’s measure the performance by measuring the time and memory.

import time
from pympler import asizeof  # memory measurement

start_normal = time.time()
de = DataExploration(df)
_ = de.head()
_ = de.tail()
_ = de.describe()
_ = de.info()
end_normal = time.time()
normal_duration = end_normal - start_normal
normal_memory = asizeof.asizeof(de)

start_slotted = time.time()
sde = SlottedDataExploration(df)
_ = sde.head()
_ = sde.tail()
_ = sde.describe()
_ = sde.info()
end_slotted = time.time()
slotted_duration = end_slotted - start_slotted
slotted_memory = asizeof.asizeof(sde)

print(f"⏱️ Normal class duration: {normal_duration:.4f} seconds")
print(f"⏱️ Slotted class duration: {slotted_duration:.4f} seconds")

print(f"📦 Normal class memory usage: {normal_memory:.2f} bytes")
print(f"📦 Slotted class memory usage: {slotted_memory:.2f} bytes")

 

Now let’s see the result.
 
Python slots example
 

The slotted class duration is 46.45% faster, but the memory usage is the same for this example.

 

Machine Learning in Action

 
Now, in this section, let’s continue with the machine learning. But before doing so, let’s do a train and test split.

 

Train and Test Split

Now we have three different datasets, train, val, and test, so let’s first find their indices.

train_indeces = train_df.dropna().index
val_indeces = val_df.dropna().index
test_indeces = test_df.dropna().index

 

Now it’s time to assign those indices to select those datasets easily in the next step.

train_df = new_df.loc[train_indeces]
val_df = new_df.loc[val_indeces]
test_df = new_df.loc[test_indeces]

 

Great, now let’s format these data frames because numpy wants the flat (n,) format instead of
the (n,1). To do that, we need ot use .ravel() after to_numpy().

X_train, X_val, X_test = train_df[selected_features].to_numpy(), val_df[selected_features].to_numpy(), test_df[selected_features].to_numpy()
y_train, y_val, y_test = df.loc[train_indeces][label_col].to_numpy().ravel(), df.loc[val_indeces][label_col].to_numpy().ravel(), df.loc[test_indeces][label_col].to_numpy().ravel()

 

Applying Machine Learning Models

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error 
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import VotingRegressor
from sklearn import linear_model
from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, MaxAbsScaler
import matplotlib.pyplot as plt
from sklearn import tree
import seaborn as sns
def rmse(y_true, y_pred): 
    return mean_squared_error(y_true, y_pred, squared=False)
def regression(regressor_name, regressor):
    pipe = make_pipeline(MaxAbsScaler(), regressor)
    pipe.fit(X_train, y_train) 
    predicted = pipe.predict(X_test)
    rmse_val = rmse(y_test, predicted)
    print(regressor_name, ':', rmse_val)
    pred_df[regressor_name+'_Pred'] = predicted
    plt.figure(regressor_name)
    plt.title(regressor_name)
    plt.xlabel('predicted')
    plt.ylabel('actual')
    sns.regplot(y=y_test,x=predicted)

 

Next, we will define a dictionary of regressors and run each model.

regressors = {
    'Linear' : LinearRegression(),
    'MLP': MLPRegressor(random_state=42, max_iter=500, learning_rate="constant", learning_rate_init=0.6),
    'DecisionTree': DecisionTreeRegressor(max_depth=15, random_state=42),
    'RandomForest': RandomForestRegressor(random_state=42),
    'GradientBoosting': GradientBoostingRegressor(random_state=42, criterion='squared_error',
                                                  loss="squared_error",learning_rate=0.6, warm_start=True),
    'ExtraTrees': ExtraTreesRegressor(n_estimators=100, random_state=42),
}
pred_df = pd.DataFrame(columns =["Actual"])
pred_df["Actual"] = y_test
for key in regressors.keys():
    regression(key, regressors[key])

 

Here are the results.

 
Python slots example
 

Now, implement this with both slots and regular classes.

 

Machine Learning with __slots__ vs Regular Classes

Now let’s check the code with slots.

class SlottedMachineLearning:
    __slots__ = ['X_train', 'y_train', 'X_test', 'y_test', 'pred_df']

    def __init__(self, X_train, y_train, X_test, y_test):
        self.X_train = X_train
        self.y_train = y_train
        self.X_test = X_test
        self.y_test = y_test
        self.pred_df = pd.DataFrame({'Actual': y_test})

    def rmse(self, y_true, y_pred):
        return mean_squared_error(y_true, y_pred, squared=False)

    def regression(self, name, model):
        pipe = make_pipeline(MaxAbsScaler(), model)
        pipe.fit(self.X_train, self.y_train)
        predicted = pipe.predict(self.X_test)
        self.pred_df[name + '_Pred'] = predicted

        score = self.rmse(self.y_test, predicted)
        print(f"{name} RMSE:", score)

        plt.figure(figsize=(6, 4))
        sns.regplot(x=predicted, y=self.y_test, scatter_kws={"s": 10})
        plt.xlabel('Predicted')
        plt.ylabel('Actual')
        plt.title(f'{name} Predictions')
        plt.grid(True)
        plt.show()

    def run_all(self):
        models = {
            'Linear': LinearRegression(),
            'MLP': MLPRegressor(random_state=42, max_iter=500, learning_rate="constant", learning_rate_init=0.6),
            'DecisionTree': DecisionTreeRegressor(max_depth=15, random_state=42),
            'RandomForest': RandomForestRegressor(random_state=42),
            'GradientBoosting': GradientBoostingRegressor(random_state=42, learning_rate=0.6, warm_start=True),
            'ExtraTrees': ExtraTreesRegressor(n_estimators=100, random_state=42)
        }

        for name, model in models.items():
            self.regression(name, model)

 

Here is the regular class application.

class MachineLearning:
    def __init__(self, X_train, y_train, X_test, y_test):
        self.X_train = X_train
        self.y_train = y_train
        self.X_test = X_test
        self.y_test = y_test
        self.pred_df = pd.DataFrame({'Actual': y_test})

    def rmse(self, y_true, y_pred):
        return mean_squared_error(y_true, y_pred, squared=False)

    def regression(self, name, model):
        pipe = make_pipeline(MaxAbsScaler(), model)
        pipe.fit(self.X_train, self.y_train)
        predicted = pipe.predict(self.X_test)
        self.pred_df[name + '_Pred'] = predicted

        score = self.rmse(self.y_test, predicted)
        print(f"{name} RMSE:", score)

        plt.figure(figsize=(6, 4))
        sns.regplot(x=predicted, y=self.y_test, scatter_kws={"s": 10})
        plt.xlabel('Predicted')
        plt.ylabel('Actual')
        plt.title(f'{name} Predictions')
        plt.grid(True)
        plt.show()

    def run_all(self):
        models = {
            'Linear': LinearRegression(),
            'MLP': MLPRegressor(random_state=42, max_iter=500, learning_rate="constant", learning_rate_init=0.6),
            'DecisionTree': DecisionTreeRegressor(max_depth=15, random_state=42),
            'RandomForest': RandomForestRegressor(random_state=42),
            'GradientBoosting': GradientBoostingRegressor(random_state=42, learning_rate=0.6, warm_start=True),
            'ExtraTrees': ExtraTreesRegressor(n_estimators=100, random_state=42)
        }

        for name, model in models.items():
            self.regression(name, model)

 

Performance Comparison: Time Benchmark

Now let’s compare each code to the one we did in the previous section.

import time

start_normal = time.time()
ml = MachineLearning(X_train, y_train, X_test, y_test)
ml.run_all()
end_normal = time.time()
normal_duration = end_normal - start_normal
normal_memory = (
    ml.X_train.nbytes +
    ml.X_test.nbytes +
    ml.y_train.nbytes +
    ml.y_test.nbytes
)

start_slotted = time.time()
sml = SlottedMachineLearning(X_train, y_train, X_test, y_test)
sml.run_all()
end_slotted = time.time()
slotted_duration = end_slotted - start_slotted
slotted_memory = (
    sml.X_train.nbytes +
    sml.X_test.nbytes +
    sml.y_train.nbytes +
    sml.y_test.nbytes
)

print(f"⏱️ Normal ML class duration: {normal_duration:.4f} seconds")
print(f"⏱️ Slotted ML class duration: {slotted_duration:.4f} seconds")

print(f"📦 Normal ML class memory usage: {normal_memory:.2f} bytes")
print(f"📦 Slotted ML class memory usage: {slotted_memory:.2f} bytes")

time_diff = normal_duration - slotted_duration
percent_faster = (time_diff / normal_duration) * 100
if percent_faster > 0:
    print(f"✅ Slotted ML class is {percent_faster:.2f}% faster than the regular ML class.")
else:
    print(f"ℹ️ No speed improvement with slots in this run.")

memory_diff = normal_memory - slotted_memory
percent_smaller = (memory_diff / normal_memory) * 100
if percent_smaller > 0:
    print(f"✅ Slotted ML class uses {percent_smaller:.2f}% less memory than the regular ML class.")
else:
    print(f"ℹ️ No memory savings with slots in this run.")

 

Here is the output.

 
Python slots example
 

Conclusion

 
By preventing the creation of dynamic __dict__ for each instance, Python __slots__ are very good at reducing the memory usage and speeding up attribute access. You saw how it works in practice through both data exploration and machine learning tasks using Allegro’s real recruitment project.

In small datasets, the improvements might be minor. But as data scales, the benefits become more noticeable, especially in memory-bound or performance-critical applications.
 
 

Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.