machine learning

Collecting Real-Time Data with APIs: A Hands-On Guide Using Python

October 29, 2025

Image by Author

# Introduction

The ability to collect high-quality, relevant information is still a core skill for any data professional. While there are several ways to gather data, one of the most powerful and dependable methods is through APIs (application programming interfaces). They serve as bridges, allowing different software systems to communicate and share data seamlessly.

In this article, we’ll break down the essentials of using APIs for data collection — why they matter, how they work, and how to get started with them in Python.

# What is an API?

An API (application programming interface) is a set of rules and protocols that allows different software systems to communicate and exchange data efficiently.
Think of it like dining at a restaurant. Instead of speaking directly to the chef, you place your order with a waiter. The waiter checks if the ingredients are available, passes the request to the kitchen, and brings your meal back once it’s ready.
An API works the same way: it receives your request for specific data, checks if that data exists, and returns it if available — serving as the messenger between you and the data source.
When using an API, interactions typically involve the following components:

Client: The application or system that sends a request to access data or functionality
Request: The client sends a structured request to the server, specifying what data it needs
Server: The system that processes the request and provides the requested data or performs an action
Response: The server processes the request and sends back the data or result in a structured format, usually JSON or XML

Collecting Real-Time Data with APIs: A Hands-On Guide Using Python

Image by Author

This communication allows applications to share information or functionalities efficiently, enabling tasks like fetching data from a database or interacting with third-party services.

# Why Using APIs for Data Collection?

APIs offer several advantages for data collection:

Efficiency: They provide direct access to data, eliminating the need for manual data gathering
Real-time Access: APIs often deliver up-to-date information, which is essential for time-sensitive analyses
Automation: They enable automated data retrieval processes, reducing human intervention and potential errors
Scalability: APIs can handle large volumes of requests, making them suitable for extensive data collection tasks

# Implementing API Calls in Python

Making a basic API call in Python is one of the easiest and most practical exercises to get started with data collection. The popular requests library makes it simple to send HTTP requests and handle responses.
To demonstrate how it works, we’ll use the Random User Generator API, a free service that provides dummy user data in JSON format, perfect for testing and learning.
Here’s a step-by-step guide to making your first API call in Python.

// Installing the Requests Library:

// Importing the Required Libraries:

import requests
import pandas as pd

// Checking the Documentation Page:

Before making any requests, it’s important to understand how the API works. This includes reviewing available endpoints, parameters, and response structure. Start by visiting the Random User API documentation.

// Defining the API Endpoint and Parameters:

Based on the documentation, we can construct a simple request. In this example, we fetch user data limited to users from the United States:

url="https://randomuser.me/api/"
params = {'nat': 'us'}

// Making the GET Request:

Use the requests.get() function with the URL and parameters:

response = requests.get(url, params=params)

// Handling the Response:

Check whether the request was successful, then process the data:

if response.status_code == 200:
    data = response.json()
    # Process the data as needed
else:
    print(f"Error: {response.status_code}")

// Converting Our Data into a Dataframe:

To work with the data easily, we can convert it into a pandas DataFrame:

data = response.json()
df = pd.json_normalize(data["results"])
df

Now, let’s exemplify it with a real case.

# Working with the Eurostat API

Eurostat is the statistical office of the European Union. It provides high-quality, harmonized statistics on a wide range of topics such as economics, demographics, environment, industry, and tourism — covering all EU member states.

Through its API, Eurostat offers public access to a vast collection of datasets in machine-readable formats, making it a valuable resource for data professionals, researchers, and developers interested in analyzing European-level data.

// Step 0: Understanding the Data in the API:

If you go check the Data section of Eurostat, you will find a navigation tree. We can try to identify some data of interest in the following subsections:

Detailed Datasets: Full Eurostat data in multi-dimensional format
Selected Datasets: Simplified datasets with fewer indicators, in 2–3 dimensions
EU Policies: Data grouped by specific EU policy areas
Cross-cutting: Thematic data compiled from multiple sources

// Step 1: Checking the Documentation:

Always start with the documentation. You can find Eurostat’s API guide here. It explains the API structure, available endpoints, and how to form valid requests.

// Step 2: Generating the First Call Request:

To generate an API request using Python, the first step is installing and importing the requests library. Remember, we already installed it in the previous simple example. Then, we can easily generate a call request using a demo dataset from the Eurostat documentation.

# We import the requests library
import requests

# Define the URL endpoint -> We use the demo URL in the EUROSTATS API documentation.
url = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/DEMO_R_D3DENS?lang=EN"

# Make the GET request
response = requests.get(url)

# Print the status code and response data
print(f"Status Code: {response.status_code}")
print(response.json())  # Print the JSON response

Pro tip: We can split the URL into the base URL and parameters to make it easier to understand what data we are requesting from the API.

# We import the requests library
import requests

# Define the URL endpoint -> We use the demo URL in the EUROSTATS API documentation.
url = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/DEMO_R_D3DENS"

# Define the parameters -> We define the parameters to add in the URL.
params = {
   'lang': 'EN'  # Specify the language as English
}

# Make the GET request
response = requests.get(url, params=params)

# Print the status code and response data
print(f"Status Code: {response.status_code}")
print(response.json())  # Print the JSON response

// Step 3: Determining Which Dataset to Call:

Instead of using the demo dataset, you can select any dataset from the Eurostat database. For example, let’s query the dataset TOUR_OCC_ARN2, which contains tourism accommodation data.

# We import the requests library
import requests

# Define the URL endpoint -> We use the demo URL in the EUROSTATS API documentation.
base_url = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/"
dataset = "TOUR_OCC_ARN2"

url = base_url + dataset
# Define the parameters -> We define the parameters to add in the URL.
params = {
    'lang': 'EN'  # Specify the language as English
}

# Make the GET request -> we generate the request and obtain the response
response = requests.get(url, params=params)

# Print the status code and response data
print(f"Status Code: {response.status_code}")
print(response.json())  # Print the JSON response

// Step 4: Understanding the Response

Eurostat’s API returns data in JSON-stat format, a standard for multidimensional statistical data. You can save the response to a file and explore its structure:

import requests
import json

# Define the URL endpoint and dataset
base_url = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/"
dataset = "TOUR_OCC_ARN2"

url = base_url + dataset

# Define the parameters to add in the URL
params = {
    'lang': 'EN',
    "time": 2019  # Specify the language as English
}

# Make the GET request and obtain the response
response = requests.get(url, params=params)

# Check the status code and handle the response
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()

    # Generate a JSON file and write the response data into it
    with open("eurostat_response.json", "w") as json_file:
        json.dump(data, json_file, indent=4)  # Save JSON with pretty formatting

    print("JSON file 'eurostat_response.json' has been successfully created.")
else:
    print(f"Error: Received status code {response.status_code} from the API.")

// Step 5: Transforming the Response into Usable Data:

Now that we got the data, we can find a way to save it into a tabular format (CSV) to smooth the process of analyzing it.

import requests
import pandas as pd

# Step 1: Make the GET request to the Eurostat API
base_url = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/"
dataset = "TOUR_OCC_ARN2"  # Tourist accommodation statistics dataset
url = base_url + dataset
params = {'lang': 'EN'}  # Request data in English

# Make the API request
response = requests.get(url, params=params)

# Step 2: Check if the request was successful
if response.status_code == 200:
    data = response.json()

    # Step 3: Extract the dimensions and metadata
    dimensions = data['dimension']
    dimension_order = data['id']  # ['geo', 'time', 'unit', 'indic', etc.]

    # Extract labels for each dimension dynamically
    dimension_labels = {dim: dimensions[dim]['category']['label'] for dim in dimension_order}

    # Step 4: Determine the size of each dimension
    dimension_sizes = {dim: len(dimensions[dim]['category']['index']) for dim in dimension_order}

    # Step 5: Create a mapping for each index to its respective label
    # For example, if we have 'geo', 'time', 'unit', and 'indic', map each index to the correct label
    index_labels = {
        dim: list(dimension_labels[dim].keys())
        for dim in dimension_order
    }

    # Step 6: Create a list of rows for the CSV
    rows = []
    for key, value in data['value'].items():
        # `key` is a string like '123', we need to break it down into the corresponding labels
        index = int(key)  # Convert string index to integer

        # Calculate the indices for each dimension
        indices = {}
        for dim in reversed(dimension_order):
            dim_index = index % dimension_sizes[dim]
            indices[dim] = index_labels[dim][dim_index]
            index //= dimension_sizes[dim]

        # Construct a row with labels from all dimensions
        row = {f"{dim.capitalize()} Code": indices[dim] for dim in dimension_order}
        row.update({f"{dim.capitalize()} Name": dimension_labels[dim][indices[dim]] for dim in dimension_order})
        row["Value (Tourist Accommodations)"] = value
        rows.append(row)

    # Step 7: Create a DataFrame and save it as CSV
    if rows:
        df = pd.DataFrame(rows)
        csv_filename = "eurostat_tourist_accommodation.csv"
        df.to_csv(csv_filename, index=False)
        print(f"CSV file '{csv_filename}' has been successfully created.")
    else:
        print("No valid data to save as CSV.")
else:
    print(f"Error: Received status code {response.status_code} from the API.")

// Step 6: Generating a Specific View

Imagine we just want to keep those records corresponding to Campings, Apartments or Hotels. We can generate a final table with this condition, and obtain a pandas DataFrame we can work with.

# Check the unique values in the 'Nace_r2 Name' column
set(df["Nace_r2 Name"])

# List of options to filter
options = ['Camping grounds, recreational vehicle parks and trailer parks',
          'Holiday and other short-stay accommodation',
          'Hotels and similar accommodation']

# Filter the DataFrame based on whether the 'Nace_r2 Name' column values are in the options list
df = df[df["Nace_r2 Name"].isin(options)]
df

# Best Practices When Working with APIs

Read the Docs: Always check the official API documentation to understand endpoints and parameters
Handle Errors: Use conditionals and logging to gracefully handle failed requests
Respect Rate Limits: Avoid overwhelming the server — check if rate limits apply
Secure Credentials: If the API requires authentication, never expose your API keys in public code

# Wrapping Up

Eurostat’s API is a powerful gateway to a wealth of structured, high-quality European statistics. By learning how to navigate its structure, query datasets, and interpret responses, you can automate access to critical data for analysis, research, or decision-making — right from your Python scripts.

You can go check the corresponding code in my GitHub repository My-Articles-Friendly-Links

Josep Ferrer is an analytics engineer from Barcelona. He graduated in physics engineering and is currently working in the data science field applied to human mobility. He is a part-time content creator focused on data science and technology. Josep writes on all things AI, covering the application of the ongoing explosion in the field.