machine learning

Weights & Biases: A KDnuggets Crash Course

October 6, 2025

Image by Author

If you train models beyond a single notebook, you’ve probably hit the same headaches: you tweak five knobs, rerun training, and by Friday you can’t remember which run produced the “good” ROC curve or which data slice you used. Weights & Biases (W&B) gives you a paper trail — metrics, configs, plots, datasets, and models — so you can answer what changed with evidence, not guesswork.

Below is a practical tour. It’s opinionated, light on ceremony, and geared for teams who want a clean experiment history without building their own platform. Let’s call it a no-fluff walkthrough.

# Why W&B at All?

Notebooks grow into experiments. Experiments multiply. Soon you’re asking: Which run used that data slice? Why is today’s ROC curve higher? Can I reproduce last week’s baseline?

W&B gives you a place to:

Log metrics, configs, plots, and system stats
Version datasets and models with artifacts
Run hyperparameter sweeps
Share dashboards without screenshots

You can start tiny and layer features when needed.

# Setup in 60 Seconds

Start by installing the library and logging in with your API key. If you don’t have one yet, you can find it here.

pip install wandb
wandb login # paste your API key once

Weights & Biases: a KDnuggets crash course

Image by Author

// Minimal Sanity Check

import wandb, random, time

wandb.init(project="kdn-crashcourse", name="hello-run", config={"lr": 0.001, "epochs": 5})
for epoch in range(wandb.config.epochs):
    loss = 1.0 / (epoch + 1) + random.random() * 0.05
    wandb.log({"epoch": epoch, "loss": loss})
    time.sleep(0.1)
wandb.finish()

Now you should see something like this:

Image by Author

Now let’s go for the useful bits.

# Tracking Experiments Properly

// Log Hyperparameters and Metrics

Treat wandb.config as the single source of truth for your experiment’s knobs. Give metrics clear names so charts auto-group.

cfg = dict(arch="resnet18", lr=3e-4, batch=64, seed=42)
run = wandb.init(project="kdn-mlops", config=cfg, tags=["baseline"])

# training loop ...
for step, (x, y) in enumerate(loader):
    # ... compute loss, acc
    wandb.log({"train/loss": loss.item(), "train/acc": acc, "step": step})

# log a final summary
run.summary["best_val_auc"] = best_auc

A few tips:

Use namespaces like train/loss or val/auc to group charts automatically
Add tags like "lr-finder" or "fp16" so you can filter runs later
Use run.summary[...] for one-off results you want to see on the run card

// Log Images, Confusion Matrices, and Custom Plots

wandb.log({
    "val/confusion": wandb.plot.confusion_matrix(
        preds=preds, y_true=y_true, class_names=classes)
})

You can also save any Matplotlib plot:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(history)
wandb.log({"training/curve": fig})

// Version Datasets and Models With Artifacts

Artifacts answer questions like, “Which exact files did this run use?” and “What did we train?” No more final_final_v3.parquet mysteries.

import wandb

run = wandb.init(project="kdn-mlops")

# Create a dataset artifact (run once per version)
raw = wandb.Artifact("imdb_reviews", type="dataset", description="raw dump v1")
raw.add_dir("data/raw") # or add_file("path")
run.log_artifact(raw)

# Later, consume the latest version
artifact = run.use_artifact("imdb_reviews:latest")
data_dir = artifact.download() # folder path pinned to a hash

Log your model the same way:

import torch
import wandb

run = wandb.init(project="kdn-mlops")

model_path = "models/resnet18.pt"
torch.save(model.state_dict(), model_path)

model_art = wandb.Artifact("sentiment-resnet18", type="model")
model_art.add_file(model_path)
run.log_artifact(model_art)

Now, the lineage is obvious: this model came from that data, under this code commit.

// Tables for Evaluations and Error Analysis

wandb.Table is a light dataframe for results, predictions, and slices.

table = wandb.Table(columns=["id", "text", "pred", "true", "prob"])
for r in batch_results:
    table.add_data(r.id, r.text, r.pred, r.true, r.prob)
wandb.log({"eval/preds": table})

Filter the table in the UI to find failure patterns (e.g., short reviews, rare classes, etc.).

// Hyperparameter Sweeps

Define a search space in YAML, launch agents, and let W&B coordinate.

# sweep.yaml
method: bayes
metric: {name: val/auc, goal: maximize}
parameters:
  lr: {min: 1e-5, max: 1e-2}
  batch: {values: [32, 64, 128]}
  dropout: {min: 0.0, max: 0.5}

Start the sweep:

wandb sweep sweep.yaml # returns a SWEEP_ID
wandb agent // # run 1+ agents

Your training script should read wandb.config for lr, batch, etc. The dashboard shows top trials, parallel coordinates, and the best config.

# Drop-In Integrations

Pick the one you use and keep moving.

// PyTorch Lightning

from pytorch_lightning.loggers import WandbLogger
logger = WandbLogger(project="kdn-mlops")
trainer = pl.Trainer(logger=logger, max_epochs=10)

// Keras

import wandb
from wandb.keras import WandbCallback

wandb.init(project="kdn-mlops", config={"epochs": 10})
model.fit(X, y, epochs=wandb.config.epochs, callbacks=[WandbCallback()])

// Scikit-learn

from sklearn.metrics import roc_auc_score
wandb.init(project="kdn-mlops", config={"C": 1.0})
# ... fit model
wandb.log({"val/auc": roc_auc_score(y_true, y_prob)})

# Model Registry and Staging

Think of the registry as a named shelf for your best models. You push an artifact once, then manage aliases like staging or production so downstream code can pull the right one without guessing file paths.

run = wandb.init(project="kdn-mlops")
art = run.use_artifact("sentiment-resnet18:latest")
registry = wandb.sdk.artifacts.model_registry.ModelRegistry()
entry = registry.push(art, name="sentiment-classifier")
entry.aliases.add("staging")

Flip the alias when you promote a new build. Consumers always read sentiment-classifier:production.

# Reproducibility Checklist

Configs: Store every hyperparameter in wandb.config
Code and commit: Use wandb.init(settings=wandb.Settings(code_dir=".")) to snapshot code or rely on CI to attach the git SHA
Environment: Log requirements.txt or the Docker tag and include it in an artifact
Seeds: Log them and set them

Minimal seed helper:

def set_seeds(s=42):
    import random, numpy as np, torch
    random.seed(s)
    np.random.seed(s)
    torch.manual_seed(s)
    torch.cuda.manual_seed_all(s)

# Collaboration and Sharing Without Screenshots

Add notes and tags so teammates can search. Use Reports to stitch charts, tables, and commentary into a link you can drop in Slack or a PR. Stakeholders can follow along without opening a notebook.

# CI and Automation Tips

Run wandb agent on training nodes to execute sweeps from CI
Log a dataset artifact after your ETL job; train jobs can depend on that version explicitly
After evaluation, promote model aliases (staging → production) in a small post-step
Pass WANDB_API_KEY as a secret and group related runs with WANDB_RUN_GROUP

# Privacy and Reliability Tips

Use private projects by default for teams
Use offline mode for air-gapped runs. Train normally, then wandb sync later

export WANDB_MODE=offline

Don’t log raw PII. If needed, hash IDs before logging.
For large files, store them as artifacts instead of attaching them to wandb.log.

# Common Snags (and Quick Fixes)

“My run didn’t log anything.” The script may have crashed before wandb.finish() was called. Also, check that you haven’t set WANDB_DISABLED=true in your environment.
Logging feels slow. Log scalars at each step, but save heavy assets like images or tables for the end of an epoch. You can also pass commit=False to wandb.log() and batch multiple logs together.
Seeing duplicate runs in the UI? If you are restarting from a checkpoint, set id and resume="allow" in wandb.init() to continue the same run.
Experiencing mystery data drift? Put every dataset snapshot into an Artifact and pin your runs to explicit versions.

# Pocket Cheatsheet

// 1. Start a Run

wandb.init(project="proj", config=cfg, tags=["baseline"])

// 2. Log Metrics, Images, or Tables

wandb.log({"train/loss": loss, "img": [wandb.Image(img)]})

// 3. Version a Dataset or Model

art = wandb.Artifact("name", type="dataset")
art.add_dir("path")
run.log_artifact(art)

// 4. Consume an Artifact

path = run.use_artifact("name:latest").download()

// 5. Run a Sweep

wandb sweep sweep.yaml && wandb agent //

# Wrapping Up

Start small: initialize a run, log a few metrics, and push your model file as an artifact. When that feels natural, add a sweep and a short report. You’ll end up with reproducible experiments, traceable data and models, and a dashboard that explains your work without a slideshow.

Josep Ferrer is an analytics engineer from Barcelona. He graduated in physics engineering and is currently working in the data science field applied to human mobility. He is a part-time content creator focused on data science and technology. Josep writes on all things AI, covering the application of the ongoing explosion in the field.