Weights & Biases: A KDnuggets Crash Course

0
4



Image by Author

 

If you train models beyond a single notebook, you’ve probably hit the same headaches: you tweak five knobs, rerun training, and by Friday you can’t remember which run produced the “good” ROC curve or which data slice you used. Weights & Biases (W&B) gives you a paper trail — metrics, configs, plots, datasets, and models — so you can answer what changed with evidence, not guesswork.

Below is a practical tour. It’s opinionated, light on ceremony, and geared for teams who want a clean experiment history without building their own platform. Let’s call it a no-fluff walkthrough.

 

Why W&B at All?

 
Notebooks grow into experiments. Experiments multiply. Soon you’re asking: Which run used that data slice? Why is today’s ROC curve higher? Can I reproduce last week’s baseline?

W&B gives you a place to:

  • Log metrics, configs, plots, and system stats
  • Version datasets and models with artifacts
  • Run hyperparameter sweeps
  • Share dashboards without screenshots

You can start tiny and layer features when needed.

 

Setup in 60 Seconds

 
Start by installing the library and logging in with your API key. If you don’t have one yet, you can find it here.

pip install wandb
wandb login # paste your API key once

 

Weights & Biases: a KDnuggets crash course
Image by Author

 

// Minimal Sanity Check

import wandb, random, time

wandb.init(project="kdn-crashcourse", name="hello-run", config={"lr": 0.001, "epochs": 5})
for epoch in range(wandb.config.epochs):
    loss = 1.0 / (epoch + 1) + random.random() * 0.05
    wandb.log({"epoch": epoch, "loss": loss})
    time.sleep(0.1)
wandb.finish()

 

Now you should see something like this:

 

Weights & Biases: a KDnuggets crash course
Image by Author

 

Now let’s go for the useful bits.

 

Tracking Experiments Properly

 

// Log Hyperparameters and Metrics

Treat wandb.config as the single source of truth for your experiment’s knobs. Give metrics clear names so charts auto-group.

cfg = dict(arch="resnet18", lr=3e-4, batch=64, seed=42)
run = wandb.init(project="kdn-mlops", config=cfg, tags=["baseline"])

# training loop ...
for step, (x, y) in enumerate(loader):
    # ... compute loss, acc
    wandb.log({"train/loss": loss.item(), "train/acc": acc, "step": step})

# log a final summary
run.summary["best_val_auc"] = best_auc

 

A few tips:

  • Use namespaces like train/loss or val/auc to group charts automatically
  • Add tags like "lr-finder" or "fp16" so you can filter runs later
  • Use run.summary[...] for one-off results you want to see on the run card

 

// Log Images, Confusion Matrices, and Custom Plots

wandb.log({
    "val/confusion": wandb.plot.confusion_matrix(
        preds=preds, y_true=y_true, class_names=classes)
})

 

You can also save any Matplotlib plot:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(history)
wandb.log({"training/curve": fig})

 

// Version Datasets and Models With Artifacts

Artifacts answer questions like, “Which exact files did this run use?” and “What did we train?” No more final_final_v3.parquet mysteries.

import wandb

run = wandb.init(project="kdn-mlops")

# Create a dataset artifact (run once per version)
raw = wandb.Artifact("imdb_reviews", type="dataset", description="raw dump v1")
raw.add_dir("data/raw") # or add_file("path")
run.log_artifact(raw)

# Later, consume the latest version
artifact = run.use_artifact("imdb_reviews:latest")
data_dir = artifact.download() # folder path pinned to a hash

 

Log your model the same way:

import torch
import wandb

run = wandb.init(project="kdn-mlops")

model_path = "models/resnet18.pt"
torch.save(model.state_dict(), model_path)

model_art = wandb.Artifact("sentiment-resnet18", type="model")
model_art.add_file(model_path)
run.log_artifact(model_art)

 

Now, the lineage is obvious: this model came from that data, under this code commit.

 

// Tables for Evaluations and Error Analysis

wandb.Table is a light dataframe for results, predictions, and slices.

table = wandb.Table(columns=["id", "text", "pred", "true", "prob"])
for r in batch_results:
    table.add_data(r.id, r.text, r.pred, r.true, r.prob)
wandb.log({"eval/preds": table})

 

Filter the table in the UI to find failure patterns (e.g., short reviews, rare classes, etc.).

 

// Hyperparameter Sweeps

Define a search space in YAML, launch agents, and let W&B coordinate.

# sweep.yaml
method: bayes
metric: {name: val/auc, goal: maximize}
parameters:
  lr: {min: 1e-5, max: 1e-2}
  batch: {values: [32, 64, 128]}
  dropout: {min: 0.0, max: 0.5}

 

Start the sweep:

wandb sweep sweep.yaml # returns a SWEEP_ID
wandb agent // # run 1+ agents

 

Your training script should read wandb.config for lr, batch, etc. The dashboard shows top trials, parallel coordinates, and the best config.

 

Drop-In Integrations

 
Pick the one you use and keep moving.

 

// PyTorch Lightning

from pytorch_lightning.loggers import WandbLogger
logger = WandbLogger(project="kdn-mlops")
trainer = pl.Trainer(logger=logger, max_epochs=10)

 

// Keras

import wandb
from wandb.keras import WandbCallback

wandb.init(project="kdn-mlops", config={"epochs": 10})
model.fit(X, y, epochs=wandb.config.epochs, callbacks=[WandbCallback()])

 

// Scikit-learn

from sklearn.metrics import roc_auc_score
wandb.init(project="kdn-mlops", config={"C": 1.0})
# ... fit model
wandb.log({"val/auc": roc_auc_score(y_true, y_prob)})

 

Model Registry and Staging

 
Think of the registry as a named shelf for your best models. You push an artifact once, then manage aliases like staging or production so downstream code can pull the right one without guessing file paths.

run = wandb.init(project="kdn-mlops")
art = run.use_artifact("sentiment-resnet18:latest")
registry = wandb.sdk.artifacts.model_registry.ModelRegistry()
entry = registry.push(art, name="sentiment-classifier")
entry.aliases.add("staging")

 

Flip the alias when you promote a new build. Consumers always read sentiment-classifier:production.

 

Reproducibility Checklist

 

  • Configs: Store every hyperparameter in wandb.config
  • Code and commit: Use wandb.init(settings=wandb.Settings(code_dir=".")) to snapshot code or rely on CI to attach the git SHA
  • Environment: Log requirements.txt or the Docker tag and include it in an artifact
  • Seeds: Log them and set them

Minimal seed helper:

def set_seeds(s=42):
    import random, numpy as np, torch
    random.seed(s)
    np.random.seed(s)
    torch.manual_seed(s)
    torch.cuda.manual_seed_all(s)

 

Collaboration and Sharing Without Screenshots

 
Add notes and tags so teammates can search. Use Reports to stitch charts, tables, and commentary into a link you can drop in Slack or a PR. Stakeholders can follow along without opening a notebook.

 

CI and Automation Tips

 

  • Run wandb agent on training nodes to execute sweeps from CI
  • Log a dataset artifact after your ETL job; train jobs can depend on that version explicitly
  • After evaluation, promote model aliases (stagingproduction) in a small post-step
  • Pass WANDB_API_KEY as a secret and group related runs with WANDB_RUN_GROUP

 

Privacy and Reliability Tips

 

  • Use private projects by default for teams
  • Use offline mode for air-gapped runs. Train normally, then wandb sync later
export WANDB_MODE=offline

 

  • Don’t log raw PII. If needed, hash IDs before logging.
  • For large files, store them as artifacts instead of attaching them to wandb.log.

 

Common Snags (and Quick Fixes)

 

  • “My run didn’t log anything.” The script may have crashed before wandb.finish() was called. Also, check that you haven’t set WANDB_DISABLED=true in your environment.
  • Logging feels slow. Log scalars at each step, but save heavy assets like images or tables for the end of an epoch. You can also pass commit=False to wandb.log() and batch multiple logs together.
  • Seeing duplicate runs in the UI? If you are restarting from a checkpoint, set id and resume="allow" in wandb.init() to continue the same run.
  • Experiencing mystery data drift? Put every dataset snapshot into an Artifact and pin your runs to explicit versions.

 

Pocket Cheatsheet

 

// 1. Start a Run

wandb.init(project="proj", config=cfg, tags=["baseline"])

 

// 2. Log Metrics, Images, or Tables

wandb.log({"train/loss": loss, "img": [wandb.Image(img)]})

 

// 3. Version a Dataset or Model

art = wandb.Artifact("name", type="dataset")
art.add_dir("path")
run.log_artifact(art)

 

// 4. Consume an Artifact

path = run.use_artifact("name:latest").download()

 

// 5. Run a Sweep

wandb sweep sweep.yaml && wandb agent //

 

Wrapping Up

 
Start small: initialize a run, log a few metrics, and push your model file as an artifact. When that feels natural, add a sweep and a short report. You’ll end up with reproducible experiments, traceable data and models, and a dashboard that explains your work without a slideshow.
 
 

Josep Ferrer is an analytics engineer from Barcelona. He graduated in physics engineering and is currently working in the data science field applied to human mobility. He is a part-time content creator focused on data science and technology. Josep writes on all things AI, covering the application of the ongoing explosion in the field.