Image by Author | Ideogram.ai
# Introduction
When building large language model applications, tokens are money. If you’ve ever worked with an LLM like GPT-4, you’ve probably had that moment where you check the bill and think, “How did it get this high?!” Each API call you make consumes tokens, which directly impacts both latency and cost. But without tracking them, you have no idea where they’re being spent or how to optimize.
That’s where LangSmith comes in. It not only traces your LLM calls but also lets you log, monitor, and visualize token usage for every step in your workflow. In this guide, we’ll cover:
- Why token tracking matters?
- How to set up logging?
- How to visualize token consumption in the LangSmith dashboard?
# Why does Token Tracking Matter?
Token tracking matters because every interaction with a large language model has a direct cost tied to the number of tokens processed, both in your inputs and the model’s outputs. Without monitoring, small inefficiencies in prompts, unnecessary context, or redundant requests can silently inflate your bill and slow down performance.
By tracking tokens, you gain visibility into exactly where they’re being consumed. This way you can optimize prompts, streamline workflows, and maintain cost control. For example, if your chatbot is using 1,500 tokens per request, reducing that to 800 tokens can cut costs almost in half. The token tracking concept somehow works like:
# Setting Up LangSmith for Token Logging
// Step 1: Install Required Packages
pip3 install langchain langsmith transformers accelerate langchain_community
// Step 2: Make all necessary imports
import os
from transformers import pipeline
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langsmith import traceable
// Step 3: Configure Langsmith
Set your API key and project name:
# Replace with your API key
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"
os.environ["LANGCHAIN_PROJECT"] = "HF_FLAN_T5_Base_Demo"
os.environ["LANGCHAIN_TRACING_V2"] = "true"
# Optional: disable tokenizer parallelism warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"
// Step 4: Load a Hugging Face Model
Use a CPU-friendly model like google/flan-t5-base and enable sampling for more natural outputs:
model_name = "google/flan-t5-base"
pipe = pipeline(
"text2text-generation",
model=model_name,
tokenizer=model_name,
device=-1, # CPU
max_new_tokens=60,
do_sample=True, # enable sampling
temperature=0.7
)
llm = HuggingFacePipeline(pipeline=pipe)
// Step 5: Create a Prompt and Chain
Define a prompt template and connect it with your Hugging Face pipeline using LLMChain:
prompt_template = PromptTemplate.from_template(
"Explain gravity to a 10-year-old in about 20 words using a fun analogy."
)
chain = LLMChain(llm=llm, prompt=prompt_template)
// Step 6: Make the Function Traceable with LangSmith
Use the @traceable decorator to automatically log inputs, outputs, token usage, and runtime:
@traceable(name="HF Explain Gravity")
def explain_gravity():
return chain.run({})
// Step 7: Run the Function and Print Results
answer = explain_gravity()
print("\n=== Hugging Face Model Answer ===")
print(answer)
Output:
=== Hugging Face Model Answer ===
Gravity is a measure of mass of an object.
// Step 8: Check the Langsmith Dashboard
Go to smith.langchain.com → Tracing Projects. You will something as:
You can even see the cost associated with each project, which lets you analyse your billing. Now to see the usage of tokens and other insights, click on your project. And you will see:
The red box highlights and lists down the number of runs you have made to your project. Click on any run and you will see:
You can see various things here such as total tokens, latency, etc. Click on dashboard as shown below:
Now you can view graphs over time to track token usage trends, check average latency per request, compare input vs. output tokens, and identify peak usage periods. These insights help optimize prompts, manage costs, and improve model performance.
Please scroll down to view all the associated graphs with your project.
// Step 9: Explore the LangSmith Dashboard
You can analyse plenty of the insights such as:
- View Example Traces: Click on a trace to see detailed execution, including raw input, generated output, and performance metrics
- Inspect Individual Traces: For each trace, you can explore every step of execution, seeing prompts, outputs, token usage, and latency
- Check Token Usage & Latency: Detailed token counts and processing times help identify bottlenecks and optimize performance
- Evaluation Chains: Use LangSmith’s evaluation tools to test scenarios, track model performance, and compare outputs
- Experiment in Playground: Adjust parameters such as temperature, prompt templates, or sampling settings to fine-tune your model’s behavior
With this setup, you now have full visibility of your Hugging Face model runs, token usage, and overall performance in the LangSmith dashboard.
# How To Spot and Fix Token Hogs?
Once you’ve got logging, you can:
- See if prompts are too long
- Identify calls where the model is over-generating
- Switch to smaller models for cheaper tasks
- Cache responses to avoid duplicate requests
This is gold for debugging long chains or agents. Find the step eating the most tokens and fix it.
# Wrapping Up
This is how you can set up and use Langsmith. Logging token usage isn’t just about saving money, it’s about building smarter, more efficient LLM apps. The guide provides a foundation, you can learn more by exploring, experimenting, and analyzing your own workflows.
Kanwal Mehreen is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.