Tracing, Monitoring

observability toolkit for reliable AI systems in production.

Why monitoring is necessary?

Track how you're users are using your application.
See if any bugs or errors faced in your logs
- Catch hallucinations: Monitor fact-check failure rates
- Identify degradation: Response latency increased 40% after model update
Cost Control
- Token consumption analysis: Why do some queries use 10x more tokens?
- Optimize expensive operations: Cache common RAG queries
Have a place to visulize the traces of AI app

Real-World Failure: Customer service bot started recommending competitors' products due to training data drift. Monitoring caught it in 3 hours vs. 3 weeks.

What is Tracing

Collect metrics (CPU & GPU resources) usage logs of what is user input and what is LLM's response

Monitoring Tool Showdown

LangSmith (Closed Source)

It's a closed source tool to do it with simple implementation https://www.langchain.com/langsmith

langgraph logo

Best for: Teams using LangChain ecosystem Strengths:

Deep integration with LangChain components
Visual debugging of complex chains
Performance analytics by model/version

Langfuse (Open Source)

https://langfuse.com/

Best for: Self-hosted or custom stacks Strengths:

MIT License - fully self-hostable
SDKs for Python/JS + OpenTelemetry support
Custom alerting (Slack/Email)