metadata
title: RAG Observability Platform
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860
RAG Observability Platform π
RAG Observability Platform - Project Summary
Project Overview
The RAG Observability Platform is a production-grade Retrieval-Augmented Generation (RAG) system that demonstrates advanced MLOps practices and hybrid cloud-local deployment strategies. It combines cutting-edge ML inference optimization (Apple Silicon GPU) with MLOps observability frameworks for enterprise-ready applications.
What This Project Does
Core Functionality
Local RAG Pipeline (Mac M4)
- Ingests unstructured text documents
- Chunks documents using recursive text splitting
- Generates embeddings via sentence-transformers (optimized for Apple Silicon via MPS acceleration)
- Stores embeddings in ChromaDB (local vector database)
- Retrieves relevant context and generates answers using Llama 3.2 3B model via MLX
Cloud Deployment (Hugging Face Spaces)
- Docker containerization for reproducible deployment
- Automatic fallback to CPU-based inference when MLX unavailable
- Streamlit web UI for interactive chat with documents
- Graceful degradation: maintains functionality across platforms
Experiment Tracking (Dagshub + MLflow)
- Logs all ingestion runs with parameters and metrics
- Centralized experiment monitoring from local machine
- Version control for code and data via Git + DVC
- Remote MLflow server for team collaboration
Technical Highlights
- Cross-Platform Optimization: Native M4 GPU (via MLX) for local development; CPU fallback for cloud
- Infrastructure as Code: Docker + UV for reproducible environments
- Modern Python Stack: LangChain (LCEL), Pydantic, asyncio-ready
- MLOps Best Practices: Experiment tracking, dependency management, secrets handling
Key Highlight
- GPU Optimization: Understand when to use specialized tools (MLX for Apple Silicon) vs. standard libraries (PyTorch)
- Cross-Platform Development: Device abstraction, graceful fallbacks, testing on multiple architectures
- Dependency Management: Using UV for faster resolution, managing optional dependencies (local vs. cloud groups)
- MLOps Practices: Experiment tracking, versioning data + code, secrets management
- Production Deployment: Docker best practices, environment variable injection, port mapping
- Modern Python: Type hints, LangChain LCEL (functional composition), error handling
- Troubleshooting: Resolved Python version mismatches, binary file handling in Git, device compatibility issues
Why This Project Stands Out
- Full Stack: From local GPU optimization to cloud deployment
- Senior-Level Considerations:
- Device compatibility across platforms
- Graceful degradation (MLX β Transformers fallback)
- Secrets management without pushing
.env - Experiment observability
- Modern Tooling: UV (faster than pip), MLX (Apple Silicon optimization), LangChain LCEL (declarative chains)
- Problem Solving: Resolved real-world issues (ONNX version compatibility, Docker base image mismatch, GPU device detection)
GitHub/Portfolio Presentation
Repository Structure (visible in your GitHub):
rag-observability-platform/
βββ src/
β βββ ingestion/ (document loading, chunking, embedding)
β βββ retrieval/ (RAG chain with LCEL)
β βββ generation/ (MLX wrapper, device handling)
βββ app/frontend/ (Streamlit UI)
βββ Dockerfile (Cloud deployment)
βββ pyproject.toml (UV dependency management)
βββ README.md (project documentation)
Git History (visible in commits):
- Clean, semantic commits showing progression
- Branching strategy:
masterβmvpβfrontend/backend - Demonstrates collaborative workflow understanding
"Why MLX instead of PyTorch?"
- MLX is optimized for Apple Silicon; PyTorch CPU mode is 10x slower on M4
"How do you handle the MLX import error in Docker?"
- Try-except with fallback to transformers; dynamic device selection
"Why use Dagshub for this portfolio project?"
- Demonstrates understanding of MLOps practices; shows ability to connect local experiments to remote tracking
"What would you do at scale?"
- Move to managed inference (HF Inference API), DVC for larger datasets, Kubernetes for orchestration