|
|
--- |
|
|
title: RAG Observability Platform |
|
|
emoji: π |
|
|
colorFrom: blue |
|
|
colorTo: indigo |
|
|
sdk: docker |
|
|
pinned: false |
|
|
app_port: 7860 |
|
|
--- |
|
|
|
|
|
# RAG Observability Platform π |
|
|
|
|
|
# RAG Observability Platform - Project Summary |
|
|
|
|
|
## Project Overview |
|
|
|
|
|
The **RAG Observability Platform** is a production-grade Retrieval-Augmented Generation (RAG) system that demonstrates advanced MLOps practices and hybrid cloud-local deployment strategies. It combines cutting-edge ML inference optimization (Apple Silicon GPU) with MLOps observability frameworks for enterprise-ready applications. |
|
|
|
|
|
--- |
|
|
|
|
|
## What This Project Does |
|
|
|
|
|
### Core Functionality |
|
|
1. **Local RAG Pipeline (Mac M4)** |
|
|
- Ingests unstructured text documents |
|
|
- Chunks documents using recursive text splitting |
|
|
- Generates embeddings via sentence-transformers (optimized for Apple Silicon via MPS acceleration) |
|
|
- Stores embeddings in ChromaDB (local vector database) |
|
|
- Retrieves relevant context and generates answers using Llama 3.2 3B model via MLX |
|
|
|
|
|
2. **Cloud Deployment (Hugging Face Spaces)** |
|
|
- Docker containerization for reproducible deployment |
|
|
- Automatic fallback to CPU-based inference when MLX unavailable |
|
|
- Streamlit web UI for interactive chat with documents |
|
|
- Graceful degradation: maintains functionality across platforms |
|
|
|
|
|
3. **Experiment Tracking (Dagshub + MLflow)** |
|
|
- Logs all ingestion runs with parameters and metrics |
|
|
- Centralized experiment monitoring from local machine |
|
|
- Version control for code and data via Git + DVC |
|
|
- Remote MLflow server for team collaboration |
|
|
|
|
|
### Technical Highlights |
|
|
- **Cross-Platform Optimization**: Native M4 GPU (via MLX) for local development; CPU fallback for cloud |
|
|
- **Infrastructure as Code**: Docker + UV for reproducible environments |
|
|
- **Modern Python Stack**: LangChain (LCEL), Pydantic, asyncio-ready |
|
|
- **MLOps Best Practices**: Experiment tracking, dependency management, secrets handling |
|
|
--- |
|
|
|
|
|
## Key Highlight |
|
|
|
|
|
1. **GPU Optimization**: Understand when to use specialized tools (MLX for Apple Silicon) vs. standard libraries (PyTorch) |
|
|
2. **Cross-Platform Development**: Device abstraction, graceful fallbacks, testing on multiple architectures |
|
|
3. **Dependency Management**: Using UV for faster resolution, managing optional dependencies (local vs. cloud groups) |
|
|
4. **MLOps Practices**: Experiment tracking, versioning data + code, secrets management |
|
|
5. **Production Deployment**: Docker best practices, environment variable injection, port mapping |
|
|
6. **Modern Python**: Type hints, LangChain LCEL (functional composition), error handling |
|
|
7. **Troubleshooting**: Resolved Python version mismatches, binary file handling in Git, device compatibility issues |
|
|
|
|
|
--- |
|
|
|
|
|
## Why This Project Stands Out |
|
|
|
|
|
- **Full Stack**: From local GPU optimization to cloud deployment |
|
|
- **Senior-Level Considerations**: |
|
|
- Device compatibility across platforms |
|
|
- Graceful degradation (MLX β Transformers fallback) |
|
|
- Secrets management without pushing `.env` |
|
|
- Experiment observability |
|
|
- **Modern Tooling**: UV (faster than pip), MLX (Apple Silicon optimization), LangChain LCEL (declarative chains) |
|
|
- **Problem Solving**: Resolved real-world issues (ONNX version compatibility, Docker base image mismatch, GPU device detection) |
|
|
|
|
|
--- |
|
|
|
|
|
## GitHub/Portfolio Presentation |
|
|
|
|
|
**Repository Structure** (visible in your GitHub): |
|
|
``` |
|
|
rag-observability-platform/ |
|
|
βββ src/ |
|
|
β βββ ingestion/ (document loading, chunking, embedding) |
|
|
β βββ retrieval/ (RAG chain with LCEL) |
|
|
β βββ generation/ (MLX wrapper, device handling) |
|
|
βββ app/frontend/ (Streamlit UI) |
|
|
βββ Dockerfile (Cloud deployment) |
|
|
βββ pyproject.toml (UV dependency management) |
|
|
βββ README.md (project documentation) |
|
|
``` |
|
|
|
|
|
**Git History** (visible in commits): |
|
|
- Clean, semantic commits showing progression |
|
|
- Branching strategy: `master` β `mvp` β `frontend`/`backend` |
|
|
- Demonstrates collaborative workflow understanding |
|
|
|
|
|
--- |
|
|
|
|
|
1. **"Why MLX instead of PyTorch?"** |
|
|
- MLX is optimized for Apple Silicon; PyTorch CPU mode is 10x slower on M4 |
|
|
|
|
|
2. **"How do you handle the MLX import error in Docker?"** |
|
|
- Try-except with fallback to transformers; dynamic device selection |
|
|
|
|
|
3. **"Why use Dagshub for this portfolio project?"** |
|
|
- Demonstrates understanding of MLOps practices; shows ability to connect local experiments to remote tracking |
|
|
|
|
|
4. **"What would you do at scale?"** |
|
|
- Move to managed inference (HF Inference API), DVC for larger datasets, Kubernetes for orchestration |
|
|
|
|
|
--- |
|
|
|
|
|
|