chat-with-your-data / LOCAL_SETUP.md
sanchitshaleen
Initial deployment of RAG with Gemma-3 to Hugging Face Spaces
4aec76b

πŸš€ Local Development Setup (Without Docker)

Prerequisites

Before starting, ensure you have:

  • Python 3.12+ installed (python --version)
  • Ollama running locally with models installed
  • PostgreSQL running locally
  • Redis running locally (optional, for chat history)

Step 1: Set Up Virtual Environment

# Navigate to project root
cd /path/to/rag-with-gemma3

# Activate existing venv
source venv/bin/activate

# Or create a new one if needed
python3.12 -m venv venv_local
source venv_local/bin/activate

# Verify Python version
python --version  # Should be 3.12+

Step 2: Install Dependencies

# Upgrade pip
pip install --upgrade pip

# Install all requirements
pip install -r requirements.txt

# Verify key installations
python -c "import langchain; import streamlit; import fastapi; print('βœ… All dependencies installed')"

Step 3: Configure Environment Variables

Create a .env file in the project root:

# Create .env file
cat > .env << 'EOF'
# DATABASE
DATABASE_URL=postgresql://raguser:ragpass@localhost:5432/ragdb

# REDIS (optional, for chat history)
REDIS_URL=redis://localhost:6379/0

# OLLAMA
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=mxbai-embed-large:latest
LLM_MODEL=gemma3:latest

# LLM CONFIG
TEMPERATURE=0.7
MAX_TOKENS=2048
CONTEXT_SIZE=4096

# HISTORY BACKEND
HISTORY_BACKEND=memory  # or 'redis' if Redis is running

# VECTOR DATABASE
VECTOR_DB_PERSIST_DIR=./user_faiss
VECTOR_DB_INDEX_NAME=index.faiss

# ENVIRONMENT
ENV_TYPE=dev
EOF

cat .env

Step 4: Start Required Services

Option A: Using Homebrew (macOS)

# Start PostgreSQL
brew services start postgresql

# Start Redis (optional)
brew services start redis

# Start Ollama (if not already running)
ollama serve &

Option B: Using Docker (Just Services, No App)

# Start only the databases (not the app)
docker run -d \
  -p 5432:5432 \
  -e POSTGRES_USER=raguser \
  -e POSTGRES_PASSWORD=ragpass \
  -e POSTGRES_DB=ragdb \
  postgres:15

docker run -d \
  -p 6379:6379 \
  redis:latest

# For Ollama, pull required models
ollama pull mxbai-embed-large:latest
ollama pull gemma3:latest

Verify Services Are Running

# PostgreSQL
psql -h localhost -U raguser -d ragdb -c "SELECT 1"  # Should return 1

# Redis
redis-cli ping  # Should return PONG

# Ollama
curl http://localhost:11434/api/tags  # Should list models

Step 5: Initialize Database

# Create tables in PostgreSQL
python << 'EOF'
import sys
sys.path.insert(0, 'server')

from pg_db import Base, engine

# Create all tables
Base.metadata.create_all(bind=engine)
print("βœ… Database tables created successfully")
EOF

Step 6: Start the Backend (FastAPI)

In Terminal 1:

# Activate venv
source venv/bin/activate

# Navigate to server directory
cd server

# Start FastAPI
uvicorn server:app --host 127.0.0.1 --port 8000 --reload

# Output should show:
# INFO:     Uvicorn running on http://127.0.0.1:8000
# INFO:     Application startup complete

Test the backend:

# In another terminal
curl http://localhost:8000/health  # If you have a health endpoint
# or
curl http://localhost:8000/docs  # FastAPI Swagger UI

Step 7: Start the Frontend (Streamlit)

In Terminal 2:

# Activate venv
source venv/bin/activate

# Start Streamlit
streamlit run app.py --server.port 8501

# Output should show:
# You can now view your Streamlit app in your browser.
# Local URL: http://localhost:8501

Step 8: Access the Application

Open your browser and navigate to:

Component URL
Frontend (Streamlit) http://localhost:8501
API Documentation http://localhost:8000/docs
API Redoc http://localhost:8000/redoc

Complete Startup Script

Create start_local.sh:

#!/bin/bash

# Color output
GREEN='\033[0;32m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Activate venv
source venv/bin/activate

# Check services
echo -e "${BLUE}Checking required services...${NC}"

# PostgreSQL check
if ! psql -h localhost -U raguser -d ragdb -c "SELECT 1" > /dev/null 2>&1; then
    echo "❌ PostgreSQL not running. Start with: brew services start postgresql"
    exit 1
fi
echo -e "${GREEN}βœ… PostgreSQL running${NC}"

# Redis check
if ! redis-cli ping > /dev/null 2>&1; then
    echo "⚠️  Redis not running (optional). Start with: brew services start redis"
fi

# Ollama check
if ! curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
    echo "❌ Ollama not running. Start with: ollama serve"
    exit 1
fi
echo -e "${GREEN}βœ… Ollama running${NC}"

# Start backend
echo -e "${BLUE}Starting FastAPI backend...${NC}"
cd server
uvicorn server:app --host 127.0.0.1 --port 8000 --reload &
BACKEND_PID=$!
echo -e "${GREEN}βœ… Backend started (PID: $BACKEND_PID)${NC}"
sleep 2

# Start frontend
echo -e "${BLUE}Starting Streamlit frontend...${NC}"
cd ..
streamlit run app.py --server.port 8501 &
FRONTEND_PID=$!
echo -e "${GREEN}βœ… Frontend started (PID: $FRONTEND_PID)${NC}"

# Display access URLs
echo ""
echo -e "${GREEN}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo -e "${GREEN}βœ… System is ready!${NC}"
echo -e "${GREEN}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo ""
echo -e "Frontend: ${BLUE}http://localhost:8501${NC}"
echo -e "API Docs: ${BLUE}http://localhost:8000/docs${NC}"
echo ""
echo "Press Ctrl+C to stop all services"
echo ""

# Wait for both processes
wait

Make it executable and run:

chmod +x start_local.sh
./start_local.sh

Troubleshooting

Port Already in Use

# Find process using port 8000
lsof -i :8000
# Kill it
kill -9 <PID>

# Or use different port
uvicorn server:app --host 127.0.0.1 --port 8001

PostgreSQL Connection Error

# Check if PostgreSQL is running
brew services list

# Start PostgreSQL
brew services start postgresql

# Or verify credentials
psql -h localhost -U raguser -d ragdb

Ollama Connection Error

# Start Ollama
ollama serve

# Pull required models
ollama pull mxbai-embed-large:latest
ollama pull gemma3:latest

# Check available models
ollama list

Import Errors

# Reinstall dependencies
pip install --force-reinstall -r requirements.txt

# Clear cache
pip cache purge

# Check Python version
python --version  # Should be 3.12+

Module Not Found Errors

# Ensure PYTHONPATH includes server directory
export PYTHONPATH="${PYTHONPATH}:/path/to/rag-with-gemma3/server"

# Verify imports
python -c "import llm_system; print('βœ… llm_system imports successfully')"

Development Tips

1. Use Hot Reload

Both FastAPI (--reload) and Streamlit auto-reload on file changes. Just save and refresh!

2. Monitor Logs

# Backend logs
tail -f /tmp/uvicorn.log

# Frontend logs
streamlit run app.py --logger.level=debug

3. Database Queries

# Connect to PostgreSQL
psql -h localhost -U raguser -d ragdb

# List tables
\dt

# Query vector IDs
SELECT * FROM user_files LIMIT 5;

4. Test API Endpoints

# Upload a file
curl -X POST http://localhost:8000/upload \
  -F "user_id=test_user" \
  -F "file=@/path/to/document.pdf"

# Embed file
curl -X POST http://localhost:8000/embed \
  -H "Content-Type: application/json" \
  -d '{"user_id": "test_user", "file_name": "document.pdf"}'

# Query RAG
curl -X POST http://localhost:8000/rag \
  -H "Content-Type: application/json" \
  -d '{"session_id": "test_user", "query": "What is this document about?"}'

Performance Expectations (Local)

Operation Time Notes
File Upload 1-2s Depends on file size
Text Extraction (OCR) 30-60s For scanned PDFs
Embedding 5-10s Per document
Cache Hit <100ms Repeated queries
RAG Generation 3-5s With caching
First Response 45-60s Full pipeline

Next Steps

Once running locally:

  1. Upload Documents - Test with different file formats
  2. Ask Questions - Try various query types
  3. Monitor Performance - Check response times in logs
  4. Adjust Settings - Modify timeouts in .env if needed
  5. Explore API - Use Swagger at http://localhost:8000/docs

Environment Variables Reference

# Database
DATABASE_URL              # PostgreSQL connection string
REDIS_URL                # Redis connection string

# LLM & Embeddings
OLLAMA_BASE_URL          # Ollama server URL
EMBEDDING_MODEL          # Embedding model name
LLM_MODEL                # Language model name
TEMPERATURE              # Model temperature (0-1)
MAX_TOKENS               # Max response length
CONTEXT_SIZE             # Model context window

# Vector Database
VECTOR_DB_PERSIST_DIR    # Where to store vector DB
VECTOR_DB_INDEX_NAME     # Index filename

# History Backend
HISTORY_BACKEND          # 'memory' or 'redis'
HISTORY_TTL_SECONDS      # Chat history expiration

# Environment
ENV_TYPE                 # 'dev' or 'prod'
LOG_LEVEL                # 'DEBUG', 'INFO', 'WARNING', 'ERROR'

You're all set! Your RAG system is now running locally with full development capabilities. Happy coding! πŸš€