Add Fully Loaded Serving section with Ax MiPRO integration

e1be71a verified 2 months ago

20 kB

	---
	license: gemma
	language:
	- en
	base_model:
	- google/functiongemma-270m-it
	pipeline_tag: text-generation
	tags:
	- function-calling
	- infrastructure
	- devops
	- litertlm
	---

	# FunctionGemma Infrastructure Tools v8

	A fine-tuned [FunctionGemma 270M](https://huggingface.co/google/functiongemma-270m-it) model for infrastructure error diagnosis and remediation. Achieves 100% accuracy on 7 infrastructure tools when using the correct tool definitions.

	## Model Details

	- Base Model: google/functiongemma-270m-it
	- Format: LiteRT-LM (.litertlm) - optimized for on-device inference
	- Quantization: INT8 (Q8)
	- Size: ~271MB
	- Training: 50 epochs on 10,500 examples (1,500 per tool)

	## Supported Tools

	\| Tool \| Description \| Use Case \|
	\|------\|-------------\|----------\|
	\| `enableCors` \| Enable CORS for a specific origin \| CORS policy errors, blocked cross-origin requests \|
	\| `updateConnectionUrl` \| Update service connection URL \| ECONNREFUSED errors, localhost connection issues in containers \|
	\| `setEnvVar` \| Set environment variable \| Missing configuration, undefined env vars \|
	\| `addHostMapping` \| Add hostname to IP mapping \| DNS resolution (ENOTFOUND) errors \|
	\| `increaseMemory` \| Increase memory limit \| OOMKilled errors, out of memory crashes \|
	\| `increaseTimeout` \| Increase timeout value \| 504 Gateway Timeout, connection timeout errors \|
	\| `restartService` \| Restart a service \| Stuck processes, stale data after deployment \|

	## Usage with LiteRT-LM

	### Download the Model

	```bash
	# Using huggingface-cli
	huggingface-cli download macmacmacmac/functiongemma-nextjs functiongemma-infra-v8_q8_ekv1024.litertlm

	# Or using Python
	from huggingface_hub import hf_hub_download
	model_path = hf_hub_download(
	repo_id="macmacmacmac/functiongemma-nextjs",
	filename="functiongemma-infra-v8_q8_ekv1024.litertlm"
	)
	```

	### Required Tool Definitions

	Important: You must use these exact tool definitions for optimal accuracy. The model was trained with these specific descriptions.

	```javascript
	const tools = [
	{
	type: "function",
	function: {
	name: "enableCors",
	description: "Enable CORS for a specific origin to fix blocked cross-origin requests.",
	parameters: {
	type: "object",
	properties: {
	origin: { type: "string", description: "The origin to allow (e.g., http://localhost:3000)" },
	methods: { type: "string", description: "Allowed HTTP methods (e.g., GET,POST,PUT,DELETE)" }
	},
	required: ["origin"]
	}
	}
	},
	{
	type: "function",
	function: {
	name: "updateConnectionUrl",
	description: "Update a service connection URL to fix ECONNREFUSED errors, typically changing localhost to the correct service hostname.",
	parameters: {
	type: "object",
	properties: {
	service: { type: "string", description: "The service to update (e.g., database, redis, api)" },
	hostname: { type: "string", description: "The correct hostname to connect to" },
	port: { type: "integer", description: "The port number to connect to" }
	},
	required: ["service", "hostname", "port"]
	}
	}
	},
	{
	type: "function",
	function: {
	name: "setEnvVar",
	description: "Set an environment variable to fix missing configuration errors.",
	parameters: {
	type: "object",
	properties: {
	name: { type: "string", description: "Environment variable name (e.g., DATABASE_URL, API_KEY)" },
	value: { type: "string", description: "The value to set" }
	},
	required: ["name", "value"]
	}
	}
	},
	{
	type: "function",
	function: {
	name: "addHostMapping",
	description: "Add a hostname to IP mapping to fix DNS resolution (ENOTFOUND) errors.",
	parameters: {
	type: "object",
	properties: {
	hostname: { type: "string", description: "The hostname to map" },
	ip: { type: "string", description: "The IP address to map to" }
	},
	required: ["hostname", "ip"]
	}
	}
	},
	{
	type: "function",
	function: {
	name: "increaseMemory",
	description: "Increase memory limit for a service to fix OOMKilled errors.",
	parameters: {
	type: "object",
	properties: {
	service: { type: "string", description: "The service/container/pod name" },
	memoryMb: { type: "integer", description: "Memory limit in megabytes" }
	},
	required: ["service", "memoryMb"]
	}
	}
	},
	{
	type: "function",
	function: {
	name: "increaseTimeout",
	description: "Increase timeout value to fix 504 Gateway Timeout or connection timeout errors.",
	parameters: {
	type: "object",
	properties: {
	service: { type: "string", description: "The service to configure" },
	timeoutMs: { type: "integer", description: "Timeout value in milliseconds" }
	},
	required: ["service", "timeoutMs"]
	}
	}
	},
	{
	type: "function",
	function: {
	name: "restartService",
	description: "Restart a service to apply configuration changes or fix a stuck process.",
	parameters: {
	type: "object",
	properties: {
	service: { type: "string", description: "The service/container/pod name to restart" }
	},
	required: ["service"]
	}
	}
	}
	];
	```

	### Example Usage with dad-express

	```javascript
	const { FunctionGemmaEngine } = require('dad-express');

	const engine = new FunctionGemmaEngine({
	modelPath: './functiongemma-infra-v8_q8_ekv1024.litertlm',
	tools: JSON.stringify(tools)
	});

	// Diagnose an error
	const result = await engine.call('Container api was OOMKilled - out of memory');
	console.log(result.tool_calls[0].function);
	// { name: 'increaseMemory', arguments: { service: 'api', memoryMb: 1024 } }
	```

	## Training Data

	The model was trained on 10,500 synthetic examples covering common infrastructure errors:

	\| Error Pattern \| Tool \| Examples \|
	\|--------------\|------\|----------\|
	\| CORS policy errors \| enableCors \| 1,500 \|
	\| ECONNREFUSED errors \| updateConnectionUrl \| 1,500 \|
	\| Missing env vars \| setEnvVar \| 1,500 \|
	\| DNS/ENOTFOUND errors \| addHostMapping \| 1,500 \|
	\| OOMKilled errors \| increaseMemory \| 1,500 \|
	\| Timeout errors \| increaseTimeout \| 1,500 \|
	\| Stuck services \| restartService \| 1,500 \|

	### Sample Training Examples

	```
	"CORS error: No 'Access-Control-Allow-Origin' header from http://localhost:3000" → enableCors
	"Error: connect ECONNREFUSED 127.0.0.1:5432 - database connection failed" → updateConnectionUrl
	"Missing required environment variable: DATABASE_URL" → setEnvVar
	"getaddrinfo ENOTFOUND db" → addHostMapping
	"Container api was OOMKilled" → increaseMemory
	"504 Gateway Timeout from backend" → increaseTimeout
	"nginx container is not responding" → restartService
	```



	## Fully Loaded Serving

	Fully Loaded Serving is an end-to-end intelligent error remediation pipeline that runs entirely on-device. It combines:

	1. Low-latency vector embeddings (EmbeddingGemma) for streaming log classification
	2. Semantic clustering to group similar errors/issues/outliers
	3. Function calling (FunctionGemma) to automatically diagnose and fix infrastructure issues
	4. Prompt optimization via [Ax](https://github.com/ax-llm/ax) with MiPRO for continuous improvement

	### Architecture

	```
	┌─────────────────────────────────────────────────────────────────────────┐
	│ Next.js Application │
	├─────────────────────────────────────────────────────────────────────────┤
	│ stdout/stderr ──▶ Log Stream ──▶ dad-express middleware │
	│ │ │
	│ ┌─────────────────────┼──────────────────────┐ │
	│ │ ▼ │ │
	│ │ ┌──────────────────────────────────┐ │ │
	│ │ │ EmbeddingGemma (~5ms) │ │ │
	│ │ │ 768-dim vector per log line │ │ │
	│ │ └──────────────┬───────────────────┘ │ │
	│ │ │ │ │
	│ │ ▼ │ │
	│ │ ┌──────────────────────────────────┐ │ │
	│ │ │ Semantic Clustering (cosine) │ │ │
	│ │ │ • Group similar errors │ │ │
	│ │ │ • Detect outliers │ │ │
	│ │ │ • Identify recurring patterns │ │ │
	│ │ └──────────────┬───────────────────┘ │ │
	│ │ │ │ │
	│ │ ▼ │ │
	│ │ ┌──────────────────────────────────┐ │ │
	│ │ │ FunctionGemma (~50ms/call) │ │ │
	│ │ │ → enableCors, setEnvVar, etc. │ │ │
	│ │ └──────────────┬───────────────────┘ │ │
	│ │ │ │ │
	│ │ ▼ │ │
	│ │ ┌──────────────────────────────────┐ │ │
	│ │ │ Auto-Remediation Layer │ │ │
	│ │ │ Execute fix or notify developer │ │ │
	│ │ └──────────────────────────────────┘ │ │
	│ │ │ │
	│ │ LiteRT-LM (on-device, ~300MB RAM) │ │
	│ └────────────────────────────────────────────┘ │
	└─────────────────────────────────────────────────────────────────────────┘
	```

	### Ax Integration with MiPRO

	[Ax](https://github.com/ax-llm/ax) is a TypeScript DSPy-style framework for declarative AI programming. dad-express provides `AxLiteRTProvider` to run Ax signatures entirely on-device:

	```typescript
	import { AxGen } from "@ax-llm/ax";
	import { AxLiteRTProvider, EmbeddingEngine, FunctionGemmaEngine } from "dad-express";

	// Create on-device provider with both embedding and chat models
	const provider = new AxLiteRTProvider({
	chat: {
	modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm",
	tools: infrastructureTools, // The 7 tools from this repo
	},
	embed: {
	modelPath: "./models/embedding_gemma.tflite",
	tokenizerPath: "./models/tokenizer.model",
	},
	});

	// Define Ax signature for error diagnosis (MiPRO-optimizable)
	const diagnoseError = new AxGen(`
	errorMessage:string "The error log line",
	errorCluster:string? "Similar errors seen recently"
	->
	diagnosis:string "Root cause analysis",
	toolName:string "Which infrastructure tool to call",
	confidence:class "high, medium, low"
	`);

	// Run inference on-device
	const result = await diagnoseError.forward(provider, {
	errorMessage: "CORS error from http://localhost:3000",
	errorCluster: "3 similar CORS errors in last 5 minutes",
	});

	console.log(result);
	// { diagnosis: "Frontend origin not in allowed list",
	// toolName: "enableCors",
	// confidence: "high" }
	```

	### Example: Hosting Next.js with Fully Loaded Serving

	```typescript
	// server.ts - Next.js with intelligent error remediation
	import { createApp, FunctionGemmaEngine, EmbeddingEngine } from "dad-express";
	import { spawn } from "child_process";

	// Infrastructure tools (exact definitions for 100% accuracy)
	const tools = [
	{ type: "function", function: { name: "enableCors", description: "Enable CORS for a specific origin to fix blocked cross-origin requests.", parameters: { type: "object", properties: { origin: { type: "string", description: "The origin to allow" } }, required: ["origin"] } } },
	{ type: "function", function: { name: "updateConnectionUrl", description: "Update a service connection URL to fix ECONNREFUSED errors.", parameters: { type: "object", properties: { service: { type: "string" }, hostname: { type: "string" }, port: { type: "integer" } }, required: ["service", "hostname", "port"] } } },
	{ type: "function", function: { name: "setEnvVar", description: "Set an environment variable to fix missing configuration errors.", parameters: { type: "object", properties: { name: { type: "string" }, value: { type: "string" } }, required: ["name", "value"] } } },
	{ type: "function", function: { name: "addHostMapping", description: "Add a hostname to IP mapping to fix DNS resolution errors.", parameters: { type: "object", properties: { hostname: { type: "string" }, ip: { type: "string" } }, required: ["hostname", "ip"] } } },
	{ type: "function", function: { name: "increaseMemory", description: "Increase memory limit for a service to fix OOMKilled errors.", parameters: { type: "object", properties: { service: { type: "string" }, memoryMb: { type: "integer" } }, required: ["service", "memoryMb"] } } },
	{ type: "function", function: { name: "increaseTimeout", description: "Increase timeout value to fix 504 Gateway Timeout errors.", parameters: { type: "object", properties: { service: { type: "string" }, timeoutMs: { type: "integer" } }, required: ["service", "timeoutMs"] } } },
	{ type: "function", function: { name: "restartService", description: "Restart a service to apply changes or fix stuck processes.", parameters: { type: "object", properties: { service: { type: "string" } }, required: ["service"] } } },
	];

	// Initialize on-device models
	const embedEngine = new EmbeddingEngine({
	modelPath: "./models/embedding_gemma.tflite",
	tokenizerPath: "./models/tokenizer.model",
	});

	const functionGemma = new FunctionGemmaEngine({
	modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm",
	tools: JSON.stringify(tools),
	});

	// Error clustering state
	const errorClusters = new Map<string, { embedding: Float32Array; count: number; lastSeen: Date }>();

	async function classifyAndCluster(logLine: string): Promise<string \| null> {
	// Skip non-error lines
	if (!logLine.match(/error\|fail\|exception\|timeout\|refused\|denied/i)) {
	return null;
	}

	// Generate embedding (~5ms on CPU)
	const embedding = await embedEngine.encodeAsync(logLine);

	// Find similar errors via cosine similarity
	let bestMatch: string \| null = null;
	let bestSimilarity = 0.85; // Threshold for clustering

	for (const [clusterId, cluster] of errorClusters) {
	const similarity = EmbeddingEngine.cosineSimilarity(embedding, cluster.embedding);
	if (similarity > bestSimilarity) {
	bestSimilarity = similarity;
	bestMatch = clusterId;
	}
	}

	if (bestMatch) {
	// Update existing cluster
	const cluster = errorClusters.get(bestMatch)!;
	cluster.count++;
	cluster.lastSeen = new Date();
	return bestMatch;
	}

	// Create new cluster
	const clusterId = `cluster_${Date.now()}`;
	errorClusters.set(clusterId, { embedding, count: 1, lastSeen: new Date() });
	return clusterId;
	}

	async function diagnoseAndFix(errorLog: string, clusterId: string): Promise<void> {
	const cluster = errorClusters.get(clusterId);

	// Call FunctionGemma for diagnosis (~50ms)
	const result = await functionGemma.sendMessage(errorLog);

	if (result.functionCalls && result.functionCalls.length > 0) {
	const call = result.functionCalls[0];
	console.log(`[AutoFix] Detected ${cluster?.count \|\| 1}x: ${call.name}`);
	console.log(`[AutoFix] Args: ${JSON.stringify(call.arguments)}`);

	// Execute remediation (in production, this would call actual infrastructure APIs)
	switch (call.name) {
	case "enableCors":
	console.log(`[AutoFix] Would enable CORS for: ${call.arguments.origin}`);
	break;
	case "restartService":
	console.log(`[AutoFix] Would restart: ${call.arguments.service}`);
	break;
	case "increaseMemory":
	console.log(`[AutoFix] Would increase memory for ${call.arguments.service} to ${call.arguments.memoryMb}MB`);
	break;
	// ... handle other tools
	}
	}
	}

	// Create dad-express app
	const app = createApp();

	// API routes
	app.get("/health", () => ({ status: "ok", models: { embed: true, functionGemma: true } }));

	app.get("/clusters", () => {
	const clusters = [];
	for (const [id, cluster] of errorClusters) {
	clusters.push({ id, count: cluster.count, lastSeen: cluster.lastSeen });
	}
	return clusters;
	});

	// Start Next.js as child process with log monitoring
	const nextProcess = spawn("npx", ["next", "start"], {
	stdio: ["inherit", "pipe", "pipe"],
	env: { ...process.env, PORT: "3001" },
	});

	// Stream stdout
	nextProcess.stdout.on("data", (data) => {
	const line = data.toString().trim();
	console.log(`[next] ${line}`);
	});

	// Stream stderr with intelligent processing
	nextProcess.stderr.on("data", async (data) => {
	const line = data.toString().trim();
	console.log(`[next:err] ${line}`);

	// Classify and cluster error
	const clusterId = await classifyAndCluster(line);

	if (clusterId) {
	// Diagnose and auto-fix
	await diagnoseAndFix(line, clusterId);
	}
	});

	// Start dad-express on separate port for monitoring
	app.listen(4000, () => {
	console.log("dad-express monitoring on http://localhost:4000");
	console.log("Next.js app on http://localhost:3001");
	});
	```

	### Key Benefits

	\| Feature \| Latency \| Memory \| Cloud Calls \|
	\|---------\|---------\|--------\|-------------\|
	\| EmbeddingGemma \| ~5ms/embed \| ~50MB \| 0 \|
	\| FunctionGemma \| ~50ms/call \| ~271MB \| 0 \|
	\| Semantic clustering \| <1ms \| Varies \| 0 \|
	\| Total pipeline \| ~60ms \| ~350MB \| 0 \|

	- Zero cloud dependency: All inference runs locally via LiteRT-LM
	- Sub-100ms latency: Fast enough for real-time log processing
	- Privacy-preserving: Error logs never leave the device
	- Continuous improvement: Use Ax MiPRO to optimize prompts over time

	## Limitations

	- Optimized for the 7 specific infrastructure tools listed above
	- Requires exact tool definitions for best accuracy
	- May not generalize well to error patterns not seen in training

	## License

	This model inherits the [Gemma license](https://ai.google.dev/gemma/terms) from the base model.