Augmentoolkit-DataSpecialist-v0.1-AWQ
This is a 4-bit AWQ quantized version of Heralax/Augmentoolkit-DataSpecialist-v0.1.
Motivation
This quantization was created specifically to enable efficient serving of Heralax's Data Specialist model using the vLLM engine.
While the original model is excellent, running it at full precision can be resource-intensive. This AWQ version allows for high-throughput inference on consumer hardware (such as single or dual RTX 3090s) while maintaining the model's specialized capabilities.
Methodology
This model was converted using the LLM Compressor library (the currently recommended method for vLLM).
- Quantization Scheme: W4A16 (4-bit weights, 16-bit activations)
- Group Size: 128
- Calibration Dataset: open_platypus
Credit
All credit for the training and architecture of the original model goes to Evan Armstrong (Heralax). This repository merely provides the quantized weights to facilitate broader adoption and easier deployment within the vLLM ecosystem.
How to Use with vLLM
from vllm import LLM, SamplingParams
# Load the model
llm = LLM(
model="bbarn4/Augmentoolkit-DataSpecialist-v0.1-AWQ",
quantization="awq"
)
# Generate
prompts = ["Hello, my name is"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print(f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}")
- Downloads last month
- 2
Model tree for bbarn4/Augmentoolkit-DataSpecialist-v0.1-AWQ
Base model
Heralax/Augmentoolkit-DataSpecialist-v0.1