Is there a way to deploy using a Hugging Face inference endpoint?
Looking for an easy way to spool up briefly for testing! Could also be another deployment method.
Howdy @iyodev :
A) I've just enabled hosted inference.
- Will only work with an access token (and if you've paid and gotten access - which you have, I believe!)
- The free inference won't work because the model is larger than 10 GB in size.
B) Deploying on runpod is probably an easy option. You can try this template. Make sure to read the ReadMe and use an access token.
Thanks Ronan! Do you happen to have an example of how to use the hosted inference for this large of a model? Or do you mean I should create an inference endpoint?
Yeah the hosted inference won't work for models bigger than 10 GB . I think it maybe works if you have a HuggingFace paid plan, but I'm not sure.
Yes, create an inference endpoint, or deploy using runpod.
Thank you, the inference endpoint is working for me. Do you happen to know if any frameworks like langchain, llamaindex, litellm, etc already have the necessary formatting for function calling baked in?
unfortunately, I don't know @iyodev , but appreciate you keeping me posted if you gain any insights. Any learnings can inform a v3 for function calling.