Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 · Is there a way to deploy using a Hugging Face inference endpoint?

Is there a way to deploy using a Hugging Face inference endpoint?

by iyodev - opened Nov 3, 2023

Discussion

iyodev

Nov 3, 2023

Looking for an easy way to spool up briefly for testing! Could also be another deployment method.

RonanMcGovern

Trelis org Nov 4, 2023

Howdy @iyodev :

A) I've just enabled hosted inference.

Will only work with an access token (and if you've paid and gotten access - which you have, I believe!)
The free inference won't work because the model is larger than 10 GB in size.

B) Deploying on runpod is probably an easy option. You can try this template. Make sure to read the ReadMe and use an access token.

iyodev

Nov 6, 2023

•

edited Nov 6, 2023

Thanks Ronan! Do you happen to have an example of how to use the hosted inference for this large of a model? Or do you mean I should create an inference endpoint?

RonanMcGovern

Trelis org Nov 7, 2023

Yeah the hosted inference won't work for models bigger than 10 GB . I think it maybe works if you have a HuggingFace paid plan, but I'm not sure.

Yes, create an inference endpoint, or deploy using runpod.

iyodev

Nov 9, 2023

Thank you, the inference endpoint is working for me. Do you happen to know if any frameworks like langchain, llamaindex, litellm, etc already have the necessary formatting for function calling baked in?

RonanMcGovern

Trelis org Nov 9, 2023

unfortunately, I don't know @iyodev , but appreciate you keeping me posted if you gain any insights. Any learnings can inform a v3 for function calling.

RonanMcGovern changed discussion status to closed Nov 16, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment