Jonna Matthiesen
AI & ML interests
None yet
Recent Activity
reacted
to
AINovice2005's
post with š about 2 hours ago
Pro tip: If you are finetuning any model with tensorboard logs enabled, be sure to upload them to HF Hub as event artifacts, they can be viewed instantly. š
I previously remembered this done in the notus model release: https://huggingface.co/argilla/notus-7b-v1/tensorboard
Examples:
https://huggingface.co/AINovice2005/ModernBERT-base-lora-cicflow-1m-r8/tensorboard
https://huggingface.co/AINovice2005/ModernBERT-base-lora-cicflow-1m-r4/tensorboard
https://huggingface.co/AINovice2005/ModernBERT-base-lora-cicflow-1m-r16
cc: @davidberenstein1957 replied to their post about 2 hours ago
š FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference
š Check out our latest FlashHead-enabled model: https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
š§© Seamless integration with vllm:
```
docker run --rm -it \
--network host \
--shm-size=8g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--runtime=nvidia \
--name=vllm-serve \
-e HF_TOKEN=hf_*** \
-e HF_HOME=/root/.cache/huggingface \
embedl/vllm:latest-jetson-orin-flashhead \
vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead" \
--max-model-len 8192 \
--gpu-memory-utilization 0.75 \
--max-num-seqs 2 \
--trust-remote-code
```
published an
article
about 9 hours ago
FlashHead: Accelerating Language Model Inference ~ *Efficient drop-in replacement for the classification head*