Post
14
I recently created my first storage bucket to store experiment data of my performance analysis of 15 tokenizers across 20 languages.
The setup is simple enough for a new product and can be scalable depending on the use-case ๐ค .
Bucket: https://huggingface.co/buckets/AINovice2005/tokenizer-benchmark
github gist: https://gist.github.com/ParagEkbote/b3877f667f84cbb9a27bdaca94ba662a
Article: https://medium.com/@paragekbote23/one-sentence-fifteen-tokenizers-a-tokenizer-benchmarking-pipeline-with-hf-storage-buckets-2e59790276fd
The setup is simple enough for a new product and can be scalable depending on the use-case ๐ค .
Bucket: https://huggingface.co/buckets/AINovice2005/tokenizer-benchmark
github gist: https://gist.github.com/ParagEkbote/b3877f667f84cbb9a27bdaca94ba662a
Article: https://medium.com/@paragekbote23/one-sentence-fifteen-tokenizers-a-tokenizer-benchmarking-pipeline-with-hf-storage-buckets-2e59790276fd