chore: remove usage of load_tiktoken_bpe

The `load_tiktoken_bpe()` function depends on blobfile to load
tokenizer.model files. However, blobfile brings in pycryptodomex, which
is primarily used for JWT signing in GCP - functionality we don’t
require, as we always load tokenizers from local files. pycryptodomex
implements its own cryptographic primitives, which are known to be
problematic and insecure. While blobfile could potentially switch to the
more secure PyCA cryptography library, the project appears inactive, so
this transition may not happen soon. Fortunately, `load_tiktoken_bpe()`
is a simple function that just reads a BPE file and returns a dictionary
mapping byte sequences to their mergeable ranks. It’s straightforward
enough for us to implement ourselves.

Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is contained in:
Sébastien Han 2025-05-27 10:49:03 +02:00
parent a8f75d3897
commit b45cc42202
No known key found for this signature in database
6 changed files with 234 additions and 17 deletions

View file

@ -15,7 +15,6 @@ from llama_stack.providers.datatypes import (
META_REFERENCE_DEPS = [
"accelerate",
"blobfile",
"fairscale",
"torch",
"torchvision",