SafeTensors.jl

This packages loads data stored in safetensor format. Since Python is row-major and Julia is column-major, the dimensions are permuted such the tensor has the same shape as in python, but everything is correctly ordered. This includes a performance penalty in sense that we cannot be completely copy-free.

The main function is load_safetensors which returns a Dict{String,V} where keys are names of tensors and values are tensors. An example from runtests is as follows

julia> using SafeTensors

julia> d = load_safetensors("test/model.safetensors")
Dict{String, Array} with 27 entries:
  "int32_357"   => Int32[0 7 … 21 28; 35 42 … 56 63; 70 77 … 91 98;;; 1 8 … 22 29…
  "uint8_3"     => UInt8[0x00, 0x01, 0x02]
  "float16_35"  => Float16[0.0 1.0 … 3.0 4.0; 5.0 6.0 … 8.0 9.0; 10.0 11.0 … 13.0…
  "bool_3"      => Bool[0, 1, 0]
  "int64_3"     => [0, 1, 2]
  "int64_35"    => [0 1 … 3 4; 5 6 … 8 9; 10 11 … 13 14]
  "float32_357" => Float32[0.0 7.0 … 21.0 28.0; 35.0 42.0 … 56.0 63.0; 70.0 77.0 …
  "bool_35"     => Bool[0 1 … 1 0; 1 0 … 0 1; 0 1 … 1 0]
  "float32_35"  => Float32[0.0 1.0 … 3.0 4.0; 5.0 6.0 … 8.0 9.0; 10.0 11.0 … 13.0…
  "float32_3"   => Float32[0.0, 1.0, 2.0]
  "uint8_35"    => UInt8[0x00 0x01 … 0x03 0x04; 0x05 0x06 … 0x08 0x09; 0x0a 0x0b …
  "float16_3"   => Float16[0.0, 1.0, 2.0]
  "int16_357"   => Int16[0 7 … 21 28; 35 42 … 56 63; 70 77 … 91 98;;; 1 8 … 22 29…
  "int16_3"     => Int16[0, 1, 2]
  "float64_357" => [0.0 7.0 … 21.0 28.0; 35.0 42.0 … 56.0 63.0; 70.0 77.0 … 91.0 …
  "uint8_357"   => UInt8[0x00 0x07 … 0x15 0x1c; 0x23 0x2a … 0x38 0x3f; 0x46 0x4d …
  "float16_357" => Float16[0.0 7.0 … 21.0 28.0; 35.0 42.0 … 56.0 63.0; 70.0 77.0 …
  "int32_3"     => Int32[0, 1, 2]
  "int16_35"    => Int16[0 1 … 3 4; 5 6 … 8 9; 10 11 … 13 14]
  "int8_357"    => Int8[0 7 … 21 28; 35 42 … 56 63; 70 77 … 91 98;;; 1 8 … 22 29;…
  "int8_35"     => Int8[0 1 … 3 4; 5 6 … 8 9; 10 11 … 13 14]
  "bool_357"    => Bool[0 1 … 1 0; 1 0 … 0 1; 0 1 … 1 0;;; 1 0 … 0 1; 0 1 … 1 0; …
  "float64_35"  => [0.0 1.0 … 3.0 4.0; 5.0 6.0 … 8.0 9.0; 10.0 11.0 … 13.0 14.0]
  "int8_3"      => Int8[0, 1, 2]
  "int64_357"   => [0 7 … 21 28; 35 42 … 56 63; 70 77 … 91 98;;; 1 8 … 22 29; 36 …
  "int32_35"    => Int32[0 1 … 3 4; 5 6 … 8 9; 10 11 … 13 14]
  "float64_3"   => [0.0, 1.0, 2.0]

It can also perform a lazy loading with SafeTensors.deserialize("model.safetensors") which mmap the file and return a Dict-like object:

julia> tensors = SafeTensors.deserialize("test/model.safetensors"; mmap = true #= default to `true`=#);

julia> tensors["float32_35"]
3×5 mappedarray(ltoh, PermutedDimsArray(reshape(reinterpret(Float32, view(::Vector{UInt8}, 0x0000000000000ef5:0x0000000000000f30)), 5, 3), (2, 1))) with eltype Float32:
  0.0   1.0   2.0   3.0   4.0
  5.0   6.0   7.0   8.0   9.0
 10.0  11.0  12.0  13.0  14.0

Serialization is also supported:

julia> using Random, BFloat16s

julia> weights = Dict("W"=>randn(BFloat16, 3, 5), "b"=>rand(BFloat16, 3))
Dict{String, Array{BFloat16}} with 2 entries:
  "W" => [0.617188 0.695312 … 0.390625 -2.0; -0.65625 -0.617188 … 0.652344 0.244141; 0.226562 2.70312 … -0.174805 -0.7773…
  "b" => [0.111816, 0.566406, 0.283203]

julia> f = tempname();

julia> SafeTensors.serialize(f, weights)

julia> loaded = SafeTensors.deserialize(f);

julia> loaded["W"] ≈ weights["W"]
true

julia> SafeTensors.serialize(f, weights, Dict("Package"=>"SafeTensors.jl", "version"=>"1"))

julia> loaded = SafeTensors.deserialize(f);

julia> loaded.metadata
Dict{String, String} with 2 entries:
  "Package" => "SafeTensors.jl"
  "version" => "1"

Working with gpu:

julia> loaded["W"]
3×5 mappedarray(ltoh, PermutedDimsArray(reshape(reinterpret(BFloat16, view(::Vector{UInt8}, 0x00000000000000b9:0x00000000000000d6)), 5, 3), (2, 1))) with eltype BFloat16:
  0.542969    0.201172   1.38281    -0.255859  -1.55469
  0.172852   -0.949219   0.0561523  -1.34375   -0.206055
 -0.0854492   1.17969   -0.265625   -0.871094   2.25

julia> using CUDA; CUDA.allowscalar(false)

julia> CuArray(loaded["W"])
3×5 CuArray{BFloat16, 2, CUDA.Mem.DeviceBuffer}:
  0.542969    0.201172   1.38281    -0.255859  -1.55469
  0.172852   -0.949219   0.0561523  -1.34375   -0.206055
 -0.0854492   1.17969   -0.265625   -0.871094   2.25

julia> gpu_weights = Dict("W"=>CuArray(loaded["W"]), "b"=>CuArray(loaded["b"]))
Dict{String, CuArray{BFloat16, N, CUDA.Mem.DeviceBuffer} where N} with 2 entries:
  "W" => [0.542969 0.201172 … -0.255859 -1.55469; 0.172852 -0.949219 … -1.34375 -0.206055; -0.0854492 1.17969 … -0.871094…
  "b" => BFloat16[0.871094, 0.773438, 0.703125]

julia> f = tempname();

julia> SafeTensors.serialize(f, gpu_weights)

julia> SafeTensors.deserialize(f)
SafeTensors.SafeTensor{SubArray{UInt8, 1, Vector{UInt8}, Tuple{UnitRange{UInt64}}, true}} with 2 entries:
  "W" => BFloat16[0.542969 0.201172 … -0.255859 -1.55469; 0.172852 -0.949219 … -1.34375 -0.206055; -0.0854492 1.17969 … -…
  "b" => BFloat16[0.871094, 0.773438, 0.703125]

chengchingwen/SafeTensors.jl

SafeTensors.jl