/ruby-multibase

:memo: Multibases is the ruby implementation of multiformats/multibase, a protocol for disambiguating the encoding of base-encoded (e.g., base32, base64, base58, etc.) binary appearing in text.

Primary LanguageRuby

Multibases

Build Status Gem Version MIT license Maintainability

Multibase is a protocol for disambiguating the encoding of base-encoded (e.g., base32, base64, base58, etc.) binary appearing in text.

Multibases is the ruby implementation of multiformats/multibase.

This gem can be used both for encoding into or decoding from multibase packed strings, as well as serve as a general purpose library to do BaseX encoding and decoding without adding the prefix.

🙌🏽 This is called multibases instead of the singular form, to stay consistent with the multihashes gem, which was forced to take a different name has multihash was already taken, which is also the case for multibase and others. In the future, this might be renamed to multiformats-base, with a backwards-compatible interface.

Installation

Add this line to your application's Gemfile:

gem 'multibases'

or alternatively if you would like to bring your own engines and not load any of the built-in ones:

gem 'multibases', require: 'multibases/bare'

And then execute:

$ bundle

Or install it yourself as:

$ gem install multibases

Usage

This is a low-level library, but high level implementations are provided. You can also bring your own encoder/decoder. The most important methods are:

  • Multibases.encode(encoding, data, engine?): encodes the given data with a built-in engine for encoding, or engine if it's given. Returns an Encoded PORO that has pack.
  • Multibases.unpack(packed): decodes a multibase packed string into an Encoded PORO that has decode.
  • Multibases::Encoded.pack: packs the multihash into a single string
  • Multibases::Encoded.decode(engine?): decodes the PORO's data using a built-in engine, or engine if it's given. Returns a decoded ByteArray.
encoded = Multibases.encode('base2', 'mb')
# => #<struct Multibases::Encoded
#             code="0", encoding="base2", length=16,
#             data=[Multibases::EncodedByteArray "0110110101100010"]>

encoded.pack
# => [Multibases::EncodedByteArray "00110110101100010"]


encoded = Multibases.unpack('766542')
# => #<struct Multibases::Encoded
#             code="7", encoding="base8", length=5,
#             data=[Multibases::EncodedByteArray "66542"]>

encoded.decode
# => [Multibases::DecodedByteArray "mb"]

This means that the flow of calls is as follows:

        data ➡️ (encode) ➡️ encoded data
encoded data ➡️ (pack)   ➡️ multibasestr

multibasestr ➡️ (unpack) ➡️ encoded data
encoded data ➡️ (decode) ➡️ data

Convenience methods are provided:

  • Multibases.pack(encoding, data, engine?): calls encode and then pack
  • Multibases.decode(packed, engine?): calls unpack and then decode
Multibases.pack('base2', 'mb')
# => [Multibases::EncodedByteArray "00110110101100010"]

ByteArrays and encoding

As you can see, the "final" methods output a ByteArray. These are simple DelegateClass wrappers around the array with bytes, which means that the hex encoding of hello is not actually stored as "f68656c6c6f":

packed = Multibases.pack('base16', 'hello')
# => [Multibases::EncodedByteArray "f68656c6c6f"]

packed.to_a # .__getobj__.dup
# => [102, 54, 56, 54, 53, 54, 99, 54, 99, 54, 102]

They override inspect and force the encoding to UTF-8 (in inspect), but you can use the convenience methods to use the correct encoding:

Note: If you're using pry and have not changed the printer, you naturally won't see the output as described above, but instead see the inner Array of bytes, always.

data = 'hello'.encode('UTF-16LE')
data.encoding
# => #<Encoding:UTF-16LE>

data.bytes
# => [104, 0, 101, 0, 108, 0, 108, 0, 111, 0]

packed = Multibases.pack('base16', data)
# => [Multibases::EncodedByteArray "f680065006c006c006f00"]

decoded = Multibases.decode(packed)
# => [Multibases::DecodedByteArray "h e l l o "]

decoded.to_s('UTF-16LE')
# => "hello"

Implementations

You can find the current multibase table here. At this moment, built-in engines are provided as follows:

encoding code description implementation
identity 0x00 8-bit binary bare
base1 1 unary (1111)
base2 0 binary (0101) base2 💨
base8 7 octal base_x
base10 9 decimal base_x
base16 f hexadecimal base16 💨
base16upper F hexadecimal base16 💨
base32hex v rfc4648 no padding - highest char base32
base32hexupper V rfc4648 no padding - highest char base32
base32hexpad t rfc4648 with padding base32
base32hexpadupper T rfc4648 with padding base32
base32 b rfc4648 no padding base32
base32upper B rfc4648 no padding base32
base32pad c rfc4648 with padding base32
base32padupper C rfc4648 with padding base32
base32z h z-base-32 (used by Tahoe-LAFS) base32
base58flickr Z base58 flicker base_x
base58btc z base58 bitcoin base_x
base64 m rfc4648 no padding base64 💨
base64pad M rfc4648 with padding - MIME enc base64 💨
base64url u rfc4648 no padding base64 💨
base64urlpad U rfc4648 with padding base64 💨

Those with a 💨 are marked because they are backed by a C implementation (using pack and unpack) and are therefore suposed to be blazingly fast. Those with a ✨ are marked because they have a custom implementation over the generic base_x implementation. It should be faster.

The version of the spec that this repository was last updated for is available via Multibases.multibase_version:

Multibases.multibase_version
# => "1.0.0"

Bring your own engine

The methods of multibases allow you to bring your own engine, and you can safe additional memory by only loading multibases/bare.

# Note: This is not how multibase was meant to work. It's supposed to only
#       convert the input from one base to another, and denote what that base
#       is, stored in the output. However, the system is _so_ flexible that this
#       works perfectly for any reversible transformation!
class EngineKlazz
  def initialize(*_)
  end

  def encode(plain)
    plain = plain.bytes unless plain.is_a?(Array)
    Multibases::EncodedByteArray.new(plain.reverse)
  end

  def decode(encoded)
    encoded = encoded.bytes unless encoded.is_a?(Array)
    Multibases::DecodedByteArray.new(encoded.reverse)
  end
end

Multibases.implement 'reverse', 'r', EngineKlazz, 'alphabet'
# => Initializes EngineKlazz with 'alphabet'

Multibases.pack('reverse', 'md')
# => [Multibases::EncodedByteArray "rdm"]

Multibases.decode('dm')
# => [Multibases::DecodedByteArray "md"]

# Alternatively, you can pass the instantiated engine to the appropriate
# function.
engine = EngineKlazz.new

# Mark the encoding as "existing" and attach a code
Multibases.implement 'reverse', 'r'

# Pack, using a custom engine
Multibases.pack('reverse', 'md', engine)
# => [Multibases::EncodedByteArray "rdm"]

Multibases.decode('rdm', engine)
# => [Multibases::DecodedByteArray "md"]

Using the built-in encoders/decoders

You can use the built-in encoders and decoders.

require 'multibases/base16'

Multibases::Base16.encode('foobar')
# => [Multibases::EncodedByteArray "666f6f626172"]

Multibases::Base16.decode('666f6f626172')
# => [Multibases::DecodedByteArray "foobar"]

These don't add the multibase prefix to the output and they use the canonical encode and decode nomenclature.

The base_x / BaseX encoder does not have a module function. You must instantiate it first. The result is an encoder that uses the base alphabet to determine its base. Currently padding is ❌ not supported for BaseX, but might be in a future update using a second argument or key.

require 'multibases/base_x'

Base3 = Multibases::BaseX.new('012')
# => [Multibases::Base3 alphabet="012" strict]

Base3.encode('foobar')
# => [Multibases::EncodedByteArray "112202210012121110020020001100"]

You can use the same technique to inject a custom alphabet. This can be used on the built-in encoders, even the ones that are not BaseX:

base = Multibases::Base2.new('.!')
# => [Multibases::Base2 alphabet=".!"]

base.encode('foo')
# [Multibases::EncodedByteArray ".!!..!!..!!.!!!!.!!.!!!!"]

base.decode('.!!...!..!!....!.!!!..!.')
# => [Multibases::DecodedByteArray "bar"]

All the built-in encoder/decoders take strings, arrays or byte-arrays as input.

expected = Multibases::Base16.encode("abc")
# => [Multibases::EncodedByteArray "616263"]

expected == Multibases::Base16.encode([97, 98, 99])
# => true

expected == Multibases::Base16.encode(Multibases::ByteArray.new("abc".bytes))
# => true

Related

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at SleeplessByte/ruby-multibase. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Shrine::ConfigurableStorage project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.