Schema-to-case-class code generation for working with Avro in Scala.
avrohugger-core
: Generate source code at runtime for evaluation at a later step.avrohugger-tools
: Generate source code at the command line with the avrohugger-tools jar.
Alternative Distributions:
sbt-avrohugger
: Generate source code at compile time with an sbt plugin found here.avro2caseclass
: Generate source code from a web app, found here.
- Supported Formats:
Standard
,SpecificRecord
,Scavro
- Supported Datatypes
- Protocol support
- Doc support
- Usage
- Warnings
- Best Practices
- Future
- Testing
- Credits
-
Standard
Vanilla case classes (for use with Apache Avro'sGenericRecord
API, etc.) -
SpecificRecord
Case classes that implementSpecificRecordBase
and therefore have mutablevar
fields (for use with the Avro Specific API - Scalding, Spark, Avro, etc.). -
Scavro
Case classes with immutable fields, intended to wrap Java generated Avro classes (for use with the Scavro runtime, Java classes provided separately (see Scavro Plugin or sbt-avro)).
- INT → Int
- LONG → Long
- FLOAT → Float
- DOUBLE → Double
- STRING → String
- BOOLEAN → Boolean
- NULL → Null
- MAP → Map
- ENUM → scala.Enumeration, Java Enum. See Customizable Enum Style.
- BYTES → Array[Byte]
- FIXED → //TODO
- ARRAY → List,
generate-scavro
: Array. See Customizable Type Mapping. - UNION → Option
- RECORD → case class
-
.avdl
,.avpr
, and json protocol strings are generated as ADTs if they define more than one Scala definition. -
For
SpecificRecord
, if the protocol contains messages then no ADT is generated, and an RPC trait is generated instead.
-
.avdl
: Comments that begin with/**
are used as the documentation string for the type or field definition that follows the comment. -
.avsc
,.avpr
, and.avro
: Docs in Avro schemas are used to define a case class' ScalaDoc -
.scala
: ScalaDocs of case class definitions are used to define record and field docs
Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).
For Scala 2.10 and 2.11
"com.julianpeeters" %% "avrohugger-core" % "0.12.1"
Instantiate a Generator
with Standard
, Scavro
, or SpecificRecord
source
formats. Then use
tToFile(input: T, outputDir: String): Unit
or
tToStrings(input: T): List[String]
where 'T' can be File
, Schema
, or String
.
import avrohugger._
import format._
val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"
where an input File
can be .avro
, .avsc
, .avpr
, or .avdl
,
and where an input String
can be the string representation of an Avro schema,
protocol, IDL, or a set of case classes that you'd like to have implement
SpecificRecordBase
.
Avro 'array' is represented by Scala List
by default. array
can be
reassigned to either Array
or Vector
by instantiating a Generator
with a
custom type map:
val generator = new Generator(SpecificRecord, avroScalaCustomType = Map("array"->classOf[Array[_]]))
Namespaces can be reassigned by instantiating a Generator
with a custom
namespace map (please see warnings below):
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))
SpecificRecord
format requires that enums be represented as Java enums. By
default, Standard
and Scavro
formats use Scala enumerations, but can be
reassigned to 'case object' or 'java enum' by instantiating a Generator
with a custom enum style map:
val custom = Map("enum" -> "java enum")
val generator = new Generator(Standard, avroScalaCustomEnumStyle = custom)
Generate simple classes instead of case classes when fields.size > 22, useful for generating code for Scala 2.10 from large schemas.
val generator = new Generator(SpecificRecord, restrictedFieldNumber = true)
Download the avrohugger-tools jar for Scala 2.10 or Scala 2.11(>20MB!) and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir
:
'generate' generates Scala case class definitions:
java -jar /path/to/avrohugger-tools_2.11-0.12.1-assembly.jar generate schema user.avsc .
'generate-specific' generates definitions that extend SpecificRecordBase:
java -jar /path/to/avrohugger-tools_2.11-0.12.1-assembly.jar generate-specific schema user.avsc .
'generate-scavro' generates definitions that extend Scavro's AvroSerializable:
java -jar /path/to/avrohugger-tools_2.11-0.12.1-assembly.jar generate-scavro schema user.avsc .
Also available as an sbt plugin found here
that adds a generate
or generate-specific
task to compile
(an alternative
to macros).
Code generation is also available via a web app found here. Hosted at Heroku on a hobbyist account, so it may take ~20 seconds to fire up the first time.
-
If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (as in the Avro example above).
-
For the
SpecificRecord
format, generated case class fields must be mutable (var
) in order to be compatible with the SpecificRecord API. Note: If your framework allows 'GenericRecord', avro4s provides a type class that converts to and from immutable case classes cleanly (though seems to fail on maps and case object enums as of v1.4.3). -
When the input is a case class definition string, import statements are not supported, please use fully qualified type names if using records/classes from multiple namespaces.
-
By default, a schema's namespace is used as a package name. In the case of the Scavro output format, the default is the namespace with
model
appended. -
While Scavro format uses custom namespaces in a way that leaves it unaffected, most formats fail on schemas with records within unions (see [avro forum](see http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-td4032782.html)).
-
Avoid recursive schemas since they can cause compatibility issues if trying to flow data into a system that doesn't support them (e.g., Hive).
-
Use namespaces to ensure compatibility when importing into Java/Scala.
-
Use default field values in case of future schema evolution (further reading).
- Support more Avro types: fixed, decimal via logical types.
- Support for RPC using the Scavro format.
- Support for expanding Standard ADT strings into SpecificRecord and Scavro ADTs
The test
task will only run the tests in src/test
.
The scripted
task runs all tests in src/test
, as well as the serialization
tests in src/sbt-test
. Note: the scripted tests depend on a local version
of sbt-avrohugger
that needs to be published with the updated version of
avrohugger
that is to be tested.
Depends on Avro and Treehugger. avrohugger-tools
is based on avro-tools.
Contributors: