/dfs-datastores

Dead-simple vertical partitioning, compression, appends, and consolidation of data on a distributed filesystem.

Primary LanguageJavaBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

dfs-datastores Build Status

A dramatically simpler and more powerful way to store records on a distributed filesystem.

to include dfs-datastores in your project, add the following to project.clj:

[com.backtype/dfs-datastores "2.0.0"]

To run the tests:

lein do sub install, sub junit

How to import the project into Eclipse

After checkout run the commands above followed by:

lein eclipse
lein do sub eclipse

This will create the neccesary eclipse .project and .classpath files. Now go into Eclipse and import the project:

File -> Import... -> General -> Existing Projects into Workspace

In the Import Projects dialog deselect the bundle / parent project. The parent project only serves as a container and is not needed in Eclipse. Once the projects dfs-datastores and dfs-datastores-cascading are imported the build path needs to be adjusted. For this add the following directories to the build path (use context menu in the Package Explorer -> Build Path -> Use as Source Folder:

dfs-datastores -> src -> main -> java
dfs-datastores -> src -> test -> java

dfs-datastores-cascading -> src -> main -> java
dfs-datastores-cascading -> src -> test -> java

That's it. Now the projects can be edited.

Changelog

2.0.0

  • add Spark support (consolidate, coerce, balanced distcp can use Spark as engine)
  • improve file handling on AWS S3 file systems