DataSpread: A Spreadsheet-Database Hybrid System

dataspread-fiverr2-cropped

Introduction

DataSpread is a spreadsheet-database hybrid system, with a spreadsheet frontend, and a database backend. Thus, DataSpread inherits the flexibility and ease-of-use of spreadsheets, as well as the scalability and power of databases. A paper describing DataSpread's architecture, design decisions, and optimization can be found here. DataSpread is a multi-year project, supported by the National Science Foundation via award number 1633755.

Several key design innovations in DataSpread include, but are not limited to:

  • A flexible hybrid data model to represent spreadsheet data within a database
  • Speculative fetching to fetch additional data beyond the user's current spreadsheet window
  • Asynchronous formulae evaluation thereby not requiring the users to wait for long running operations to complete
  • A navigation panel which enables the users to explore tabular spreadsheet data and obtain additional details on demand via aggregation operations.

Full Documentation

See the Wiki for full documentation on APIs, developer environment setup, and other information.

Version

The current version is 0.5.1.

Getting Started

You can directly use DataSpread via our cloud-hosted site (Temporarily offline).

DataSpread can be deployed locally through Docker (recommended) or through Apache Tomcat. To start a new book, import a csv file or use the /sample.csv provided.

Docker Method

Required Software

Deploying DataSpread locally.

  1. Clone the DataSpread repository and go the directory in your terminal. Alternatively, you can download the source as a zip or tar.gz.

  2. Install Docker. Docker makes it easy to separate applications from underlying infrastructure so setting up and running applications is quick and easy.

  3. Start Docker and start the application. It should be accessible at http://localhost:8080/. Stop the application with CTRL+C.

    docker-compose up
    

Rebuilding Changes

Any changes to the code can be rebuilt by adding the build tag when starting the application.

docker-compose up --build

If there are any errors or the docker image needs to be built from scratch, run the following.

docker-compose down
docker-compose build --no-cache
docker-compose up

Data Persistance

Data is automatically persisted in a Docker volume across shutdowns. Erase the persisted data by running the following.

docker-compose down -v

Additional Information

Docker uses the /docker-compose.yml to startup the application. For more information about how the application is deployed, look at /docker-compose.yml, /Dockerfile, and the files in the /build-db and /build-web folders.

Tomcat Method

To host DataSpread locally on Tomcat, you can either use one of the pre-built WAR files, available here, or build the WAR file yourself from the source.

Required Software

Building Instructions (To generate a WAR file)

  1. Clone the DataSpread repository. Alternatively, you can download the source as a zip or tar.gz.

  2. Use maven to build the war file using the following command. After the build completes, the WAR is available at webapp/target/DataSpread.war.

    mvn clean install
    

Deploying DataSpread locally.

  1. Install PostgreSQL database. Postgres.app is a quick way to get PostgreSQL working on Mac. For other operating systems check out the guides here.

  2. Create a database and an user who has access to the database. Note the database name, username and password. Typically when you have PostgreSQL installed locally the password is blank.

  3. Install Apache Tomcat. You can use the guide here. Make a note of the directory where tomcat is installed. This is known as TOMCAT_HOME in all documentation.

  4. Update the Tomcat configuration. You need to update the following file, which is present in conf folder under TOMCAT_HOME folder.

    1. context.xml by adding the following text at the end of the file before the closing XML tag.
    <Resource name="jdbc/ibd" auth="Container"
              type="javax.sql.DataSource" driverClassName="org.postgresql.Driver"
              url="jdbc:postgresql://127.0.0.1:5432/<database_name>"
              username="<username>" password="<password>"
                  maxTotal="20" maxIdle="10" maxWaitMillis="-1" defaultAutoCommit="false" accessToUnderlyingConnectionAllowed="true"/>
    

    Replace <database_name>, <username> and <password> with your PostgreSQL's database name, user name and password respectively.

  5. Copy postgresql-42.1.4.jar (Download from here) to lib folder under TOMCAT_HOME. It is crucial to have the exact version of this file.

  6. Deploy the WAR file within Tomcat as the root application. This can be done via Tomcat's web interface by undeploying any application located at / and deploying the WAR file with the context path /. To do this manually, delete the webapps/ROOT folder under TOMCAT_HOME while the application is not running, copy the WAR file to the webapps folder, and rename it to ROOT.war.

  7. Now you are ready to run the program. Visit the url where Tomcat is installed. It will be typically http://localhost:8080/ for a local install.

Contributing

To work with the DataSpread source code, follow the developer setup guide. Read the contributing guide before making a pull request. Contributions are welcome!

For bugs and feedback, please use the GitHub Issues.

License

MIT