This library parses a subset of GTFS files from ZIP archives.
Projects using this library will benefit from:
- parallel processing (unzip + parse)
- high-performance CSV parser
Supported GTFS files are
- agency.txt
- routes.txt
- trips.txt (in parallel)
- stops.txt
- stop_times.txt (in parallel)
- feed_info.txt
- calendar_dates.txt
- calendar.txt
- transfers.txt
The project also servers as a complex use-case for the sesseltjonna-csv in combination with unzip-csv projects. Notable features:
- Large files are unzipped and split into multiple pieces for multithreaded processing, and
- intermediate processors are used to store referential relationships (without use of synchronization), then
- post-processing hooks are used to manage state and resolve referential relationships
Bugs, feature suggestions and help requests can be filed with the issue-tracker.
The project is implemented in Java and built using Maven. The project is available on the central Maven repository.
Example dependency config:
<dependency>
<groupId>com.github.skjolber.gtfs-databinding</groupId>
<artifactId>gtfs-databinding</artifactId>
<version>1.0.2</version>
</dependency>
Use a builder to parse a GTFS archive:
GtfsFeed feed = GtfsFeedBuilder.newInstance().withFile(file).build();
The current implementation is tested against the OneBusAway GTFS Reference parser.
Taking advantage of both a dynamically generated CSV databinding and parallelization improves parse time about 4-5 times compared to the reference implementation (which, to be fair, is not the fastest out there).
If you have any questions, comments or improvement suggestions, please file an issue or submit a pull-request.
Feel free to connect with me on LinkedIn, see also my Github page.
- 1.0.2: Bump unzip / CSV library versions
- 1.0.0: Initial version