DistrictDataLabs/baleen
An automated ingestion service for blogs to construct a corpus for NLP research.
PythonMIT
Issues
- 1
conect with mongodb
#97 opened by nikolandrush - 1
- 1
Export to directory other than '.' fails
#95 opened by agodbehere - 3
Add load from csv
#90 opened by janetriley - 2
Move html sanitization to Post
#87 opened by janetriley - 8
Baleen add2venv
#77 opened by bbengfort - 1
Export Compressed Posts
#91 opened by bbengfort - 2
move sanitize to its own exporter option
#89 opened by janetriley - 4
Update baleen github repo url in docs
#80 opened by janetriley - 1
export commandline options
#68 opened by echolabstech - 3
- 5
PEP8 cleanup
#83 opened by janetriley - 0
- 20
Update to use Python 3.5
#48 opened by janetriley - 0
Configurable Scheduling
#79 opened by will2041 - 1
Examples for documentation
#78 opened by rebeccabilbro - 0
- 1
README Markdown messed up
#76 opened by bbengfort - 0
Formalize Mongo Schema
#75 opened by will2041 - 0
Use Timeout Decorator
#74 opened by will2041 - 3
Unicode decode error
#73 opened by bbengfort - 2
Docker image is empty
#53 opened by janetriley - 3
Update Docker image to Python 3.5
#55 opened by bbengfort - 2
Add version number to footer
#45 opened by bahadasx - 0
- 0
document exporter commandline options
#69 opened by echolabstech - 0
Make posts.htmlize() smarter
#65 opened by echolabstech - 6
commit seed file to /fixtures
#50 opened by echolabstech - 2
Fetch/Ingest Timeout
#22 opened by bbengfort - 4
Update Quickstart documentation
#49 opened by janetriley - 1
Handle mongo connection refused error
#21 opened by bbengfort - 0
- 1
Python 3.5 Support
#51 opened by bbengfort - 1
Currently running status screen a bit wonky
#46 opened by bbengfort - 0
Baleen Export: Citation and License
#47 opened by bbengfort - 0
Better Export
#24 opened by bbengfort - 0
Baleen Corpus Reader
#39 opened by bbengfort - 2
Trouble getting installing feedparser
#40 opened by bahadasx - 1
Segmentation Fault --> 404 Error on Status
#38 opened by bbengfort - 10
Quick time display
#35 opened by bbengfort - 1
Time zone and humanized numbers
#36 opened by bbengfort - 0
Latest post appears incorrect
#37 opened by bbengfort - 0
Bootstrapify /status
#34 opened by bbengfort - 1
Memory Crash
#32 opened by bbengfort - 1
short urls
#31 opened by bbengfort - 0
Deploy web application
#26 opened by bbengfort - 0
Add bootstrap to web view
#27 opened by bbengfort - 1
Add Flask config from Baleen config
#28 opened by bbengfort - 1
Create console command to run server
#30 opened by bbengfort - 0
Move www lib into baleen for easier import
#29 opened by bbengfort