USCDataScience/sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
JavaApache-2.0
Issues
- 1
Sparkler not distributing work over nodes
#234 opened by buggtb - 3
- 19
- 3
Error from server at http://localhost:8983/solr/crawldb: ERROR: [doc=<>] unknown field 'contenthash'
#247 opened by ravindrabajpai - 0
Build fails: could not find com.browserup:browserup-proxy-core:jar:3.0.0-SNAPSHOT
#239 opened by thammegowda - 2
Exclude net.jpountz.lz4 lz4 from kafka-clients dependency in sparkler-app/pom.xml
#236 opened by lewismc - 4
Broken run script
#219 opened by buggtb - 0
Debugging Elasticsearch Connection
#229 opened by Kefaun2601 - 0
Unit Tests for Sparkler and Elasticsearch
#228 opened by slhsxcmy - 0
Failed to create thread
#227 opened by keiranFTW - 1
Writing Data to Elasticsearch Storage Engine
#224 opened by Kefaun2601 - 0
Fix sparkler CI build
#222 opened by lewismc - 2
Make storage engine pluggable
#196 opened by buggtb - 2
- 1
Elasticsearch for Sparkler - Factory Design Pattern
#218 opened by slhsxcmy - 1
Investigate pipeline frameworks
#203 opened by buggtb - 4
- 0
Elasticsearch for Sparkler - Maven Profiles
#215 opened by KilometersFan - 10
Sparkler Elasticsearch storage engine
#209 opened by lewismc - 2
Is there any benchmark Sparkler versus Nutch?
#186 opened by MobinRanjbar - 0
Sparkler cannot be executed on Databricks because sparkContext not pulled from sparkSession
#204 opened by mattvryan-github - 1
- 0
Improve plugins
#197 opened by buggtb - 0
Improve deployments for different architectures
#198 opened by buggtb - 0
Arm support
#200 opened by buggtb - 0
- 4
Add fetcher-default as a plugin
#181 opened by balashashanka - 3
Move Sparkler to sbt build
#184 opened by karanjeets - 0
Update CI so users can download built Sparkler package
#202 opened by buggtb - 0
Fix preview performance issues
#199 opened by buggtb - 1
Dashboard for banana
#188 opened by ravituduru - 0
Enable pagination in SCE
#195 opened by buggtb - 0
Fix basic SCE deployment
#194 opened by buggtb - 4
Argument '-i -1' does not work.
#185 opened by MobinRanjbar - 1
Update to Spark 3.x , scala 2.12.x
#190 opened by thammegowda - 1
Crawler success but data is not populated into dashboard and output file
#165 opened by kavitasharma21 - 2
Standalone Docker image
#166 opened by buggtb - 0
Create Helm Chart for Sparkler
#168 opened by buggtb - 1
Push to docker repos on each travis build
#169 opened by buggtb - 2
silly question
#187 opened by vwoloszyn - 4
Unable to run jBrowser plugin
#160 opened by micheladennis - 0
- 1
data from JS pages is not returned
#174 opened by chaitra-rs - 1
Not an issue
#176 opened by chaitra-rs - 4
Newbie question
#177 opened by chrome83 - 1
push chrome fetcher code
#173 opened by buggtb - 0
Fix mvn build
#172 opened by buggtb - 5
Failed to construct kafka producer
#157 opened by misterpilou - 8
Removing sparkler-app-0.2.0-SNAPSHOT.jar
#159 opened by misterpilou - 1