unable to read the whole file when pipeline get reload
kaisecheng opened this issue · 0 comments
kaisecheng commented
When Logstash start with --config.reload.automatic
, the file input can ingest all data without any reload
However, if pipeline got reload in the middle of ingestion, let's say have already read 300 out of 600 lines, Logstash read the first 300 lines again and leave the rest unread.
- Version: 4.2.4
- LS Version: 7.12
- Operating System: macOS
- Config File (if you have sensitive info, please remove it):
- pipeline.id: SDH_650
pipeline.workers: 1
pipeline.batch.size: 5
config.string: |
input {
file {
path => "/650/merged.csv"
mode => "read"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns => ["id", "host", "fqdn", "IP", "mac", "role", "type", "make", "model", "oid", "fid", "time"]
remove_field => ["path", "host", "message", "@version" ]
}
}
output {
elasticsearch { index => "650" }
stdout { codec => rubydebug }
}
- Sample Data:
"464783b9468bed39b19aff0c98128af4f26c3b972092cb26ede33b28ace57bad","aff4.bc","aff4.bc.org","127.0.0.1","cb:91:bc:28:3b:be","MOBILE DEVICE","TABLET","make","model","DHS","","2000-03-09 02:36:17.154791"
"464783b9468bed39b19aff0c98128af4f26c3b972092cb26ede33b28ace57bad","aff4.bc","aff4.bc.org","127.0.0.1","cb:91:bc:28:3b:be","MOBILE DEVICE","TABLET","make","model","DHS","","2000-03-10 02:36:17.154791"
"464783b9468bed39b19aff0c98128af4f26c3b972092cb26ede33b28ace57bad","aff4.bc","aff4.bc.org","127.0.0.1","cb:91:bc:28:3b:be","MOBILE DEVICE","TABLET","make","model","DHS","","2000-03-11 02:36:17.154791"
- Steps to Reproduce:
- run the pipeline in
7.12
with auto-reload> bin/logstash --config.reload.automatic
- change the
pipeline.workers
from 1 to 2 during ingestion - change the
pipeline.workers
multiple times during ingestion - check data in elasticsearch. You will find duplication of the head of csv, while the tail of csv is missing
Currently the workaround is use tail
mode