Most logs are lost
collimarco opened this issue · 5 comments
I have a serious problem: most application logs seems to be lost with no apparent reason.
Today I have finally decided to investigate the issue and it seems a very large issue. I have tried to log in to my production server, then I have logged in to the Rails console and I have sent some logs manually. The result is that most logs (not all) are lost. Please see the attached screenshots.
The production server is hosted on DigitalOcean and when I use ping listener.logz.io I get 0% packet loss.
Can you help?
The Rails configuration for logging is:
# Logs
config.lograge.enabled = true
config.lograge.formatter = Lograge::Formatters::Logstash.new
config.lograge.ignore_actions = ['PagesController#health']
config.logstash.type = :udp
config.logstash.host = 'listener.logz.io'
config.logstash.port = 5050
LogStashLogger.configure do |config|
config.max_message_size = 2000 # truncate the message
config.customize_event do |event|
event["token"] = Rails.application.secrets.logzio_api_key
end
end
I wouldn't recommend using UDP if reliable delivery is a concern. Does logz.io support other transport mechanisms?
Yes, it also supports TCP, however switching from UDP to TCP may have an impact on the application performance. I don't need 100% reliability, but it seems strange that about 70% of the logs from different servers are lost. Probably the problem has started 1 month ago because I remember that I have seen a decrement in the daily total size of the logs.
Log messages are buffered and flushed in a background thread, so I doubt you'll see an impact on performance.
IMHO this is an infrastructure problem (either Digital Ocean or logz.io) and not a LogStashLogger problem. Switching to TCP as an experiment could give more insight into this. If a TCP message fails to be delivered, there should be an exception you can inspect. Whereas UDP will sometimes just silently drop things.
Switching to TCP solved the problem.
In any case I think that something somewhere in the network is configured really badly! I can accept a small packet loss, but not 70% of logs being lost (for weeks).