Consider avoiding apnotic/net-http2 dependency
Closed this issue · 28 comments
Hello!
Exciting to see this gem, I was looking for something built into Rails a few months ago and I'm so happy to see this!
I just wanted to drop a suggestion to consider what apnotic is buying you, because FWIW I found it was (in combination with the net-http2 dependency) increasing my app's memory use (in a memory constrained environment on Heroku).
Here's what I ended up with using Async::HTTP in case it's useful to you or others (please excuse the application-specific code!)
require "async/http/internet/instance"
class ApnsToken < ActiveRecord::Base
TOPIC = "ios-app-name-goes-here"
validates :env, :token, :topic, presence: true
validates :token, uniqueness: { scope: [:env, :topic] }
validates :topic, inclusion: { in: [TOPIC] }
def test
push(:alert, { alert: "Hello!", sound: "default" })
end
def ping
push(:background, { "content-available": 1 })
end
private
def push(type, payload)
host = case env
when "development" then "https://api.sandbox.push.apple.com"
when "production" then "https://api.push.apple.com"
end
url = "#{host}/3/device/#{token}"
headers = {
"authorization" => "bearer #{jwt}",
"apns-topic" => topic,
"apns-push-type" => type,
"content-type" => "application/json"
}
body = JSON.generate({ aps: payload })
Sync do |task|
response = nil
result = nil
task.with_timeout(2.seconds) do
response = Async::HTTP::Internet.instance.post(url, headers, body)
result = case response.status
when 200
true
when 400
destroy if JSON.parse(response.read).dig("reason") == "BadDeviceToken"
false
when 410
destroy
false
else
false
end
end
result
ensure
response&.finish
end
end
def jwt
@_jwt ||= begin
payload = {
iss: Rails.application.credentials.services.apns.team_id,
iat: Time.now.to_i
}
JWT.encode(
payload,
OpenSSL::PKey::EC.new(Rails.application.credentials.services.apns.private_key),
"ES256",
kid: Rails.application.credentials.services.apns.key_id
)
end
end
endIn any case please feel free to close as not planned etc, but I figured it was worth sending a note.
Thanks!
Hey @trevorturk! Thanks a lot for this. Dropping apnotic is very much in our plans! We used it to get to something that worked quickly, but we definitely want to get rid of it. This is super useful! 🤩 Huge thanks!
I’m starting the work to remove the apnotic dependency. The plan is to replace it and use httpx as HTTP/2 client.
I’m picking httpx because:
- It relies on the same underlying HTTP/2 library of apnotic, which is the battle-tested Ruby implementation of the HTTP/2 protocol adopted by multiple clients.
- Although it’s a relatively large library, it’s still smaller than pulling in the entire async ecosystem via
async-http. async-httpbrings its Fiber scheduler, which is great for handling many concurrent connections, but right now we process each notification in a separate job, so we wouldn’t benefit.- With
httpx, we can still opt in to the Fiber scheduler. That means if we implement multiplexing later, switching should be straightforward.
- With
- It supports keep-alive connections for HTTP/1, which we can use in the FCM service.
- It has a comprehensive wiki.
@trevorturk do you remember any details about the app memory issue you ran into with apnotic? I want to make sure the library will also be usable by relatively low-memory clients.
Howdy, @intrip, thanks for thinking of me! 😉
I'm sorry, but I didn't look into the memory issues much, tbh, it was very clearly because of adding the apnotic gem, and I noticed the HTTP/2 lib was included, but I shouldn't point fingers without having investigated. I moved into triage mode and realized it wouldn't be much work to swap out for async-http since I was already using it throughout my app, already bought into the async ecosystem, etc etc.
I think httpx looks like a good choice, as it seems to have wonderful support and is becoming more broadly adopted. I've been following along as an old coworker has been working through some issues (see HoneyryderChuck/httpx#100) and it's been great to (you can tell the author of the gem cares deeply and is obvious very skilled). I would say it's a bit of a concern that there have been these difficult to track down bugs, but I'm confident things will get solved and it seems like httpx has a bright future.
FWIW I'd be more than happy to test even a rough implementation and report back about memory usage etc if helpful? It's very easy for me to swap things in and out, and I don't mind if things are rough for the particular feature I'm using APNS for.
Apologies again for not having more details in any case!
@intrip Think we should use httpx in Action Push Web as well? It's currently using net-http-persistent but seems like we could make them consistent.
I think httpx looks like a good choice, as it seems to have wonderful support and is becoming more broadly adopted. I've been following along as an old coworker has been working through some issues (see HoneyryderChuck/httpx#100) and it's been great to (you can tell the author of the gem cares deeply and is obvious very skilled). I would say it's a bit of a concern that there have been these difficult to track down bugs, but I'm confident things will get solved and it seems like httpx has a bright future.
Right! We'll see how it turns out: We are still in the experimenting phase, time will tell.
FWIW I'd be more than happy to test even a rough implementation and report back about memory usage etc if helpful? It's very easy for me to swap things in and out, and I don't mind if things are rough for the particular feature I'm using APNS for.
That would be great! I'll keep you posted.
@intrip Think we should use httpx in Action Push Web as well? It's currently using net-http-persistent but seems like we could make them consistent.
I’m still experimenting with httpx, so it’s a bit early to decide, but I’d like the consistency too.
@trevorturk I just merged #34 to switch away from Apnotic. We are already using it in Basecamp without any performance regression.
Can you see how it performs in your constrained environment?
Ok, will test it out (today or tomorrow) and report back ASAP!
@intrip I reviewed the README in detail and I think I see a clear enough migration path (thanks to the custom Device model support) except for a a couple things I'm not sure how to handle. In my iOS/Rails application, I have support for APNS production and development servers in the same config/table, so I can test in Xcode or TestFlight or App Store more easily. I also allow the topic to be set dynamically, so we essentially end up with a valid device token that comes along with the app (topic) and environment in which it was created.
I'm happy to share more code samples, but this message was getting a bit long, so here's the basics in pseudo-code:
class AppDelegate: UNUserNotificationCenterDelegate {
static var topic: String {
"example-app-id"
}
static var env: String {
#if DEBUG
return "development"
#else
return "production"
#endif
}
func application(
_ application: UIApplication,
didRegisterForRemoteNotificationsWithDeviceToken
deviceToken: Data
) {
localStore.pushTokenTopic = topic
localStore.pushTokenEnv = env
localStore.pushToken = deviceToken
railsApp.post(...)
}
}Then, on the Rails side, we have the following in the db table:
create_table "apns_tokens" do |t|
t.string "topic", null: false
t.string "env", null: false
t.string "token", null: false
end...and you can see how I dynamically handle in the code example I posted earlier.
So, I'm sorry I didn't get a chance to test yet, and no worries at all if this all seems out of scope. I thought it might be interesting to see another use case like mine and to consider if we should bake in the flexibility as I'm doing here, but I'm not sure if my use case really squares with the typical Rails app, which wants a device to be associated with a user in a specific environment etc.
I'm happy to chat more about this, or maybe I can figure out how to adapt to your config options? I'm a bit confused about how I would use the "multiple apps" config, and I see the connect_to_development_server option... perhaps there's some way I'm not quite seeing where I could combine these two configs in order to reproduce my existing setup...? (I'm giving this some more thought now and will report back as well... perhaps using the custom device model, I could dynamically switch between "apps" which would have different "connect_to_development_server" configs...)
@trevorturk The current Device model doesn’t support the concept of “environment” since it’s shared with the google service, where that concept doesn’t exist. However, you should be able to achieve what you need by using multiple applications. You can define shared config under the applications key and then override values per application, like this:
push.yml
shared:
apple:
# shared values for all the applications
application:
key_id: <%= Rails.application.credentials.dig(:action_push_native, :apns, :calendar, :key_id) %>
encryption_key: <%= Rails.application.credentials.dig(:action_push_native, :apns, :calendar, :encryption_key) %>
development:
topic: development.bundle.identifier
connect_to_development_server: true
production:
topic: production.bundle.identifierThen create a customizable Push notification class:
class PushNotification < ApplicationPushNotification
# allow to override the application at runtime, if needed
attr_accessor :application
endNo data migration is needed, you can keep using your ApnsToken model as explained in the "Custom device model section" like this:
class ApnsToken < ApplicationRecord
...
def push(notification)
notification.token = token
notification.application = env
# Optional: override also the topic at runtime if needed, you still need to set a shared value in the config to avoid a runtime error.
# notification.apple_data.merge!("apns-topic": topic)
ActionPushNative.service_for(platform, notification).push(notification)
rescue ActionPushNative::TokenError => error
destroy!
end
end
# deliver the notification
PushNotification.with_apple(...).new(...).deliver_later_to(apns_token)Does that work for you?
Yes, that's what I was thinking of, but (separate issue) I wonder if these flexibility concepts might be better modeled explicitly by expanding the number of db columns.
In any case, I should have time to give this a try. Perhaps if all goes well we can consider documenting somehow (but the README is already getting long in the tooth!)
I'll report back ASAP.
Ok! I'm going to let this run overnight and report back if anything crazy happens, but all looks well so far in my memory-constrained environment 🎉
For reference, my app needed the following changes/setup:
# adding ApplicationRecord, mentioned in issue #45
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
end
# config/push.yml, note `.dump` as mentioned in issue #33
shared:
apple:
application:
team_id: <%= Rails.application.credentials.services.apns.team_id %>
key_id: <%= Rails.application.credentials.services.apns.key_id %>
encryption_key: <%= Rails.application.credentials.services.apns.private_key.dump %>
topic: <%= ApnsToken::TOPIC %>
development:
connect_to_development_server: true
production:
connect_to_development_server: false
# adjusting my existing device model as discussed
class ApnsToken < ActiveRecord::Base
def test
push ActionPushNative::Notification.new(title: "Hello!", sound: "default")
end
def ping
push ActionPushNative::Notification.silent.new
end
private
def push(notification)
notification.application = env
notification.token = token
ActionPushNative.service_for(:apple, notification).push(notification)
true
rescue ActionPushNative::TokenError, ActionPushNative::BadDeviceTopicError
destroy
false
end
endI do think it's worth considering adding the topic and/or env in the db, but I'll resist opening an issue about that since we were able to figure out a way to support my existing use-case and schema (thank you!) -- it might be worth documenting, but I'll leave that for your consideration.
Separately, I did notice that at least 10 new gems were added to my Gemfile (all to do with googleauth) so I'm curious if there are plans to try to reduce the dependencies there as well. Those dependencies include faraday, net-http, multi_json , etc so I'm guessing this may already be on your list...? 😬
Feel free to close this and the other issues I created without much comment, but do let me know if you'd like additional feedback or if I can be helpful in any way! Thanks again!
Ok! I'm going to let this run overnight and report back if anything crazy happens, but all looks well so far in my memory-constrained environment 🎉
Thanks a lot @trevorturk! I really appreciate your support!
it might be worth documenting, but I'll leave that for your consideration.
What do you think we should add to the custom device model section to make it more accurate? Feel free to make a PR if you like.
Separately, I did notice that at least 10 new gems were added to my Gemfile (all to do with googleauth) so I'm curious if there are plans to try to reduce the dependencies there as well. Those dependencies include faraday, net-http, multi_json , etc so I'm guessing this may already be on your list...? 😬
I just removed the "net-http" dependency but the others you mentioned come from googleauth, which is required for authenticating with Google FCM, so we’ll need to keep those for now.
@trevorturk side note: I just released v0.2.0 with all the latest improvements for both Apple and Google integrations.
I'll have a look at the README and see if it might make sense to document how to have dev/prod support in the same table somehow. It's probably a niche thing, so I don't know if it's worth the space in the README, but I found it nice to not have to worry about testing against different servers in dev/prod (Xcode / App Store).
Sounds good re: googleauth but I'd say it might be worth a spelunk someday (waves hands) to try to simplify as you just did with apnotic. I'm always a bit wary of Gemfile creep if you know what I mean...
Thx for the v0.2.0 release!
Also an update after running for ~12hrs, HTTPX looks absolutely great in terms of memory use, and it seems very well supported, becoming more widely adopted, so I'm +1 for considering that as a new best-first-choice.
Just a small heads up, @trevorturk, @excid3: we ran into a small problem running HTTPX together with Solid Queue in HEY. We were running the jobs in a worker with 3 threads, and this was causing significant CPU spikes in the servers where we were running this. We were also getting frequent HTTPX::PoolTimeoutError, even though the pool had a size of 10 and we were only running 3 threads per process (so there should have been enough connections for the 3 threads always, without having to wait).
Then, we noticed that some jobs running in that same worker (other jobs, not push notification jobs) would somehow end up with all queries taking between 400ms to 800ms longer than they should, as if all queries (to different DBs) had a fixed overhead of that time. We could only find the CPU spikes that correlated with that, and all started when we switched to HTTPX.
Yesterday, we moved the jobs to run in a Solid Queue worker with a single thread, and everything has been working fine since. I'm not sure yet if this is an HTTPX problem, a Solid Queue problem or a combination of the two. I need to investigate more, but I wanted to mention it.
That's interesting, thank you for sharing! FWIW on my end, I found job times decreased when I switched over to this gem, but I'm using Sidekiq. I'm planning to switch to Solid everything including Solid Queue, so please do let me know if it might be helpful if I did some testing. TBH this seems like a tricky one, though, so I'm not sure how much I can do, but I'm here to help if possible!
Also, apologies for littering this issue, but I'm now remembering my Solid-everything work was blocked by rails/solid_cache#275 if you happen to know the author... 😉
Ohhh, @trevorturk, with Sidekiq, how are you running these jobs? Are you using multiple threads? How many? And are you setting any value for connection_pool_size? It'd be great to know these details to compare with our setup. In Basecamp we're using this with Resque so we don't have an option to run these in multiple threads to compare.
About the Solid Cache issue, I'll try to move it! 🙌
I'm not sure how much detail you want exactly, so feel free to ping me back. I'll try to over-share anyway to start....
I'm not setting any connection_pool_size that I can see. I'm running a single Sidekiq instance on Heroku with the default concurrency: 5. (Maybe that's the pool size?) I'm using default configs generally, I think.
I have 1 recurring job: ApnsTokenPingEnqueueJob which enqueues the actual ping jobs:
class ApnsTokenPingEnqueueJob < ApplicationJob
def perform
ApnsToken.find_each do |token|
fanout = rand(5.minutes) + 1.second
ApnsTokenPingJob.set(wait: fanout.seconds).perform_later(token)
end
end
en
class ApnsTokenPingJob < ApplicationJob
def perform(token)
token.ping
end
end
class ApnsToken < ActiveRecord::Base
def ping
push ActionPushNative::Notification.silent.new
end
private
def push(notification)
notification.application = env
notification.token = token
ActionPushNative.service_for(:apple, notification).push(notification)
true
rescue ActionPushNative::TokenError, ActionPushNative::BadDeviceTopicError
destroy
false
end
end
# Procfile
web: bundle exec falcon host
worker: bundle exec sidekiq
# config/initializers/sidekiq.rb
SIDEKIQ_REDIS_CONFIGURATION = {
url: ENV["REDIS_SIDEKIQ_URL"],
ssl_params: Rails.configuration.redis_ssl_params,
}
Sidekiq.configure_server do |config|
config.redis = SIDEKIQ_REDIS_CONFIGURATION
config.logger.level = Logger::WARN
end
Sidekiq.configure_client do |config|
config.redis = SIDEKIQ_REDIS_CONFIGURATION
config.logger.level = Logger::WARN
endThis is super helpful, thank you! 🙏
I'm not setting any connection_pool_size that I can see. I'm running a single Sidekiq instance on Heroku with the default concurrency: 5. (Maybe that's the pool size?) I'm using default configs generally, I think.
Ahh, then the pool size you're using is also 5, it's set here.
With a setup similar to what you have with Sidekiq, we got frequent HTTPX::PoolTimeoutError errors... I wonder if volume makes a difference 🤔 Do you run other jobs together with the ApnsTokenPingJob, or do you have a dedicated Sidekiq worker for them? I imagine not, looking at your configuration, but asking just in case.
Damn I'm so sorry that we are still having issues @rosa!
I wonder if volume makes a difference 🤔
Agreed, it’s probably related to higher volume.
I wonder if enabling verbose logging HTTPX_DEBUG=1 HTTPX_DEBUG_REDACT=1 could reveal more details to help with troubleshooting.
Yeah, we have a ton of very small jobs emitting stats to Datadog etc, so we're high volume generally, but very low volume in terms of pings. I'm seeing connection timeouts and too many request errors, but we let them retry and we're basically stable. I agree it's probably related to your higher volume. Maybe there's some way to rotate or flush the pool after some amount of time being connected? Sorry I can't be of much help, but please do keep us apprised, I'd love to follow along!
Thanks a lot, @trevorturk! 🙏
I'm seeing connection timeouts and too many request errors
Are these HTTPX::PoolTimeoutError errors? Or another kind of error?
Oh, no, that is super helpful. So, no HTTPX::PoolTimeoutError or ActionPushNative::ConnectionPoolTimeoutError. I think the problem must be with Solid Queue + httpx together... I'll figure it out eventually, possibly by enabling more verbose logs as @intrip suggested. Thank you!
...just thinking it could be that some intermittent errors are typical, and IIRC the timeouts you see there happened around there same time we were last active in this issue.
@trevorturk I confirm that these intermittent errors are expected. The library does not report them by default unless explicitly configured to do so, and it automatically handles retries. However, since you are using a custom job that does not inherit from NotificationJob, you lose this behavior.
@rosa I hope that the verbose logging helps 🤞.
