Consider avoiding apnotic/net-http2 dependency

Question

Consider avoiding apnotic/net-http2 dependency

Closed this issue a month ago · 28 comments

Hello!

Exciting to see this gem, I was looking for something built into Rails a few months ago and I'm so happy to see this!

I just wanted to drop a suggestion to consider what apnotic is buying you, because FWIW I found it was (in combination with the net-http2 dependency) increasing my app's memory use (in a memory constrained environment on Heroku).

Here's what I ended up with using Async::HTTP in case it's useful to you or others (please excuse the application-specific code!)

require "async/http/internet/instance"

class ApnsToken < ActiveRecord::Base
  TOPIC = "ios-app-name-goes-here"

  validates :env, :token, :topic, presence: true
  validates :token, uniqueness: { scope: [:env, :topic] }
  validates :topic, inclusion: { in: [TOPIC] }

  def test
    push(:alert, { alert: "Hello!", sound: "default" })
  end

  def ping
    push(:background, { "content-available": 1 })
  end

  private

  def push(type, payload)
    host = case env
      when "development" then "https://api.sandbox.push.apple.com"
      when "production"  then "https://api.push.apple.com"
    end

    url = "#{host}/3/device/#{token}"

    headers = {
      "authorization" => "bearer #{jwt}",
      "apns-topic" => topic,
      "apns-push-type" => type,
      "content-type" => "application/json"
    }

    body = JSON.generate({ aps: payload })

    Sync do |task|
      response = nil
      result = nil

      task.with_timeout(2.seconds) do
        response = Async::HTTP::Internet.instance.post(url, headers, body)

        result = case response.status
          when 200
            true
          when 400
            destroy if JSON.parse(response.read).dig("reason") == "BadDeviceToken"
            false
          when 410
            destroy
            false
        else
          false
        end
      end

      result
    ensure
      response&.finish
    end
  end

  def jwt
    @_jwt ||= begin
      payload = {
        iss: Rails.application.credentials.services.apns.team_id,
        iat: Time.now.to_i
      }

      JWT.encode(
        payload,
        OpenSSL::PKey::EC.new(Rails.application.credentials.services.apns.private_key),
        "ES256",
        kid: Rails.application.credentials.services.apns.key_id
      )
    end
  end
end

In any case please feel free to close as not planned etc, but I figured it was worth sending a note.

Thanks!

Answer 1 · 2025-08-25T07:45:01.000Z

Hey @trevorturk! Thanks a lot for this. Dropping apnotic is very much in our plans! We used it to get to something that worked quickly, but we definitely want to get rid of it. This is super useful! 🤩 Huge thanks!

Answer 2 · 2025-09-16T15:07:58.000Z

I’m starting the work to remove the apnotic dependency. The plan is to replace it and use httpx as HTTP/2 client.

I’m picking httpx because:

It relies on the same underlying HTTP/2 library of apnotic, which is the battle-tested Ruby implementation of the HTTP/2 protocol adopted by multiple clients.
Although it’s a relatively large library, it’s still smaller than pulling in the entire async ecosystem via async-http.
async-http brings its Fiber scheduler, which is great for handling many concurrent connections, but right now we process each notification in a separate job, so we wouldn’t benefit.
- With httpx, we can still opt in to the Fiber scheduler. That means if we implement multiplexing later, switching should be straightforward.
It supports keep-alive connections for HTTP/1, which we can use in the FCM service.
It has a comprehensive wiki.

@trevorturk do you remember any details about the app memory issue you ran into with apnotic? I want to make sure the library will also be usable by relatively low-memory clients.

Answer 3 · 2025-09-16T16:27:30.000Z

Howdy, @intrip, thanks for thinking of me! 😉

I'm sorry, but I didn't look into the memory issues much, tbh, it was very clearly because of adding the apnotic gem, and I noticed the HTTP/2 lib was included, but I shouldn't point fingers without having investigated. I moved into triage mode and realized it wouldn't be much work to swap out for async-http since I was already using it throughout my app, already bought into the async ecosystem, etc etc.

I think httpx looks like a good choice, as it seems to have wonderful support and is becoming more broadly adopted. I've been following along as an old coworker has been working through some issues (see HoneyryderChuck/httpx#100) and it's been great to (you can tell the author of the gem cares deeply and is obvious very skilled). I would say it's a bit of a concern that there have been these difficult to track down bugs, but I'm confident things will get solved and it seems like httpx has a bright future.

FWIW I'd be more than happy to test even a rough implementation and report back about memory usage etc if helpful? It's very easy for me to swap things in and out, and I don't mind if things are rough for the particular feature I'm using APNS for.

Apologies again for not having more details in any case!

Answer 4 · 2025-09-17T01:06:23.000Z

@intrip Think we should use httpx in Action Push Web as well? It's currently using net-http-persistent but seems like we could make them consistent.

Answer 5 · 2025-09-17T06:37:59.000Z

I think httpx looks like a good choice, as it seems to have wonderful support and is becoming more broadly adopted. I've been following along as an old coworker has been working through some issues (see HoneyryderChuck/httpx#100) and it's been great to (you can tell the author of the gem cares deeply and is obvious very skilled). I would say it's a bit of a concern that there have been these difficult to track down bugs, but I'm confident things will get solved and it seems like httpx has a bright future.

Right! We'll see how it turns out: We are still in the experimenting phase, time will tell.

FWIW I'd be more than happy to test even a rough implementation and report back about memory usage etc if helpful? It's very easy for me to swap things in and out, and I don't mind if things are rough for the particular feature I'm using APNS for.

That would be great! I'll keep you posted.

Answer 6 · 2025-09-17T06:41:38.000Z

@intrip Think we should use httpx in Action Push Web as well? It's currently using net-http-persistent but seems like we could make them consistent.

I’m still experimenting with httpx, so it’s a bit early to decide, but I’d like the consistency too.

Answer 7 · 2025-09-23T09:44:36.000Z

@trevorturk I just merged #34 to switch away from Apnotic. We are already using it in Basecamp without any performance regression.

Can you see how it performs in your constrained environment?

Answer 8 · 2025-09-23T17:10:41.000Z

Ok, will test it out (today or tomorrow) and report back ASAP!

Answer 9 · 2025-09-23T20:22:11.000Z

@intrip I reviewed the README in detail and I think I see a clear enough migration path (thanks to the custom Device model support) except for a a couple things I'm not sure how to handle. In my iOS/Rails application, I have support for APNS production and development servers in the same config/table, so I can test in Xcode or TestFlight or App Store more easily. I also allow the topic to be set dynamically, so we essentially end up with a valid device token that comes along with the app (topic) and environment in which it was created.

I'm happy to share more code samples, but this message was getting a bit long, so here's the basics in pseudo-code:

class AppDelegate: UNUserNotificationCenterDelegate {
    static var topic: String {
        "example-app-id"
    }

    static var env: String {
        #if DEBUG
            return "development"
        #else
            return "production"
        #endif
    }

    func application(
        _ application: UIApplication,
        didRegisterForRemoteNotificationsWithDeviceToken
        deviceToken: Data
    ) {

        localStore.pushTokenTopic = topic
        localStore.pushTokenEnv = env
        localStore.pushToken = deviceToken

        railsApp.post(...)
    }
}

Then, on the Rails side, we have the following in the db table:

  create_table "apns_tokens" do |t|
    t.string "topic", null: false
    t.string "env", null: false
    t.string "token", null: false
end

...and you can see how I dynamically handle in the code example I posted earlier.

So, I'm sorry I didn't get a chance to test yet, and no worries at all if this all seems out of scope. I thought it might be interesting to see another use case like mine and to consider if we should bake in the flexibility as I'm doing here, but I'm not sure if my use case really squares with the typical Rails app, which wants a device to be associated with a user in a specific environment etc.

I'm happy to chat more about this, or maybe I can figure out how to adapt to your config options? I'm a bit confused about how I would use the "multiple apps" config, and I see the connect_to_development_server option... perhaps there's some way I'm not quite seeing where I could combine these two configs in order to reproduce my existing setup...? (I'm giving this some more thought now and will report back as well... perhaps using the custom device model, I could dynamically switch between "apps" which would have different "connect_to_development_server" configs...)

Answer 10 · 2025-09-24T06:58:47.000Z

@trevorturk The current Device model doesn’t support the concept of “environment” since it’s shared with the google service, where that concept doesn’t exist. However, you should be able to achieve what you need by using multiple applications. You can define shared config under the applications key and then override values per application, like this:

push.yml

shared:
  apple:
    # shared values for all the applications
    application:
      key_id: <%= Rails.application.credentials.dig(:action_push_native, :apns, :calendar, :key_id) %>
      encryption_key: <%= Rails.application.credentials.dig(:action_push_native, :apns, :calendar, :encryption_key) %>

    development:
      topic: development.bundle.identifier
      connect_to_development_server: true

    production:
      topic: production.bundle.identifier

Then create a customizable Push notification class:

  class PushNotification < ApplicationPushNotification
    # allow to override the application at runtime, if needed
    attr_accessor :application       
  end

No data migration is needed, you can keep using your ApnsToken model as explained in the "Custom device model section" like this:

class ApnsToken < ApplicationRecord
  ...

  def push(notification)
    notification.token = token
    notification.application = env
    # Optional: override also the topic at runtime if needed, you still need to set a shared value in the config to avoid a runtime error.
    # notification.apple_data.merge!("apns-topic": topic)
    ActionPushNative.service_for(platform, notification).push(notification)
  rescue ActionPushNative::TokenError => error
    destroy!
  end
end

# deliver the notification
PushNotification.with_apple(...).new(...).deliver_later_to(apns_token)

Does that work for you?

Answer 11 · 2025-09-24T14:48:25.000Z

Yes, that's what I was thinking of, but (separate issue) I wonder if these flexibility concepts might be better modeled explicitly by expanding the number of db columns.

In any case, I should have time to give this a try. Perhaps if all goes well we can consider documenting somehow (but the README is already getting long in the tooth!)

I'll report back ASAP.

Answer 12 · 2025-09-25T01:10:32.000Z

Ok! I'm going to let this run overnight and report back if anything crazy happens, but all looks well so far in my memory-constrained environment 🎉

For reference, my app needed the following changes/setup:

# adding ApplicationRecord, mentioned in issue #45
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
end

# config/push.yml, note `.dump` as mentioned in issue #33 
shared:
  apple:
    application:
      team_id: <%= Rails.application.credentials.services.apns.team_id %>
      key_id: <%= Rails.application.credentials.services.apns.key_id %>
      encryption_key: <%= Rails.application.credentials.services.apns.private_key.dump %>
      topic: <%= ApnsToken::TOPIC %>

    development:
      connect_to_development_server: true

    production:
      connect_to_development_server: false

# adjusting my existing device model as discussed
class ApnsToken < ActiveRecord::Base
  def test
    push ActionPushNative::Notification.new(title: "Hello!", sound: "default")
  end

  def ping
    push ActionPushNative::Notification.silent.new
  end

  private

  def push(notification)
    notification.application = env
    notification.token = token
    ActionPushNative.service_for(:apple, notification).push(notification)
    true
  rescue ActionPushNative::TokenError, ActionPushNative::BadDeviceTopicError
    destroy
    false
  end
end

I do think it's worth considering adding the topic and/or env in the db, but I'll resist opening an issue about that since we were able to figure out a way to support my existing use-case and schema (thank you!) -- it might be worth documenting, but I'll leave that for your consideration.

Separately, I did notice that at least 10 new gems were added to my Gemfile (all to do with googleauth) so I'm curious if there are plans to try to reduce the dependencies there as well. Those dependencies include faraday, net-http, multi_json , etc so I'm guessing this may already be on your list...? 😬

Feel free to close this and the other issues I created without much comment, but do let me know if you'd like additional feedback or if I can be helpful in any way! Thanks again!

Answer 13 · 2025-09-25T07:46:25.000Z

Ok! I'm going to let this run overnight and report back if anything crazy happens, but all looks well so far in my memory-constrained environment 🎉

Thanks a lot @trevorturk! I really appreciate your support!

it might be worth documenting, but I'll leave that for your consideration.

What do you think we should add to the custom device model section to make it more accurate? Feel free to make a PR if you like.

Separately, I did notice that at least 10 new gems were added to my Gemfile (all to do with googleauth) so I'm curious if there are plans to try to reduce the dependencies there as well. Those dependencies include faraday, net-http, multi_json , etc so I'm guessing this may already be on your list...? 😬

I just removed the "net-http" dependency but the others you mentioned come from googleauth, which is required for authenticating with Google FCM, so we’ll need to keep those for now.

Answer 14 · 2025-09-25T08:44:59.000Z

@trevorturk side note: I just released v0.2.0 with all the latest improvements for both Apple and Google integrations.

Answer 15 · 2025-09-25T10:46:53.000Z

@intrip Think we should use httpx in Action Push Web as well? It's currently using net-http-persistent but seems like we could make them consistent.

@excid3 FYI: We're now relying on HTTPX for both APNs and FCM. Tested in Basecamp and HEY, and It's working well.

Example usage

Answer 16 · 2025-09-25T13:55:12.000Z

Excellent, thanks @intrip! I will switch to that for Web. 👍

Answer 17 · 2025-09-25T14:43:08.000Z

I'll have a look at the README and see if it might make sense to document how to have dev/prod support in the same table somehow. It's probably a niche thing, so I don't know if it's worth the space in the README, but I found it nice to not have to worry about testing against different servers in dev/prod (Xcode / App Store).

Sounds good re: googleauth but I'd say it might be worth a spelunk someday (waves hands) to try to simplify as you just did with apnotic. I'm always a bit wary of Gemfile creep if you know what I mean...

Thx for the v0.2.0 release!

Also an update after running for ~12hrs, HTTPX looks absolutely great in terms of memory use, and it seems very well supported, becoming more widely adopted, so I'm +1 for considering that as a new best-first-choice.

Answer 18 · 2025-10-04T09:15:38.000Z

Just a small heads up, @trevorturk, @excid3: we ran into a small problem running HTTPX together with Solid Queue in HEY. We were running the jobs in a worker with 3 threads, and this was causing significant CPU spikes in the servers where we were running this. We were also getting frequent HTTPX::PoolTimeoutError, even though the pool had a size of 10 and we were only running 3 threads per process (so there should have been enough connections for the 3 threads always, without having to wait).

Then, we noticed that some jobs running in that same worker (other jobs, not push notification jobs) would somehow end up with all queries taking between 400ms to 800ms longer than they should, as if all queries (to different DBs) had a fixed overhead of that time. We could only find the CPU spikes that correlated with that, and all started when we switched to HTTPX.

Yesterday, we moved the jobs to run in a Solid Queue worker with a single thread, and everything has been working fine since. I'm not sure yet if this is an HTTPX problem, a Solid Queue problem or a combination of the two. I need to investigate more, but I wanted to mention it.

Answer 19 · 2025-10-04T15:22:12.000Z

That's interesting, thank you for sharing! FWIW on my end, I found job times decreased when I switched over to this gem, but I'm using Sidekiq. I'm planning to switch to Solid everything including Solid Queue, so please do let me know if it might be helpful if I did some testing. TBH this seems like a tricky one, though, so I'm not sure how much I can do, but I'm here to help if possible!

Also, apologies for littering this issue, but I'm now remembering my Solid-everything work was blocked by rails/solid_cache#275 if you happen to know the author... 😉

Answer 20 · 2025-10-04T17:30:19.000Z

Ohhh, @trevorturk, with Sidekiq, how are you running these jobs? Are you using multiple threads? How many? And are you setting any value for connection_pool_size? It'd be great to know these details to compare with our setup. In Basecamp we're using this with Resque so we don't have an option to run these in multiple threads to compare.

About the Solid Cache issue, I'll try to move it! 🙌

Answer 21 · 2025-10-05T21:47:31.000Z

I'm not sure how much detail you want exactly, so feel free to ping me back. I'll try to over-share anyway to start....

I'm not setting any connection_pool_size that I can see. I'm running a single Sidekiq instance on Heroku with the default concurrency: 5. (Maybe that's the pool size?) I'm using default configs generally, I think.

I have 1 recurring job: ApnsTokenPingEnqueueJob which enqueues the actual ping jobs:

class ApnsTokenPingEnqueueJob < ApplicationJob
  def perform
    ApnsToken.find_each do |token|
      fanout = rand(5.minutes) + 1.second
      ApnsTokenPingJob.set(wait: fanout.seconds).perform_later(token)
    end
  end
en

class ApnsTokenPingJob < ApplicationJob
  def perform(token)
    token.ping
  end
end

class ApnsToken < ActiveRecord::Base
  def ping
    push ActionPushNative::Notification.silent.new
  end

  private

  def push(notification)
    notification.application = env
    notification.token = token
    ActionPushNative.service_for(:apple, notification).push(notification)
    true
  rescue ActionPushNative::TokenError, ActionPushNative::BadDeviceTopicError
    destroy
    false
  end
end

# Procfile
web: bundle exec falcon host
worker: bundle exec sidekiq

# config/initializers/sidekiq.rb
SIDEKIQ_REDIS_CONFIGURATION = {
  url: ENV["REDIS_SIDEKIQ_URL"],
  ssl_params: Rails.configuration.redis_ssl_params,
}

Sidekiq.configure_server do |config|
  config.redis = SIDEKIQ_REDIS_CONFIGURATION
  config.logger.level = Logger::WARN
end

Sidekiq.configure_client do |config|
  config.redis = SIDEKIQ_REDIS_CONFIGURATION
  config.logger.level = Logger::WARN
end

Answer 22 · 2025-10-06T06:09:11.000Z

This is super helpful, thank you! 🙏

I'm not setting any connection_pool_size that I can see. I'm running a single Sidekiq instance on Heroku with the default concurrency: 5. (Maybe that's the pool size?) I'm using default configs generally, I think.

Ahh, then the pool size you're using is also 5, it's set here.

With a setup similar to what you have with Sidekiq, we got frequent HTTPX::PoolTimeoutError errors... I wonder if volume makes a difference 🤔 Do you run other jobs together with the ApnsTokenPingJob, or do you have a dedicated Sidekiq worker for them? I imagine not, looking at your configuration, but asking just in case.

Answer 23 · 2025-10-06T07:15:30.000Z

Damn I'm so sorry that we are still having issues @rosa!

I wonder if volume makes a difference 🤔

Agreed, it’s probably related to higher volume.

I wonder if enabling verbose logging HTTPX_DEBUG=1 HTTPX_DEBUG_REDACT=1 could reveal more details to help with troubleshooting.

Answer 24 · 2025-10-06T14:53:26.000Z

Yeah, we have a ton of very small jobs emitting stats to Datadog etc, so we're high volume generally, but very low volume in terms of pings. I'm seeing connection timeouts and too many request errors, but we let them retry and we're basically stable. I agree it's probably related to your higher volume. Maybe there's some way to rotate or flush the pool after some amount of time being connected? Sorry I can't be of much help, but please do keep us apprised, I'd love to follow along!

Answer 25 · 2025-10-14T15:04:35.000Z

Thanks a lot, @trevorturk! 🙏

I'm seeing connection timeouts and too many request errors

Are these HTTPX::PoolTimeoutError errors? Or another kind of error?

Answer 26 · 2025-10-14T16:31:35.000Z

Probably not super helpful, but this is the 14 day view of my APNS related errors in Sentry:

...just thinking it could be that some intermittent errors are typical, and IIRC the timeouts you see there happened around there same time we were last active in this issue.

Answer 27 · 2025-10-15T15:18:33.000Z

Oh, no, that is super helpful. So, no HTTPX::PoolTimeoutError or ActionPushNative::ConnectionPoolTimeoutError. I think the problem must be with Solid Queue + httpx together... I'll figure it out eventually, possibly by enabling more verbose logs as @intrip suggested. Thank you!

Answer 28 · 2025-10-16T07:39:45.000Z

...just thinking it could be that some intermittent errors are typical, and IIRC the timeouts you see there happened around there same time we were last active in this issue.

@trevorturk I confirm that these intermittent errors are expected. The library does not report them by default unless explicitly configured to do so, and it automatically handles retries. However, since you are using a custom job that does not inherit from NotificationJob, you lose this behavior.

@rosa I hope that the verbose logging helps 🤞.