How Much Does it Cost? ๐ธ ๐ฌ
nelsonic opened this issue ยท 29 comments
As noted by @LuchoTurtle in #97 (comment) ๐ฌ
This "hobby" app is costing us considerably more money than we originally expected. ๐
The most recent invoice on Fly.io was $48.61
for Mar 1 - Apr 1, 2024 https://fly.io/dashboard/dwyl-img-class/billing
The current month (April 2024) Amount Due is already $14.34
and we're only on the 4th!!
If we extrapolate the total will be 7.5 x ($14.34 - $5) + $5 = $75
๐ธ ๐ฅ
This is already more than we spend on our Internet & Phone bill ... ๐คฏ
If the cost could be kept to $10/month it would be fine. ๐
Todo
- @LuchoTurtle please have a think about what can be done to reduce or cap the cost. โ๏ธ
I'm keen to keep this app available for people to test without having to run it on localhost
. ๐ป
But if the casual visitor is costing us this kind of cash, imagine if this got to the top of HN
! ๐ฌ
Not sure to understand these costs: Fly.io pricing).
-
19760s = 5h29min20s => is this the uptime?
-
Every time you start the app, you need to upload 2G of data of the models (volume is pruned with the VM), see Fly.io docs). This means you recreate a volume of 2G and load 2G into memory. Is this the meaning of the 2 highlighted lines?
@ndrean the models are not pruned with the VM every time the app is started (it was a misconception that was fixed in #82). Currently, the models are being saved in the volume and they are not re-downloaded every time it restarts. You can see it in the logs, actually:
2024-04-05T05:04:33.414 app[080e325c904168] mad [info] 05:04:33.414 [info] โน๏ธ No download needed: Salesforce/blip-image-captioning-base
2024-04-05T05:04:33.415 app[080e325c904168] mad [info] 05:04:33.414 [info] โน๏ธ No download needed: openai/whisper-small
2024-04-05T05:04:33.415 app[080e325c904168] mad [info] 05:04:33.415 [info] โน๏ธ No download needed: sentence-transformers/paraphrase-MiniLM-L6-v2
The problem is that models take up a fair amount of space (Salesforce/blip-image-captioning-base
, especially). So we're basically paying for every additional space the models take over the free tier.
However, the cost is definitely bigger with the RAM usage. There's no way around this, as it's used to run inference on the images being uploaded. Although GPUs are much better at this, the costs are severely higher.
@nelsonic unfortunately there's no way around this. There's been people using the application, which is great. But, as with any LLM/ML-based application, it's hard to make it in any free-tier cloud solution without putting the money in.
The app has already been optimized to reduce costs (inbound/outbound data reduced with persistent storages, reducing file limit of images by optimizing it before feeding into the model, reducing Hz of audio file on the client-side before feeding it into the model).
Unfortunately, we have to pull the plug. With increased activity, this "hobbyist" project sucks money (even more so when they're stopped, as shown in your image).
I'd love to have this online for any person to check it out. But it's not feasible :(
I'm going to shut down the machine now.
I'll keep the database, though. It has the images and index files saved. So we can still have the uploaded data and have the app running normally by just spawning a new machine whenever we want to. The machine will look for the index file in the database (since it doesn't have any on its own filesystem), download the models and the index file and it will resume where it stopped gracefully :)
@ndrean the documentation are correct. The application's filesystem is wiped whenever they restart. That's why they offer volumes to place persistent data (data we want to keep in-between restarts). Currently, the models are inside one of these volumes, hence why they don't need to be downloaded.
What #82 did was fix the path of the volume inside Fly.io, which was previously incorrect.
I've deleted the machine. We can spawn a new one whenever we want to.
I'm keeping this issue open for other people to see it as a reference of how much it costs to run this on fly.io
(without GPU!)
I don't think they differentiate it on the billing page, unfortunately.
According to https://fly.io/docs/about/billing/#machine-billing:
Started Machines are billed per second that theyโre running (the time they spend in the started state), based on the price of a named CPU/RAM combination, plus the price of any additional RAM you specify.
For example, a Machine described in your dashboard as โshared-1x-cpu@1024MBโ is the โshared-cpu-1xโ Machine size preset, which comes with 256MB RAM, plus additional RAM (1024MB - 256MB = 768MB). For pricing and available CPU/RAM combinations, see Compute pricing.
So they bill based on the preset per second it is used + any additional RAM we specify. Because the machine wasn't always being used, we didn't pay the 124 dollars that you showed in the picture.
@LuchoTurtle didn't want you to DELETE
the machine ... ๐
Just wanted it to run more efficiently ... ๐ญ
But if that is going to take too much time, fair enough. ๐
@LuchoTurtle quick question: (though probably a rabbit holeโฆ)
- Do you think we could run the image Classifier App on a Server (custom build on-prem machine) @home connected to our main App via Back Channel such that:
A. Person uploads the image to AWS S3
B. This triggers a request to the AI BOX to classify it
C. AI BOX classifies the image and returns its guess
- How much would it cost to build a machine that can do basic inference?
This would mean that our only marginal cost would be electricity and no surprise bills when it gets to the top of HN.
asking cause if we could put together a decent machine for ~โฌ600
including an NVIDIA GeForce RTX 4060 EAGLE with 8 GB GDDR6:
https://amzn.eu/d/5fr9N5J
This could serve our needs quite well and we could run other models on it without ever having to worry about boot times etc.
Thoughts? ๐ญ
Though we probably have to spend a decent chunk on the GPU ...
https://www.reddit.com/r/MachineLearning/comments/17x8kup/d_best_value_gpu_for_running_ai_models/
We will use it for a few tasks so I think it's worth investigating. ๐ญ
Currently, this project targets the CPU (by default, since running on GPU entails having specific drivers according to the hardware).
To run on GPUs, I think we only need to change a few env variables (https://github.com/elixir-nx/xla#usage) but further testing may be necessary.
Regarding which GPU to choose, I can't really provide an informed decision. I know vRAM is quite important.
Of course, I'm not expecting you to get a H100, that's wayyy too overkill. But it seems that the 3090 seems like a good compromise and a performance-to-cost ratio.
I'd hold on purchasing anything yet, though. It needs to be confirmed that inference can be run on the GPU with Bumblebee before making any purchases that can be rather costly :)
Ok. Thanks for your reply. Seems like this will require some further thought. What do we need to do next? ๐ญ
I'd need to check running locally on the GPU to see if it works. Since I've a 1080, it's CUDA, so it should work with a 3090, theoretically. I just need to know it actually uses the GPU first :)
https://blog.themvp.in/hardware-requirements-for-machine-learning/
Used - like new - the 3090 with 24GB VRAM
costs ~ยฃ650
: https://www.ebay.co.uk/itm/176345660501
This is certainly more than we were spending on Fly.io but if it means we can do more with Machine Learning with a baseline load I think it's worth it. ๐ญ
My 2รง input.
If Whisper (Speech-to-Text) is the sink or bottleneck, can a cloud service be considered?
https://elevenlabs.io/docs/introduction seems to offer WS connection to stream down the response.
Did not see figures on pricing.
@ndrean your insight is always welcome. โค๏ธ
Yeah, the speech part shouldnโt be the bottleneck, ๐ค
and importantly the purpose of building our own project
Instead of using an API (or Google Lens) for image classification was to not send personal data to a 3rd party. ๐ญ
We want to be 100% certain that an image we classify is not being used for any other purpose. Same goes for voice recordings. Ref: dwyl/video#91
While I might be OK with making a recording of my voice public, I know people who wouldnโt do it because they are way more privacy conscious.
Fair point. If you open your machine and offer such a service, how do you guarantee the user's privacy? I mean, you store images on S3 - publicly available - and run a local database.
What is your architecture? The HTTPS termination would be a reverse proxy, so that any app served by your machine is routed to as a sub domain?
Also, is a simple declaration of intention enough? Something like "we don't store your data nor transmit them to any external service of any kind"?
It really depends on if we want to make the service public or just for people using our App. For people using our App they know we arenโt using their images for โtrainingโ and also wonโt leak them. But we donโt have advanced access controls on images yet beyond restricting access to just the person that uploaded them.
Ideally once we have the โgroupsโ feature, it will be easy to restrict access.
But if we were running the service as a general purpose privacy-first classifier, weโd just store the images in /temp and then delete them after classifying. ๐ญ
If you use the app as such, all images are saved altogether in your public bucket, and the corresponding URL is saved in a database, meaning a semantic search can deliver any image approximating your query, yours or not.
A simple login and the addition of the user_id in the image schema can overcome images from becoming publicly available, at least through the semantic search. But when you receive a response, you receive an URL to display the image. Doesn't the URL display the bucket origin name? Since the bucket is public, can't you exploit this?
But if we were running the service as a general purpose privacy-first classifier, weโd just store the images in /temp and then delete them after classifying.
Instead of an S3 URL, you use a path on the Filesystem. If you erase the uploaded paths after a search, you can run a search only once.
Predictably, someone has already setup a biz around renting GPU time: https://www.gpudeploy.com/connect
Via: https://news.ycombinator.com/item?id=40260259
@LuchoTurtle If you're not applying for "AI Jobs" and needing to showcase this classifier App,
please scale it down and ensure that we do not get another surprise bill like this. ๐ข
Why on earth are we using this many resources for a non-production DB?! ๐ฎ
https://fly.io/dashboard/dwyl-img-class
@nelsonic I did not do any modifications to the app (you can check the activity logs on fly.io
) except merging dependencies.
Last time I interacted with the CI to deploy to fly.io
was to comment it and deleted all instances of any app running, leaving only the database to hold data for historical purposes.
Checking the logs of the db
machine show no activity spike that would justify the price increase in the last month ๐ค
Usage
tab says the same thing.
Should I stop these machines? https://fly.io/apps/imgai-db/machines
Maybe they're the culprit. I'm stopping them, just in case.
The DB is now stopped.
There are no active applications within the organization, so billing should have stopped from now on.
But if we are just holding the data in the DB
does it need to have 4GB RAM
? ๐คทโโ๏ธ
Probably not, but I haven't made any modifications to the db
application. If you check https://github.com/dwyl/image-classifier/blob/main/deployment.md, I've only changed the Elixir
applications to upgrade them, Postgres
applications have been deployed with the default settings.
My fear is that maybe we're having costs with the volumes?
https://fly.io/docs/about/billing/#volume-billing
But we're only using 7GB
at most, across three different volumes?
So they should fall under the Free Allowance
...?
Indeed. "haven't made any modifications". To be clear: the June
bill was the same!
So the change that was made in May simply wasn't enough for us to avoid these silly high bills!
It's just than in June I was so focussed on the work-work
trip that I didn't even notice the Fly.io bill!
But it's there and we're spending more money on an App that nobody is using than all our other apps combined!
Please just scale down the RAM
of the DB. if we don't need to be spending $74/month
for storing "historical data". ๐
And in future please don't unassign yourself from an issue until it's resolved, that's not what responsible adults do! "Not My Problem" is a classic unaccountable child reaction and would 100% get you fired in most serious organisations because it's the attitude that matters! You scaled up the
DB
to make this work - that's fine! - but you're responsible for scaling it back down so I don't have this silly bill appearing on my credit card!
๐คฆโโ๏ธ
Part of being a "Senior" engineer is taking responsibility for your actions. Like spending the company's money.
I don't have $900/year
to spend on a "Demo" app that nobody is using.
Please just fix it by scaling down the RAM
to the bare minimum so I don't see this again. ๐
I didn't ask you to DELETE
the VMs for this App; in fact that's the exact opposite of what you should have done.
You should have invested the time to write the Ops code to automate the scaling so that it still works!
Not having the App instances but keeping the DB is the worst of all worlds because we are still paying but have nothing to show for it! ๐ข
Ideally you should have proactively written a few lines of code to count how many times the App gets booted/invoked so that we can cap it at a LIMIT
and show a "sorry we are overloaded" page with samples/history but no expensive instances so that we don't incur a bill of thousands for being top on HackerNews
#22
Apologies if this comes across as "harsh", but most people
in the world don't have $900/year
sitting around that they can burn on hosting a "demo" AI app ... ๐ (maybe VC-funded startups do, I certainly don't!)
The amount of time I'd take to respond to this properly is definitely not worth it given my backlog of tasks. I'll let the comments about optimizations regarding this topic, issues, commits, the README.md's and related asked questions (e.g. https://community.fly.io/t/build-failed-with-elixir-bumblebee/15779/4) speak for themselves :)
- All
Bumblebee
Elixir apps need to load models on startup. The list of optimizations that were made to reduce cold startups and processing times are thoroughly documented. This is why one needs more computing power to initially load the model. - It is impossible to run the wanted image classifier model without more processing power in VMs in
fly.io
. Scaling was already automated to minimize costs, given that the billing on these machines works on a "time usage basis". Each machine was turned off after an hour of inactivity (smallest numberfly.io
allows. The startup was optimized with volumes to reduce bandwidth and avoid re-downloading models. Again, all of this is documented. The reason that VMs were deleted was because, even with these optimizations to reduce bandwidth (and, in consequence, costs), the costs were still too high. Since I didn't have the proper time to dedicate to this topic, deleting the VMs (even if temporary) was the quickest solution to avoid unnecessary costs. - "Ideally you should have proactively written a few lines of code to count how many times the App gets booted/invoked so that we can cap it at a
LIMIT
and show a "sorry we are overloaded" page" - unfortunately, even though this was thought of before, this is not possible given how billing works infly.io
. Even if we implemented that process, you'd still need to have App VMs working to display that warning and you'd be billed the same, regardless if the app is not making requests to the DB and is throttled. Ultimately, 99% of the cost stems from provisioned computing power, regardless of usage. Because it's impossible to run the classifying model on a free-tiered machine, this becomes a non-starter for free apps.
I digress.
To reduce costs:
- I initially (and erroneously) thought that only the volume used was priced. They price it on provisioned capacity
after all. So, I've taken the steps necessary to pass over the data fromPostgres
volume that has40GB
to one that has3GB
.
This volume has 2GB but only around 180MB
of data from Postgres. Most of the volume is used to load the model that is used when an instance boots up and finds no model, persisting in the model.
-
I tried following https://community.fly.io/t/resize-database-volume-size/16222 to migrate the volume without having to
fly sftp
into the volume. However, this didn't work because I got afailed to resolve member over dns: unable to resolve cloneable member
when trying tofly clone
the machine to import from one volume to the other (fly.io
doesn't allow me to shrink a volume, only increase. So I'm forced to do everything manually to transfer files - https://community.fly.io/t/fly-clone-machine-fails-after-barman-postgres-recovery/19208/2 and https://community.fly.io/t/sftp-put-overwrite-existing-folder/10384/2). -
Because this didn't work, I had to create a new
postgres
cluster by forking the pre-existing one and downscaling the machines there. I thenfly sftp
the volume, fetched thedata/
directory to my local host and then I put it in a smaller volume of3GB
and attached it to a new, single node instance on free tier. So, the new volume3GB
only used around300MB
but it's because the model is not loaded, the data from Postgres is still there and saved.
With all of the above in mind, I initially shut down the machines because it's where the higher cost was. The steps that I made addressed the extra computing power and paying for extra provisioned volume. A single node with 256MB
of RAM
and a volume of 3GB
-> https://fly.io/apps/imgai-db/machines.
Now there should be no charge per month.
And the winners are cloud service providers, and GPU companies more recently.