JSON API – implementation details
sergey-alekseev opened this issue · 11 comments
How would you generated and store the documentation bits? Are the ri files generated from the sdoc files? Currently the HTML docs are generated on the server and then synced to S3.
The generation flow:
- Generate ri files with
gem rdoc --ri GEM_NAME
These files will look like this:

- Generate markdown files (they are close to plain text) with
ri -f markdown Array >> Array.md
I plan to store files in the same files structure as ri stores .ri files.
Class/module name for a directory, files with names corresponding to method names inside this directory.


For Ruby 2.1.2 these files will be store along with sdoc files in ruby-2-1-2/markdown/ directory. And will be synced to S3 as well.
How would the API for receiving the docs look like?
Let's say http://api.rubydocs.org/search?q=QUERY&project=PROJECT&version=VERSION.
Where:
QUERY is required and looks like "Array#map", "Array", "map", etc.
PROJECT isn't required and looks like "Ruby", "Rails", etc.
VERSION isn't required and looks like "1.9.3-p545", "4.1.1", etc.
How/why would you need to retrieve the generated sdoc files?
As you wrote: So http://docs.rubydocs.org/ruby-2-1-2/ simply points to http://rubydocs.s3-website-us-east-1.amazonaws.com/ruby-2-1-2/. The "docs" subdomain points to a MaxCDN host which fetches the files from the S3 static website.
So, when you request http://api.rubydocs.org/search?q=Array%23map&project=ruby&version=2.1.2 API will respond with JSON like {project: 'Ruby', version: '2.1.2', content: 'CONTENT'}. CONTENT will come from http://rubydocs.s3-website-us-east-1.amazonaws.com/ruby-2-1-2/markdown/Array/map-i.md.
@manuelmeurer how do you find this?
Let me know if you need more details before I could start working and create PR.
Why would you store the generated ri files on S3 as well?
For the docs right now it's a big win since docs.rubydocs.org points to MaxCDN which points to S3, so the HTML is directly served from there and the rubydocs.org web/app server doesn't need to handle it.
But the API requests would need to be handled by Rails and then Rails would have to go and get the ri docs back from S3? S3 would be like a (pretty slow) database in that case.
Why not store the ri docs in the database and cache with Redis or Memcached?
I've been thinking of storing ri docs in database and caching them with Redis. However ri documents are almost the same size as HTML docs. 63 MB for Ruby 2.0.0-p353, see below:

I considered it would be expensive to store all documentation files for more than 200 versions of Rails and more than 250 versions of Ruby. And I had in mind your intention to store other gem's docs.
So, I think about storing only keys in database on DO server. Documentation files will be fetched from AWS S3 (e.g. http://docs.rubydocs.org/ruby-2-1-2/markdown/Array/map-i.md) during each request. I know it is one more request and it will slow down the whole process. However I hope each request to JSON API will be accomplished in less than one second. And I hope there is a rather close to DO server AWS S3 replica – it will decrease network communication and increase overall speed.
Let me know your thoughts on it.
Yes, it could potentially grow to be a lot of data. Still, I don't think S3 is the right tool for this. For one, it doesn't offer any kind of search. What if the API gets a query for "Arra"? We couldn't search the generated ri docs on S3 for any keys matching "Arra", so we would need to build an index in our app database that matches search terms to S3 keys. Why not just save the doc fragments in the db as well then?
How about using something like AWS DynamoDB? It seems to have very competitive pricing (http://aws.amazon.com/dynamodb/pricing/) and support indexes/searching (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html).
Manuel, sure, it would be obviously better to use database for storing all data. If you or sponsors could afford this, then we'd better use AWS DynamoDB. DynamoDB is a good choice IMO.
Well, I'm currently paying for the S3 storage myself, but it's not a lot (< $10 per month). I would also pay for DynamoDB but I don't think it would be a lot either initially, in their pricing example they write "For less than $0.25/day ($7.50/month), you could support an application that performs 1 million writes and reads per day and stores 3 GB of data." If it adds up to more in the future, I could try asking them to sponsor the site.
Do you want to start working on the doc generation and storage in DynamoDB? I can set up an account then.
First we should think about the data model for the DynamoDB table: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataModel.html
Great! I want to start working on the doc generation and storage in DynamoDB. I will describe DB schema here soon.
Any progress? 😸
Manuel, I definitely plan to implement this feature. However I will be travelling in a few days and can't really give you my schedule on this now.
Alright, no worries. Looking forward to what you come up with!