confluence page which explains the complete architecture of how RDocumentation works R Package that uses pkgdown
package to parse R package documentation and pass it on to the next Lambda worker to upload the documentation to the RDocumentation database We have forked our own version of
RPackageParser Note: Please read this pkgdown
https://github.com/datacamp/pkgdown")
which we use here: ("rdocs-r-worker SQS queue This will contain the packages that need to be processed. The message types are documented in the /docs folder Process the messages into a JSON files that we dump in S3 for logging If the message is successfully processed, add the JSON to the rdocs-app-worker SQS queue (that will then be handled in the rdocs app API) If the processing fails, add an error job to the rdoc-r-worker-deadletter queue
How it works Read messages fromLocal development Installing the package Ensure you have
devtools
installed to ease local development Set an environment variable
GITHUB_PAT
Install the package's dependencies:
Open up
RPackageParser.RProj
in RStudio Select
Build > Load All; this will make all exported and unexported functions of the package available.
To verify that it works, try to following command in your R console:
res <- process_package("https://cran.r-project.org/src/contrib/Archive/R6/R6_2.5.0.tar.gz", "R6", "cran")
Polling and posting to SQS queues
First, add a file
`.env.R`
in the package root folder with info that AWS needs:
```R
Sys.setenv(AWS_ACCESS_KEY_ID = "ACCESS_KEY_ID",
AWS_SECRET_ACCESS_KEY = "SECRET_ACCESS_KEY",
AWS_DEFAULT_REGION = "us-east-1",
DEST_QUEUE = "rdoc-app-worker",
SOURCE_QUEUE = "rdoc-r-worker",
DEADLETTER_QUEUE = "rdoc-r-worker-deadletter")
You need to add AWS keys that have write access to the SQS queues so that you can post messages to the queue You can find `AWS_ACCESS_KEY_ID`
in the AWS Parameter Store
but
`AWS_SECRET_ACCESS_KEY`
will be encrypted there so you will need to request that value from the infra team
After that you can run
`main()`;
this will poll the SQS queues
and do all the processing:
RPackageParser::main()
Add messages to the queue If you want to add messages to the queue for local testing, setup the aws cli and then run:
aws sqs send-message --queue-url https://queue.amazonaws.com/301258414863/rdoc-r-worker --message-body '{"name":"ReorderCluster","version":"1.0","path":"ftp://cran.r-project.org/pub/R/src/contrib/ReorderCluster_1.0.tar.gz"}'
where you replace the body with the package that you want to test.
Note that this is the production queue, which means that the queue will be processed both by your local parser and the production parser, and whoever pics the message first will be the one to process it. That's why you might need to send a few requests until your local parser can pick the message.
After you added your message to the [rdoc-r-worker queue](https://us-east-1.console.aws.amazon.com/sqs/v2/home?region=us-east-1#/queues/https%3A%2F%2Fsqs.us-east-1.amazonaws.com%2F301258414863%2Frdoc-r-worker/send-receive), you should see it for a brief moment in AWS while its being processed. After the processing is done, you should be able to see new messages in [rdoc-app-worker queue](https://us-east-1.console.aws.amazon.com/sqs/v2/home?region=us-east-1#/queues/https%3A%2F%2Fsqs.us-east-1.amazonaws.com%2F301258414863%2Frdoc-app-worker/send-receive#/) (click on the "Poll for messages" button in the aws console).
### Testing locally without SQS queues
If you just want to test pulling a package and generating the output that will be added to the destination queue, just open this project in RStudio and run these commands in the console:
1. `devtools::load_all(".")`
2. `library("RPackageParser")`
3. `res < - process_package("https://cran.r-project.org/src/contrib/REdaS_0.9.4.tar.gz", "REdaS", "cran")`: replace these arguments with the ones of the package you want to test.
4. `write(jsonlite::toJSON(res$topics[[1]],auto_unbox = TRUE), file = 'topic.json')`: this will create a `topic.json` file in the root of the project that contains the JSON that will be added to the queue. This is what the API will process before adding the topic to the mysql database.
Deployment Commits to master are deployed to staging Tags that use
vx.y.z
are deployed to production