/r-package-parser

R Package to parse R documentation files for RDocumentation

Primary LanguageR

RPackageParser Note: Please read this confluence page which explains the complete architecture of how RDocumentation works R Package that uses pkgdown package to parse R package documentation and pass it on to the next Lambda worker to upload the documentation to the RDocumentation database We have forked our own version of

pkgdown

which we use here: ("https://github.com/datacamp/pkgdown")

How it works Read messages from rdocs-r-worker SQS queue This will contain the packages that need to be processed. The message types are documented in the /docs folder Process the messages into a JSON files that we dump in S3 for logging If the message is successfully processed, add the JSON to the rdocs-app-worker SQS queue (that will then be handled in the rdocs app API) If the processing fails, add an error job to the rdoc-r-worker-deadletter queue

Local development Installing the package Ensure you have

devtools

installed to ease local development Set an environment variable

GITHUB_PAT

Install the package's dependencies:

Open up RPackageParser.RProj in RStudio Select

Build > Load All; this will make all exported and unexported functions of the package available.
To verify that it works, try to following command in your R console:
 res <- process_package("https://cran.r-project.org/src/contrib/Archive/R6/R6_2.5.0.tar.gz", "R6", "cran")

Polling and posting to SQS queues

First, add a file
 `.env.R`
 in the package root folder with info that AWS needs:
```R
Sys.setenv(AWS_ACCESS_KEY_ID = "ACCESS_KEY_ID",
           AWS_SECRET_ACCESS_KEY = "SECRET_ACCESS_KEY",
           AWS_DEFAULT_REGION = "us-east-1",
           DEST_QUEUE = "rdoc-app-worker",
           SOURCE_QUEUE = "rdoc-r-worker",
           DEADLETTER_QUEUE = "rdoc-r-worker-deadletter")

You need to add AWS keys that have write access to the SQS queues so that you can post messages to the queue You can find `AWS_ACCESS_KEY_ID`
in the AWS Parameter Store
 but
 `AWS_SECRET_ACCESS_KEY`
will be encrypted there so you will need to request that value from the infra team
After that you can run
 `main()`;
 this will poll the SQS queues
 and do all the processing:

RPackageParser::main()

Add messages to the queue If you want to add messages to the queue for local testing, setup the aws cli and then run:

aws sqs send-message --queue-url https://queue.amazonaws.com/301258414863/rdoc-r-worker --message-body '{"name":"ReorderCluster","version":"1.0","path":"ftp://cran.r-project.org/pub/R/src/contrib/ReorderCluster_1.0.tar.gz"}'

where you replace the body with the package that you want to test.

Note that this is the production queue, which means that the queue will be processed both by your local parser and the production parser, and whoever pics the message first will be the one to process it. That's why you might need to send a few requests until your local parser can pick the message.

After you added your message to the [rdoc-r-worker queue](https://us-east-1.console.aws.amazon.com/sqs/v2/home?region=us-east-1#/queues/https%3A%2F%2Fsqs.us-east-1.amazonaws.com%2F301258414863%2Frdoc-r-worker/send-receive), you should see it for a brief moment in AWS while its being processed. After the processing is done, you should be able to see new messages in [rdoc-app-worker queue](https://us-east-1.console.aws.amazon.com/sqs/v2/home?region=us-east-1#/queues/https%3A%2F%2Fsqs.us-east-1.amazonaws.com%2F301258414863%2Frdoc-app-worker/send-receive#/) (click on the "Poll for messages" button in the aws console).

### Testing locally without SQS queues

If you just want to test pulling a package and generating the output that will be added to the destination queue, just open this project in RStudio and run these commands in the console:

1. `devtools::load_all(".")`
2. `library("RPackageParser")`
3. `res < - process_package("https://cran.r-project.org/src/contrib/REdaS_0.9.4.tar.gz", "REdaS", "cran")`: replace these arguments with the ones of the package you want to test.
4. `write(jsonlite::toJSON(res$topics[[1]],auto_unbox = TRUE), file = 'topic.json')`: this will create a `topic.json` file in the root of the project that contains the JSON that will be added to the queue. This is what the API will process before adding the topic to the mysql database.

Deployment Commits to master are deployed to staging Tags that use

vx.y.z are deployed to production