An R wrapper for Python's pxtextmining
library- a pipeline to classify text-based patient experience data.
Function documentation: https://nhs-r-community.github.io/pxtextmineR/.
Package pxtextmineR
does not wrap everything from pxtextmining
, but
selected functions that will offer R users new opportunities for modelling. For
example, the whole Scikit-learn
(Pedregosa et al., 2011) text classification pipeline is wrapped, as
well as helper functions for e.g. sentiment analysis with Python's
textBlob
and
vaderSentiment
.
How does the wrapper work? It uses R package reticulate
,
which provides tools for interoperability between Python and R.
There are a few things that need to be done to install and set up pxtextmineR
.
-
Run
devtools::install_github("nhs-r-community/pxtextmineR")
in the R console. -
Create a Python virtual environment. If not familiar with virtual environments please take a look at this and this. R package
reticulate
has functions to create a Python virtual environment via the R console. Refer toreticulate::conda_create
andreticulate::virtualenv_create
. For example, if using Conda, runreticulate::conda_create("r-reticulate")
where
r-reticulate
is the name ofreticulate
's default virtual environment. Using this default virtual environment forpxtextmineR
is strongly recommended because it makes the setup so much easier. According to thereticulate
authors' own words "[i]t’s much more straightforward for users if there is a common environment used by R packages [...]" -
Tell
reticulate
to use ther-reticulate
virtual environment:reticulate::use_condaenv("r-reticulate", required = TRUE)
-
Install Python package
pxtextmining
inr-reticulate
:reticulate::py_install(envname = "r-reticulate", packages = "pxtextmining", pip = TRUE)
-
We also need to install a couple of
spaCy
models inr-reticulate
. These are obtained from URL links and thus need to be installed separately. In the R console run:system("pip install wheel") system("pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz") system("pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.3.1/en_core_web_lg-2.3.1.tar.gz")
All steps in one go:
devtools::install_github("nhs-r-community/pxtextmineR")
# If not using Conda, comment out the next two lines and uncomment the two lines
# following them.
reticulate::conda_create("r-reticulate")
reticulate::use_condaenv("r-reticulate", required = TRUE)
# reticulate::virtualenv_create("r-reticulate")
# reticulate::use_virtualenv("r-reticulate", required = TRUE)
reticulate::py_install(envname = "r-reticulate", packages = "pxtextmining", pip = TRUE)
system("pip install wheel")
system("pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz")
system("pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.3.1/en_core_web_lg-2.3.1.tar.gz")
The installation instructions above did not work in all machines on which the installation process was tested. There were two problems:
- In some machines
reticulate
would simply refuse to install in virtual environmentr-reticulate
the version ofScikit-learn
thatpxtextmining
uses (v 0.23.2). - When trying to use a virtual environment other than
r-reticulate
(i.e.reticulate::use_condaenv("<some_other_virtual_environment>", required = TRUE)
), the behaviour ofreticulate
was confusing. On the one hand, it would runpxtextmineR
functions using the user-specified virtual environment. However, on the other hand, when running commands to build e.g. function documentation with R packagepkgdown
,reticulate
would automatically setr-reticulate
as the default environment, causing the code to break.
We have opted for a more "invasive" approach to fix this problem so that users can use any virtual environment with no issues. This requires the following steps:
-
Create a Python virtual environment using e.g. Anaconda, Miniconda or a Virtual Python Environment.
-
Activate it and install
pxtextmining
and thespaCy
models:pip install pxtextmining pip install wheel pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.3.1/en_core_web_lg-2.3.1.tar.gz
-
Use a text editor to open your
.Renviron
file, normally located in~/.Renviron
, and add the following lines:PXTEXTMINER_PYTHON_VENV_MANAGER=name_or_path_to_venv_manager PXTEXTMINER_PYTHON_VENV=name_of_venv
where "name_of_venv" should be replaced by the name of the virtual environment (unquoted) and "name_or_path_to_venv_manager" should be replaced by the name of the virtual environment manager or the path to the virtual environment (unquoted). In more detail:
- If using Conda or Miniconda, replace "name_or_path_to_venv_manager" with "conda" or "miniconda" (unquoted) respectively.
- If using a Virtual Python Environment, replace
"name_or_path_to_venv_manager" with the path to the virtual environment,
e.g.
/home/user/venvs/myvenv
.
-
Good idea to restart R Studio.
-
Run
devtools::install_github("nhs-r-community/pxtextmineR")
in the R console. -
Again, good idea to restart R Studio. If there are error messages that the user-specified Python environment cannot be set, close and re-open R Studio.
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M. & Duchesnay E. (2011), Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12:2825--2830.