lares
is a library designed to automate, improve, and speed everyday Analysis and Machine Learning tasks. With a wide variety of family functions within Machine Learning, Data Wrangling, EDA, and Scrappers, lares
helps the analyst or data scientist to get quick, reproducible, and robust results, without the need of repetitive coding or extensive programming skills.
You are most welcome to install, use, and/or comment on any of the code and functionalities. If you are colour blind as well, glad to share my colour palettes! Feel free to contact me via Linkedin, and please, do let me know where did you got my contact from.
# install.packages('devtools')
devtools::install_github("laresbernardo/lares")
# User friendly update
lares::updateLares()
CRAN NOTE: I currently don't have planned to submit the library into CRAN, eventhough it passes all its quality tests (and I'm a huge fan). I think lares
is more of an everyday useful package rather than a "specialized for a specific task" library. It has too many useful and various kinds of functions, from NLP to querying APIs to plotting Machine Learning results to market stocks and portfolio reports. I gladly share my code with the community and encourage you to use/comment/share it, but I strongly think that CRAN is not aiming for this kind of libraries in their repertoire.
-
DataScience+: Visualizations for Classification Models Results
-
DataScience+: Visualizations for Regression Models Results
-
DataScience+: AutoML and DALEX for Dataset Understanding
-
DataScience+: Find Insights with Ranked Cross-Correlations
-
DataScience+: Portfolio's Performance and Reporting
-
DataScience+: Plot Timelines with Gantt Charts
To get insights and value out of your dataset, first you need to understand its structure, types of data, empty values, interactions between variables... corr_cross()
and freqs()
are here to give you just that! They show a wide persepective of your dataset content, correlations, and frequencies. Additionally, with the missingness()
function to detect all missing values and df_str()
to break down you data frame's structure, you will be ready to squeeze valuable insights out of your data.
My favourite and most used functions are freqs()
, distr()
, and corr_var()
. In this RMarkdown you can see them in action. Basically, they group and count values within variables, show distributions of one variable vs another one (numerical or categorical), and calculate/plot correlations of one variables vs all others, no matter what type of data you insert.
If there is space for one more, I would add ohse()
(One Hot Smart Encoding), which has made my life much easier and my work much valuable. It converts a whole data frame into numerical values by making dummy variables (categoricals turned into new columns with 1s and 0s, ordered by frequencies and grouping less frequent into a single column) and dates into new features (such as month, year, week of the year, minutes if time is present, holidays given a country, currency exchange rates, etc).
You can type lares::
in RStudio and you will get a pop-up with all the functions that are currently available within the package. You might also want to check the whole documentation by running help(package = "lares")
locally or in the rdrr.io or rdocumentation.org websites. Remember to check the families and similar functions on the See Also sections too.
If you need help with any of the functions, use the ?
function (i.e. ?lares::function
) and the Help tab will display a short explanation on each function and its parameters.
If you encounter a bug, please share with me a reproducible example on Github issues and I'll take care of it. For inquiries, and other matters, you can email me directly or open a new ticket here.