Sweary is an R package that contains a database of swear words from different languages, cherry picked by native speakers.
The development version of this package can be installed using devtools:
devtools::install_github("pdrhlik/sweary")
Language | Language code | Number of swear words |
---|---|---|
Czech | cs | 57 |
English | en | 39 |
Polish | pl | 41 |
Total | 3 langs | 137 |
All languages are stored in a swear_words
data frame.
library(sweary)
head(swear_words)
## # A tibble: 6 x 2
## word language
## <chr> <chr>
## 1 buzerant cs
## 2 čubka cs
## 3 čurák cs
## 4 čůrák cs
## 5 debil cs
## 6 dement cs
You can only extract one language that you are interested in.
en_swear_words <- get_swearwords("en")
head(en_swear_words)
## # A tibble: 6 x 2
## word language
## <chr> <chr>
## 1 arse en
## 2 arsehole en
## 3 ass en
## 4 asshole en
## 5 bitch en
## 6 bollocks en
If you are not comfortable with git
and pull requests, you can just follow steps 1-3. After you create the file, send it to me via email with a subject New sweary language: {LANG_CODE}. We will acknowledge you in the README after we approve of the changes.
- Choose a new language. Find its two letter ISO 639-1 code.
- Create a language file.
Place the file in
data-raw/swear-word-lists/{LANG_CODE}
. Example for English:data-raw/swear-word-lists/en
. - Fill in the file with swear-words. Following rules must apply:
- One swear-word per line.
- All words must be lowercase.
- The list must only contain unique words.
- The list must be sorted alphabetically.
- Make sure all the tests pass.
You can do that using a development function called
build_sweary()
. It becomes available when yougit clone
the repository and calldevtools::load_all()
. Or pressingCtrl+Shift+L
in RStudio. Learn more about calling this function using?build_sweary
. - Update README.Rmd
Update the
langs
data frame in README.Rmd by adding a new row to it. More precise instructions are in the raw file itself. - Create a pull request.
The idea first appeared after the South Park text analysis lightning talk at the Why R? 2018 conference in Wrocław. All the contributors will be acknowledged as the work progresses.