ldccr is utilities for various Japanese corpora.
The goal of ldccr package is to make easy to use Japanese language resources.
This package provides:
- parsers for several Japanese corpora that are free or open licensed (non proprietary).
- a downloader of zipped text files published on Aozora Bunko.
install.packages("ldccr", repos = c("https://paithiov909.r-universe.dev", "https://cloud.r-project.org"))
… | Name | License | Link |
---|---|---|---|
✔️ | Live Door News Corpus | CC BY-ND 2.1 JP | # |
✔️ | Japanese Realistic Textual Entailment Corpus | CC BY-NC-SA 4.0 | # |
✔️ | ja.text8 corpus | CC BY-SA | # |
Currently not supported.
if (!dir.exists("cache")) dir.create("cache")
text <- ldccr::AozoraBunkoSnapshot |>
dplyr::sample_n(1L) |>
dplyr::pull("テキストファイルURL") |>
ldccr::read_aozora(directory = "cache") |>
readr::read_lines()
dplyr::glimpse(text)
#> chr [1:16] "雪子さんの泥棒よけ" "夢野久作" ...
MIT license.