Useful functions to translate text for multiple languages using online translators. For example, by translating error messages and descriptive analysis results into a language familiar to the user, it enables a better understanding of the information, thereby reducing the barriers caused by language. It offers several helper functions to query gene information to help interpretation of interested genes (e.g., marker genes, differential expression genes), and provides utilities to translate ggplot graphics.
To cite fanyi in publications use:
G Yu. Using fanyi to assist research communities in retrieving and interpreting information. bioRxiv, 2023. doi: 10.1101/2023.12.21.572729.
Guangchuang YU
School of Basic Medical Sciences, Southern Medical University
Get the released version from CRAN:
install.packages("fanyi")
Or the development version from github:
## install.packages("yulab.utils")
yulab.utils::install_zip_gh("YuLab-SMU/fanyi")
use set_translate_source()
to set the default translator using in
translate()
- go to https://fanyi-api.baidu.com/manage/developer and regist as an individual developer
- get
appid
andkey
(密钥) - set
appid
andkey
withsource = "baidu"
usingset_translate_option()
- have fun with
translate()
- regist a free azure account
- enable
Azure AI services | Translator
from https://portal.azure.com/ - create a translation service with free tier pricing version (you need a visa/master card to complete registion and will not be charged until exceed 2 million characters monthly)
- get your
key
andregion
- set
key
andregion
withsource = "bing"
usingset_translate_option()
- have fun with
translate()
- go to https://ai.youdao.com/ and register an account
- click
自然语言翻译服务
and create an app from subsection文本翻译
- get your
应用ID
as appid, and应用秘钥
as key - set
appid
andkey
withsource = "youdao"
usingset_translate_option()
- have fun with
translate()
- (bonus) you can also create
术语表
(dictionary for the terms) as a user-defined dictionary and get the dict id to help you get precise translation in certain domain.
gene_summary()
allows retrieving gene information from NCBI.translate_ggplot()
allows translating axis labels of a ggplot graph.
library(fanyi)
##
## run `set_translate_option()` to setup
##
text <- 'clusterProfiler supports exploring functional
characteristics of both coding and non-coding genomics
data for thousands of species with up-to-date gene annotation'
translate(text, from='en', to='zh')
clusterProfiler支持通过最新的基因注释探索数千个物种的编码和非编码基因组学数据的功能特征
translate(text, from='en', to='jp')
clusterProfilerは、最新の遺伝子注釈による数千種の種の符号化および非符号化ゲノム学データの機能的特徴の探索を支援する
library(DOSE)
library(enrichplot)
data(geneList)
de <- names(geneList)[1:200]
x <- enrichDO(de)
p <- dotplot(x)
p2 <- translate_ggplot(p, axis='y')
p3 <- translate_ggplot(p, axis='y', to='kor')
p4 <- translate_ggplot(p, axis='y', to='ara')
aplot::plot_list(English = p, Chinese = p2,
Korean = p3, Arabic = p4, ncol=2)
symbol <- c("CCR7", "CD3E")
gene <- clusterProfiler::bitr(symbol,
fromType = 'SYMBOL',
toType = 'ENTREZID',
OrgDb = 'org.Hs.eg.db')
gene
## SYMBOL ENTREZID
## 1 CCR7 1236
## 2 CD3E 916
res <- gene_summary(gene$ENTREZID)
names(res)
## [1] "uid" "name" "description" "summary"
d <- data.frame(desc=res$description,
desc2=translate(res$description))
d
## desc desc2
## 1 C-C motif chemokine receptor 7 C-C基序趋化因子受体7
## 2 CD3 epsilon subunit of T-cell receptor complex T细胞受体复合体的CD3ε亚基
res$summary
[1] The protein encoded by this gene is a member of the G protein-coupled receptor family. This receptor was identified as a gene induced by the Epstein-Barr virus (EBV), and is thought to be a mediator of EBV effects on B lymphocytes. This receptor is expressed in various lymphoid tissues and activates B and T lymphocytes. It has been shown to control the migration of memory T cells to inflamed tissues, as well as stimulate dendritic cell maturation. The chemokine (C-C motif) ligand 19 (CCL19/ECL) has been reported to be a specific ligand of this receptor. Signals mediated by this receptor regulate T cell homeostasis in lymph nodes, and may also function in the activation and polarization of T cells, and in chronic inflammation pathogenesis. Alternative splicing of this gene results in multiple transcript variants. [provided by RefSeq, Sep 2014]
[2] The protein encoded by this gene is the CD3-epsilon polypeptide, which together with CD3-gamma, -delta and -zeta, and the T-cell receptor alpha/beta and gamma/delta heterodimers, forms the T-cell receptor-CD3 complex. This complex plays an important role in coupling antigen recognition to several intracellular signal-transduction pathways. The genes encoding the epsilon, gamma and delta polypeptides are located in the same cluster on chromosome 11. The epsilon polypeptide plays an essential role in T-cell development. Defects in this gene cause immunodeficiency. This gene has also been linked to a susceptibility to type I diabetes in women. [provided by RefSeq, Jul 2008]
translate(res$summary)
[1] 该基因编码的蛋白质是G蛋白偶联受体家族的成员。该受体被鉴定为EB病毒(EBV)诱导的基因,被认为是EB病毒对B淋巴细胞影响的媒介。这种受体在各种淋巴组织中表达,并激活B和T淋巴细胞。它已被证明可以控制记忆T细胞向炎症组织的迁移,并刺激树突细胞成熟。据报道,趋化因子(C-C基序)配体19(CCL19/ECL)是该受体的特异性配体。该受体介导的信号调节淋巴结中的T细胞稳态,也可能在T细胞的激活和极化以及慢性炎症发病机制中发挥作用。该基因的选择性剪接导致多种转录物变体。【RefSeq提供,2014年9月】
[2] 该基因编码的蛋白质是CD3ε多肽,其与CD3γ、-Δ和-ζ以及T细胞受体α/β和γ/Δ异二聚体一起形成T细胞受体-CD3复合物。这种复合物在将抗原识别与几种细胞内信号转导途径偶联方面起着重要作用。编码ε、γ和δ多肽的基因位于11号染色体上的同一簇中。ε多肽在T细胞发育中起着重要作用。这种基因的缺陷会导致免疫缺陷。该基因也与女性易患I型糖尿病有关。【RefSeq提供,2008年7月】