pretrained BERT model for cyber security text, learned CyberSecurity Knowledge

Primary LanguagePythonMIT LicenseMIT


standard-readme compliant Donate

中文说明 | English

SecBERT is a BERT model trained on cyber security text, learned CyberSecurity Knowledge.

Table of Contents

Downloading Trained Models

SecBERT models now installable directly within Huggingface's framework:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("jackaduma/SecBERT")

model = AutoModelForMaskedLM.from_pretrained("jackaduma/SecBERT")

tokenizer = AutoTokenizer.from_pretrained("jackaduma/SecRoBERTa")

model = AutoModelForMaskedLM.from_pretrained("jackaduma/SecRoBERTa")


We release the the pytorch version of the trained models. The pytorch version is created using the Hugging Face library, and this repo shows how to use it.

Huggingface Modelhub

Using SecBERT in your own model

SecBERT models include all necessary files to be plugged in your own model and are in same format as BERT.

If you use PyTorch, refer to Hugging Face's repo where detailed instructions on using BERT models are provided.

Fill Mask

We proposed to build language model which work on cyber security text, as result, it can improve downstream tasks (NER, Text Classification, Semantic Understand, Q&A) in Cyber Security Domain.

First, as below shows Fill-Mask pipeline in Google Bert, AllenAI SciBert and our SecBERT .

cd lm
python eval_fillmask_lm.py
