Extract Raw Features for Own Dataset
Closed this issue · 3 comments
This repository makes it easy to generate raw features and/or vectorized features from any PE file. Researchers can implement their own features, or even vectorize the existing features differently from the existing implementations.
Could you please provide steps or requirements to extract raw features from a different dataset? I'd like to create .jsonl
file (see cropped image) for my dataset, however I am struggling to extract some spesific information such as histogram
. Any suggestion or code sample would be great.
What is the exact problem with the histogram?
The function that does what you are asking for is this:
https://github.com/elastic/ember/blob/d97a0b523de02f3fe5ea6089d080abacab6ee931/ember/features.py#LL37C36-L37C36
What is the exact problem with the histogram?
The function that does what you are asking for is this: https://github.com/elastic/ember/blob/d97a0b523de02f3fe5ea6089d080abacab6ee931/ember/features.py#LL37C36-L37C36
I think I cannot see clearly, maybe I focus something wrong.
from ember import PEFeatureExtractor
extractor = PEFeatureExtractor()
extractor.raw_features('./files/13.exe')
I received an error:
Traceback (most recent call last):
File "...\main.py", line 4, in <module>
extractor.raw_features('./files/13.exe')
File "...\.venv\lib\site-packages\ember\features.py", line 540, in raw_features
lief_binary = lief.PE.parse(list(bytez))
TypeError: ['.', '/', 'f', 'i', 'l', 'e', 's', '/', '1', '3', '.', 'e', 'x', 'e']
I thought the function automatically reads the PE file. It is fixed by sending file as a parameter.
from ember import PEFeatureExtractor
extractor = PEFeatureExtractor()
with open('files/13.exe', 'rb') as f:
print(extractor.raw_features(f.read()))