elastic/ember

Extract Raw Features for Own Dataset

Closed this issue · 3 comments

This repository makes it easy to generate raw features and/or vectorized features from any PE file. Researchers can implement their own features, or even vectorize the existing features differently from the existing implementations.

Could you please provide steps or requirements to extract raw features from a different dataset? I'd like to create .jsonl file (see cropped image) for my dataset, however I am struggling to extract some spesific information such as histogram. Any suggestion or code sample would be great.

image

What is the exact problem with the histogram?

The function that does what you are asking for is this:
https://github.com/elastic/ember/blob/d97a0b523de02f3fe5ea6089d080abacab6ee931/ember/features.py#LL37C36-L37C36

What is the exact problem with the histogram?

The function that does what you are asking for is this: https://github.com/elastic/ember/blob/d97a0b523de02f3fe5ea6089d080abacab6ee931/ember/features.py#LL37C36-L37C36

I think I cannot see clearly, maybe I focus something wrong.

from ember import PEFeatureExtractor

extractor = PEFeatureExtractor()
extractor.raw_features('./files/13.exe')

I received an error:

Traceback (most recent call last):
  File "...\main.py", line 4, in <module>
    extractor.raw_features('./files/13.exe')
  File "...\.venv\lib\site-packages\ember\features.py", line 540, in raw_features
    lief_binary = lief.PE.parse(list(bytez))
TypeError: ['.', '/', 'f', 'i', 'l', 'e', 's', '/', '1', '3', '.', 'e', 'x', 'e']

I thought the function automatically reads the PE file. It is fixed by sending file as a parameter.

from ember import PEFeatureExtractor

extractor = PEFeatureExtractor()

with open('files/13.exe', 'rb') as f:
    print(extractor.raw_features(f.read()))