CybercentreCanada/assemblyline

FrankenStrings URL extraction seems to trim URLs on char 0, even when it's not a binary file

kam193 opened this issue · 2 comments

Describe the bug
I've just observed that FrankenStrings started to trim URLs when they have a 0 inside. It looks like an attempt to fix extraction from binaries where URLs often weren't properly trimmed, but applied to text files results in too early trimming.

To Reproduce
Steps to reproduce the behavior:

  1. Upload a file with an URL like https://some.doma.in/pathwith0butnotended, a real example: utils.py.zip (pass: zippy, it contains a Discord webhook)
  2. Wait on results.
  3. See the URL extracted as https://some.doma.in/pathwith

Expected behavior
The full URL is extracted.

Screenshots
obraz

Environment (please complete the following information if pertinent):

  • Assemblyline Version: 4.5.0.26
  • FrankenStrings 4.5.0.6

Additional context

You are exactly right, this is a false positive from a recently implemented pascal string check. I'll change it so it only runs on data files.

This should be resolved in version 1.3.5 of the multidecoder package and in the latest stable release of Frankenstrings.