rustwasm/twiggy

`twiggy garbage` should hide/summarize "garbage" data by default

Closed this issue ยท 4 comments

๐Ÿ’ก Feature Description

When twiggy garbage reports false positives, they are often data segments. The reason is that code accessing data segments can do arbitrary computations to conjure up a data segment's linear memory address and we only recognize a very limited pattern of (load (const $addr)).*

To avoid reporting so many false positives, by default we should hide individual unused data segments, and instead summarize them in a single row. If the user provides the --show-data-segments flag (open to suggestions on a better flag name!) then we can show all the potentially-false-positive data segments like our current behavior.

The potentially-false-positive data segments should not be included in the ฮฃ row. It should have its own, distinct summary row.

* While we can certainly improve our ability to recognize uses of addresses, we can't find everything without literally executing the whole program (and even that might not find every edge if it doesn't encounter every possible input state...).

๐Ÿ’ป Example Usage

Default.

$ twiggy garbage input.wasm
 Bytes โ”‚ Size % โ”‚ Garbage Item
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   404 โ”Š  1.02% โ”Š code[42]
...
   802 โ”Š  1.56% โ”Š ... and 25 more
  2294 โ”Š  4.48% โ”Š ฮฃ [35 Total Rows]
   596 โ”Š  1.16% โ”Š 14 potential false positive data segments

With --show-data-segments. Now the data segments should be included in the ฮฃ row.

$ twiggy garbage input.wasm --show-data-segments
 Bytes โ”‚ Size % โ”‚ Garbage Item
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   404 โ”Š  1.02% โ”Š code[42]
   209 โ”Š  0.41% โ”Š data[25]
   122 โ”Š  0.24% โ”Š data[0]
   113 โ”Š  0.22% โ”Š data[16]
   102 โ”Š  0.20% โ”Š data[1]
    77 โ”Š  0.15% โ”Š data[28]
    77 โ”Š  0.15% โ”Š data[34]
    73 โ”Š  0.14% โ”Š data[12]
    63 โ”Š  0.12% โ”Š data[13]
    60 โ”Š  0.12% โ”Š data[15]
   802 โ”Š  1.56% โ”Š ... and 25 more
  2294 โ”Š  4.48% โ”Š ฮฃ [35 Total Rows]

๐Ÿ™Œ How to fix this

@fitzgen, it's probably fine, but I'm slightly worried about showing them by default at all. For two reasons:

  1. In cases like the one I ran into, where the data segments make up about 50% of the total binary size, there's a strong motivation to investigate them, but
  2. They're not really actionable, because they can't be stripped anyway.

Though, perhaps adding a very short explanation for how to act on them would change this? Not sure.

I have a (I think) finished implementation of this locally. I need to update the test cases, though, and I have to leave for class right now, so it will have to wait until I get back later.

Hopefully you will see a PR from me tonight!

@Cldfire great! Can't wait to see!