calyxir/calyx

[Profiling] Tracker Issue for Profiling first steps

Opened this issue · 4 comments

This issue lists out steps for profiling! (Mostly so I can organize my TODOs.) Will update as I move along.

Inspections & QoL improvements to profiler

  • Running the profiler on more (big) programs
    • Make a test suite for profiling
    • Use Cider2 benchmarks for "real programs"
      • Brainstorm ways to give actionable feedback for bigger programs
    • Fix profiler tests CI
      • Have runt test print out more things to properly debug
  • Inspections
    • Look through the waveforms for the "weird behavior"/mystery cycles
    • Minimization to see different behaviors
    • For any FSM-managed group, collect two pieces of info and report both to the users. The diff would display the mystery redundant cycles that Calyx is consuming.
      • ground truth (go, done ports)
      • FSM (what Calyx says is allowed to run)
  • QoL improvements:
    • Connect invokes & pars (cond groups for whiles?) with user identifiable info (line numbers?)
  • Visualizations

First Pass: Cycle-level performance info at the Calyx level

  • Metadata generation
    • Print JSON from TDCC (add another pass option to print JSON instead of the dump)
    • Write JSON to file
    • Instead of hacking through the enable assignment, we directly keep track of group to FSM state mappings
      • Refactor this by directly building a FSMStateInfo when processing enables.
    • Fix JSON emission to output a single JSON file at the end (when there are multiple TDCC groups, like in language-tutorial-iterate, the individual TDCC FSMs overwrite each other)
    • Right now (for optimization purposes?) the first group is morphed with the setup. Want to differentiate for more accurate counts of the first group.
    • Merge dump-fsm and dump-fsm-json for TDCC
    • Add FSM name information to JSON
    • If the par arm/component does not yield a FSM, need to output corresponding information (check go and done instead!)
    • We want information about parentage (if a FSM is managing a par arm, we want to know what the par itself is)?
  • Loading in the trace
    • Figure out what tool to use?
    • Make first pass script for reading vcd and outputting group lengths based on FSM values
    • Remove assumption that there is only one FSM
    • Remove assumption that each cycle takes 10ms (have a counter mechanism of how many cycles passed between X ms and Y ms)
      • Sample signals on rising/falling clock edge (comment)
    • Check out example programs with parallelism
    • Produce summary: compute the total cycles that a given group was active, the number of times it was active (the number of segments), and the average running time (which is just the quotient of the previous two values).
    • Multi-component programs:
      • Update TDCC to write one JSON file reflecting all components
      • Output cell names info using a backend instead of TDCC?
      • Fix hardcoding of "TOP.TOP.main.go"
    • Find edge cases where timing info is not actionable
    • Don't start counting clock cycles until main.go is 1
  • Make flame graphs
  • Write wrapper script around the pipeline

Thanks for opening this @ayakayorihiro! Could you add the "Tracker" label to this issue?

Thanks @rachitnigam ! Just added the tracker label, will keep in mind for next time :)

  • Remove assumption that each cycle takes 10ms (have a counter mechanism of how many cycles passed between X ms and Y ms)

For synchronous designs like the ones Calyx produces I generally recommend sampling signals on a rising or falling clock edge (depending on how the testbench works). That way you stay independent of the actual timing. Here is how I find the sample point in a rust implementation: https://github.com/ekiwi/rtl-repair/blob/71e1afc0b9a2327d008b46acd415cf3f0343a938/scripts/osdd/src/main.rs#L113

Similar thing but with the vcdvcd library in python:
https://github.com/ekiwi/rtl-repair/blob/861e244c599e682efe5dbd8e3295c3b8e3590a34/scripts/calc_osdd.py#L215
https://github.com/ekiwi/rtl-repair/blob/861e244c599e682efe5dbd8e3295c3b8e3590a34/scripts/calc_osdd.py#L195

Thanks @ekiwi ! I'll take a stab following your work with the vcdvcd library :)