Issues
- 2
Issue with FlexFlow LLM Compilation and Generation
#1444 opened by QAZWSX0827 - 2
CUDA testing support in `proj`
#1537 opened by lockshaw - 0
Fix input, weight, noop in local execution
#1422 opened by reyna-abhyankar - 0
Add tests for managed_ff_stream and handle
#1435 opened by oOTigger - 0
- 0
Update rewriting search in unity algorithm
#1501 opened by lockshaw - 0
Fix embedding kernel refactor
#1443 opened by reyna-abhyankar - 0
Fix handling for device specific and binding arbitrary device specific types
#1462 opened by reyna-abhyankar - 0
Update files in `kernels` so that the `CMakeLists.txt` src pattern can be changed to `src/cuda/*.cu`
#1502 opened by lockshaw - 1
- 0
- 0
- 0
- 0
- 0
Add a `SubstitutionBuilder` to make creating `Substitution`s less verbose and error-prone
#1473 opened by lockshaw - 0
Figure out what to do with `LazyLabelledDataflowGraph`
#1513 opened by lockshaw - 0
Improve unit tests for `ParallelComputationGraphBuilder` and `ComputationGraphBuilder`
#1474 opened by lockshaw - 0
Rename `filtermap_keys` and `filtermap_values` to `filtrans_keys` and `filtrans_values` for consistency
#1514 opened by lockshaw - 0
Add ability to document dtgen structs using doxygen
#1475 opened by lockshaw - 0
- 0
Add a function in `op-attrs` that takes an `UnmappedOpCostEstimateKey` and generates an `OperatorTaskSpace`
#1520 opened by lockshaw - 0
Implement `is_valid_substitution`
#1477 opened by lockshaw - 0
- 0
Replace/simplify DimOrdered
#1483 opened by lockshaw - 0
- 0
Remove inheritance structure from graph objects
#1484 opened by lockshaw - 0
Add `num_inputs` check to `get_output_shapes(PCGOperatorAttrs, std::vector<ParallelTensorShape>)`
#1496 opened by lockshaw - 0
CUDA GPU CI for `repo-refactor`
#1536 opened by lockshaw - 0
- 0
Add intermediate interface between `ComputationGraphBuilder` and the raw graph interface for testing
#1499 opened by lockshaw - 0
Add weight handling for SP decomposition of PCGs
#1500 opened by lockshaw - 0
Standardize kernel function signatures
#1540 opened by lockshaw - 0
Performance issue when batch_size is 32
#1529 opened by letheantest - 0
- 1
Error when I use larger batch size for spec-infer
#1491 opened by lhr-30 - 1
Add function to divide list of PCG inputs into inputs and weights in `op-attrs`
#1469 opened by lockshaw - 2
Tokenizer not optional
#1515 opened by stelleg - 5
multinode python: Legion error 67 alongside NCCL errors.
#1480 opened by stelleg - 4
cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu
#1497 opened by Spacecat-zwh - 2
Questions about the measurement of the latency
#1454 opened by QAZWSX0827 - 0
- 1
- 0
- 0
Add support for Tiktoken tokenizer in Request Manager
#1438 opened by Flechman - 0
Issue with debugging using cuda-gdb
#1451 opened by Liu-Weijie - 0
- 0
Sorry, it was a typo
#1445 opened by QAZWSX0827 - 0
How to enable reduction parallel in substitutions?
#1431 opened by weilinquan - 0
How to enable reduction parallel in substitutions?
#1432 opened by weilinquan - 0