flexflow/FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++Apache-2.0

Issues

Issue with FlexFlow LLM Compilation and Generation
#1444 opened 7 months ago by QAZWSX0827
2
CUDA testing support in `proj`
#1537 opened 3 months ago by lockshaw
2
Fix input, weight, noop in local execution
#1422 opened 8 months ago by reyna-abhyankar
0
Add tests for managed_ff_stream and handle
#1435 opened 7 months ago by oOTigger
0
Check whether an invocation is valid against a signature
#1442 opened 7 months ago by reyna-abhyankar
0
Update rewriting search in unity algorithm
#1501 opened 5 months ago by lockshaw
0
Fix embedding kernel refactor
#1443 opened 7 months ago by reyna-abhyankar
0
Fix handling for device specific and binding arbitrary device specific types
#1462 opened 6 months ago by reyna-abhyankar
0
Update files in `kernels` so that the `CMakeLists.txt` src pattern can be changed to `src/cuda/*.cu`
#1502 opened 5 months ago by lockshaw
0
Move `OpTaskSignature` to dtgen, away from visitable
#1465 opened 6 months ago by reyna-abhyankar
1
Add `hash` and `fmt` for `TaskBinding` in local execution
#1503 opened 5 months ago by reyna-abhyankar
0
Fix profiling wrapper to avoid resource handle issues
#1504 opened 5 months ago by reyna-abhyankar
0
Move `TensorSlotsBackingWithoutAddresses` over to dtgen
#1468 opened 6 months ago by reyna-abhyankar
0
Add kernels/local-execution support for new `BatchNormAttrs` fields
#1505 opened 5 months ago by lockshaw
0
Add a `SubstitutionBuilder` to make creating `Substitution`s less verbose and error-prone
#1473 opened 6 months ago by lockshaw
0
Figure out what to do with `LazyLabelledDataflowGraph`
#1513 opened 5 months ago by lockshaw
0
Improve unit tests for `ParallelComputationGraphBuilder` and `ComputationGraphBuilder`
#1474 opened 6 months ago by lockshaw
0
Rename `filtermap_keys` and `filtermap_values` to `filtrans_keys` and `filtrans_values` for consistency
#1514 opened 5 months ago by lockshaw
0
Add ability to document dtgen structs using doxygen
#1475 opened 6 months ago by lockshaw
0
Implement `as_dot` for graphs in `utils/graph` in a less hacky way
#1476 opened 6 months ago by lockshaw
0
Add a function in `op-attrs` that takes an `UnmappedOpCostEstimateKey` and generates an `OperatorTaskSpace`
#1520 opened 4 months ago by lockshaw
0
Implement `is_valid_substitution`
#1477 opened 6 months ago by lockshaw
0
Factor out common n-dimensional coordinate/index primitives
#1528 opened 4 months ago by lockshaw
0
Replace/simplify DimOrdered
#1483 opened 6 months ago by lockshaw
0
Add ability to get `vector` of all enum values to `proj dtgen`
#1478 opened 6 months ago by lockshaw
0
Remove inheritance structure from graph objects
#1484 opened 6 months ago by lockshaw
0
Add `num_inputs` check to `get_output_shapes(PCGOperatorAttrs, std::vector<ParallelTensorShape>)`
#1496 opened 5 months ago by lockshaw
0
CUDA GPU CI for `repo-refactor`
#1536 opened 3 months ago by lockshaw
0
`CHECK_VALID_OP_ATTR` is not that necessary anymore and should be removed
#1498 opened 5 months ago by lockshaw
0
Add intermediate interface between `ComputationGraphBuilder` and the raw graph interface for testing
#1499 opened 5 months ago by lockshaw
0
Add weight handling for SP decomposition of PCGs
#1500 opened 5 months ago by lockshaw
0
Standardize kernel function signatures
#1540 opened 3 months ago by lockshaw
0
Performance issue when batch_size is 32
#1529 opened 4 months ago by letheantest
0
Stuck During Tree-Based Speculative Decoding with OPT Model
#1526 opened 4 months ago by SeungjaeLim
0
Error when I use larger batch size for spec-infer
#1491 opened 5 months ago by lhr-30
1
Add function to divide list of PCG inputs into inputs and weights in `op-attrs`
#1469 opened 4 months ago by lockshaw
1
Tokenizer not optional
#1515 opened 4 months ago by stelleg
2
multinode python: Legion error 67 alongside NCCL errors.
#1480 opened 5 months ago by stelleg
5
cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu
#1497 opened 5 months ago by Spacecat-zwh
4
Questions about the measurement of the latency
#1454 opened 7 months ago by QAZWSX0827
2
it is possible to print the simulated running time and simualtion time cost？
#1463 opened 6 months ago by tyn513
0
Inference with TP>1 and dissagregated qkv projection & attention operator
#1447 opened 6 months ago by yingchen21
1
Issue with C++ inference for model meta-llama/Llama-2-70b-hf
#1452 opened 7 months ago by DDDDDYTS
0
Add support for Tiktoken tokenizer in Request Manager
#1438 opened 7 months ago by Flechman
0
Issue with debugging using cuda-gdb
#1451 opened 7 months ago by Liu-Weijie
0
example inference test in cpp_inference_tests.sh does not terminate
#1440 opened 7 months ago by Bob-Chen222
0
Sorry, it was a typo
#1445 opened 7 months ago by QAZWSX0827
0
How to enable reduction parallel in substitutions?
#1431 opened 7 months ago by weilinquan
0
How to enable reduction parallel in substitutions?
#1432 opened 7 months ago by weilinquan
0
Request for Graph Pruning Algorithm Code Location in FlexLLM
#1425 opened 8 months ago by zbtrs
0