Kyrie-Curiosities-Cabinet

Kyrie's Cabinet of Curiosities

Compiler for FL (computation+communication)
Is it possible to adopt imitation learning or other IRL approaches to mimic DNN compiler generated kernels,,, shortening long tuning process, boosting cross-device compiling?
NeRF, Q4ML, ML4Q, QCompiler... dazzled, Closing the Gap between Quantum Algorithms and Machines with Hardware-Software Co-Design
Sim2Real Runtime Engine, for e.g. MinDOJO, autonomous driving?[X], nature of simulation: massive data, exploration cost, distributed traing <-> real.
Carbon-aware DNN Compiler

EDEN: Enabling energy-efficient, high-performance deep neural network inference using approximate DRAM by Koppula, Skanda, et al., MICRO 2019
Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters by Acun, Bilge, et al., ASPLOS 2023
octoml
lamppost, energy = cost, "However, AI inference at such a massive scale is very expensive."
Zeus

The OctoML Platform has always provided automation for exploring multiple model acceleration techniques. Via our new TVM-ONNX Runtime integration,

Duet类似的dual device inference的nn compiler+runtime，异构子图优化，根据不同设备的并行性能优化。
Diffusion model Survey Diffusion models: A comprehensive survey of methods and applications
Crypto | Privacy | Security + Accelerator

CryptGPU
PolyMPCNet
Crypten
Cheetah

On-device AI

Summarizing CPU and GPU Design Trends with Product Data
prompt
webGPU (https://github.com/mlc-ai/web-stable-diffusion)
zeroth order optimization (blackboxML)
Unified abstarctions for IoT datastream.
mlir for heter-device ml-flow (dnn & non-dnn operators & flow)
tile ir => mixed-tile ir? cross-domain/modality ir? Pain point: multi-layer IR redundant code opt. iree support vulkan-spirv，for mobile gpu and cpu => compiler support for data flow in e2e auto vehicle
LLM running with CNN and DNN, co-running transformer and CNN.
大模型更强的表征能力赋给在线小模型。
NPU running LLM, energy problem.
llm token的稀疏性和input token的动态性使distributed inference不是所有通信都是必要的。
machine unlearning ood
样本端不愿意给label，都从云走的话太慢，以及云负载过大。
未知任务识别放edge，未知任务识别在云上？
有多少edge数据放在边缘侧可以finetune出比较好的垂直大模型？
Simulator for edge？
大模型compiler for edge,迁移性
finetuning edge llm + compiler
rTile, rGraph, 重新定义基本单元，不要以op为单位，see as dataflow。load-compute-store。大模型on edge as dynamic nn。
Develop compiler strategies that can efficiently distribute model computations between edge and server GPUs, considering factors such as network latency, communication overhead, and load balancing.
To facilitate this mapping, WELDER provides an abstracted accelerator device with hierarchical memory layers.
Tensor 版本的imagebind, meta-transformer. Tuned records of same op/graphs (objects) on different hardwares (modality). Goal: unify hardware intrinsics, feature: cost model <-> svm对image到现在的encoder
Explore Data Placement Algorithm for Balanced Recovery Load Distribution
zpoline: a system call hook mechanism based on binary rewriting
tensorir,code embedding, llm
decompiling, executables to ir to another device e2e, nvidia tx2 的compiled模型，一键迁移到另一个上。

Kyrie-Zhao/Kyrie-Curiosities-Cabinet

Kyrie-Curiosities-Cabinet

Kyrie's Cabinet of Curiosities