taichi: A C++ repository from xumingkuan

Docs | Tutorial | DiffTaichi | Examples | Contribute | Forum

Documentations	Chat	taichi-nightly	taichi-nightly-cuda-10-0	taichi-nightly-cuda-10-1

# Python 3.6/3.7 needed

# CPU only. No GPU/CUDA needed. (Linux, OS X and Windows)
python3 -m pip install taichi-nightly

# With GPU (CUDA 10.0) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-0

# With GPU (CUDA 10.1) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-1

	Linux (CUDA)	OS X (10.14+)	Windows
Build
PyPI

Contribution Guidelines

Short-term goals

(Done) Fully implement the LLVM backend to replace the legacy source-to-source C++/CUDA backends (By Dec 2019)
- The only missing features compared to the old source-to-source backends:
  - Vectorization on CPUs. Given most users who want performance are using GPUs (CUDA), this is given low priority.
  - Automatic shared memory utilization. Postponed until Feb/March 2020.
(Done) Redesign & reimplement (GPU) memory allocator (by the end of Jan 2020)
(WIP) Tune the performance of the LLVM backend to match that of the legacy source-to-source backends (Hopefully by mid Feb, 2020. Current progress: setting up/tuning for final benchmarks)

Updates

(Feb 14, 2020) v0.5.0 released with a new Apple Metal GPU backend for Mac OS X users! (by Ye Kuang [k-ye])
- Just initialize your program with ti.init(..., arch=ti.metal) and run Taichi on your Mac GPUs!
- A few takeaways if you do want to use the Metal backend:
  - For now, the Metal backend only supports dense SNodes and 32-bit data types. It doesn't support ti.random() or print().
  - Pre-2015 models may encounter some undefined behaviors under certain conditions (e.g. read-after-write). According to our tests, it seems like the memory order on a single GPU thread could go inconsistent on these models.
  - The [] operator in Python is slow in the current implementation. If you need to do a large number of reads, consider dumping all the data to a numpy array via to_numpy() as a workaround. For writes, consider first generating the data into a numpy array, then copying that to the Taichi variables as a whole.
  - Do NOT expect a performance boost yet, and we are still profiling and tuning the new backend. (So far we only saw a big performance improvement on a 2015 MBP 13-inch model.)
(Feb 12, 2020) v0.4.6 released.
- (For compiler developers) An error will be raised when TAICHI_REPO_DIR is not a valid path (by Yubin Peng [archibate])
- Fixed a CUDA backend deadlock bug
- Added test selectors ti.require() and ti.archs_excluding() (by Ye Kuang [k-ye])
- ti.init(**kwargs) now takes a parameter debug=True/False, which turns on debug mode if true
- ... or use TI_DEBUG=1 to turn on debug mode non-intrusively
- Fixed ti.profiler_clear
- Added GUI.line(begin, end, color, radius) and ti.rgb_to_hex
- Renamed ti.trace (Matrix trace) to ti.tr. ti.trace is now for logging with ti.TRACE level
- Fixed return value of ti test_cpp (thanks to Ye Kuang [k-ye])
- Raise default loggineg level to ti.INFO instead of trace to make the world quiter
- General performance/compatibility improvements
- Doc updated
(Feb 6, 2020) v0.4.5 released.
- ti.init(arch=..., print_ir=..., default_fp=..., default_ip=...) now supported. ti.cfg.xxx is deprecated
- Immediate data layout specification supported after ti.init. No need to wrap data layout definition with @ti.layout anymore (unless you intend to do so)
- ti.is_active, ti.deactivate, SNode.deactivate_all supported in the new LLVM x64/CUDA backend. Example
- Experimental Windows non-UTF-8 path fix (by Yubin Peng [archibate])
- ti.global_var (which duplicates ti.var) is removed
- ti.Matrix.rotation2d(angle) added
(Feb 5, 2020) v0.4.4 released.
- For developers: ffi-navigator support [doc]. (by masahi)
- Fixed f64 precision support of sin and cos on CUDA backends (by Kenneth Lozes [KLozes])
- Make Profiler print the arch name in its title (by Ye Kuang [k-ye])
- Tons of invisible contributions by Ye Kuang [k-ye], for the WIP Metal backend
- Profiler working on CPU devices. To enable, ti.cfg.enable_profiler = True. Call ti.profiler_print() to print kernel running times
- General performance improvements
(Feb 3, 2020) v0.4.3 released.
- GUI.circles 2.4x faster
- General performance improvements
(Feb 2, 2020) v0.4.2 released.
- GUI framerates are now more stable
- Optimized OffloadedRangeFor with const bounds. Light computation programs such as mpm88.py is 30% faster on CUDA due to reduced kernel launches
- Optimized CPU parallel range for performance
(Jan 31, 2020) v0.4.1 released.
- Fixed an autodiff bug introduced in v0.3.24. Please update if you are using Taichi differentiable programming.
- Updated Dockerfile (by Shenghang Tsai [jackalcooper])
- pbf2d.py visualization performance boosted (by Ye Kuang [k-ye])
- Fixed GlobalTemporaryStmt codegen
(Jan 30, 2020) v0.4.0 released.
- Memory allocator redesigned
- Struct-fors with pure dense data structures will be demoted into a range-for, which is faster since no element list generation is needed
- Python 3.5 support is dropped. Please use Python 3.6(pip)/3.7(pip)/3.8(Windows: pip; OS X & Linux: build from source) (by Chujie Zeng [Psycho7])
- ti.deactivate now supported on sparse data structures
- GUI.circles (batched circle drawing) performance improved by 30x
- Minor bug fixes (by Yubin Peng [archibate], Ye Kuang [k-ye])
- Doc updated
Full changelog

xumingkuan/taichi

Docs | Tutorial | DiffTaichi | Examples | Contribute | Forum

Contribution Guidelines

Related papers

Short-term goals

Updates