/taichi

Productive programming language for portable, high-performance, sparse & differentiable computing

Primary LanguageC++MIT LicenseMIT

Documentations Chat taichi-nightly taichi-nightly-cuda-10-0 taichi-nightly-cuda-10-1
Documentation Status Join the chat at https://gitter.im/taichi-dev/Lobby Downloads Downloads Downloads
# Python 3.6/3.7 needed

# CPU only. No GPU/CUDA needed. (Linux, OS X and Windows)
python3 -m pip install taichi-nightly

# With GPU (CUDA 10.0) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-0

# With GPU (CUDA 10.1) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-1
Linux (CUDA) OS X (10.14+) Windows
Build Build Status Build Status Build status
PyPI Build Status Build Status Build status

Related papers

Short-term goals

  • (Done) Fully implement the LLVM backend to replace the legacy source-to-source C++/CUDA backends (By Dec 2019)
    • The only missing features compared to the old source-to-source backends:
      • Vectorization on CPUs. Given most users who want performance are using GPUs (CUDA), this is given low priority.
      • Automatic shared memory utilization. Postponed until Feb/March 2020.
  • (Done) Redesign & reimplement (GPU) memory allocator (by the end of Jan 2020)
  • (WIP) Tune the performance of the LLVM backend to match that of the legacy source-to-source backends (Hopefully by mid Feb, 2020. Current progress: setting up/tuning for final benchmarks)

Updates

  • (Feb 14, 2020) v0.5.0 released with a new Apple Metal GPU backend for Mac OS X users! (by Ye Kuang [k-ye])

    • Just initialize your program with ti.init(..., arch=ti.metal) and run Taichi on your Mac GPUs!
    • A few takeaways if you do want to use the Metal backend:
      • For now, the Metal backend only supports dense SNodes and 32-bit data types. It doesn't support ti.random() or print().
      • Pre-2015 models may encounter some undefined behaviors under certain conditions (e.g. read-after-write). According to our tests, it seems like the memory order on a single GPU thread could go inconsistent on these models.
      • The [] operator in Python is slow in the current implementation. If you need to do a large number of reads, consider dumping all the data to a numpy array via to_numpy() as a workaround. For writes, consider first generating the data into a numpy array, then copying that to the Taichi variables as a whole.
      • Do NOT expect a performance boost yet, and we are still profiling and tuning the new backend. (So far we only saw a big performance improvement on a 2015 MBP 13-inch model.)
  • (Feb 12, 2020) v0.4.6 released.

    • (For compiler developers) An error will be raised when TAICHI_REPO_DIR is not a valid path (by Yubin Peng [archibate])
    • Fixed a CUDA backend deadlock bug
    • Added test selectors ti.require() and ti.archs_excluding() (by Ye Kuang [k-ye])
    • ti.init(**kwargs) now takes a parameter debug=True/False, which turns on debug mode if true
    • ... or use TI_DEBUG=1 to turn on debug mode non-intrusively
    • Fixed ti.profiler_clear
    • Added GUI.line(begin, end, color, radius) and ti.rgb_to_hex
    • Renamed ti.trace (Matrix trace) to ti.tr. ti.trace is now for logging with ti.TRACE level
    • Fixed return value of ti test_cpp (thanks to Ye Kuang [k-ye])
    • Raise default loggineg level to ti.INFO instead of trace to make the world quiter
    • General performance/compatibility improvements
    • Doc updated
  • (Feb 6, 2020) v0.4.5 released.

    • ti.init(arch=..., print_ir=..., default_fp=..., default_ip=...) now supported. ti.cfg.xxx is deprecated
    • Immediate data layout specification supported after ti.init. No need to wrap data layout definition with @ti.layout anymore (unless you intend to do so)
    • ti.is_active, ti.deactivate, SNode.deactivate_all supported in the new LLVM x64/CUDA backend. Example
    • Experimental Windows non-UTF-8 path fix (by Yubin Peng [archibate])
    • ti.global_var (which duplicates ti.var) is removed
    • ti.Matrix.rotation2d(angle) added
  • (Feb 5, 2020) v0.4.4 released.

    • For developers: ffi-navigator support [doc]. (by masahi)
    • Fixed f64 precision support of sin and cos on CUDA backends (by Kenneth Lozes [KLozes])
    • Make Profiler print the arch name in its title (by Ye Kuang [k-ye])
    • Tons of invisible contributions by Ye Kuang [k-ye], for the WIP Metal backend
    • Profiler working on CPU devices. To enable, ti.cfg.enable_profiler = True. Call ti.profiler_print() to print kernel running times
    • General performance improvements
  • (Feb 3, 2020) v0.4.3 released.

    • GUI.circles 2.4x faster
    • General performance improvements
  • (Feb 2, 2020) v0.4.2 released.

    • GUI framerates are now more stable
    • Optimized OffloadedRangeFor with const bounds. Light computation programs such as mpm88.py is 30% faster on CUDA due to reduced kernel launches
    • Optimized CPU parallel range for performance
  • (Jan 31, 2020) v0.4.1 released.

    • Fixed an autodiff bug introduced in v0.3.24. Please update if you are using Taichi differentiable programming.
    • Updated Dockerfile (by Shenghang Tsai [jackalcooper])
    • pbf2d.py visualization performance boosted (by Ye Kuang [k-ye])
    • Fixed GlobalTemporaryStmt codegen
  • (Jan 30, 2020) v0.4.0 released.

    • Memory allocator redesigned
    • Struct-fors with pure dense data structures will be demoted into a range-for, which is faster since no element list generation is needed
    • Python 3.5 support is dropped. Please use Python 3.6(pip)/3.7(pip)/3.8(Windows: pip; OS X & Linux: build from source) (by Chujie Zeng [Psycho7])
    • ti.deactivate now supported on sparse data structures
    • GUI.circles (batched circle drawing) performance improved by 30x
    • Minor bug fixes (by Yubin Peng [archibate], Ye Kuang [k-ye])
    • Doc updated
  • Full changelog