/xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.

Primary LanguagePythonApache License 2.0Apache-2.0


PyPI Latest Release License Coverage Build Status Doc Slack Twitter

What is Xorbits?

Xorbits is an open-source computing framework that makes it easy to scale data science and machine learning workloads — from data preprocessing to tuning, training, and model serving. Xorbits can leverage multi-cores or GPUs to accelerate computation on a single machine or scale out up to thousands of machines to support processing terabytes of data and training or serving large models.

Xorbits provides a suite of best-in-class libraries for data scientists and machine learning practitioners. Xorbits provides the capability to scale tasks without the necessity for extensive knowledge of infrastructure.

Xorbits features a familiar Python API that supports a variety of libraries, including pandas, NumPy, PyTorch, XGBoost, etc. With a simple modification of just one line of code, your pandas workflow can be seamlessly scaled using Xorbits:


Why Xorbits?

As ML and AI workloads continue to grow in complexity, the computational demands soar high. Even though single-node development environments like your laptop provide convenience, but they fall short when it comes to accommodating these scaling demands.

Seamlessly scale your workflow from laptop to cluster

To use Xorbits, you do not need to specify how to distribute the data or even know how many cores your system has. You can keep using your existing notebooks and still enjoy a significant speed boost from Xorbits, even on your laptop.

Process large datasets that pandas can't

Xorbits can leverage all of your computational cores. It is especially beneficial for handling larger datasets, where pandas may slow down or run out of memory.

Lightning-fast speed

According to our benchmark tests, Xorbits surpasses other popular pandas API frameworks in speed and scalability. See our performance comparison , explanation and research paper.

Leverage the Python ecosystem with native integrations

Xorbits aims to take full advantage of the entire ML ecosystem, offering native integration with pandas and other libraries.

Where to get it?

The source code is currently hosted on GitHub at: https://github.com/xorbitsai/xorbits

Binary installers for the latest released version are available at the Python Package Index (PyPI).

# PyPI
pip install xorbits

Other resources

License

Apache 2

Roadmaps

The main goals we want to achieve in the future include the following:

  • Transitioning from pandas native to arrow native for data storage
    will reduce the memory cost substantially and is more friendly for compute engine.
  • Introducing native engines that leverage technologies like vectorization and codegen to accelerate computations.
  • Scale as many libraries and algorithms as possible!

More detailed roadmaps will be revealed soon. Stay tuned!

Relationship with Mars

The creators of Xorbits are mainly those of Mars, and we currently built Xorbits on Mars to reduce duplicated work, but the vision of Xorbits suggests that it's not appropriate to put everything on Mars. Instead, we need a new project to support the roadmaps better. In the future, we will replace some core internal components with other upcoming ones we will propose. Stay tuned!

Getting involved

Platform Purpose
Github Issues Reporting bugs and filing feature requests.
StackOverflow Asking questions about how to use Xorbits.
Slack Collaborating with other Xorbits users.

Citing Xorbits

If Xorbits could help you, please cite our paper using the following metadata:

@inproceedings{lu2024Xorbits,
  title = {Xorbits: Automating Operator Tiling for Distributed Data Science},
  shorttitle = {Xorbits},
  booktitle = {2024 {{IEEE}} 40th {{International Conference}} on {{Data Engineering}} ({{ICDE}})},
  author = {Lu, Weizheng and He, Kaisheng and Qin, Xuye and Li, Chengjie and Wang, Zhong and Yuan, Tao and Liao, Xia and Zhang, Feng and Chen, Yueguo and Du, Xiaoyong},
  year = {2024},
  month = may,
  pages = {5211--5223},
  issn = {2375-026X},
  doi = {10.1109/ICDE60146.2024.00392},
}