/pypi_malregistry

The repository has collected about 3k malicious pypi packages. This dataset is the work of the ASE 2023 paper "An Empirical Study of Malicious Code In PyPI Ecosystem". Of course, we will continue to expand the dataset.

Primary LanguagePython

Dataset Size

This data set includes about 3000 versions of the source code of 2415 malicious packages.

Dataset Format

package name -> version -> source code zip file.

Example: ython-binance -> 0.1 -> ython-binance-0.1.tar.gz

Reference

This dataset is the work of the ASE 2023 paper "An Empirical Study of Malicious Code In PyPI Ecosystem"

@misc{guo2023empirical,
      title={An Empirical Study of Malicious Code In PyPI Ecosystem}, 
      author={Wenbo Guo and Zhengzi Xu and Chengwei Liu and Cheng Huang and Yong Fang and Yang Liu},
      year={2023},
      eprint={2309.11021},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}