/topicmodel-tools-packaging-tests

Playing around with Stephen Hansen's text-mining-tutorial code, with the aim of getting the non-user-facing bits into a PyPi package

Primary LanguagePythonMIT LicenseMIT

topicmodel-tools-packaging-tests

Playing around with Stephen Hansen's text-mining-tutorial code (original repo https://github.com/sekhansen/text-mining-tutorial ), with the aim of getting the non-user-facing bits into a PyPi package

Building source and binary distributions.

Source distribution can be made by running (from this package's base directory) python setup.py sdist

This creates a tar.gz file in the dist/ directory that can then be uploaded to e.g. PyPi (see below for more info on this).

Binary wheel distribution can be made by running python setup.py bdist_wheel --universal (universal flag indicates it can work in both python 2 and python 3). This creates a .whl binary in the dist/ directory for upload to PyPi. Wheels are platform dependent.

Uploading to (Test)PyPi

[distutils]
index-servers=
    pypi
    testpypi


[pypi]
repository: https://upload.pypi.org/legacy/
username: PYPI_USERNAME
password: PYPI_PASSWD

[testpypi]
repository: https://test.pypi.org/legacy/
username: TESTPYPI_USERNAME
password: TESTPYPI_PASSWD

(substituting in the usernames/passwords you used when registering with (Test)PyPi).

  • Then you can upload with the command:

twine upload dist/topic-modelling-tools-0.1.dev0.tar.gz -r testpypi (for TestPyPi)

twine upload dist/topic-modelling-tools-0.1.dev0.tar.gz -r pypi (for PyPi).

After this, people should be able to install your package, with e.g.

pip install topic-modelling-tools (from regular PyPi)

pip install --index-url https://test.pypi.org/simple/topic-modelling-tools (from TestPyPi).

Note that if there are dependencies on other python packages, these packages might not be present in the TestPyPi repository, in which case pip install-ing from there might fail.

Unit tests

The basic functionality of the topicmodels library is tested by the suite of unit tests in topicmodel_tests. These can be run with the command: python setup.py test

Current status of package and distributions.

Stephen's original text-mining-tutorial package made use of GSL for faster random number generation. However, I have not yet managed to get setup.py to handle this correctly even for OSX (and it will be even more difficult to get it working on multiple platforms), so the current implementation just uses numpy.

Code has been modified in order to work in both python2 and python3 - the main changes involved were:

  • Changing xrange to range in many places.
  • Changing iteritems to items in a few places.
  • Changing map(func,list) to list(map(func,list)) in many places.

Unit tests have been written to test the reading in of a text file, the creation of the docsobj preprocessing object and some of its functionality (removing stopwords, stemming), and the creation and operation of the ldaobj sampling object.

A "wheel" for OSX, and a source dist for other platforms, have been uploaded to PyPi. Installing via pip install has been tested on a couple of different Macs, on an Ubuntu 16.04 VM, and on a Windows Server 2016 VM. For the Windows installation, it was necessary to install the Microsoft Visual Studio C++ Compiler (to build the library from the Cython file). (Note that pip install from TestPyPi fails, as various python dependencies are not present in TestPyPi).