pyexcel - Let you focus on data, instead of file formats

https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png

https://api.bountysource.com/badge/team?team_id=288537

https://travis-ci.org/pyexcel/pyexcel.svg?branch=master

https://readthedocs.org/projects/pyexcel/badge/?version=latest

Support the project

If your company has embedded pyexcel and its components into a revenue generating product, please support me on patreon or bounty source to maintain the project and develop it further.

If you are an individual, you are welcome to support me too and for however long you feel like. As my backer, you will receive early access to pyexcel related contents.

And your issues will get prioritized if you would like to become my patreon as pyexcel pro user.

With your financial support, I will be able to invest a little bit more time in coding, documentation and writing interesting posts.

Known constraints

Fonts, colors and charts are not supported.

Introduction

Feature Highlights

One application programming interface(API) to handle multiple data sources:
- physical file
- memory file
- SQLAlchemy table
- Django Model
- Python data structures: dictionary, records and array
One API to read and write data in various excel file formats.
For large data sets, data streaming are supported. A genenerator can be returned to you. Checkout iget_records, iget_array, isave_as and isave_book_as.

Installation

You can install pyexcel via pip:

$ pip install pyexcel

or clone it and install it:

$ git clone https://github.com/pyexcel/pyexcel.git
$ cd pyexcel
$ python setup.py install

Usage

Please note that you will have to use '.sortable.html' in order to replicate the example.

>>> # pip install pyexcel-text==0.2.7.1
>>> import pyexcel as p
>>> ccs_insight2 = p.Sheet()
>>> ccs_insight2.name = "Worldwide Mobile Phone Shipments (Billions), 2017-2021"
>>> ccs_insight2.ndjson = """
... {"year": ["2017", "2018", "2019", "2020", "2021"]}
... {"smart phones": [1.53, 1.64, 1.74, 1.82, 1.90]}
... {"feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]}
... """.strip()
>>> ccs_insight2
pyexcel sheet:
+----------------+------+------+------+------+------+
| year           | 2017 | 2018 | 2019 | 2020 | 2021 |
+----------------+------+------+------+------+------+
| smart phones   | 1.53 | 1.64 | 1.74 | 1.82 | 1.9  |
+----------------+------+------+------+------+------+
| feature phones | 0.46 | 0.38 | 0.3  | 0.23 | 0.17 |
+----------------+------+------+------+------+------+

Suppose you have the following data in a dictionary:

Name	Age
Adam	28
Beatrice	29
Ceri	30
Dean	26

you can easily save it into an excel file using the following code:

>>> import pyexcel
>>> # make sure you had pyexcel-xls installed
>>> a_list_of_dictionaries = [
...     {
...         "Name": 'Adam',
...         "Age": 28
...     },
...     {
...         "Name": 'Beatrice',
...         "Age": 29
...     },
...     {
...         "Name": 'Ceri',
...         "Age": 30
...     },
...     {
...         "Name": 'Dean',
...         "Age": 26
...     }
... ]
>>> pyexcel.save_as(records=a_list_of_dictionaries, dest_file_name="your_file.xls")

And here's how to obtain the records:

>>> import pyexcel as p
>>> records = p.iget_records(file_name="your_file.xls")
>>> for record in records:
...     print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26
>>> p.free_resources()

Advanced usage 🔥

If you are dealing with big data, please consider these usages:

>>> def increase_everyones_age(generator):
...     for row in generator:
...         row['Age'] += 1
...         yield row
>>> def duplicate_each_record(generator):
...     for row in generator:
...         yield row
...         yield row
>>> records = p.iget_records(file_name="your_file.xls")
>>> io=p.isave_as(records=duplicate_each_record(increase_everyones_age(records)),
...     dest_file_type='csv', dest_lineterminator='\n')
>>> print(io.getvalue())
Age,Name
29,Adam
29,Adam
30,Beatrice
30,Beatrice
31,Ceri
31,Ceri
27,Dean
27,Dean
<BLANKLINE>

Two advantages of above method:

Add as many wrapping functions as you want.
Constant memory consumption

Available Plugins

A list of file formats supported by external plugins

Package name	Supported file formats	Dependencies	Python versions
pyexcel-io	csv, csvz [1], tsv, tsvz [2]		2.6, 2.7, 3.3, 3.4, 3.5, 3.6 pypy
pyexcel-xls	xls, xlsx(read only), xlsm(read only)	xlrd, xlwt	same as above
pyexcel-xlsx	xlsx	openpyxl	same as above
pyexcel-ods3	ods	pyexcel-ezodf, lxml	2.6, 2.7, 3.3, 3.4 3.5, 3.6
pyexcel-ods	ods	odfpy	same as above

Dedicated file reader and writers

Package name	Supported file formats	Dependencies	Python versions
pyexcel-xlsxw	xlsx(write only)	XlsxWriter	Python 2 and 3
pyexcel-xlsxr	xlsx(read only)	lxml	same as above
pyexcel-odsr	read only for ods, fods	lxml	same as above
pyexcel-htmlr	html(read only)	lxml,html5lib	same as above

Other data renderers

Package name	Supported file formats	Dependencies	Python versions
pyexcel-text	write only:rst, mediawiki, html, latex, grid, pipe, orgtbl, plain simple read only: ndjson r/w: json	tabulate	2.6, 2.7, 3.3, 3.4 3.5, 3.6, pypy
pyexcel-handsontable	handsontable in html	handsontable	same as above
pyexcel-pygal	svg chart	pygal	2.7, 3.3, 3.4, 3.5 3.6, pypy
pyexcel-sortable	sortable table in html	csvtotable	same as above
pyexcel-gantt	gantt chart in html	frappe-gantt	except pypy, same as above

In order to manage the list of plugins installed, you need to use pip to add or remove a plugin. When you use virtualenv, you can have different plugins per virtual environment. In the situation where you have multiple plugins that does the same thing in your environment, you need to tell pyexcel which plugin to use per function call. For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. You need to append get_array(..., library='pyexcel-odsr').

Footnotes

[1]	zipped csv file

[2]	zipped tsv file

Acknowledgement

All great work have been done by odf, ezodf, xlrd, xlwt, tabulate and other individual developers. This library unites only the data access code.

.. testcode::
   :hide:

   >>> import os
   >>> os.unlink("your_file.xls")

License

New BSD License

vangoah20/pyexcel