/fluxify

A micro python library that retrieves and organizes data from a yaml mapping.

Primary LanguagePython

Fluxify

A Python package that eases the process of retrieving, organizing and altering data.

Required packages

  • pandas
  • imperium
  • ijson

Installation

pip install fluxify

Main classes

fluxify.mapper.Mapper

This class is used read and processing fast files with small amounts of data that can be loaded into memory.

fluxify.lazy_mapper.LazyMapper

You've probable guessed it, this class is used to iterate on large files of data wether it is of format CSV, JSON or XML.

Usage

Retrieve data from a simple CSV file

id,brand,price,state,published_at
938,Xaomi,390.90,used,2020-01-03 12:32:29
04593,iPhone,1299.90,new,2020-01-02 09:48:12

Mapper implementation

from fluxify.mapper import Mapper
import yaml

# Could also be loaded from a file
yamlmapping = """
brand:
    col: 1
price:
    col: 2
state:
    col: 3
publish_date:
    col: 4
    transformations:
        - { transformer: 'date', in_format: '%Y-%m-%d %H:%M:%S', out_format: '%H:%M %d/%m/%Y' }
is_new:
    conditions:
        -
            condition: "subject['state'] == 'new'"
            returnOnSuccess: True
            returnOnFail: False
"""

Map = yaml.load(yamlmapping, Loader=yaml.FullLoader)
mapper = Mapper(_type='csv')
data = mapper.map('path/to/csvfile.csv', Map)
print(data)

Output

[
    {
        'brand': 'Xaomi',
        'price': '390.90',
        'state': 'used',
        'published_date': '12:32 03/01/2020'
        'is_new': False
    },
    {
        'brand': 'iPhone',
        'price': '1299.90',
        'state': 'new',
        'published_date': '09:48 02/01/2020'
        'is_new': True
    }
]

LazyMapper implementation

The LazyMapper does not return all the mapped data at the end,
instead it maps the data in small sizes that you can specify in order to not max out the memory.

from fluxify.lazy_mapper import LazyMapper
import yaml

# Could also be loaded from a file
yamlmapping = """
brand:
    col: 1
price:
    col: 2
state:
    col: 3
publish_date:
    col: 4
    transformations:
        - { transformer: 'date', in_format: '%Y-%m-%d %H:%M:%S', out_format: '%H:%M %d/%m/%Y' }
is_new:
    conditions:
        -
            condition: "subject['state'] == 'new'"
            returnOnSuccess: True
            returnOnFail: False
"""

Map = yaml.load(yamlmapping, Loader=yaml.FullLoader)
mapper = LazyMapper(_type='csv', error_tolerance=True, bulksize=500)
mapper.map('path/to/csvfile.csv', Map)

def some_callback(results):
    for item in results:
        pass # Perform some action

mapper.set_callback(some_callback)

mapper.map('path/to/csvfile.csv', Map)

As you can see, in this example the mapper will call the callback function every time it accumulates 500 mapped items.

Mapping settings

col key is used to specify the column number or attribute name from where the value must be retrieved.
If you want to specify the input data as the retrieved value use _all_ as the value of col

transformations key is used to apply transformations to the retrieved value. Available transformers are listed below.

conditions key is used to apply conditions and alter the retrieved value.
These conditions are in Python syntax, but you may not use all of Python's native functions.
Available functions are listed below.

default is used to define a default value for when a retrieved value is null.
Warning: If the default key is defined with a value, it will be applied before applying transformations and conditions.

Special cases for JSON and XML

XML
Set the multiple to true if you want to retrieve data from multiple XML tags with the same name.
Use the index key with multiple: true if you wish to retrieve only one value from a number of XML tags.

When retrieving a XML value, the default behaviour is to retrieve the .text value of the tag.
If you wish to change this, to retrieve a tag containing many other tags, use raw key and set it to false.
This will return you an object of type xml.etree.Element, you could later apply transformations on this object to alter, organize and retrieve the data.

JSON
Use index key to retrieve a specific value from an array.
Of course, it only works if the retrieved value is of type array.

Supported formats

Format CSV JSON XML TXT
Supported YES YES YES NO

Transformers

Fluxify has built-in transformers that can alter/modify the data.

Function Arguments Description
number stringvalue Parses a string to an integer or float value
split delimiter, index Splits a string into parts with a delimiter and returns the splitted result if the index argument is not defined.
date in_format, out_format Let's you format a date string to the desired format.
replace search, new Replaces the search value with new value from string
boolean No arguments Parses a string to Boolean if the string contains [true
equipments_from_string delimiter Custom usage
options_from_string delimiter Custom usage

Exceptions

Fluxify has different Exception classes for different reasons They reside in the exceptions sub-package fluxify.exceptions

Class Arguments Description
ArgumentNotFoundException message This exception is raised whenever a argument is not found.
InvalidArgumentException message This exception is raised when a passed parameter/argument is invalid.
ConditionNotFoundException message This exception is raised when the "condition" key is not defined in the mapping.
UnsupportedTransformerException message This exception is raised when a transformer other than the ones defined above, is used.