/data-import

Import data from and export data to a range of different file formats and media

Primary LanguagePHPMIT LicenseMIT

Ddeboer Data Import library

Build Status Scrutinizer Quality Score Code Coverage Latest Stable Version

Introduction

This PHP library offers a way to read data from, and write data to, a range of file formats and media. Additionally, it includes tools to manipulate your data.

Features

  • Read from and write to CSV files, Excel files, databases, and more.
  • Convert between charsets, dates, strings and objects on the fly.
  • Build reusable and extensible import workflows.
  • Decoupled components that you can use on their own, such as a CSV reader and writer.
  • Well-tested code.

Installation

This library is available on Packagist. The recommended way to install it is through Composer:

$ composer require ddeboer/data-import:@stable

For integration with Symfony2 projects, the DdeboerDataImportBundle is available.

Usage

The workflow

Each data import revolves around the workflow and takes place along the following lines:

  1. Construct a reader.
  2. Construct a workflow and pass the reader to it. Add at least one writer to the workflow.
  3. Optionally, add filters, item converters and value converters to the workflow.
  4. Process the workflow. This will read the data from the reader, filter and convert the data, and write the output to each of the writers.

So, schematically:

use Ddeboer\DataImport\Workflow;
use Ddeboer\DataImport\Reader;
use Ddeboer\DataImport\Writer;
use Ddeboer\DataImport\Filter;

$reader = new Reader\...;
$workflow = new Workflow($reader);
$workflow->addWriter(new Writer\...());
    ->addWriter(new Writer\...())
    ->addFilter(new Filter\CallbackFilter(...))
    ->process();

Readers

This library includes a handful of readers:

  • An ArrayReader for reading arrays (or testing your workflow).
  • A CsvReader for reading CSV files, optimized to use a little memory as possible:
    use Ddeboer\DataImport\Reader\CsvReader;
    
    $reader = new CsvReader(new \SplFileObject('/path/to/csv_file.csv'));
  • A DbalReader to read data through Doctrine’s DBAL:
    use Ddeboer\DataImport\Reader\DbalReader;
    
    $reader = new DbalReader(
        $connection,    // Instance of \Doctrine\DBAL\Connection
        'SELECT u.id, u.username, g.name FROM `user` u INNER JOIN groups g ON u.group_id = g.id'
    );
  • A DoctrineReader to read data through the Doctrine ORM:
    use Ddeboer\DataImport\Reader\DoctrineReader;
    
    $reader = new DoctrineReader($entityManager, 'Your\Namespace\Entity\User');
  • An ExcelReader that acts as an adapter for the PHPExcel library:
    use Ddeboer\DataImport\Reader\ExcelReader;
    
    $reader = new ExcelReader(new \SplFileObject('/path/to/ecxel_file.xls'));
  • You can create your own data reader by implementing the ReaderInterface.

After you’ve set up your reader, construct the workflow from it:

$workflow = new Workflow($reader);

Writers

Many of the data writers closely resemble their reader counterparts:

  • An ArrayWriter.
  • A CsvWriter.
  • A DoctrineWriter.

Also available are:

  • A ConsoleProgressWriter that displays import progress when you start the workflow from the command-line:
    use Ddeboer\DataImport\Writer\ConsoleProgressWriter;
    use Symfony\Component\Console\Output\ConsoleOutput;
    
    $output = new ConsoleOutput(...);
    $progressWriter = new ConsoleProgressWriter($output, $reader);
    

Build your own writer by implementing the WriterInterface.

If you want, you can use multiple writers:

$workflow->addWriter($progressWriter)
    ->addWriter($csvWriter);

Filters

A filter decides whether data input is accepted into the import process. The library currently ships with a CallbackFilter:

use Ddeboer\DataImport\Filter\CallbackFilter;

// Don’t import The Beatles
$filter = new CallbackFilter(function($data) {
    if ('The Beatles' == $data['name']) {
        return false;
    } else {
        return true;
    }
});

$workflow->addFilter($filter);

Item converters

Value converters

Value converters are used to convert specific fields (e.g., columns in database).

  • A DateTimeValueConverter that converts a date representation in a format you specify into a DateTime object: ```php use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

    $converter = new DateTimeValueConverter('d/m/Y H:i:s');
    $workflow->addValueConverter('my_date_field', $converter);
    ```
    
  • A StringToObjectConverter that looks up an object in the database based on a string value:

    use Ddeboer\DataImport\ValueConverter\StringToObjectConverter;
    
    $converter = new StringToObjectConverter($repository, 'name');
    $workflow->addValueConverter('input_name', $converter);

An example

use Ddeboer\DataImport\Workflow;
use Ddeboer\DataImport\Source\HttpSource;
use Ddeboer\DataImport\Source\Filter\Unzip;
use Ddeboer\DataImport\Reader\CsvReader;
use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

(...)

// Create the source; here we use an HTTP one
$source = new HttpSource('http://www.opta.nl/download/registers/nummers_csv.zip');

// As the source file is zipped, we add an unzip filter
$source->addFilter(new Unzip('nummers.csv'));

// Retrieve the \SplFileObject from the source
$file = $source->getFile();

// Create and configure the reader
$csvReader = new CsvReader($file);
$csvReader->setHeaderRowNumber(0);

// Create the workflow
$workflow = new Workflow($csvReader);
$dateTimeConverter = new DateTimeValueConverter();

// Add converters to the workflow
$workflow
    ->addValueConverter('twn_datumbeschikking', $dateTimeConverter)
    ->addValueConverter('twn_datumeind', $dateTimeConverter)
    ->addValueConverter('datummutatie', $dateTimeConverter)

// You can also add closures as converters
    ->addValueConverter('twn_nummertm',
        new \Ddeboer\DataImport\ValueConverter\CallbackValueConverter(
            function($input) {
                return str_replace('-', '', $input);
            }
        )
    ->addValueConverter('twn_nummervan',
        new \Ddeboer\DataImport\ValueConverter\CallbackValueConverter(
            function($input) {
                return str_replace('-', '', $input);
            }
        )

// Use one of the writers supplied with this bundle, implement your own, or use
// a closure:
    ->addWriter(new \Ddeboer\DataImport\Writer\CallbackWriter(
        function($csvLine) {
            var_dump($csvLine);
        }
    ));

// Process the workflow
$workflow->process();

ArrayValueConverterMap

The ArrayValueConverterMap is used to filter values of a multi-level array.

The converters defined in the list are applied on every data-item's value that match the defined array_keys.

    //...
    $data = array(
        'products' => array(
            0 => array(
                'name' => 'some name',
                'price' => '€12,16',
            ),
            1 => array(
                'name' => 'some name',
                'price' => '€12,16',
            ),
        )
    );

    // ...
    // create the workflow and reader etc.
    // ...

    $workflow->addValueConverter(new ArrayValueConverterMap(array(
        'name' => array(new CharsetValueConverter('UTF-8', 'UTF-16')),  // encode to UTF-8
        'price' => array(new CallbackValueConverter(function($input) {  // remove € char
            return str_replace('', '', $intput);
        }),
    )));

    // ..
    // after filtering data looks as follows
    $data = array(
        'products' => array(
            0 => array(
                'name' => 'some name', // in UTF-8
                'price' => '12,16',
            ),
            1 => array(
                'name' => 'some name',
                'price' => '12,16',
            ),
        )
    );

GlobalMapping

The global-mapping allows you to define an array that is used to rename fields of an item.

Using global-mapping can be used to add renaming-rules for a multi-level array and is applied after the standard-mapping rules are applied.

    //...
    $data = array(
        0 => array(
            'foo' => 'bar',
            'baz' => array(
                'some' => 'value',
                'some2' => 'value'
            )
        )
    );

    // ...
    // create the workflow and reader etc.
    // ...

    // this defines a single mapping
    $workflow->addMapping('baz', 'bazinga');

    // this defines renaming global rules
    $workflow->setGlobalMapping(array(
        'foo' => 'fooloo',
        'bazinga' => array( // we need to use the new name here because global mapping is applied after standard mapping
            'some' => 'something',
            'some2' => 'somethingelse'
        )
    ));

    // ..
    // after filtering data looks as follows
    $data = array(
        0 => array(
            'fooloo' => 'bar',
            'bazinga' => array(
                'something' => 'value',
                'somethingelse' => 'value'
            )
        )
    );