Parsing of the transaction description (SVZW, and other fields)

Question

Parsing of the transaction description (SVZW, and other fields)

clorch opened this issue 8 years ago · 2 comments

First of all: thank you for this great project, I have been looking for something like this for several years!

With my bank the description field of (some) transactions look like this:
EREF+...MREF+...CRED+...SVWZ+...

I would like to parse this data to extract the actual "Verwendungszweck" (after SVZW+) and the other fields. Since your MT940 parser merges fields 20 to 29 without any separator this becomes more difficult as necessary. As a workaround I added newlines: clorch/fints-hbci-php@2d27ba3

Nevertheless, I do not think this is the best or most elegant solution. Is there another possibility to access the raw field data? I am not that familiar with the world of banking data formats but maybe it is a good idea to add the parser for these fields to the library?

Answer 1 · 2016-10-06T22:07:10.000Z

I have a similar "issue" - or better: feature request. I believe this occurs with some financial institutes only. I am testing with Deutsche Bank and almost each of the transaction descriptions looks like @clorch stated above:

EREF+...MREF+...CRED+...SVWZ+...ABWA...

The descriptions are even truncated as they are too long when concatenated. As far as I understand the MT940/942 format, these are codewords used in the tag 86. This resource indicates that the presence of these keywords depends on the transaction type and means the following:

/EREF/ End-to-End Reference
/KREF/ Client / Orderer Reference
/MREF/ Mandate Reference where available
/PREF/ Payment Reference
/CRED/ Creditor ID
/DEBT/ Debtor ID
/ORDP/ Ordering Party Name and address of ordering party
/BENM/ Beneficiary
/ULTC/ Ultimate Creditor
/ULTD/ Ultimate Debtor
/REMI/ Remittance Information
/PURP/ Purpose Code Purpose code - currently only SEPA
/RTRN/ Return Reason Return reason code and narrative (if available)
/ACCW/ Counterparty Account and bank
/IBK/ Intermediary Bank BIC or local bank code
/OCMT/ Original Amount Only if not already shown in 61/9
/COAM/ Compensation Amount
/CHGS/ Charges Only if not already shown in 61/9
/EXCH/ Exchange Rate Only if not already shown in 61/9

I have created an approach where these keywords and their contents get isolated out of the description. Not tested extensively, but maybe a pointer into the right direction?

Here's the code (in lib/Fhp/Parser/MT940.php, insert after line 170)

        // keywords to be isolated into separate results
        $keywords = array(
            'EREF',     // End-to-End Reference
            'KREF',     // Client / Orderer Reference
            'MREF',     // Mandate Id
            'PREF',     // Payment Reference
            'CRED',     // Creditor ID
            'DEBT',     // Debtor ID
            'ORDP',     // Ordering Party
            'BENM',     // Beneficiary
            'ULTC',     // Ultimate Creditor
            'ULTD',     // Ultimate Debtor
            'REMI',     // Remittance Information
            'PURP',     // Purpose Code
            'RTRN',     // Return Reason
            'ACCW',     // Counterparty Account and bank
            'IBK',      // Intermediary Bank
            'OCMT',     // Original Amount
            'COAM',     // Compensation Amount
            'CHGS',     // Charges
            'EXCH'      // Exchange Rate
        );

        // split the concatenated description string into parts, including the keywords
        $parts  = preg_split('/(' . implode('|', $keywords) . ')\+/', $description, null, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

        // restructure result array (must be empty before!)
        unset($result);
        $result = array();

        // use item n as key and item n+1 as value
        for ($i=0, $n=count($parts)-1; $i<$n; $i+=2) {
            $result[$parts[$i]] = $parts[$i+1];
        }

Answer 2 · 2016-11-13T19:46:36.000Z

I have some code for this that also preserves spaces at the end of lines (those are usually trimmed off, but you can re-insert them in most cases because the lines have a fixed length of 27+3 characters).
#27

The open problem that the code relies on detecting the + character only could be resolved by incorporating @larsgrau's keyword-list and regex approach.