Parsing of the transaction description (SVZW, and other fields)
clorch opened this issue · 2 comments
First of all: thank you for this great project, I have been looking for something like this for several years!
With my bank the description field of (some) transactions look like this:
EREF+...MREF+...CRED+...SVWZ+...
I would like to parse this data to extract the actual "Verwendungszweck" (after SVZW+
) and the other fields. Since your MT940 parser merges fields 20 to 29 without any separator this becomes more difficult as necessary. As a workaround I added newlines: clorch/fints-hbci-php@2d27ba3
Nevertheless, I do not think this is the best or most elegant solution. Is there another possibility to access the raw field data? I am not that familiar with the world of banking data formats but maybe it is a good idea to add the parser for these fields to the library?
I have a similar "issue" - or better: feature request. I believe this occurs with some financial institutes only. I am testing with Deutsche Bank and almost each of the transaction descriptions looks like @clorch stated above:
EREF+...MREF+...CRED+...SVWZ+...ABWA...
The descriptions are even truncated as they are too long when concatenated. As far as I understand the MT940/942 format, these are codewords used in the tag 86. This resource indicates that the presence of these keywords depends on the transaction type and means the following:
/EREF/ End-to-End Reference
/KREF/ Client / Orderer Reference
/MREF/ Mandate Reference where available
/PREF/ Payment Reference
/CRED/ Creditor ID
/DEBT/ Debtor ID
/ORDP/ Ordering Party Name and address of ordering party
/BENM/ Beneficiary
/ULTC/ Ultimate Creditor
/ULTD/ Ultimate Debtor
/REMI/ Remittance Information
/PURP/ Purpose Code Purpose code - currently only SEPA
/RTRN/ Return Reason Return reason code and narrative (if available)
/ACCW/ Counterparty Account and bank
/IBK/ Intermediary Bank BIC or local bank code
/OCMT/ Original Amount Only if not already shown in 61/9
/COAM/ Compensation Amount
/CHGS/ Charges Only if not already shown in 61/9
/EXCH/ Exchange Rate Only if not already shown in 61/9
I have created an approach where these keywords and their contents get isolated out of the description. Not tested extensively, but maybe a pointer into the right direction?
Here's the code (in lib/Fhp/Parser/MT940.php, insert after line 170)
// keywords to be isolated into separate results
$keywords = array(
'EREF', // End-to-End Reference
'KREF', // Client / Orderer Reference
'MREF', // Mandate Id
'PREF', // Payment Reference
'CRED', // Creditor ID
'DEBT', // Debtor ID
'ORDP', // Ordering Party
'BENM', // Beneficiary
'ULTC', // Ultimate Creditor
'ULTD', // Ultimate Debtor
'REMI', // Remittance Information
'PURP', // Purpose Code
'RTRN', // Return Reason
'ACCW', // Counterparty Account and bank
'IBK', // Intermediary Bank
'OCMT', // Original Amount
'COAM', // Compensation Amount
'CHGS', // Charges
'EXCH' // Exchange Rate
);
// split the concatenated description string into parts, including the keywords
$parts = preg_split('/(' . implode('|', $keywords) . ')\+/', $description, null, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
// restructure result array (must be empty before!)
unset($result);
$result = array();
// use item n as key and item n+1 as value
for ($i=0, $n=count($parts)-1; $i<$n; $i+=2) {
$result[$parts[$i]] = $parts[$i+1];
}
I have some code for this that also preserves spaces at the end of lines (those are usually trimmed off, but you can re-insert them in most cases because the lines have a fixed length of 27+3 characters).
#27
The open problem that the code relies on detecting the + character only could be resolved by incorporating @larsgrau's keyword-list and regex approach.