IBM/TabFormer

Real culprits in the dataset?

abedshantti opened this issue · 1 comments

In the financial transaction dataset, who is considered to be the main source of transactional frauds?

A breakdown of the fraud transactions indicate that all 2,000 users were victims of fraud, whereas, around 3K merchants out of over 100K merchants were involved in fraudulent transactions. Does that mean these 3K merchants are considered to be the main culprits of credit card fraud in this dataset? Or is it assumed that the source of fraud are external parties that conceal their identities by using the merchants' details?

Also interestingly, some of those "shady" merchants have hundreds of thousands of transactions, however, only a smaller fractions of these are marked as fraud.

It is greatly appreciated if this can be clarified, as the paper does not cover the dataset description in detail.

The source of the financial frauds is a population of criminals who steal cards (or buy info on stolen cards from the dark web). These criminals may use the stolen cards at whichever merchants they choose. To use a stolen card does not require merchant complicity, and normally merchants are not complicit. Thus, as you indicate, "the source of fraud are external parties that conceal their identities."

As a result and as you also note, the criminals end up using the stolen cards at only 3K of the 100K merchants at which transactions are conducted.

There are more details on how the data is generated here: https://arxiv.org/abs/1910.03033

Please let us know if you have further comments or questions.

--Erik Altman