Bank Fraud Detection with Flink
Here, I want to present an example that how to use Flink's stream processing API to detect bank fraud transactions on a real-time basis.
First, Let's us define what is a fraud transaction in our example here, in other words, list what kind of transaction see what we need to detect:
1. If the customer in a transaction is an alarmed customer.
2. If the credit card is a reported lost card.
3. The number of transactions is more than 10 in a 1-minute window.
4. City of transactions is changed more than 2 times in a 1-minute window.
If one of 4 conditions are met, we consider that is a potential fraud transaction and then raise an alarm.
Then, let's see the data set samples in this example.
1. Transaction data:
Transaction_ID, TimeStamp, City, Customer_ID, accountNumber, CreditCardNumber, Amount
OXGT865174,2018-06-14 23:41:27,Surat,id_741yce,yc43363383, 126028343460, 457
UXJT427774,2018-06-14 23:41:37,Pune,id_741yce,yc43363383, 126028343460, 155
FIPW460992,2018-06-14 23:42:15,Surat,id_741yce,yc43363383, 126028343460, 555
2. Alarmed Customer:
Customer_ID, AccountNumber
id_158gsb, gs38055270
id_710tar, ta12685664
id_433zlh, zl94914183
3. Lost Credit Card:
CreditCardNumber, TimeStamp, Name, Progress
124460647673,2018-01-12 21:41:18,Allison, Investigating
122018058677,2018-02-13 20:41:19,Arthur, Pending with customer
128958965781,2018-03-24 22:21:29,Ana, Complaint registered
After we are clear with the task and be familiar with the data, eventually, we could dive into the Flink logic :D
The first thing we will do is to broadcast alarmed customer file and lost credit card file because they are relatively lightweight and we want to every node in our distributed system have to access this two important file.
Noto the POJO means Plain Old Java Object.
Then, we start by getting the current streaming execution environment:
Get and broadcast the 3 kinds of data we discussed above:
Note that Flink has the built-in broadcast method, which makes this task easy for us.
Then, the first check against alarmed customers
Here is the AlamedCustCheck class in the last line of the above code:
Check the second condition:
The actually checking logic is in the LostCardCheck class here:
The third task will form a 1 min window to detect more than 10 transactions on the same credit card:
Again the FilterAndMapMoreThan10 class here:
The last condition to check city change more than 1 time in 1 min window:
Lastly, we want to combine all the suspicious transactions together with the union method that in Flink and execute our application:
The complete code is in my GitHub Repo if you would like to check them out :-)
Enjoy Learning :D
Ye Jiang