Monday, December 10, 2018

Bank Real-Time Fraud Detection with Flink

                      Bank Fraud Detection with Flink 


        Apache Flink is an open source stream processing framework. The core of Flink is a distributed streaming data flow engine, however, it also provides support for batch processing, graph processing and iterative processing in ML. Flink is the next generation engine for stream processing and is proved to be even faster than Spark, which is a well-known data process engine specialized for batch processing. So, in terms of speed,  Flink > Spark >> Hadoop.  Actually, industry giants like Alibaba, Uber, Yelp... are using Flink as their stream process engine! Check this out get part of the list.

    Here, I want to present an example that how to use Flink's stream processing API to detect bank fraud transactions on a real-time basis.

   First, Let's us define what is a fraud transaction in our example here, in other words, list what kind of transaction see what we need to detect:

   1. If the customer in a transaction is an alarmed customer.

   2. If the credit card is a reported lost card.

   3. The number of transactions is more than 10 in a 1-minute window.

   4. City of transactions is changed more than 2 times in a 1-minute window.


If one of 4 conditions are met, we consider that is a potential fraud transaction and then raise an alarm.

 Then, let's see the data set samples in this example.

1. Transaction data: 

Transaction_ID, TimeStamp, City, Customer_ID, accountNumber, CreditCardNumber, Amount 
OXGT865174,2018-06-14 23:41:27,Surat,id_741yce,yc43363383, 126028343460,        457
UXJT427774,2018-06-14 23:41:37,Pune,id_741yce,yc43363383,   126028343460,        155
FIPW460992,2018-06-14 23:42:15,Surat,id_741yce,yc43363383,   126028343460,        555


2. Alarmed Customer: 

Customer_ID, AccountNumber
id_158gsb,      gs38055270
id_710tar,        ta12685664
id_433zlh,       zl94914183

3. Lost Credit Card: 

CreditCardNumber, TimeStamp,        Name,    Progress
124460647673,2018-01-12 21:41:18,Allison,  Investigating
122018058677,2018-02-13 20:41:19,Arthur,   Pending with customer
128958965781,2018-03-24 22:21:29,Ana,       Complaint registered


After we are clear with the task and be familiar with the data, eventually, we could dive into the Flink logic :D

The first thing we will do is to broadcast alarmed customer file and lost credit card file because they are relatively lightweight and we want to every node in our distributed system have to access this two important file.



Noto the POJO means Plain Old Java Object.

Then, we start by getting the current streaming execution environment:



Get and broadcast the 3 kinds of data we discussed above:




Note that Flink has the built-in broadcast method, which makes this task easy for us.

Then, the first check against alarmed customers



Here is the AlamedCustCheck class in the last line of the above code:




Check the second condition:







The actually checking logic is in the LostCardCheck class here:




The third task will form a 1 min window to detect more than 10 transactions on the same credit card:



Again the FilterAndMapMoreThan10 class here:



The last condition to check city change more than 1 time in 1 min window:



Lastly, we want to combine all the suspicious transactions together with the union method that in Flink and execute our application:




The complete code is in my GitHub Repo if you would like to check them out :-)



Enjoy Learning :D
Ye Jiang


1 comment: