Predicting Cyber Crimes using Confusion Matrix in Classification

So first of all lets understand,

What is Cyber Crime ?

Rarely, cybercrime aims to damage computers for reasons other than profit. These could be political or personal.

Examples of the different types of cybercrime :

  • Identity fraud.
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack).
  • Ransomware attacks (a type of cyberextortion).
  • Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
  • Cyberespionage (where hackers access government or company data).

Most cybercrime falls under two main categories :

  • Criminal activity that uses computers to commit other crimes.

What is a Confusion Matrix?

For a binary classification use case, a Confusion Matrix is a 2×2 matrix which is as shown below.

Lets, Understand Terms from above table one by one,

  1. TN(True Negative): Machine predicted cyber-attack happened and this is right attack actually happened.

2. TP(True Positive): Machine predicted cyber-attack hasn’t happened and this is right actually attack hasn’t happened.

3. FP(False Positive): Machine predicted attack hasn’t happened and this is the wrong result actually cyber-attack has happened. , FP also called a Type 1error.

4. FN(False Negative): Machine predicted attack happened and this is the wrong result actually cyber-attack hasn’t happened. FP also called a Type 2 error.

Confusion Matrix gives two types of errors :-

From our confusion matrix, we can calculate five different metrics measuring the validity of our model.

  1. Accuracy = (TP + TN) /( TP + TN + FP + FN)
  2. Misclassification = (FP + FN )/( TP + TN + FP + FN)
  3. Precision = TP / (TP + FP)
  4. Sensitivity aka Recall = TP /( TP + FN)
  5. Specificity =TN / (TN + FP)

Why Actually we need ML for Predicting Cyber Crimes in Todays Era?

At present, there is no generalized framework is available to categorize cybercrime offenses by feature extraction of the cases. In the present work, data analysis and machine learning are incorporated to build a cybercrime detection and analytics system. The proposed system’s design and implementation utilize classification, clustering and supervised algorithms. Here, naive Bayes is used for classification and k-means are used for clustering . For feature extraction in the proposed work, the TFIDF or tf-idf vector process is used . This developed methodology is based on 4 phases that are applied to the data, which are reconnaissance, preprocessing, data clustering and classification and prediction analysis.

ARTH Learner | Bigdata — Hadoop | Linux | Front-End Developer | Flask | Coding Enthusiast | Python | AWS | Ansible | Kubernetes