Pruning and Ranking Based Classifier for Efficient Detection of Android Malware Abstract Mobile devices that run Android operating system are widely used

Pruning and Ranking Based Classifier for Efficient Detection of Android Malware

Abstract
Mobile devices that run Android operating system are widely used. The applications running in Android mobiles can have malicious permissions due to malware. In other words, Android applications might spread malware which can sabotage valuable data. Therefore it is essential to have mechanism to classify malware and benign mobile applications running in Android phones. Since Android mobile applications run in the confines of mobile devices and associated servers, it is very challenging task to detect Android malware. Many solutions came into existence to detect malware applications. Of late Abawajy et al. proposed a technique known as Iterative Classifier Fusion System (ICFS) which employs classifiers iteratively with fusion to generate a final classifier for effective detection of malware. They combined NBtree classifier, Multilayer perception and LibSVM with polynomial kernel to achieve this. However, the system does not focus on reduction or pruning of Android application permissions so as to build a classifier that reduces time and space complexity. In the proposed system, a methodology is proposed that focuses on reduction or pruning of Android application permissions and ranking them in order to build a classifier that reduces time and space complexity. The classifier modelled with best ranked permissions can be representative of all permissions as least significant permissions are pruned to reduce search space. We built a prototype application to demonstrate proof of the concept. The experimental results revealed that the proposed system performs better in improving detection accuracy besides precision and recall measures.

Keywords – Malware, malware detection technique, pruning, ranking

INTRODUCTION
Android malware has become a potential risk to mobile applications. Due to the increase in the usage of Android mobile applications in the world, the attackers target it to spread malware. Malware is the malicious software that can cause damage to a system of mobile device in terms of removing data or denying a service and so on. Malware can be one of the forms of cyber security threats. It has history of damaging potential applications in the real world. The cyber security threat landscape is increased drastically with the presence of Android malware. The rationale behind this is that people of all walks of life started using Android smart phones for virtually any operation including banking and shopping.
As the mobile smart phones and associated sensors produce huge amount of data known as big data, it became crucial to protect Android applications from malware. Big data has become an important buzzword and there are enterprises that depend on the business intelligence acquired from big data for decision making. In this context, it is important to have an efficient mechanism to detect and prevent Android malware. Many approaches came into exultance as found in the literature. They include signature based approaches 1, commercial malware detection methods 2, detection methods that also support Dalvik byte code transformations 4, data flow path based solutions 5 and 6, characterization of malware through system calls 9 and a hybrid approach that combines both API-calls and permission based approaches. In 12 an iterative approach is followed with multiple classifiers to detect malware. From the literature it is found that the methods are focusing more on accuracy of the solution rather than the computational complexity. In this paper we proposed an approach that focuses more on reducing computational complexity and increasing accuracy of the detection method. Our contributions are as follows.
We proposed a framework to have a systematic approach in Android malware detection. It is based on the significant permission based permission reduction, pruning and ranking approaches.
We proposed an algorithm known as Permission Significance-based Pruning for Android Malware Detection (PSP-AMD) to build a classifier that provides accuracy of detection and reduces computational complexity.
We built a prototype application that demonstrates proof of the concept. The empirical results revealed the utility of the proposed solution which is light weight and focuses not only on the accuracy but also reduction of computational complexity.
The remainder of the paper is structured as follows. Section 2 provides review of literature. Section 3 presents the proposed system in detail. Section 4 presents experimental results while section 5 concludes the paper and provides future scope.

RELATED WORK
This section provides review of literature on the malware detection methods and related works. As the Android mobile platform became popular, adversaries are targeting spreading of malware through Android mobile apps. There is a good survey on the current methods to detect malware in Android applications is found in 1. There are signature based methods that are used to make use of malware signatures for detection. Signature based approaches are more prevalent among solutions available. Zhou et al. 2 studied commercial malware detection systems that are popular. Their studies revealed the fact that the detection rate of the method is between 20.2% and 78.6%. Similar kind of work is made in 3 where experimental results are provided for man popular anti-malware approaches associated with cloud. For many modern computers, the previous solutions were found inadequate. The work is to know whether the current anti-malware detection methods can handle Dalvik byte code transformations. Their experiments proved that there was further research needed to define methods to handle obfuscation. In 4 an advanced detection method that is behaviour-based is presented. It could prevent the vulnerability known as system-call injection. Asymptotic equi-partition property is used by their method in order to extract important call sequences to detect malware.
A framework for automated analysis for detection of malware in Android applications is proposed . The framework identified malicious behaviours automatically by simulating intent broadcasts and user-interface events. Both static and dynamic analyses are combined in 5 while 6 make use of data flow path to distinguish benign apps from malicious ones. They made experiments on a large dataset and found that there was classification accuracy of 96% with benign apps and 98% with malware apps. In 7 a new approach is proposed to make use of system call in order to detect malware by characterizing malware behaviour. In 8 a static approach is proposed based on the API-call based and permission-based approaches. It uses a multi-classifier system and follows a collaborative approach based on probability theory that combines decisions of multiple classifiers. There are many approaches that exist in the literature. There are ensemble-classifiers that utilize multiple approaches. In 7 pruning ensemble classifiers are studied.
A multi-level system is proposed for detection of Android malware while focuses on an iterative multi-tier ensemble classifiers to do the same. In 10 another multi-classifier system is built with high accuracy. These solutions have used multi-classifier systems to increase accuracy in the detection of malware when compared with the solutions that used single-classifiers. The problem with these systems is that they are very huge and cannot be directly used for smart phone applications. They need more processing power and storage capacities besides causing much communication overhead as explored in 9. These approaches focused on detection accuracy but they did not consider computational cost. There are some features of malware that are generally used to characterise them. In 11 multiple features are used to detect malware. They used feature selection algorithms to do so. There are some limitations in the feature selection methods too as they give emphasis on the algorithm that is specific. In 12 an iterative classifier fusion method is employed where multiple classifiers are involved. It is found to be complex and it can be optimized further. In this paper we proposed a light weight approach known as permission reduction, pruning and ranking based classifier for efficient detection of Android malware.

PROPOSED METHODOLOGY
In the proposed system, an alternative methodology is proposed which consists of reduction or pruning of Android application permissions and ranking them in order to build a classifier that reduces time and space complexity. The classifier modelled with best ranked permissions can be representative of all permissions as least significant permissions are pruned to reduce search space. Thus the proposed system is expected to have better performance besides minimizing overhead.
Methodology
A dataset of around 5000 malware Android apps are collected. There are around 135 permissions that can be used by any Android app. The 135 permissions are taken as reference list of permissions. An excerpt from the list of Android permissions is given in Listing 1.

Listing 1: Permissions used by one of the malicious app
Each dataset contains a list of permissions used. Once the dataset is loaded, the proposed system takes all malicious app names and corresponding permissions. Permission reduction, pruning and ranking based approach are used to build a classifier. Initially 135 total permissions are available. These permissions are subjected to pruning. Any permission which has least usage in the malicious apps is removed from the list. That way some of the permissions are removed from the master list of permissions. Then association rule mining is made on the malicious permissions of all apps. When there are associations (multiple permissions repeatedly occurring in apps), one of them will be treated as representative for all permissions in the given association while others are pruned from the master list of permissions.

Figure 1: Overview of the proposed methodology
Afterwards, the ranking of the remaining permissions in the master list is made based on the permissions present in all malicious apps. The best ranked permissions are retained in the master list while the poorly ranked ones are removed. Then a classifier is built to model malicious behaviour of Android apps. This classifier is used to test new apps to know whether they are malicious or genuine. The pseudo code of the proposed algorithm known as Permission Significance-based Pruning for Android Malware Detection (PSP-AMD) is as shown below.
Algorithm: PSP-AMD
Inputs: Malware apps M, master list of permissions MP
Output: Classifier for malware detection
Initialize malicious application permissions vector AP
Initialize map for holding apps and list of permissions MAP
For each malware app m from M
Extract permissions from m into AP
Add app name and AP to MAP
End For
For each permission p from MP
Analyze MAP for the presence of p
Remove p from MP if it has negligible frequency
End for
Perform association rule mining on permissions of malicious apps (MAP)
Prune representative permissions from MP
For each permission p from MP
Perform ranking for p in the MAP
Prune the permission p if its ranking is negligible
End for
Build a classifier using MP which contains pruned list of master list of permissions
Classifier is applied to new app to know whether it is malicious
The algorithm is built based on the proposed architecture as shown in Figure 1.
The algorithm is meant for building a classifier based on the significant permission selection pruning of unwanted permissions besides ranking. Two matrices representing malware apps (M) and benign apps (B) are used. The difference between them is computed as follows. The threshold value for the difference is set to?.
(size(M_j ) – size(B_j )| )/(min(size(?M ?_j),size(B_j )))