Abstract— Botnets represent one ofthe most aggressive cyber security threats faced by organizations as theyprovide different platforms for many illegal activities like distributed denialof service attacks, click frauds, phishing and malware dissemination. Variety oftechniques which use different feature set are proposed for effective botnettraffic classification and analyses but several challenges remain unaddressedsuch as the effect of feature set of Network flow exporter. In this paper we explorean open source Network traffic flow exporter (with a set of features) using differentprotocol filters. We evaluated that the use of flow exporter and protocolfilters indeed affect the performance of botnet traffic classification. Keywords—Botnet,cyber security, flow exporter, protocol filter, traffic classification. I. Introduction A botnet is a collection of compromised computersconnected over internet and remotely controlled by botmaster.
The individual compromisedmachines are called bots. Botnets are created to conduct different maliciousactivities like distributed denial of service (DDoS) attacks, click-fraudscams, spreading spam, stealing victims personal information and takingadvantage of users significant computational resources by using malicious bots 1.The bots keep updating themselves and are controlled by botmaster to carry outmalicious instructions for different illegal activities.
Hence with significantlyincreasing high rate of reported infections and illegal activities, the botnetscontribute a serious threat against cyber security. The significant aspect of botnets architecture includecommunication scheme, which has highly evolved overthe years that enhanced botnet functionality and avoid botnet detection. The architecture includes the compromisedbots that communicate with command and control (C&C) server to fetchinstructions from botmaster. Botnets used the Internet Relay Chat (IRC)protocol for communication until early 2000s. However, the IRC-based bots are highlyvulnerable as they use centralized topology architecture. The complete botnetnetwork can be disrupted just by shutting down the IRC server.
Also, the messages may easily get reveled bycontinuous monitoring of network traffic and further research can be done on capturedmessages from packets. Since 2003, the botnets evolved and started using moresophisticated techniques that involved use of decentralized topologyarchitecture such as peer-to-peer (P2P) and different ubiquitous protocols suchas DNS and HTTP. The P2P communication scheme involves individual bots that actas both client and server, making it more effective without any fixedcentralized point that could be exploited. However, the P2P botnet topologyalso has its limitation that includes higher latency underlying in the commandand control transmission which further impacts the bots synchronization.
Theuse of various techniques like encryption and fluxing has also helped botnetsto avoid detection. Therefore, botnet identification and detection have becomehighly challenging. Many botnet detection approaches have been proposed thatinvolve network traffic analysis classification.
Some of the research in thiscategory focus to build a generalized model for botnet detection where asothers focuses on specific types of botnets. In Early 2000, mostly the proposedsystems included specifically botnets using IRC 2. However the recentresearch is more focused on P2P and HTTP based botnets 3 4. The botnetmonitoring and detection techniques used for botnet classification should beactive and continuous as the botnets use automatic update mechanisms. Also, it potentiallyenables them to learn new patterns and help in adapting to any changes in botnetevolution. Therefore, machine learning techniques (i.
e., classification andclustering) are an effective apt solution which can be deployed. To enableautomatic pattern recognition for meaningful representation of network trafficanalysis, the clustering and classification are used. Hence, the mostsignificant component of these systems is meaningful feature (attribute) extractionfrom network traffic. It is verychallenging to extract these features. Thus to end this, various botnetdetection and analysis systems have proposed their own feature sets thatrepresent network traffic which consists of the network packets. The networkpackets is mainly divided into two major parts: 1) packet header, that containscontrol information of protocols being used over network, and 2) packetpayload, which contain the application information being used over the network.
Some of the botnet detection and analysis approaches use network packet headers4, where as others use packet payload methods 5. Flow based featureextraction methods are commonly used by the approaches that rely on packetheaders 4. In these approaches, the traffic communication packets areaggregated into flows and later the statistics are computed. The flow exportersare used for generating flows and extracting such features. However, variousbotnets use encryption techniques to hide the identity and avoid the detectionsystems which analyze the packet payload for embedded communicationinformation. Thus, the flow exporters are very effective because they summarizethe traffic using only network packet headers. Hence, the open source flowexporter along with machine learning technique is used for performing effectivebotnet traffic classification. II.
BACKGROUDAND RELATED WORKThe bots are the vulnerable hosts that are infected by theself-propagating malwares called bot program and are designed to performvarious malicious activities. The botmaster controls the infected bots network knownas botnet. Initially, the infected bots receive the commands from the botmasterby C&C medium and perform malicious operations like DDoS, phishing,spamming, identity theft attacks and stealing user’s significant information 1. The bot uses five stages to create andmaintain a botnet 1. The first stage includes the infection stage, where theattacker infects the victim by exploiting the existing vulnerabilities bydifferent exploitation techniques.
The second stage includes the secondaryinjection, where the shell code is executed on the infected machine to get theimage of bot binary. This bot binary then itself installs on the infectedmachine and later gets converted to a bot. The third stage involves the connection,the bot binary establishes the C channel which is used by the botmaster. Thefourth stage, after the connection is established then the malicious stagestarts where the botmaster sends the commands to the botnet. The fifth stageincludes the updating and maintenance of bots by botmaster. A. Related WorkAlthough a significant amount of research work has beendone on botnet detection but botnet detection techniques using network traffic flowanalysis approach have only emerged in the last few years.Gu et al.
developed the BotMiner that detects botnets whichuses the group behavior analysis approach. It uses a clustering approach tofind similar C communication behavior and makes clusters, later employs Snort6. The data set included non malicious data from the campus network andmalicious data from running bot binaries in a sandbox environment. The capturedtraffic files are converted into flows and flow exporter included the featuressuch as the total number of packets per flow, average number of bytes perpacket and average number of bytes per second.
The result showed that theBotMiner could detect botnets with detection rate (DRs) between 75% and 100%. Strayer et al. proposed an IRC botnet detection systemwhich used machine learning techniques (classification and clustering) 2. Firstlythe classification technique is used to filter the chat type of traffic andlater the clustering technique is used to find the group activities in the filteredtraffic. Lastly, the analyzer was applied to the cluster for botnet detection. Thedata set used was gathered from a controlled testbed running bot binary. They evaluatedthe classifiers against a multidimensional flow correlation technique which wasdesigned and proposed. Zeidanloo et al.
developed a detection system that focusedon P2P and IRC-based botnets 5. By using filtering, classification, andclustering approaches, it focused to detect botnets group behavior in a given trafficfile. A flow based technique was used to analyze traffic and payload inspectionwas deployed for traffic filtering.Zhao et al. investigated a botnet detection system basedon flow intervals 3. The flow features of captured traffic packets wereemployed with Bayesian networks and decision tree classifiers to detect thebotnets. They evaluated and analyzed the normal and malicious attack traffic.
Theresult showed DRs over 90% with the false positive rates (FPRs) under 5%. Haddadi et al.proposed the botnet detection approach based on botnet traffic analysis 4. Byestablishing the HTTP and DNS communication with the publicly available domainnames of botnet C server and legitimate web server, the normal and malicioustraffic was generated. Netflow with machine learning algorithm was proposed todetect the botnets. Results achieved 97% DR and 3% FPR.
The recent literature work for botnet detection focusesmore on the P2P and HTTP protocols 4. This includes using different datamining or machine learning techniques such as neural networks, decision trees,or statistical methods that used flow features. Mostly the normal traffic filesare integrated with attack traffic file to evaluate the performance of theproposed botnet detection systems. At last, this paper is aimed to use the features exportedby open source flow exporter and analyzing the flow exporter’s effect on theperformance of botnet classification. III. METHODOLOGYEarly literature botnet traffic analysis work used somenetwork flow information, which included packet headers. Most of them focus oncertain type of protocols such as HTTP and DNS.
This indicates use of protocolfiltering in analyzing traffic data. No packet payload related information isincorporated in it. The possibility of detecting botnets by using only featuresextracted from the traffic flow is explored.A. Traffic Data SetThe traffic files obtained from botnets that used HTTPprotocol as the communication protocol or HTTP based P2P topology that looklike normal HTTP traffic are used for analysis.
The botnet traffic files publicallyavailable at NETRESEC 7 and Snort 8 website are employed for carrying outthe research.