September 19, 2022 HAWKEYE

Data Exfiltration and Detection through Anomaly Detection

Data exfiltration is the unauthorised transfer of critical and sensitive data and/or information from a targeted network to the cyber pests’ hideouts. Detecting data exfiltration is a difficult task because data flows in and out of networks on a regular basis, and this nefarious technique closely resembles normal network traffic.


Data can be collected by adversaries via encrypted or unencrypted channels. They can use pre-existing Command and Control (C2) channels to exfiltrate data. To exfiltrate data, they can use standard data transfer protocols such as FTP, SCP, and others. Alternatively, they can use non-standard protocols such as DNS, ICMP, and others with specially crafted fields to try to circumvent existing security technologies.

Data Preparation and Exfiltration Methods:

The following sections describe three techniques for exfiltration distribution,

  • Random distribution: The data in the data streams delivered to the attackers are randomized using this method. Random connections will be made, and random data transfers will take place to other malicious servers. In this situation, pattern recognition by detecting techniques is quite challenging. On the other hand, Attackers must recreate the data at the other end.
  • Round-robin distribution: Using this method of distribution, attackers using malware send each stream of packets containing exfiltrated data to various suspect servers. When the last server is reached, the next stream of packets is sent to the first server, and the cycle continues in a round-robin distribution fashion.
  • Single-server distribution: This distribution technique allows traffic to be routed to a single server rather than multiple servers. As a result, it adds no covertness to the exfiltration attempt.

Data Exfiltration Techniques:

  • DNS Tunnelling: DNS tunnelling is a type of cyber-attack that encodes command and control (C&C) messages or a small amount of data into ordinary DNS responses and queries. Naturally, DNS tunnelling needs a compromised system to have external network connectivity, as DNS tunnelling needs access to an internal DNS server with network access
  • Social Engineering and Phishing Attacks: Phishing emails are used by malicious outsiders to distribute malware and exfiltrate data. Using Social engineering techniques attackers make their emails look legitimate and appear to be from trusted senders. Users are then more likely to click on a link or download an attachment that exposes the organization to some malicious tool or malware.
  • Data exfiltration with ICMP: In ICMP exfiltration attackers transmit data as ICMP packets. For the organization’s security system, it will look like valid ICMP packets. There is nothing malicious about the structure of the packets and the data isn’t hidden inside the ICMP packet. So, observing the packet does not give any warning about exfiltration, it will look like normal packets.
  • Unauthorized data upload to public cloud storage servers:  Malware can use data uploads to benign-looking public cloud servers to exfiltrate data. If the network does not have adequate controls defined for application control and allows upload the public cloud storage services like google drive, it’s easy to exfiltrate sensitive data out of the organization.

Detection via Traditional Method:

To exfiltrate data, attackers typically use a Command & Control (C&C) channel, which is similar to a client-server architecture and allows threat actors and compromised hosts to communicate remotely (s). Because C&C can have a legitimate use, it cannot be dismissed solely because of the potential for malicious use.

The signature-based data exfiltration detection method detects malicious C&C channels by looking for known signature patterns. Signatures are created by developers based on known malware samples, and new network traffic is compared to these signatures to detect attacks. If signatures match new network traffic, it is classified as C&C traffic. This approach, however, is not foolproof because it is not possible to create signatures or policies to detect anomalous data-exfiltration events. Because current attack detection solutions are based primarily on signature detection and policy violation detection models, they lack the ability to detect data exfiltration.

Detection through Anomaly Detection:

Traffic Size:

Data exfiltration occurs between hosts in a network. Data transfer from hosts can occur either outside or within the network. Massive amounts of data are transferred from infected hosts in both cases. On most hosts, the incoming data outnumbers the outgoing data. However, data transfer is dependent on host behavior. Desktop hosts, for example, do not transmit more data outside the network, whereas this is not the case with servers such as web or FTP. Thus by monitoring the firewall logs, if the outbound traffic spikes a higher load of data transfer an anomaly gets detected.

Threat Intelligence:

Moreover, the parameters of the firewall log can be combined with the threat intelligence feeds for detecting connections to known malicious IPs/domains. Some of the free threat intelligence providers are discussed below,

  • MaxMind ASN: This service will provide the AS of the IP addresses we require. MaxMind provides a database that we can download and consult offline very quickly. With this function, we will be able to obtain the name of the company to which the IP addresses of the events belong.
  • MaxMind GeoIP: This offers an IP geolocation database that allows us to know the location of an IP address quite reliably. This can help in tracking the exfiltration of data to a particular location. Organizations maintain a list of countries for the exchange of sensitive data and if data exfiltration occurs to the location apart from this list an anomaly can be generated.
  • AbuseIPDB: : AbuseIPDB is a well-known domain and IP address reputation service. This can help to detect suspicious traffic directed to known malicious domains/IPs controlled by the attacker.

Machine Learning Approach

At HawkEye we use trend-based, signature-based and most efficiently Machine learning-based anomaly detection to detect and respond to data exfiltration threats.
More than often data exfiltration uses zero-day threats and it’s hard to detect zero-days based attacks on heuristic detection methods on a SIEM solution.
The machine learning model detects what is out of the norm regardless of the standard and corrects itself with experience. This is where machine learning provides an added value compared to traditional detection methods.

We developed an ML model which detects the user behaviour pattern and any anomalous behaviour compared to the normal behaviour will give an alert. For training our model we have used the organization’s metadata which shows the normal behaviour, and the ML model learns that pattern. This kind of learning is called unsupervised since it is very hard to categorize it as legitimate or illegitimate to do supervised learning. In unsupervised learning, it creates a mathematical representation among the data features themselves. After continuous model training, we are able to implement a successful model with very good accuracy.


, , , , ,


We welcome you to contact us for more information
about HAWKEYE - SOC As A Service.