Analyzing MobileHunter
Background
Investigative journalists from NDR, Süddeutsche Zeitung (SZ, for short), and multiple other international teams of journalists managed to get hold of a surveillance app that is used by Chinese border officials to scan mobile phones of people entering the country from Kyrgysztan. This sparked a collaboration with my advisor Prof. Thorsten Holz, who leads the Chair for Systems Security at Ruhr-Universität Bochum, and myself to unveil the inner workings of the app and figure out what exactly the app searches for.
This post aims to shed some more light on our results.
Note: A few days prior to this publication it has come to our attention that the penetration testers at Cure53 were independently tasked with analyzing this very application. As they have a track record of excellent work, please make sure to check out their report as well.
App Structure
We were given the app’s installer for Android, which came in form of a regular APK file.
The app itself has an interesting structure – in addition to the regular, to-be-expected Java layer, it comes with a set of binary assets, listed in the following:
File Name | Description |
---|---|
bk_samples.bin | Encrypted file |
gen_wifi_cj_flag[_pie] | ELF executable |
getVirAccount | ELF executable |
id.conf | Text file containing regular expressions |
terrorism_apps.csv | Empty file (for our APK) |
wifiscan[_pie] | ELF executable |
One thing worth noting is that it ships both PIE and non-PIE versions of its executables. PIE stands for position-independent executable and has been enforced on Android since version 5.0 (Lollipop) in late 2014. In shipping non-PIE binaries as well, the authors ensure to maintain compatibility with older versions of Android that do not support PIE yet (namely, any version prior to Android 4.1).
As we shall see later, the binaries are one of the more interesting components of this app. We will revisit these in a bit, but let’s first get a bird’s-eye view on how the app operates.
Dynamic Analysis
For our initial analysis, we used a Huawei P10 running Android 7.0 and installed the APK. The phone contains a few contacts and only comes with the pre-installed apps.
In order to prevent any outbound connections, we first ran the app in a shielding box and activated airplane mode. We were greeted with the following screen:
Obviously, the app seems to be expecting an active connection to a particular WiFi hotspot. Dimly in the background, one can make out that it wants to connect to a local IP in the 192.168.43.*
subnet. Later analysis revealed that it attempts to connect to 192.168.43.1
on port 8080. (In more detail: it obtains the local IP and sets the last octet to .1
, but explicitly checks for the 192.168.43.*
subnet when deciding whether to issue this warning.)
Luckily, the app does not seem to connect to any external servers which eased our analysis.
Checking the Phone
Conceptually, the app is rather simple: one button offers us to start checking our phone and another one allows us to uninstall the app after it has performed its task. In part, this super simple setup might be the attempt to ease the work of the border officials: given an unlocked phone, all they need to do is install the APK, connect to their WiFi hotspot and start scanning the phone with the press of a button.
Once the scan has completed, the app tries to upload a report to the local WiFi hotspot. The main screen now gives additional details, such as the number of files scanned, and lets us know that no suspicious files were “hitted” [sic]. This is no surprise since our phone is virtually empty. As we are in airplane mode, the app complains that it cannot upload the report and offers to try again. At this point, we set up a WiFi hotspot listening on the aboove-mentioned address and captured the resulting report, which comes in form of a regular ZIP file.
Initial Report
Even without any notable content on our phone, the report already contains a plethora of information:
File Name | Description |
---|---|
app_list | List of installed applications |
AppParse.prop | Hardware information (model, CPU, board, hardware, and device) |
Calendar.xml | List of calendar entries |
Contact.xml | List of contacts |
contact{n}.jpg | Contact pictures, sequentially numbered |
Dialing.xml | List of phone calls (esp. name, number, duration, time, and duration) |
Messages.xml | List of SMS messages |
phone.txt | Empty, as our phone does not have a SIM equipped |
PhoneData.cha | General information about the device and the scan itself |
report.html | Formatted HTML report with a subset of the data above |
In the following, we will discuss two more interesting entries, app_list
and PhoneData.cha
.
Contents of app_list
This file is created with the help of Android’s PackageManager
API and lists for each installed application the app name, package name, version, code size, time of installation, path, and MD5 hash. The analysis is thorough enough to even include itself:
蜂采 com.fiberhome.wifiserver installed 1.0 4041059 1561643875 /data/app/com.fiberhome.wifiserver-1/base.apk 1 null 8ddb342f2da5408402d7568af21e29f9 null
Contents of PhoneData.cha
and report.html
Amongst other information, the PhoneData.cha
file contains the phone’s manufacturer, model, Android version, WiFi and Bluetooth MAC addresses, IMEI, and (if a SIM card is present) its IMSI. It also performs a rudimentary root detection by checking for the presence of either /System/bin/su
or /System/xbin/su
.
Some of this information, along with the messages, contact, and dialing logs is duplicated in report.html
.
Interestingly, PhoneData
also contains an entry labelled DeviceName
with the value MobileHunter. In contrast, the rendered report.html
is captioned CellHunter Reporter. We can assume either of these strings indicate the original application’s name.
Static Analysis
Even though the initial report already contains a disturbingly detailed compilation of sensitive information on our phone, there is more to the story.
In order to verify the information in the report, we decompiled the Java layer of the app. Fortunately, there were no attempts made to obfuscate the code or hinder analysis in any other way. For the most part, the code is straight-forward: it uses well-known Android APIs to collect the information provided in the report. At this point, however, we were more interested in how the binary assets we discovered early on were used by the app.
Helper Binaries
Soon, we came across the following piece of code:
String string2 = WelcomeActivity.this.getResources().getString(2131165184);
if (string2.contains("true")) {
if (Build.VERSION.SDK_INT >= 16) {
ShellCommands.doSuCmds("sh", Global.absolutefilesPath_ + "/wifiscan_pie sm " + WelcomeActivity.this.sdP + " 2>" + Global.absolutefilesPath_ + "/error_file 1>" + Global.esnPath_ + "scandir_temp");
} else {
ShellCommands.doSuCmds("sh", Global.absolutefilesPath_ + "/wifiscan sm " + WelcomeActivity.this.sdP + " 2>" + Global.absolutefilesPath_ + "/error_file 1>" + Global.esnPath_ + "scandir_temp");
}
}
if (!"true/false".equals(string2)) {
ShellCommands.doSuCmds("sh", Global.absolutefilesPath_ + "/getVirAccount " + Global.absolutefilesPath_ + "/id.conf " + Global.esnPath_ + "app_account");
}
Based on configuration values found in its resources, the app spawns two of the binary files found in its data directory: wifiscan
and getVirAccount
. Notably, for the wifiscan
binary, it chooses between the PIE and non-PIE variants based on Android’s SDK level, which serves as an indicator as to whether PIE is supported by the installed Android version.
The choice to directly invoke helper programs instead of using Java Native Interfaces strikes us as odd. Further, it remains unclear for which reason native binaries are used after all – we can only suspect this might be due to performance issues or even the hope of obscuring the logic a bit more, as native code is a bit harder to analyze than plain decompiled Java code.
During startup, the app looks up several important locations which are used when invoking the helper binaries:
this.sdP
is initialized to all known SD card paths (i.e.,EXTERNAL_STORAGE
, followed by a space and the contents ofSECONDARY_STORAGE
, if any).Global.absolutefilesPath_
, on the other hand, is initialized to the app’s data directory holding its assets,/data/data/com.fiberhome.wifiserver/
.Global.esnPath_
points to the path where the report is stored.
In summary, wifiscan
is passed the parameter sm
as well as all known SD card paths. Its output is redirected to a file called scandir_temp
which will ultimately be added to the report. The app invokes getVirAccount
by passing the path to its configuration, id.conf
, and an output path to a file named app_account
that is also included in the final report.
getVirAccount
getVirAccount
is a stripped 32-bit ELF executable for ARM (EABI5). Although the industry standard disassembler, IDA Pro, recognizes a bit more than 1,200 functions, the binary itself isn’t too complex. Most complexity stems from the fact that it is a C++ executable and makes heavy use of the STL.
Interestingly enough, none of the associated binary files support any other architecture than ARM. Arguably, the vast majority of Android devices ship with an ARM processor nowadays, but this app might still miss some of the more obscure devices.
The binary spends the majority of its time parsing its configuration file, id.conf
. As it turns out, this file is rather self-explanatory and the binary more or less does exactly what one would expect it to do:
#包名\t路径名\t获取方式
#获取方式DIR FILE FILE_CONTENT
com.tencent.mobileqq tencent/MobileQQ/ DIR (^[1-9][0-9]+)
com.tencent.mobileqq Tencent/MobileQQ/ DIR (^[1-9][0-9]+)
com.tencent.mobileqq tencent/QWallet/ DIR (^[1-9][0-9]+)
com.tencent.mobileqq Tencent/QWallet/ DIR (^[1-9][0-9]+)
com.renren.mobile.android Android/data/com.renren.mobile.android/cache/talk_log/ FILE talk_log_([0-9]+)_.*
com.duowan.mobile yymobile/logs/sdklog/ FILE_CONTENT logs-yypush_.*txt safeParseInt ([0-9]*)
com.immomo.momo immomo/users/ DIR (^[1-9][0-9]+)
cn.com.fetion Fetion/Fetion/ DIR (^[1-9][0-9]+)
com.alibaba.android.babylon Android/data/com.alibaba.android.babylon/cache/dataCache/ FILE (^[1-9][0-9]+)
#"phone":"18551411***"
com.sdu.didi.psnger Android/data/com.sdu.didi.psnger/files/omega FILE_CONTENT e.cache "phone":"([0-9]*)"
#aaaa
com.sankuai.meituan Android/data/com.sankuai.meituan/files/elephent/im/ DIR (^[1-9][0-9]+)
com.sogou.map.android.maps Android/data/com.sogou.map.android.maps/cache/ FILE_CONTENT cache "a":"([^"]*)"
#com.sina.weibo loginname=red***@163.com&
com.sina.weibo sina/weibo/weibolog/ FILE_CONTENT sinalog.*txt loginname=([^&]*)&
Lines starting with #
indicate a comment. All other lines fall into one of the following categories:
Type | Description |
---|---|
DIR | Extract the name of the directory inside the given path. |
FILE | Extract the name of the file inside the given path. |
FILE_CONTENT | Extract the contents of a specific file. |
Each line provides a package name of the app in question, the path to its associated data (and, for FILE_CONTENTS
, a file name) as well as a regular expression that extracts a part of the name or content. Each match yields a line in the report file app_account
, containing the package name followed by the extracted identifier.
Overall, this mechanism is used to extract account identifiers of various popular chinese social media apps. Notably, no passwords, session tokens, or other kinds of information are extracted that would allow logging into the account. We can only speculate that account identifiers may be correlated to potentially suspicious account activity by another mechanism external to this app.
Finally, the omission of Tencent’s popular messaging app WeChat is interesting; especially since other Tencent apps are already included in the configuration.
wifiscan
The wifiscan
PIE and non-PIE binaries both are, just as getVirAccount
, 32-bit ELF executables for ARM EABI5. They are written in C++ as well and make use of the STL. As parameters, they accept a set of modes as well as paths that are to be scanned.
The first thing that comes to our attention is the fact that the binary supports more modes than are actually hardcoded on the Java side (s
and m
; presumably scan and match). In the following, we will describe the binary’s behavior for this configuration and will get back to the other modes shortly.
bk_samples.bin
After verifying its command line arguments, the binary soon starts reading in the binary file bk_samples.bin
which we discovered earlier in the assets directory of the app. As evident from the code, as well as debug outputs which were helpfully left in the binary, the binary file is encrypted and needs to be decrypted first. The encryption scheme is a pretty standard AES implementation, with some minor adjustments such as obscuring the static symmetric key; presumably to make extracting it a bit harder.
What is interesting about the symmetric key is that it is twice as large as actually used by the AES implementation. Since the key is an ASCII representation of hexadecimal digits, we are lead to believe that the authors originally meant to convert the key into a binary representation instead of simply throwing away the second half. This is a common mistake when handling different representations of binary data.
The decrypted bk_samples.bin
starts as follows:
135510055
E624931E72EB7D0736B8E43BE9BBA4B6
8765440
3A78017C9F0B948EE8B99F7CD9D0A359
868352
16FB644579B95CB73B80C75C381D14AC
2029879
790F89DDD4C74C5C97F59BB32C5E64F3
5210112
B229B6C4DDB12C59E3D2F061179A1B4B
59172363
12FEBEDF9B5F31469629244DC3444F96
...
Further analysis makes it obvious that this is a database of file sizes along with the expected MD5 hash of the file. Every two lines form a database entry, a tuple (size, hash)
. It is used to somewhat uniquely describe a file with a size of size
bytes and the MD5 checksum hash
.
Overall, this database consists of no less than 73,315 entries. (Side note: The authors could have reduced its file size tremendously by storing it as binary data instead of plain ASCII.)
Matching Process
The database is then used as follows: for every path that was passed to the wifiscan
binary, a recursive directory traversal is performed. If any file is visited which has a size that is present in the database, its MD5 checksum is computed. If the checksum matches the MD5 checksum for that particular entry, the file is being considered a hit. By checking the file size before actually computing the MD5 the process is being sped up considerably.
For every hit, metadata is printed that is later stored in the report. Also, the main Java code issues a beep tone to indicate to the border officer that a suspicious file has been observed and the number of matches is counted in the UI:
Metadata printed for each match includes the following:
- the number of the match,
- its file name,
- its size in bytes,
- its full path,
- its MD5 checksum,
- its extension,
- its last modification time, and
- its last access time.
The output for a match might look like the following:
3 1fa261535eb0a3ad53ab499c93a40092f919db25374d081e1aa22a703df48a50.pdf 5460831 /storage/emulated/0/1fa261535eb0a3ad53ab499c93a40092f919db25374d081e1aa22a703df48a50.pdf B9AA0AB31F184EE23A336B4B3B804835 pdf 1561639203 1561639202
With this information, the border officer can exactly pinpoint the suspicious file on the users phone, including how frequently it was actually used.
Deletion of pe*.apk
During the directory traversal, if the binary ever encounters a file whose file name starts with pe
and ends on .apk
, the file is deleted.
For a regular user, it is somewhat unlikely to find APKs stored on the SD card. This leads us to believe that this could be some sort of cleanup mechanism by the app: assuming the installer for this very app was named according to the above-mentioned scheme, a simple run of the app would suffice to delete its own installer that was previously downloaded to its SD card. When subsequently pressing the Uninstall button, nearly no trace of the app would be left on the users phone.
cjlog.txt
Having performed the scan of the phone’s file, the app writes an obfuscated log file to a folder called Android
on the SD card. It contains the fact whether any suspicious files have been hit as well as the timestamp of the last scan. Even after uninstalling the app, this file remains on the phone.
To prevent exposing this information directly, the file is obfuscated by generating randomness via lrand48
(seeded by the infamous time(0)
, a weak source of entropy), hashing it using MD5, and finally xor-combining it with the actual data. The randomness is stored alongside the obfuscated bytes.
While this scheme can be easily reversed, there actually is no need to do so: the app ships a dedicated program called gen_wifi_cj_flag
that performs this task. It generates a file called cjlog_plain.txt
.
Unused Functionality
The wifiscan
binary contains more functionality than is used by the outer Java layer, the reason for which we can only speculate about. For one, it has additional code that handles entries in bk_samples.bin
with a size of above one MiB (this holds for 47,221 entries in total; for mode D
). We did not look further into this as it was not used in our configuration.
Additionally, the binary provides even more scanning modes; namely p
(pictures), v
(videos), and d
(documents, respectively). Each mode is assigned a hardcoded list of file extensions:
Type | Extensions |
---|---|
Pictures | .tiff , .tif , .png , .jpg , .bmp , .jpeg , .cr2 , .gif |
Videos | .3gp , .aac , .amr , .flac , .m4a , .asf , .wmv , .avi , .flv , .f4v , .f4a , .f4b , .f4p , .riff , .mkv , .mk3d , .mka , .mks , .mov , .qt , .mpeg , .mpg , .m2p , .ps , .mp4 , .m4a , .m4p , .m4b , .m4r , .m4vi , .ogg , .ogv , .oga , .ogx , .spx , .opus , .rmvb , .rm , .dvd , .mts , .swf , .mp3 , .m4v , .wav |
Documents | .txt , .doc , .docx , .pdf , .ppt , .pptx , .xls , .xlsx , .zip , .rar , .xml , .apk |
During scanning, these files are not matched against the database, but their name is simply printed if any of the extensions match.
Recovering Entries
Faced with a list of opaque MD5 hashes, we were wondering what content the app actually searched for. The smallest file that is referenced in the database has a size of 31 byte (with the MD5 hash C0B8B4D706388E31C453B993015DF521
), which is still way beyond what we can hope to brute-force in due time. We started compiling some interesting word lists, but these didn’t help either. Fortunately for us, all hashes in the database are regular MD5 hashes, i.e., the algorithm uses default initialization values. Hence, we can query external databases of known MD5 hashes – the most interesting being VirusTotal. Although its primary use case lies on malicious files, the sheer amount of data uploaded to it makes it one of the more promising collections of files and their accompanying checksums.
Together with the team of investigative journalists we queried VirusTotal. In the end, we managed to identify more than 1,400 files. While this might sound like a lot of files, this still only accounts for roughly 1.9% of the database. The investigative team, together with colleagues from The Guardian and The New York Times, then analyzed and categorized the content we unveiled in more detail.
Quickly, it became apparent that much material is related to Islamist propaganda. This is no surprise, as we initially also discovered an asset aptly named terrorism_apps.csv
. The file, however, was empty in the version of the app we analyzed. Still, their efforts also uncovered the presence of a document about the Dalai Lama as well as a file containing rock music of a Japanese band.
We’d like to refer the interested reader to the publications of the respective investigative teams which will discuss their findings in more detail.
Conclusion
Albeit the app conceptually is rather simple, it collects a plethora of personal data. While we were able to get a glimpse into which data the app collects and, moreover, which files it searches for, the majority still remains unknown at this point.
Related Links
- Report by Cure53
- Link to the APK (via Joseph Cox)
- Article by Süddeutsche Zeitung (German)
- Article by tagesschau (German)
- Article by Motherboard
- Article by New York Times
- Article by The Guardian
- Press release by Ruhr-Universität Bochum
- Press release by Horst-Görtz Institute for IT Security
- Analysis of related MFSocket app by @fs0c131y
Acknowledgements
We would like to thank David Rupprecht for aiding with the hardware setup.
Revision History
Date | Description |
---|---|
2019-07-04 | Clarified IP handling, added links |
2019-07-03 | Updated affiliation, added links |
2019-07-02 | Initial version |