Harnessing Language Models for Detection of Evasive Malicious Email Attachments

The HP Q3 2023 Threat Report [2] highlights that 80% of malware is delivered via email, with 12% bypassing detection technologies to reach endpoints. The 2023 Verizon Data Breach Report also indicates that 35% of ransomware infections originated from email. Two primary factors contribute to evasion: the volume and cost challenges of sandbox scanning, which lead to selective scanning and inadvertent bypasses, and the limitations of detection technologies like signature-based methods, sandbox[1] and machine learning, which rely on the final malicious payload for decision-making. However, evasive multi-stage malware and phishing URLs often lack malicious payload when analyzed by these technologies. Additionally, generative AI tools like FraudGPT and WormGPT facilitate the creation of new malicious payloads and phishing pages, further enabling malware to evade defenses and reach endpoints. 

To address the challenge of detecting evasive malware and malicious URLs without requiring the final malicious payload, we will share the detailed design of an Neural Analysis and Correlation Engine (NACE) specifically designed to detect malicious attachments by understanding the semantics of the email and leveraging them as features instead of relying on the final malicious payload for its decision making. The NACE harnesses a layered approach employing supervised and unsupervised AI-based models leveraging transformer-based architecture to derive deeper meaning embedded within the email's body, text in the attachment, and subject. 

We will first dive into the details of the semantics commonly used by threat actors to deliver malicious attachments, which lays the foundation of our approach. These details were derived from the analysis of a dataset of malicious emails. The text from the body of the email was extracted to create embeddings. UMAP aided in dimensionality reduction, and clusters were generated based on their density in the high-dimensional embedding space. These clusters represent different types of semantics employed by threat actors to deliver malicious attachments. 

In the presentation we will share the details of our approach in which every incoming email undergoes zero-shot semantic analysis, similarity analysis using LLM to determine if it contains semantics typically used by the threat actors to deliver malicious attachments. Additionally, email's body is further analyzed for secondary semantics, including tone, sentiment, and other nuanced elements. Once semantics are identified, hierarchical topic, phrase topic modeling is then applied to uncover the relationships between various topics. 

Primary and secondary semantics from the email, along with results from phrase hierarchical topic modeling, deep file parsing results of attachments, and email headers, are sent to the expert system. Contextual relationship between the features is used to derive the verdict of malicious and benign attachment without needing malicious payload. This comprehensive approach identifies malicious content without depending on the final payload, which is crucial for any detection technology. 

Our presentation will show how LLM models can effectively detect evasive malicious attachments without depending on the analysis of the malicious payload, which typically occurs in the later stages of attachment analysis. Our approach is exemplified by our success in defending against real-world threats, in actual production traffic including HTML smuggling campaigns, Obfuscated SVG , Phishing Links behind CDN, CAPTCHA, Downloaders, Redirectors. 

The presentation will conclude with results observed from the production traffic. 

 

References:

[1] Abhishek Singh, Zheng Bu, "Hot Knives Through Butter: Evading File-based Sandboxes.", Black Hat 2013.https://media.blackhat.com/us-13/US-13-Singh-Hot-Knives-Through-Butter-Evading-File-based-San dboxes-WP.pdf 

[2] HP 2023, Q3 Threat Insights Report, HP Wolf Security Threat Insights Report Q3 2023

 

About the Presenter: Abhishek Singh

Abhishek Singh is founder and CTO of InceptionCyber.ai. He is a security R&D leader with 15+ years of experience, passion, and a proven track record of driving AI and Cyber Security Research and Engineering, which solves complex problems resulting in a winning technology leading to revenue gains at Cisco, FireEye and Microsoft. He holds 41 patents, has authored 17 research papers, seven technical white papers, and contributed to three books. Patents and papers detail work in algorithms, Generative and Predictive AI based approaches to detect advanced threats, and architecture of technologies such as the virtual machine-based approach for threat analysis, EDR, RASP, DAST, Active Defense (Deception), email, web and IPS. 

Many algorithms and preventive features which Abhishek has designed are key concepts used in technologies like RASP and Active Defense (Deception). His notable recognitions include the following: 

  • 2019 Reboot Leadership Award (Innovators Category): SC Media 

  • Shortlisted for Virus Bulletin's 2018 Péter Szőr Award 

  • Cyber Security Professional of the Year - North America (Silver Winner) Cyber Security Excellence Awards 2020 

He holds a Double Master of Science in Computer Science and in Information Security from the prestigious College of Computing, Georgia Tech, B.Tech in Electrical Engineering from the prestigious Indian Institute of Technology IIT-BHU. He has also completed Master of Engineering Leadership (ELPP++) from UC Berkeley and Postgraduate AI and Deep Learning Courses from Indian Institute of Technology IIT-Guwahati. 

 

About the Presenter: Kalpesh Mantri

Kalpesh Mantri is a seasoned professional in the field of Cyber Security, with over a decade of experience. In his current role, he spearheads forward-thinking research projects and develops innovative prototypes, focusing on the investigation of email threats, particularly within the malspams landscape. 

Before his tenure at InceptionCyber.ai , Kalpesh served as a Senior Malware Analyst and Security Software Developer. His work concentrated on malware reversing, threat hunting, detection techniques, and Advanced Persistent Threat (APT) attack investigations. He has significantly contributed to the cybersecurity domain by aiding authorities in uncovering critical APT operations, including the notable 'Operation SideCopy' and 'Operation HoneyTrap,' which targeted defense sectors. 

Kalpesh is an active member of the cybersecurity community, frequently sharing his expertise at various security peer reviewed conferences such as Virus Bulletin, AVAR, and CARO Workshop. 

He has completed advanced AI and Machine Learning courses from prestigious IIM Kozhikode.

Next
Next

Cross-Medium Injection: Exploiting Laser Signals to Manipulate Voice-Controlled IoT Devices