Research


  • Binary Software Security Retrofitting
  • Malware Offense & Defense
  • Web and Cloud Security
  • Formal Methods Code Validation
  • Cyberdeceptive Software Engineering

Sponsored Projects

  • PRISM: Proof-driv­en Re­fine­ment of In­fra­struc­ture Soft­ware Mod­els (GaTech/DARPA, 2021–2025): This proj­ect is part of DARPA's V-SPELLS pro­gram (Ver­i­fied Se­cu­ri­ty and Per­for­mance En­hance­ment of Large Leg­a­cy Soft­ware), which seeks to in­no­vate new tech­nol­o­gies for in­cre­men­tal­ly ver­i­fy­ing and hard­en­ing large leg­a­cy soft­ware sys­tems. UTD is de­vel­op­ing a ma­chine-checked for­mal meth­ods sys­tem that yields ma­chine-checked math­e­mat­i­cal proofs of se­cu­ri­ty and safe­ty for in­di­vid­u­al­ly hard­ened soft­ware com­po­nents.
  • GAMEPLAY: Graph Anal­y­sis for Mech­a­nized Ex­ploit-gen­er­a­tion and Patch­ing Lev­er­ag­ing hu­man As­sis­tance for im­proved Yield (UIC/DARPA, 2019–2022): This proj­ect is part of DARPA's Com­pu­ters and Hu­mans Ex­plor­ing Soft­ware Se­cu­ri­ty (CHESS) pro­gram, which is de­vel­op­ing next-gen­er­a­tion hu­man-machine col­lab­o­ra­tive frame­works for soft­ware vul­ner­a­bil­i­ty dis­cov­ery, ex­ploi­ta­tion, and re­me­di­a­tion.
  • Binary Ret­ro­fit­ting of Un­trust­ed Soft­ware for Se­cu­ri­ty (ONR, 2013–2022): Se­cu­ri­ty-sen­si­tive or­gan­i­za­tions, such as mil­i­tary a­gen­cies, face a dif­fi­cult chal­lenge when it comes to se­lect­ing good soft­ware. The most up-to-date, fea­ture-filled, and well-test­ed soft­ware tends to be com­mer­cial prod­ucts whose de­vel­op­ers pri­or­i­tize sales over se­cu­ri­ty. Our re­search is de­vel­op­ing al­go­rithms that can au­to­mat­i­cal­ly ret­ro­fit com­mer­cial, bi­na­ry soft­ware with aug­ment­ed se­cu­ri­ty and re­duced at­tack sur­fac­es de­fined by prod­uct con­sum­ers. This of­fers se­cu­ri­ty-con­scious con­sum­ers the best of both worlds—they get the rich fea­ture sets that come with mass-pro­duced soft­ware, plus the iron-clad, or­gan­i­za­tion-spe­cif­ic se­cu­ri­ty re­quired for mis­sion-crit­i­cal op­er­a­tions.
  • ARM Bi­na­ry Soft­ware Se­cu­ri­ty Hard­en­ing (Lock­heed Mar­tin/NSF IUCRC, 2018–2019): To­geth­er with Lock­heed Mar­tin, our lab is dis­cov­er­ing and de­vel­op­ing al­go­rithms for ret­ro­fit­ting se­cu­ri­ty-sen­si­tive ARM bi­na­ry soft­ware with ex­tra pro­tec­tions a­gainst code-re­use hi­jack­ing and ze­ro-day at­tacks.
Completed Projects (click box to expand)
  • Bottom-up For­mal Meth­ods Val­i­da­tion of Se­cure Mi­cro­ker­nel Soft­ware Com­po­nents (I-A-I/AFOSR, 2019–2020): Our re­search un­der this proj­ect is de­vel­op­ing for­mal val­i­da­tion meth­od­ol­o­gies and tools for prov­ing se­cu­ri­ty prop­er­ties of low-lev­el, source-free, bi­na­ry soft­ware as­so­ci­at­ed with the seL4 mi­cro­ker­nel.
  • ENCORE: EN­hanced pro­gram pro­tec­tion through COm­pil­er-RE­writ­er co­op­er­a­tion (NSF, 2015–2018): Soft­ware can be se­cured by its pro­duc­ers (e.g., by e­quip­ping com­pil­ers with ad­vanced code a­nal­y­ses so that they gen­er­ate more se­cure pro­grams from source code), or by its con­sum­ers (e.g., by ret­ro­fit­ting the com­piled bi­na­ry code with ex­tra se­cu­ri­ty en­hance­ments). Un­for­tu­nate­ly, both ap­proach­es suf­fer a trade-off: pro­duc­ers en­force pol­i­cies that gen­er­al­ize to all con­sum­ers, not those spe­cif­ic to in­di­vid­u­al con­sum­ers; where­as con­sum­ers lack source code, and there­fore can­not en­force pol­i­cies that re­quire that in­for­ma­tion. This proj­ect in­vents a mid­dle-ground tech­nol­o­gy where­by pro­duc­ers can re­lease soft­ware a­me­na­ble to late-stage ret­ro­fit­ting by con­sum­ers. This will fa­cil­i­tate a new par­a­digm of more se­cure, in­tend­ed-to-be-cus­tom­ized soft­ware.
  • Automated, Binary Evidence-based Attribution of Software Attacks (AFOSR, 2014–2018): A comprehensive defense against cyber-threats cannot be limited to merely detecting and weathering unrelenting streams of cyber-attacks. Staunching the flood of attacks requires defenders to identify who is responsible for the attacks, so that more aggressive action can be taken against persistent threats. Unfortunately, attacks are increasingly anonymous; attackers enjoy myriad strategies for concealing network routes. To better attribute anonymized attacks, our research is advancing the state-of-the-art in binary reverse-engineering, honey-potting, and secure data mining to uncover attacker-identifying clues in seas of binary-level attack data.

    In April 2014 our work made international news when we applied our research to create an antidote for the famous Heartbleed bug. See UTD's press release for more information.

  • Language-based Security for Polymorphic Malware Protection (NSF CAREER, 2011–2017): This project developed hybrid static-dynamic code analyses for malware detection and defense. It recasts traditionally static programming language-based security approaches, such as strong type-checking, model-checking, and in-lined reference monitoring, as hybrid static-dynamic algorithms that detect and prevent malicious program behaviors as the untrusted code executes. SIGNAL magazine, FEDcyber.com, and ACM Technews ran a news story about the project, and UTD publicized it in a press release.
  • Securing Web Advertisements (NSF Trustworthy Computing, 2011–2016): Web ads introduce unique challenges for end-to-end software security. This project developed tools and algorithms for malicious ad detection and trust negotiation at the level of ad developers, ad distributors, and ad recipients. The sidebar of this press release summarizes the project.
  • Reactively Adaptive Malware (AFOSR Active Defense, 2010–2014): Traditional polymorphic malware undergoes undirected (random) mutation as it propagates so that no two instances look exactly alike. This makes the malware harder to detect. This project examined more powerful directed mutation strategies that allow next-generation malware to reactively learn and adapt to deployed malware defenses. Anticipating this next generation of malware is critical for keeping pace with the cyber-security arms race. UTD devoted a press release to the project in May 2010. Subsequently, our work has been reported on by hundreds or thousands of news outlets. (See the "In the News" sidebar to the right for a few.) See also this UTD press release.
  • Certified, Automated In-lined Reference Monitors (AFOSR Young Investigator, 2008–2011): In-lined Reference Monitors (IRMs) automatically modify untrusted code to make it provably safe, rather than merely examining it to try to decide its safety purely statically. The approach is more powerful than purely static analysis, and more flexible than traditional OS- or VM-level execution monitoring. This project developed IRM and machine-verification systems for ActionScript (Flash) binaries, Java binaries, and x86 native code binaries.
  • Secure, Peer-to-peer Data Management (NSF EAGER, 2009–2011): Cloud computing is an increasingly essential paradigm for supporting management of large databases and distributed computations. This project develops fully decentralized data integrity, confidentiality, and privacy enforcement algorithms for cloud computing based on structured peer-to-peer networking protocols and economic theories of utility optimization and risk.

Publications

The fol­low­ing is a list of re­search pa­pers and the­ses that I've au­thored, co-au­thored, or su­per­vised. Each is pro­vid­ed in PDF form.

Conference & Workshop Publications

All of the fol­low­ing con­fer­ence and work­shop pub­li­ca­tions are peer-re­viewed. Ac­cept­ance rates are pro­vid­ed when­ever they are known (ex­cept for in­vit­ed pa­pers that un­der­went a sep­a­rate peer re­view proc­ess that does not per­tain to the con­fer­ence's gen­er­al ac­cept­ance rate).

  • Frederico Araujo, Sailik Sengupta, Jiyong Jang, Adam Doupé, Kevin W. Hamlen, and Subbarao Kambhampati. Software Deception Steering through Version Emulation. In Proceedings of the 54th Hawaii International Conference on System Sciences (HICSS), pp. 1988–1997, January 2021. [BibTeX] [acceptance rate: 47%]

    Determined cyber adversaries often strategize their attacks by carefully selecting high-value target machines that host insecure (e.g., unpatched) legacy software. In this paper, we propose a moving-target approach to thwart and countersurveil such adversaries, wherein live (non-decoy) enterprise software services are automatically modified to deceptively emulate vulnerable legacy versions that entice attackers. A game-theoretic framework chooses which emulated software stacks, versions, configurations, and vulnerabilities yield the best defensive payoffs and ost useful threat data given a specific attack model. The results show that effective movement strategies can be computed to account for pragmatic aspects of deception, such as the utility of various intelligence-gathering actions, impact of vulnerabilities, performance costs of patch deployment, complexity of exploits, and attacker profile.

  • Xiaoyu He, Erick Bauman, Feng Li, Lei Yu, Linyu Li, Bingchang Liu, Aihua Piao, Kevin W. Hamlen, Wei Huo, and Wei Zou. Exploiting the Trust Between Boundaries: Discovering Memory Corruptions in Printers via Driver-assisted Testing. In Proceedings of the 21st ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 74–84, June 2020. [BibTeX] [acceptance rate: 28%]

    TrustScope is a new, practical approach to identifying vulnerabilities in printer firmware without actually touching the firmware. By exploiting the trust between the firmware and the device drivers, TrustScope analyzes driver software to identify the driver endpoints that output the page description language (PDL) code to be sent to the printer, extracts key constraints for this output, generates new inputs violating these constraints, and fuzes the printer firmware with malicious PDL code composed with these inputs yet conforming to the grammar of the PDL accepted by the printer. To accommodate the black-box nature of printers, printer behavior is observed strictly externally, allowing TrustScope to detect more vulnerabilities than only those that produce crashes. A variety of key optimizations, such as fuzzing without consuming paper and ink, and offline test case generation, make printer vulnerability detection feasible and practical.

    An implementation of TrustScope tested with 8 different printers reveals at least one test case causing anomalous behavior in every printer tested. For most printers it finds multiple vulnerabilities, 6 of which have been assigned CVE numbers, including buffer overflow and information disclosure.

  • Zhuoyi Wang, Yigong Wang, Bo Dong, Sahoo Pracheta, Kevin Hamlen, and Latifur Khan. Adaptive Margin Based Deep Adversarial Metric Learning. In Proceedings of the 6th IEEE International Conference on Big Data Security on Cloud (BigDataSecurity), May 2020. [BibTeX] [acceptance rate: 18.8%] [Best Paper Award]

    In the past decades, learning an effective distance metric between pairs of instances has played an important role in the classification and retrieval task, for example, person identification or malware retrieval in the IoT service. The core motivation of recent efforts focuses on improving the metric forms, and already showed promising results on the various applications. However, such models often fail to produce a reliable metric on ambiguous test sets. It happens mainly due to the sampling process of the training set, which is not representative of the distribution of the negative samples, especially the examples that are closer to the boundary of different categories (also called hard negative samples). In this paper, we focus on addressing such problems and propose an adaptive margin deep adversarial metric learning (AMDAML) framework. It exploits numerous common negative samples to generate potential hard (adversarial) negatives and applies them to facilitate robust metric learning. Apart from the previous approaches that typically depend on search or data augmentation to find hard negative samples, the generation of adversarial negative instances could avoid the limitation of domain knowledge and constraint pairs' amount. Specifically, in order to prevent overfitting or underfitting during the training step, we propose an adaptive margin loss that preserves a flexible margin between the negative (including the adversarial and original) and positive samples. We simultaneously train both the adversarial negative generator and conventional metric objective in an adversarial manner and learn the feature representations that are more precise and robust. The experimental results on practical data sets clearly demonstrate the superiority of AMDAML over representative state-of-the-art metric learning models.

  • Gbadebo Ayoade, Frederico Araujo, Khaled Al-Naami, Ahmand M. Mustafa, Yang Gao, Kevin W. Hamlen, and Latifur Khan. Automating Cyberdeception Evaluation with Deep Learning. In Proceedings of the 53rd Hawaii International Conference on System Sciences (HICSS), January 2020. [BibTeX] [acceptance rate: 48%]

    A machine learning-based methodology is proposed and implemented for conducting evaluations of cyberdeceptive defenses with minimal human involvement. This avoids impediments associated with deceptive research on humans, maximizing the efficacy of automated evaluation before human subjects research must be undertaken.

    Leveraging recent advances in deep learning, the approach synthesizes realistic, interactive, and adaptive traffic for consumption by target web services. A case study applies the approach to evaluate an intrusion detection system equipped with application-layer embedded deceptive responses to attacks. Results demonstrate that synthesizing adaptive web traffic laced with evasive attacks powered by ensemble learning, online adaptive metric learning, and novel class detection to simulate skillful adversaries constitutes a challenging and aggressive test of cyberdeceptive defenses.

  • Kevin W. Hamlen, Dakota Fisher, and Gilmore R. Lundquist. Source-free Machine-checked Validation of Native Code in Coq. In Proceedings of the 4th Workshop on Forming an Ecosystem Around Software Transformation (FEAST), November 2019. [BibTeX] [acceptance rate: 88%]

    Picinæ is an infrastructure for machine-proving properties of raw native code programs without sources within the Coq program-proof co-development system. This facilitates formal reasoning about binary code that is inexpressible in source languages or for which no source code exists, such as hand-written assembly code or code resulting from binary transformations (e.g., binary hardening or debloating algorithms). Preliminary results validating some highly optimized, low-level subroutines for Intel and ARM architectures using this new framework are presented and discussed.

  • Gilmore R. Lundquist, Utsav Bhatt, and Kevin W. Hamlen. Relational Processing for Fun and Diversity: Simulating a CPU Relationally with miniKanren. In Proceedings of the 1st miniKanren and Relational Programming Workshop, pp. 100–114, August 2019. [BibTeX]

    Defining a central processing unit relationally using miniKanren is proposed as a new approach for realizing assembly code diversification. Software diversity has long been championed as a means of protecting digital ecosystems from widespread failures due to cyberattacks and faults, but is often difficult to achieve in practice. Early experiments with the technique indicate that such synthesis might lead to better automation of code diversification by breaking the synthesis problem into manageable chunks. An early prototype is presented, with some sample synthesis tasks and discussion of possible future applications.

  • Gbadebo Ayoade, Erick Bauman, Latifur Khan, and Kevin W. Hamlen. Smart Contract Defense Through Bytecode Rewriting. In Proceedings of the IEEE International Conference on Recent Advances on Blockchain and its Applications (ICBC), pp. 384–389, July 2019. [BibTeX] [acceptance rate: 19.6%]

    An Ethereum bytecode rewriting and validation architecture is proposed and evaluated for securing smart contracts in decentralized cryptocurrency systems without access to contract source code. This addresses a wave of smart contract vulnerabilities that have been exploited by cybercriminals in recent years to steal millions of dollars from victims. Such attacks have motivated various best practices proposals for helping developers write safer contracts; but as the number of programming languages used to develop smart contracts increases, implementing these best practices can be cumbersome and hard to enforce across the development tool chain. Automated hardening at the bytecode level bypasses this source-level heterogeneity to enforce safety and code integrity properties of contracts independently of the sources whence they were derived. In addition, a binary code verification tool implemented atop the Coq interactive theorem prover establishes input-output equivalence between the original code and the modified code. Evaluation demonstrates that the system can enforce policies that protect against integer overflow and underflow vulnerabilities in real Ethereum contract bytecode, and overhead is measured in terms of instruction counts.

  • Benjamin Ferrell, Jun Duan, and Kevin W. Hamlen. CUDA au Coq: A Framework for Machine-validating GPU Assembly Programs. In Proceedings of the 26th Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 474–479, March 2019. [BibTeX] [acceptance rate: 24%]

    A prototype framework for formal, machine-checked validation of GPU pseudo-assembly code algorithms using the Coq proof assistant is presented and discussed. The framework is the first to afford GPU programmers a reliable means of formally machine-validating high-assurance GPU computations without trusting any specific source-to-assembly compilation toolchain. A formal operational semantics for the PTX pseudo-assembly language is expressed as inductive, dependent Coq types, facilitating development of proofs and proof tactics that refer directly to the compiled PTX object code. Challenges modeling PTX's complex and highly parallelized computation model in Coq, with sufficient clarity and generality to tractably prove useful properties of realistic GPU programs, are discussed. Examples demonstrate how the prototype can already be used to validate some realistic programs.

  • Xiaoyang Xu, Wenhao Wang, Kevin W. Hamlen, and Zhiqiang Lin. Towards Interface-Driven COTS Binary Hardening. In Proceedings of the 3rd Workshop on Forming an Ecosystem Around Software Transformation (FEAST), October 2018. [BibTeX]

    Hardening COTS binary software products (e.g., via control-flow integrity and/or software fault isolation defenses) is particularly difficult in contexts where the surrounding software environment includes closed-source, unmodifiable, and possibly obfuscated binary components, such as system libraries, OS kernels, and virtualization layers. It is demonstrated that many code hardening algorithms, when applied only to the user-level software products in such environments, leave open critical vulnerabilities that arise from mismatches between the application-agnostic security policies enforced by the system modules versus the application-specific policies enforced at the application layer.

    To overcome this problem, a modular approach is proposed for hardening application-level software in such environments without the need to harden all other software in the environment with exactly the same protection strategy or policies. The approach embeds application-level protections within objects shared by interoperating modules. Modules that obey their interface specifications therefore receive an appropriate granularity of protection automatically when they invoke shared object methods. Experiences developing and refining this approach for Microsoft Windows environments are reported and discussed.

  • Gbadebo Ayoade, Swarup Chandra, Latifur Khan, Kevin Hamlen, and Bhavani Thuraisingham. Automated Threat Report Classification over Multi-source Data. In Proceedings of the 4th IEEE International Conference on Collaboration and Internet Computing (CIC), pp. 236–245, October 2018. [BibTeX] [acceptance rate: 27%]

    With an increase in targeted attacks such as advanced persistent threats (APTs), enterprise system defenders require comprehensive frameworks that allow them to collaborate and evaluate their defense systems against such attacks. MITRE has developed a framework which includes a database of different kill-chains, tactics, techniques, and procedures that attackers employ to perform these attacks. In this work, we leverage natural language processing techniques to extract attacker actions from threat report document generated by different organizations and automatically classify them into standardized tactics and techniques, while providing relevant mitigation advisories for each attack. A naïve method to achieve this is by training a machine learning model to predict labels that associate the reports with relevant categories. In practice, however, sufficient labeled data for model training is not always readily available, so that training and test data come from different sources, resulting in bias. A naïve model would typically underperform in such a situation. We address this major challenge by incorporating an importance weighting scheme called bias correction that efficiently utilizes available labeled data, given threat reports, whose categories are to be automatically predicted. We empirically evaluated our approach on 18,257 real-world threat reports generated between year 2000 and 2018 from various computer security organizations to demonstrate its superiority by comparing its performance with an existing approach.

  • Gbadebo Ayoade, Vishal Karande, Kevin Hamlen, and Latifur Khan. Decentralized IoT Data Management Using BlockChain and Trusted Execution Environment. In Proceedings of the 19th IEEE International Conference on Information Reuse and Integration for Data Science (IRI), pp. 15–22, July 2018. [BibTeX] [acceptance rate: 23.65%]

    Due to the centralization of authority in the management of data generated by IoT devices, there is a lack of transparency in how user data is being shared among third party entities. With the popularity of adoption of blockchain technology, which provide decentralized management of assets such as currency as seen in Bitcoin, we propose a decentralized system of data management for IoT devices where all data access permission is enforced using smart contracts and the audit trail of data access is stored in the blockchain. With smart contracts applications, multiple parties can specify rules to govern their interactions which is independently enforced in the blockchain without the need for a centralized system. We provide a framework that store the hash of the data in the blockchain and store the raw data in a secure storage platform using trusted execution environment (TEE). In particular, we consider Intel SGX as a part of TEE that ensure data security and privacy for sensitive part of the application (code and data).

  • Meera Sridhar, Abhinav Mohanty, Fadi Yilmaz, Vasant Tendulkar, and Kevin W. Hamlen. Inscription: Thwarting ActionScript Web Attacks from Within. In Proceedings of the 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), July 2018. [BibTeX] [acceptance rate: 30%]

    The design and implementation of Inscription, the first fully automated Adobe Flash binary code transformation system that can guard major Flash vulnerability categories without modifying vulnerable Flash VMs, is presented and evaluated. Inscription affords a means of mitigating the significant class of web attacks that target unpatched, legacy Flash VMs and their apps. Such legacy VMs, and the new and legacy Flash apps that they run, continue to abound in a staggering number of web clients and hosts today; their security issues routinely star in major annual threat reports and exploit kits worldwide. Through two complementary binary transformation approaches based on in-lined reference monitoring, it is shown that many of these exploits can be thwarted by a third-party principal (e.g., web page publisher, ad network, network firewall, or web browser) lacking the ability to universally patch all end-user VMs—write-access to the untrusted Flash apps (prior to execution) suffices. Detailed case-studies describing proof-of-concept exploits and mitigations for five major vulnerability categories are reported.

  • Vishal Karande, Swarup Chandra, Zhiqiang Lin, Juan Caballero, Latifur Khan, and Kevin W. Hamlen. BCD: Decomposing Binary Code into Components Using Graph-based Clustering. In Proceedings of the 13th Asia Conference on Computer and Communications Security (AsiaCCS), pp. 393–398, June 2018. [BibTeX] [acceptance rate: 20%]

    Complex software is built by composing components implementing largely independent blocks of functionality. However, once the sources are compiled into an executable, that modularity is lost. This is unfortunate for code recipients, for whom knowing the components has many potential benefits, such as improved program understanding for reverse-engineering, identifying shared code across different programs, binary code reuse, and authorship attribution. A novel approach for decomposing such source-free program executables into components is here proposed. Given an executable, the approach first statically builds a decomposition graph, where nodes are functions and edges capture three types of relationships: code locality, data references, and function calls. It then applies a graph-theoretic approach to partition the functions into disjoint components. A prototype implementation, BCD, demonstrates the approach's efficacy: Evaluation of BCD with 25 C++ binary programs to recover the methods belonging to each class achieves high precision and recall scores for these tested programs.

  • Bahman Rashidi, Carol J. Fung, Kevin W. Hamlen, and Andrzej Kamisinski. HoneyV: A Virtualized Honeynet System Based on Network Softwarization. In Proceedings of the 14th IEEE/IFIP Network Operations and Management Symposium (NOMS), April 2018. [BibTeX] [acceptance rate: 29.6%]

    Intrusion detection in modern enterprise networks faces challenges due to the increasingly large volume of data and insufficient training data for anomaly detections. In this work, we propose a novel network topology for improved intrusion detection through a multi-phase data monitoring system. Rather than the all-or-nothing approach to terminate all sessions identified as suspicious, the topology routes traffic to different server replicas with different monitoring intensity levels based on their likelihood of attacks. This topology leverages recent advances in software-defined networking (SDN) to dynamically route such sessions into risk-appropriate computing environments. These environment offer enhanced training opportunities for intrusion detection systems (IDSes) by exposing data streams that would not have been observable had the session merely been terminated at the first sign of maliciousness. They also afford defenders finer-grained risk management by supporting a continuum of endpoint environmnts, ranging from fully trusted, to semi-trusted, to fully-untrusted, for example.

  • Ahmad M. Mustafa, Gbadebo Ayoade, Khaled Al-Naami, Latifur Khan, Kevin W. Hamlen, Bhavani M. Thuraisingham, and Frederico Araujo. Unsupervised Deep Embedding for Novel Class Detection over Data Stream. In Proceedings of the 5th IEEE International Conference on Big Data (BigData), pp. 1830–1839, December 2017. [BibTeX] [acceptance rate: 17%]

    Data streams are continuous flows of data points. Novel class detection is an important part of data stream mining. A novel class is a newly emerged class that has not previously been modeled by the classifier over the input stream. This paper proposes deep embedding for novel class detection—a novel approach that combines feature learning using denoising autoencoding with novel class detection. A denoising autoencoder is a neural network with hidden layers aiming to reconstruct the input vector from a corrupted version. A nonparametric multidimensional change point detection approach is also proposed, to detect concept-drift (the change of data feature values over time). Experiments on several real datasets show that the approach significantly improves the performance of novel class detection.

  • Ahsanul Haque, Zhuoyi Wang, Swarup Chandra, Bo Dong, Latifur Khan, and Kevin W. Hamlen. FUSION: An Online Method for Multistream Classification. In Proceedings of the 26th ACM Conference on Information and Knowledge Management (CIKM), pp. 919–928, November 2017. [BibTeX] [acceptance rate: 20%]

    Traditional data stream classification assumes that data is generated from a single non-stationary process. On the contrary, multistream classification problem involves two independent non-stationary data generating processes. One of them is the source stream that continuously generates labeled data. The other one is the target stream that generates unlabeled test data from the same domain. The distribution represented by the source stream data is biased compared to that of the target stream. Moreover, these streams may have asynchronous concept drifts between them. The multistream classification problem is to predict the class labels of target stream instances by utilizing labeled data from the source stream. This kind of scenario is often observed in real-world applications due to scarcity of labeled data. The only existing approach for multistream classification uses separate drift detection on the streams for addressing the asynchronous concept drift problem. If a concept drift is detected in any of the streams, it uses an expensive batch technique for data shift adaptation. These add significant execution overhead, and limit its usability. In this paper, we propose an efficient solution for multistream classification by fusing drift detection into online data shift adaptation. We study the theoretical convergence rate and computational complexity of the proposed approach. Moreover, empirical results on benchmark data sets indicate significantly improved performance over the baseline methods.

  • Jun Duan, Kang Zhang, and Kevin W. Hamlen. VisualVital: An Observation Model for Multiple Sections of Scenes. In Proceedings of the 14th IEEE International Conference on Ubiquitous Intelligence and Computing (UIC), August 2017. [BibTeX] [acceptance rate: 32%]

    A computational methodology for reorienting, repositioning, and merging camera positions within a region under surveillance is proposed, so as to optimally cover all features of interest without overburdening human or machine analysts with an overabundance of video information. This streamlines many video monitoring applications, such as vehicular traffic and security camera monitoring, which are often hampered by the difficulty of manually identifying the few specific locations (for tracks) or frames (for videos) relevant to a particular inquiry from a vast sea of hundreds or thousands of hours of video. VisualVital ameliorates this problem by considering geographic information to select ideal locations and orientations of camera positions to fully cover the span without losing key visual details. Given a target quantity of cameras, it merges relatively unimportant camera positions to reduce the quantity of video information that must be collected, maintained, and presented. Experiments apply the technique to paths chosen from maps of different cities around the world with various target camera quantities. The approach finds detail-optimizing positions with a time complexity of O(n log n).

  • Ahsanul Haque, Swarup Chandra, Latifur Khan, Kevin Hamlen, and Charu Aggarwal. Efficient Multistream Classification using Direct Density Ratio Estimation. In Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE), pp. 155–158, April 2017. [BibTeX] [acceptance rate: 23.5%]

    Traditional data stream classification techniques assume that the stream of data is generated from a single non-stationary process. On the contrary, a recently introduced problem setting, referred to as multistream classification, involves two independent, non-stationary, generating processes. One of them is the source stream that continuously generates labeled data instances. The other one is the target stream that generates unlabeled test data instances from the same domain. The distributions represented by the source stream data is biased compared to that of the target stream. Moreover, these streams may have asynchronous concept drifts between them. The multistream classification problem is to predict the class labels of target stream instances, while utilizing labeled data available from the source stream. In this paper, we propose an efficient solution for multistream classification by fusing drift detection into online data shift adaptation. Experiment results on benchmark data sets indicate significantly improved performance over the only existing approach for multistream classification.

  • Khaled Al-Naami, Swarup Chandra, Ahmad M. Mustafa, Latifur Khan, Zhiqiang Lin, Kevin W. Hamlen, and Bhavani M. Thuraisingham. Adaptive Encrypted Traffic Fingerprinting with Bi-directional Dependence. In Proceedings of the 32nd Annual Conference on Computer Security Applications (ACSAC), pp. 177–188, December 2016. [BibTeX] [acceptance rate: 22.8%]

    Recently, network traffic analysis has been increasingly used in various applications including security, targeted advertisements, and network management. However, data encryption performed on network traffic poses a challenge to these analysis techniques. In this paper, we present a novel method to extract characteristics from encrypted traffic by utilizing data dependencies that occur over sequential transmissions of network packets. Furthermore, we explore the temporal nature of encrypted traffic and introduce an adaptive model that considers changes in data content over time. We evaluate our analysis on two packet encrypted applications: website fingerprinting and mobile application (app) fingerprinting. Our evaluation shows how the proposed approach outperforms previous works especially in the open-world scenario and when defense mechanisms are considered.

  • Masoud Ghaffarinia and Kevin W. Hamlen. Binary Software Complexity Reduction: From Artifact- to Feature-Removal. In Proceedings of the 1st Workshop on Forming an Ecosystem Around Software Transformation (FEAST), October 2016. [BibTeX]

    Research progress and challenges related to the automated removal of potentially exploitable, abusable, or unwanted code features from binary software are examined and discussed. While much of the prior literature on this theme has focused on elimination of artifacts—software functionalities unanticipated by the code's original developers (e.g., bugs)—an approach for safely and automatically removing features intended by developers but unwanted by individual consumers is proposed and evaluated. Such a capability is potentially valuable in scenarios where security-sensitive organizations wish to employ general-purpose, closed-source, commercial software in specialized computing contexts where some of the software's functionalities are unneeded. In such contexts, unwanted functionalities can constitute an unnecessary security risk, motivating their removal before deployment.

    Preliminary experiments using a combination of runtime tracing, maching learning, and control-flow integrity enforcement indicate that automated software feature removal is feasible for simple binary programs without source code. Future work is proposed for scaling these preliminary results to larger, more complex software products.

  • Gilmore R. Lundquist, Vishwath Mohan, and Kevin W. Hamlen. Searching for Software Diversity: Attaining Artificial Diversity through Program Synthesis. In Proceedings of the 24th New Security Paradigms Workshop (NSPW), September 2016. [BibTeX] [acceptance rate: 46%]

    A means of attaining richer, more comprehensive forms of software diversity on a mass scale is proposed through leveraging and repurposing a closely related, yet heretofore untapped, line of computer science research—automatic program synthesis. It is argued that the search-based methodologies presently used for obtaining implementations from specifications can be broadened relatively easily to a search for many candidate solutions, potentially diversifying the software monoculture. Small-scale experiments using the Rosette synthesis tool offer preliminary support for this proposed approach. But the possible rewards are not without danger: It is argued that the same approach can power a dangerous new level of sophistication for malware mutation and reactively adaptive software threats.

  • Bhavani Thuraisingham, Murat Kantarcioglu, Kevin Hamlen, Latifur Khan, Tim Finin, Anupam Joshi, Tim Oates, and Elisa Bertino. A Data Driven Approach for the Science of Cyber Security: Challenges and Directions. In Proceedings of the IEEE 17th Conference on Information Reuse and Integration (IRI), July 2016. [BibTeX] [acceptance rate: 25.6%]

    This paper describes a data driven approach to studying the science of cyber security (SoS). It argues that science is driven by data. It then describes issues and approaches towards the following three aspects: (i) data driven science for attack detection and mitigation, (ii) foundations for data trustworthiness and policy-based sharing, and (iii) a risk-based approach to security metrics. We believe that the three aspects addressed in this paper will form the basis for studying the science of cyber security.

  • Meera Sridhar, Abhinav Mohanty, Vasant Tendulkar, Fadi Yilmaz, and Kevin W. Hamlen. In a Flash: An In-lined Monitoring Approach to Flash App Security. In the 12th Workshop on Foundations of Computer Security (FCS), June 2016. [BibTeX] [acceptance rate: 75%]

    The design and implementation of the first fully automated Adobe Flash binary code transformation system that can guard major Flash vulnerability categories without modifying vulnerable Flash VMs is presented and evaluated. This affords a means of mitigating the significant class of web attacks that target unpatched, legacy Flash VMs and their apps. Such legacy VMs, and the new and legacy Flash apps that they run, continue to abound in a staggering number of web clients and hosts today; their security issues routinely star in major annual threat reports and exploit kits worldwide. Through two complementary binary transformation approaches based on in-lined reference monitoring, it is shown that many of these exploits can be thwarted by a third-party principal (e.g., web page publisher, ad network, network firewall, or web browser) lacking the ability to universally patch all end-user VMs—write-access to the untrusted Flash apps (prior to execution) suffices. Detailed case-studies describing proof-of-concept exploits and mitigations for five major vulnerability categories are reported.

  • Frederico Araujo, Mohammad Shapouri, Sonakshi Pandey, and Kevin W. Hamlen. Experiences with Honey-Patching in Active Cyber Security Education. In Proceedings of the 8th Workshop on Cyber Security Experimentation and Test (CSET), USENIX Security Symposium, August 2015. [BibTeX] [acceptance rate: 26.7%]

    Modern cyber security educational programs that emphasize technical skills often omit or struggle to effectively teach the increasingly important science of cyber deception. A strategy for effectively communicating deceptive technical skills by leveraging the new paradigm of honey-patching is discussed and evaluated. Honey-patches mislead attackers into believing that failed attacks against software systems were successful. This facilitates a new form of penetration testing and capture-the-flag style exercise in which students must uncover and outwit the deception in order to successfully bypass the defense. Experiences creating and running the first educational lab to employ this new technique are discussed, and educational outcomes are examined.

  • Huseyin Ulusoy, Murat Kantarcioglu, Erman Pattuk, and Kevin W. Hamlen. Vigiles: Fine-grained Access Control for MapReduce Systems. In Proceedings of the 2nd IEEE International Congress on Big Data (BigData), pp. 40–47, June–July 2014. [BibTeX] [acceptance rate: 19%]

    Security concerns surrounding the rise of Big Data systems have stimulated myriad new Big Data security models and implementations over the past few years. A significant disadvantage shared by most of these implementations is that they customize the underlying system source code to enforce new policies, making the customizations difficult to maintain as these layers evolve over time (e.g., over version updates).

    This paper demonstrates how a broad class of safety policies, including fine-grained access control policies at the level of key-value data pairs rather than files, can be elegantly enforced on MapReduce clouds with minimal overhead and without any change to the system or OS implementations. The approach realizes policy enforcement as a middleware layer that rewrites the cloud's front-end API with reference monitors. After rewriting, the jobs run on input data authorized by fine-grained access control policies, allowing them to be safely executed without additional system-level controls. Detailed empirical studies show that this more modular approach exhibits just 1% overhead compared to a less modular implementation that customizes MapReduce directly to enforce the same policies.

  • Richard Wartell, Yan Zhou, Kevin W. Hamlen, and Murat Kantarcioglu. Shingled Graph Disassembly: Finding the Undecidable Path. In Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 273–285, May 2014. [BibTeX] [acceptance rate: 16.4%]

    A probabilistic finite state machine approach to statically disassembling x86 machine language programs is presented and evaluated. Static disassembly is a crucial prerequisite for software reverse engineering, and has many applications in computer security and binary analysis. The general problem is provably undecidable because of the heavy use of unaligned instruction encodings and dynamically computed control flows in the x86 architecture. Limited work in machine learning and data mining has been undertaken on this subject. This paper shows that semantic meanings of opcode sequences can be leveraged to infer similarities between groups of opcode and operand sequences. This empowers a probabilistic finite state machine to learn statistically significant opcode and operand sequences in a training corpus of disassemblies. The similarities demonstrate the statistical significance of opcodes and operands in a surrounding context, facilitating more accurate disassembly of new binaries. Empirical results demonstrate that the algorithm is more efficient and effective than comparable approaches used by state-of-the-art disassembly tools.

  • Safwan Mahmud Khan, Kevin W. Hamlen, and Murat Kantarcioglu. Silver Lining: Enforcing Secure Information Flow at the Cloud Edge. In Proceedings of the 2nd IEEE Conference on Cloud Engineering (IC2E), pp. 37–46, March 2014. [BibTeX] [acceptance rate: 18.4%]

    SilverLine is a novel, exceptionally modular framework for enforcing mandatory information flow policies for Java computations on commodity, data-processing, Platform-as-a-Service clouds by leveraging Aspect-Oriented Programming (AOP) and In-lined Reference Monitors (IRMs). Unlike traditional system-level approaches, which typically require modifications to the cloud kernel software, OS/hypervisor, VM, or cloud file system, SilverLine automatically in-lines secure information flow tracking code into untrusted job binaries as they arrive at the cloud. This facilitates efficient enforcement of a large, flexible class of information flow and mandatory access control policies without any customization of the cloud or its underlying infrastructure. The cloud and the enforcement framework can therefore be maintained completely separately and orthogonally (i.e., modularly). To demonstrate the approach's feasibility, a prototype implements and deploys SilverLine on a real-world data processing cloud—Hadoop MapReduce. Evaluation results demonstrate that SilverLine provides inter-process information flow security for Hadoop clouds with easy maintainability (through modularity) and low overhead.

  • Yangchun Fu, Zhiqiang Lin, and Kevin W. Hamlen. Subverting System Authentication with Context-Aware, Reactive Virtual Machine Introspection. In Proceedings of the 29th Annual Computer Security Applications Conference (ACSAC), pp. 229–238, December 2013. [BibTeX] [acceptance rate: 19.8%]

    Recent advances in bridging the semantic gap between virtual machines (VMs) and their guest processes have a dark side: They can be abused to subvert and compromise VM file system images and process images. To demonstrate this alarming capability, a context-aware, reactive VM Introspection (VMI) instrument is presented and leveraged to automatically break the authentication mechanisms of both Linux and Windows operating systems. By bridging the semantic gap, the attack is able to automatically identify critical decision points where authentication succeeds or fails at the binary level. It can then leverage the VMI to transparently corrupt the control-flow or data-flow of the victim OS at that point, resulting in successful authentication without any password-guessing or encryption-cracking. The approach is highly flexible (threatening a broad class of authentication implementations), practical (realizable against real-world OSes and VM images), and useful for both malicious attacks and forensics analysis of virtualized systems and software.

  • Richard Wartell, Yan Zhou, Kevin W. Hamlen, and Murat Kantarcioglu. Shingled Graph Disassembly: Finding the Undecidable Path. In Proceedings of the 16th International Symposium on Research in Attacks, Intrusions and Defenses (RAID), pp. 460–462, October 2013. [BibTeX]

    A probabilistic finite state machine approach to statically disassembling x86 executables is presented. It leverages semantic meanings of opcode sequences to infer similarities between groups of opcode and operand sequences. Preliminary results demonstrate that the technique is more efficient and effective than comparable approaches used by state-of-the-art disassembly tools.

  • Kevin W. Hamlen. Stealthy Software: Next-generation Cyber-attacks and Defenses, Invited paper. In Proceedings of the 11th IEEE Intelligence and Security Informatics Conference (ISI), pp. 109–112, June 2013. [BibTeX]

    Weaponized software is the latest development in a decades-old battle of virus-antivirus co-evolution. Reactively adaptive malware and automated binary transformation are two recently emerging offensive and defensive (respectively) technologies that may shape future cyberwarfare weapons. The former intelligently learns and adapts to antiviral defenses fully automatically in the wild, while the latter applies code mutation technology to defense, transforming potentially dangerous programs into safe programs. These technologies and their roles within the landscape of malware attack and defense are examined and discussed.

  • Daniel Krawczyk, James Bartlett, Murat Kantarcioglu, Kevin Hamlen, and Bhavani Thuraisingham. Measuring Expertise and Bias in Cyber Security Using Cognitive and Neuroscience Approaches. In Proceedings of the 11th IEEE Intelligence and Security Informatics Conference (ISI), pp. 364–367, June 2013. [BibTeX]

    Toward the ultimate goal of enhancing human performance in cyber security, we attempt to understand the cognitive components of cyber security expertise. Our initial focus is on cyber security attackers—often called “hackers”. Our first aim is to develop behavior measures of accuracy and response time to examine the cognitive processes of pattern-recognition, reasoning and decision-making that underlie the detection and exploitation of security vulnerabilities. Understanding these processes at a cognitive level will lead to theory development addressing questions about how cyber security expertise can be identified, quantified, and trained. In addition to behavioral measures, our plan is to conduct a functional magnetic resonance imaging (fMRI) study of neural processing patterns that can differentiate persons with different levels of cyber security expertise.

    Our second aim is to quantitatively assess the impact of attackers' thinking strategies—conceptualized by psychologists as heuristics and biases—on their susceptibility to defensive techniques (e.g., “decoys” and “honeypots”). Honeypots are an established method to lure attackers into exploiting a dummy system containing misleading or false content, distracting their attention from genuinely sensitive information, and consuming their limited time and resources. We use the extensive research and experimentation that we have carried out to study the minds of successful chess players in order to study the minds of hackers, with the ultimate goal of enhancing the security of current systems. This paper outlines our approach.

  • Safwan M. Khan and Kevin W. Hamlen. Computation Certification as a Service in the Cloud. In Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 434–441, May 2013. [BibTeX] [acceptance rate: 22.18%]

    This paper proposes a new form of Security as a Service (SECaaS) that allows untrusted, mostly serial computations in untrusted computing environments to be independently and efficiently validated by trusted, commodity clouds. This addresses the longstanding problem of safely executing high assurance computations on untrusted hosts.

    Untrusted computations are instrumented with a checkpointing mechanism that yields a proof of computation integrity as the computation progresses. This proof can be validated by a trusted cloud to ensure that the computation was carried out faithfully. Cloud parallelism and replication is leveraged to validate the proof efficiently even when the original computation is not parallelized. This affords a means of high-assurance, serial computation on cloud-aware, mobile devices that mix resource-rich but untrusted hardware with trusted but comparatively resource-impoverished hardware components. An implementation for Java and Hadoop MapReduce demonstrates that the approach is effective for commodity VMs, clouds, and software.

  • Safwan M. Khan and Kevin W. Hamlen. AnonymousCloud: A Data Ownership Privacy Provider Framework in Cloud Computing. In Proceedings of the 11th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 170–176, June 2012. [BibTeX] [acceptance rate: <30%]

    A means of reliably concealing ownership of cloud data without impeding computation over the data is presented and evaluated. This facilitates information privacy enforcement in cloud environments by withholding data ownership information from cloud nodes that compute using the data. As a result, nodes that have access to private data in unencrypted form do not know who owns it, what role their computations play in the larger computational task, or to whom their computation results are ultimately delivered. To provide this data ownership privacy, the cloud's distributed computing resources are leveraged to implement an anonymizing circuit based on Tor, through which users submit private data and jobs. A tunable parameter k controls a trade-off between the degree of anonymity and the computational overhead imposed by the system. Anonymous authentication based on public-key cryptography safely links jobs and data to customers for billing purposes without revealing these associations to untrusted computation nodes. Simulation results demonstrate the potency of the system in the presence of attackers.

  • Safwan M. Khan and Kevin W. Hamlen. Hatman: Intra-cloud Trust Management for Hadoop. In Proceedings of the 5th IEEE International Conference on Cloud Computing (CLOUD), pp. 494–501, June 2012. [BibTeX] [acceptance rate: 19%]

    Data and computation integrity and security are major concerns for users of cloud computing facilities. Many production-level clouds optimistically assume that all cloud nodes are equally trustworthy when dispatching jobs; jobs are dispatched based on node load, not reputation. This increases their vulnerability to attack, since compromising even one node suffices to corrupt the integrity of many distributed computations.

    This paper presents and evaluates Hatman: the first full-scale, data-centric, reputation-based trust management system for Hadoop clouds. Hatman dynamically assesses node integrity by comparing job replica outputs for consistency. This yields agreement feedback for a trust manager based on EigenTrust. Low overhead and high scalability is achieved by formulating both consistency-checking and trust management as secure cloud computations; thus, the cloud's distributed computing power is leveraged to strengthen its security. Experiments demonstrate that with feedback from only 100 jobs, Hatman attains over 90% accuracy when 25% of the Hadoop cloud is malicious.

  • Bhavani Thuraisingham, Vaibhav Khadilkar, Jyothsna Rachapalli, Tyrone Cadenhead, Murat Kantarcioglu, Kevin Hamlen, Latifur Khan, and Farhan Husain. Cloud-Centric Assured Information Sharing, Invited paper. In Proceedings of the 7th Pacific Asia Workshop on Intelligence and Security Informatics (PAISI), pp. 1–26, May 2012. [BibTeX]

    In this paper we describe the design and implementation of cloud-based assured information sharing systems. In particular, we will describe our current implementation of a centralized cloud-based assured information sharing system and the design of a decentralized hybrid cloud-based assured information sharing system of the future. Our goal is for coalition organizations to share information stored in multiple clouds and enforce appropriate policies.

  • Kevin W. Hamlen, Micah M. Jones, and Meera Sridhar. Aspect-oriented Runtime Monitor Certification. In Proceedings of the 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pp. 126–140, March–April 2012. [BibTeX] [acceptance rate: 24%]

    In-lining runtime monitors into untrusted binary programs via aspect-weaving is an increasingly popular technique for efficiently and flexibly securing untrusted mobile code. However, the complexity of the monitor implementation and in-lining process in these frameworks can lead to vulnerabilities and low assurance for code-consumers. This paper presents a machine-verification technique for aspect-oriented in-lined reference monitors based on abstract interpretation and model-checking. Rather than relying upon trusted advice, the system verifies semantic properties expressed in a purely declarative policy specification language. Experiments on a variety of real-world policies and Java applications demonstrate that the approach is practical and effective.

  • Pallabi Parveen, Zackary R. Weger, Bhavani Thuraisingham, Kevin W. Hamlen, and Latifur Khan. Supervised Learning for Insider Threat Detection Using Stream Mining. In Proceedings of the 23rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1032–1039, November 2011. [BibTeX] [acceptance rate: 30%] [Best Paper, Special Session on Stream Data Mining]

    Insider threat detection requires the identification of rare anomalies in contexts where evolving behaviors tend to mask such anomalies. This paper proposes and tests an ensemble-based stream mining algorithm based on supervised learning that addresses this challenge by maintaining an evolving collection of multiple models to classify dynamic data streams of unbounded length. The result is a classifier that exhibits substantially increased classification accuracy for real insider threat streams relative to traditional supervised learning (traditional SVM and one-class SVM) and other single-model approaches.

  • Pallabi Parveen, Jonathan Evans, Bhavani Thuraisingham, Kevin W. Hamlen, and Latifur Khan. Insider Threat Detection using Stream Mining and Graph Mining. In Proceedings of the 3rd IEEE Conference on Privacy, Security, Risk and Trust (PASSAT), pp. 1102–1110, October 2011. [BibTeX] [acceptance rate: 8%]

    Evidence of malicious insider activity is often buried within large data streams, such as system logs accumulated over months or years. Ensemble-based stream mining leverages multiple classification models to achieve highly accurate anomaly detection in such streams even when the stream is unbounded, evolving, and unlabeled. This makes the approach effective for identifying insider threats who attempt to conceal their activities by varying their behaviors over time. This paper applies ensemble-based stream mining, unsupervised learning, and graph-based anomaly detection to the problem of insider threat detection, demonstrating that the ensemble-based approach is significantly more effective than traditional single-model methods.

  • Kevin Hamlen, Peng Liu, Murat Kantarcioglu, Bhavani Thuraisingham, and Ting Yu. Identity Management for Cloud Computing: Developments and Directions. In Proceedings of the 7th Annual Cyber Security and Information Intelligence Research Workshop (CSIIRW), October 2011. [BibTeX] [acceptance rate: 52%]

    Cloud computing technologies have been rapidly adopted by organizations to lower costs and to enable flexible and efficient access to critical data. As these new cloud technologies emerge, cyber security challenges associated with these technologies have increased at a rapid pace. Once of the critical areas that needs attention for secure cloud computing is identity management where the multiple identities of cloud users operating possibly in a federated environment have to be managed and maintained. In this paper, we first explore identity management technologies and secure cloud computing technologies. We will then discuss some of the security balances for cloud computing with respect to identity management.

  • Micah Jones and Kevin W. Hamlen. A Service-oriented Approach to Mobile Code Security. In Proceedings of the 8th International Conference on Mobile Web Information Systems (MobiWIS), pp. 531–538, September 2011. [BibTeX] [acceptance rate: 36%]

    Client software for modern service-oriented web architectures is often implemented as mobile code applets made available by service-providers. Protecting clients from malicious mobile code is therefore an important concern in these architectures; however, the burden of security enforcement is typically placed entirely on the client. This approach violates the service-oriented paradigm.

    A method of realizing mobile code security as a separate service in a service-oriented web architecture is proposed. The security service performs in-lined reference monitoring of untrusted Java binaries on-demand for client-specified security policies. An XML format for specifying these policies is outlined, and preliminary experiments demonstrate the feasibility of the approach.

  • Meera Sridhar and Kevin W. Hamlen. Flexible In-lined Reference Monitor Certification: Challenges and Future Directions. In Proceedings of the 5th ACM SIGPLAN Workshop on Programming Languages meets Program Verification (PLPV), pp. 55–60, January 2011. [BibTeX] [acceptance rate: 60%]

    Over the last few years, in-lined reference monitors (IRMs) have gained much popularity as successful security enforcement mechanisms. Aspect-oriented programming (AOP) provides one elegant paradigm for implementing IRM frameworks. There is a foreseen need to enhance both AOP-style and non-AOP IRMs with static certification due to two main concerns. Firstly, the Trusted Computing Base (TCB) can grow large quickly in an AOP-style IRM framework. Secondly, in many practical settings, such as in the domain of web-security, aspectually encoded policy implementations and the rewriters that apply them to untrusted code are subject to frequent change. Replacing the rewriter with a small, light-weight, yet powerful certifier that is policy-independent and less subject to change addresses both these concerns.

    The goal of this paper is two-fold. First, interesting issues encountered in the process of building certification systems for IRM frameworks, such as policy specification, certifier soundness, and certifier completeness, are explored in the light of related work. In the second half of the paper, three prominent unsolved problems in the domain of IRM certification are examined: runtime code-generation via eval, IRM certification in the presence of concurrency, and formal verification of transparency. Promising directions suggested by recent work related to these problems are highlighted.

  • Arindam Khaled, Mohammad Farhan Husain, Latifur Khan, Kevin W. Hamlen, and Bhavani Thuraisingham. A Token-Based Access Control System for RDF Data in the Clouds. In Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 104–111, November–December 2010. [BibTeX] [acceptance rate: <25%]

    The Semantic Web is gaining immense popularity—and with it, the Resource Description Framework (RDF) broadly used to model Semantic Web content. However, access control on RDF stores used for single machines has been seldom discussed in the literature. One significant obstacle to using RDF stores defined for single machines is their scalability. Cloud computers, on the other hand, have proven useful for storing large RDF stores; but these systems lack access control on RDF data to our knowledge.

    This work proposes a token-based access control system that is being implemented in Hadoop (an open source cloud computing framework). It defines six types of access levels and an enforcement strategy for the resulting access control policies. The enforcement strategy is implemented at three levels: Query Rewriting, Embedded Enforcement, and Post-processing Enforcement. In Embedded Enforcement, policies are enforced during data selection using MapReduce, whereas in Post-processing Enforcement they are enforced during the presentation of data to users. Experiments show that Embedded Enforcement consistently outperforms Post-processing Enforcement due to the reduced number of jobs required.

  • Aditi Patwardhan, Kevin W. Hamlen, and Kendra Cooper. Towards Security-aware Program Visualization for Analyzing In-lined Reference Monitors. In Proceedings of the International Workshop on Visual Languages and Computing (VLC), pp. 257–260, October 2010. [BibTeX] [acceptance rate: 36.5%]

    In-lined Reference Monitoring frameworks are an emerging technology for enforcing security policies over untrusted, mobile, binary code. However, formulating correct policy specifications for such frameworks to enforce remains a daunting undertaking with few supporting tools. A visualization approach is proposed to aid in this task; preliminary results are presented in this short paper. In contrast to existing approaches, which typically involve tedious and error-prone manual inspection of complex binary code, the proposed framework provides automatically generated, security-aware visual models that follow the UML specification. This facilitates formulation and testing of prototype security policy specifications in a faster and more reliable manner than is possible with existing manual approaches.

  • Bhavani Thuraisingham and Kevin W. Hamlen. Challenges and Future Directions of Software Technology: Secure Software Development, Invited paper. In Proceedings of the 34th IEEE Annual International Computer Security and Applications Conference (COMPSAC), pp. 17–20, July 2010. [BibTeX]

    Developing large scale software systems has major security challenges. This paper describes the issues involved and then addresses two topics: formal methods for emerging secure systems and secure services modeling.

  • Bhavani Thuraisingham and Kevin W. Hamlen. Secure Semantic Sensor Web and Pervasive Computing. In Proceedings of the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC), pp. 5–10, June 2010. [BibTeX] [acceptance rate: 25%]

    In this paper we discuss issues on developing a secure semantic sensor web. SensorML is the starting point for this work. We explore the layers for a semantic sensor web and discuss security issues. We also discuss secure peer-to-peer computing as it relates to sensor web.

  • Micah Jones and Kevin W. Hamlen. Disambiguating Aspect-oriented Security Policies. In Proceedings of the 9th International Conference on Aspect-Oriented Software Development (AOSD), pp. 193–204, March 2010. [BibTeX] [acceptance rate: 22%]

    Many software security policies can be encoded as aspects that identify and guard security-relevant program operations. Bugs in these aspectually-implemented security policies often manifest as ambiguities in which aspects provide conflicting advice for a shared join point. The design and implementation of a detection algorithm for such ambiguities is presented and evaluated. The algorithm reduces advice conflict detection to a combination of boolean satisfiability, linear programming, and regular language non-emptiness. Case studies demonstrate that the analysis is useful for debugging aspect-oriented security policies for several existing aspectual security systems.

  • Meera Sridhar and Kevin W. Hamlen. ActionScript In-lined Reference Monitoring in Prolog. In Proceedings of the 12th International Symposium on Practical Aspects of Declarative Languages (PADL), pp. 149–151, January 2010. [BibTeX] [acceptance rate: 37.9%]

    A Prolog implementation of an In-lined Reference Monitoring system prototype for Adobe ActionScript Bytecode programs is presented. Prolog provides an elegant framework for implementing IRMs. Its declarative and reversible nature facilitate the dual tasks of binary parsing and code generation, greatly simplifying many otherwise difficult IRM implementation challenges. The approach is demonstrated via the enforcement of several security policies on real-world Adobe Flash applets and AIR applications.

  • Meera Sridhar and Kevin W. Hamlen. Model-Checking In-Lined Reference Monitors. In Proceedings of the 11th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), pp. 312–327, January 2010. [BibTeX] [acceptance rate: 36.8%]

    A technique for elegantly expressing In-lined Reference Monitor (IRM) certification as model-checking is presented and implemented. In-lined Reference Monitors (IRMs) enforce software security policies by in-lining dynamic security guards into untrusted binary code. Certifying IRM systems provide strong formal guarantees for such systems by verifying that the instrumented code produced by the IRM system satisfies the original policy. Expressing this certification step as model-checking allows well-established model-checking technologies to be applied to this often difficult certification task. The technique is demonstrated through the enforcement and certification of a URL anti-redirection policy for ActionScript web applets.

  • Brian W. DeVries, Gopal Gupta, Kevin W. Hamlen, Scott Moore, and Meera Sridhar. ActionScript Bytecode Verification With Co-Logic Programming. In Proceedings of the 4th ACM SIGPLAN Workshop on Programming Languages and Analysis for Security (PLAS), pp. 9–15, June 2009. [BibTeX] [acceptance rate: 42%]

    A prototype security policy verification system for ActionScript binaries is presented, whose implementation leverages recent advances in co-logic programming. Our experience with co-logic programming indicates that it is an extremely useful paradigm for elegantly expressing algorithms that lie at the heart of model-checking technologies. This results in an unusually small trusted computing base, making the verification system well-suited to frameworks like certifying in-lined reference monitoring systems, which require small, light-weight verifiers. Preliminary experiments and progress are discussed.

  • Micah Jones and Kevin W. Hamlen. Enforcing IRM Security Policies: Two Case Studies. In Proceedings of the 7th IEEE Intelligence and Security Informatics Conference (ISI), pp. 214–216, June 2009. [BibTeX]

    SPoX (Security Policy XML) is a declarative language for specifying application security policies for enforcement by In-lined Reference Monitors. Two case studies are presented that demonstrate how this language can be used to effectively enforce application-specific security policies for untrusted Java applications in the absence of source code.

  • Bhavani Thuraisingham, Latifur Khan, Mohammed M. Masud, and Kevin W. Hamlen. Data Mining for Security Applications. In Proceedings of the 6th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC), pp. 585–589, December 2008. [BibTeX] [acceptance rate: 30%]

    In this paper we discuss various data mining techniques that we have successfully applied for cyber security. These applications include but are not limited to malicious code detection by mining binary executables, network intrusion detection by mining network traffic, anomaly detection, and data stream mining. We summarize our achievements and current works at the University of Texas at Dallas on intrusion detection, and cyber-security research.

  • Mohammed M. Masud, Tahseen Al-khateeb, Latifur Khan, Bhavani Thuraisingham, and Kevin W. Hamlen. Flow-based Identification of Botnet Traffic by Mining Multiple Log Files. In Proceedings of the 1st International Conference on Distributed Framework and Applications (DFmA), pp. 200–206, October 2008. [BibTeX]

    Botnet detection and disruption has been a major research topic in recent years. One effective technique for botnet detection is to identify Command and Control (C&C) traffic, which is sent from a C&C center to infected hosts (bots) to control the bots. If this traffic can be detected, both the C&C center and the bots it controls can be detected and the botnet can be disrupted. We propose a multiple log-file based temporal correlation technique for detecting C&C traffic. Our main assumption is that bots respond much faster than humans. By temporally correlating two host-based log files, we are able to detect this property and thereby detect bot activity in a host machine. In our experiments we apply this technique to log files produced by tcpdump and exedump, which record all incoming and outgoing network packets, and the start times of application executions and the host machine, respectively. We apply data mining to extract relevant features from these log files and detect C&C traffic. Our experimental results validate our assumption and show better overall performance when compared to other recently published techniques.

  • Kevin W. Hamlen and Micah Jones. Aspect-Oriented In-lined Reference Monitors. In Proceedings of the 3rd ACM SIGPLAN Workshop on Programming Languages and Analysis for Security (PLAS), pp. 11–20, June 2008. [BibTeX] [acceptance rate: 54%]

    An Aspect-Oriented, declarative, security policy specification language is presented, for enforcement by In-lined Reference Monitors. The semantics of the language establishes a formal connection between Aspect-Oriented Programming and In-lined Reference Monitoring wherein policy specifications denote Aspect-Oriented security automata—security automata whose edge labels are encoded as pointcut expressions. The prototype language implementation enforces these security policies by automatically rewriting Java bytecode programs so as to detect and prevent policy violations at runtime.

  • Kevin W. Hamlen and Bhavani Thuraisingham. Secure Peer-to-peer Networks for Trusted Collaboration, Invited paper. In Proceedings of the 3rd IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing (COLCOM), pp. 58–63, November 2007. [BibTeX]

    An overview of recent advances in secure peer-to-peer networking is presented, toward enforcing data integrity, confidentiality, availability, and access control policies in these decentralized, distributed systems. These technologies are combined with reputation-based trust management systems to enforce integrity-based discretionary access control policies. Particular attention is devoted to the problem of developing secure routing protocols that constitute a suitable foundation for implementing this security system. The research is examined as a basis for developing a secure data management system for trusted collaboration applications such as e-commerce, situation awareness, and intelligence analysis.

  • Nathalie Tsybulnik, Kevin W. Hamlen, Bhavani Thuraisingham. Centralized Security Labels in Decentralized P2P Networks. In Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC), pp. 315–324, December 2007. [BibTeX] [acceptance rate: 20.9%]

    This paper describes the design of a peer-to-peer network that supports integrity and confidentiality labeling of shared data. A notion of data ownership privacy is also enforced, whereby peers can share data without revealing which data they own. Security labels are global but the implementation does not require a centralized label server. The network employs a reputation-based trust management system to assess and update data labels, and to store and retrieve labels safely in the presence of malicious peers. The security labeling scheme preserves the efficiency of network operations; lookup cost including label retrieval is O(log N), where N is the number of agents in the network.

  • Kevin W. Hamlen, Greg Morrisett, and Fred B. Schneider. Certfied In-lined Reference Monitoring on .NET. In Proceedings of the 1st ACM SIGPLAN Workshop on Programming Languages and Analysis for Security (PLAS), pp. 7–16, June 2006. [BibTeX] [acceptance rate: 56%]

    Mobile is an extension of the .NET Common Intermediate Language that supports certified In-lined Reference Monitoring. Mobile programs have the useful property that if they are well-typed with respect to a declared security policy, then they are guaranteed not to violate that security policy when executed. Thus, when an In-lined Reference Monitor (IRM) is expressed in Mobile, it can be certified by a simple type-checker to eliminate the need to trust the producer of the IRM.

    Security policies in Mobile are declarative, can involve unbounded collections of objects allocated at runtime, and can regard infinite-length histories of security events exhibited by those objects. Our prototype implementation of Mobile enforces properties expressed by finite-state security automata—one automaton for each security-relevant object, and can type-check Mobile programs in the presence of exceptions, finalizers, concurrency, and non-termination. Executing Mobile programs requires no change to existing .NET virtual machine implementations, since Mobile programs consist of normal managed CIL code with extra typing annotations stored in .NET attributes.

Journal Articles

The following are peer-reviewed journal papers I've authored or co-authored.

  • Frederico Araujo, Gbadebo Ayoade, Khaled Al-Naami, Yang Gao, Kevin W. Hamlen, and Latifur Khan. Crook-sourced Intrusion Detection as a Service. Journal of Information Security and Applications (JISA), 61, September 2021. [BibTeX]

    Most conventional cyber defenses strive to reject detected attacks as quickly and decisively as possible; however, this instinctive approach has the disadvantage of depriving intrusion detection systems (IDSes) of learning experiences and threat data that might otherwise be gleaned from deeper interactions with adversaries. For IDS technology to improve, a next-generation cyber defense is proposed in which cyber attacks against legitimate services, adversarial interactions are selectively prolonged to maximize the defender's harvest of useful threat intelligence. Enhancing web services with deceptive attack-responses in this way is shown to be a powerful and practical strategy for improved detection, addressing several perennial challenges for machine learning-based IDS in the literature, including scarcity of training data, the high labeling burden for (semi-)supervised learning, encryption opacity, and concept differences between honeypot attacks and those against genuine services. By reconceptualizing software security patches as feature extraction engines, the approach conscripts attackers as free penetration testers, and coordinates multiple levels of the software stack to achieve fast, automatic, and accurate labeling of libe web streams.

    Prototype implementations are showcased for two feature set models to extract security-relevant network- and system-level features from cloud services hosting enterprise-grade web applications. The evaluation demonstrates that the extracted data can be fed back into a network-level IDS for exceptionally accurate, yet lightweight attack detection.

  • Fadi Yilmaz, Meera Sridhar, Abhinav Mohanty, Vasant Tendulkar, and Kevin W. Hamlen. A Security Analysis and Fine-grained Classification of Web-based Virtual Machine Vulnerabilities. Computers & Security, 105(2), June 2021. [BibTeX]

    Web-based virtual machines are one of the primary targets of attackers due to the number of design flaws they contain and the connectivity provided by the Web. The design and implementation of Inscription, the fist fully automated Adobe Flash binary code transformation system that can guard major Flash vulnerability categories without modifying vulnerable Flash VMs, is presented and evaluated. Inscription affords a means of mitigating the significant class of web attacks that target unpatched, legacy Flash VMs and their apps.

    This new enforcement capability is most effective when supplied with security policies that accurately characterize VM security vulnerabilities and their mitigations. Researchers and security engineers commonly depend on well-known, public vulnerability databases that document such vulnerabilities and provide details about each; but vulnerability information that is inconsistent, inaccurate, or vague hinders diagnosis of vulnerabilities residing in the implementations of web-based VMs, which is one of the crucial prerequisites of building generic, comprehensive security solutions for mitigating them. For example, a large percentage of disclosed vulnerabilities of the ActionScript VM (AVM), which executes Flash binaries, are vaguely classified as “Memory Corruption” or “Unspecified”. Deeper analysis of these vulnerabilities reveals that most can be more precisely classified as (1) use-after-free, (2) double-free, (3) integer overflow, (4) buffer overflow, or (5) heap overflow vulnerability sub-classes. To improve web vulnerability analysis and mitigation, a more thorough, comprehensive and accurate reclassification of web-based vulnerabilities is presented, in which “Memory Corruption” and “Unspecified” vulnerabilities are reclassified into one of these fine-grained vulnerability sub-classes.

  • Khaled Al-Naami, Amir El-Ghamry, Md Shihabul Islam, Latifur Khan, Bhavani M. Thuraisingham, Kevin W. Hamlen, Mohammed AlRahmawy, and Magdi Z. Rashad. BiMorphing: A Bi-directional Bursting Defense Against Fingerprinting Attacks. IEEE Transactions on Dependable and Secure Computing (TDSC), 18(2):505–517, March–April 2021. [BibTeX]

    Network traffic analysis has been increasingly used in various applications to either protect or threaten people, information, and systems. Website fingerprinting is a passive traffic analysis attack which threatens web navigation privacy. It is a set of techniques used to discover patterns from a sequence of network packets generated while a user accesses different websites. Internet users (such as online activists or ournalists) ay wish to hide their identity and online activity to protect their privacy. Typically, an anonymity network is utilized for this purpose. These anonymity networks such as Tor (The Onion Router) provide layers of data encryption which poses a challenge to the traffic analysis techniques. Although various defenses have been proposed to counteract this passive attack, they have been penetrated by new attacks that proved the ineffectiveness and/or impracticality of such defenses.

    In this work, we introduce a novel defense algorithm to counteract the website fingerprinting attacks. The proposed defense obfuscates original website traffic patterns through the use of double sampling and mathematical optimization techniques to deform packet sequences and destroy traffic flow dependency characteristics used by attackers to identify websites. We evaluate our defense against state-of-the-art studies and show its effectiveness with minimal overhead and zero-delay transmissio nto the real traffic.

  • Meera Sridhar, Mounica Chirva, Benjamin Ferrell, Dhiraj Karamchandani, and Kevin W. Hamlen. Flash in the Dark: Illuminating the Landscape of ActionScript Web Security Trends and Threats. Journal of Information Systems Security (JISSec), 13(2):59–96, December 2017. [BibTeX]

    As one of the foremost scripting languages of the World Wide Web, Adobe's ActionScript Flash platform now powers multimedia features for a significant percentage of all web sites. However, its popularity and complexity have also made it an attractive vehicle for myriad malware attacks over the past six years. Despite the perniciousness and severity of these threats, ActionScript has been significantly less studied in the scholarly literature than the other major web scripting language—JavaScript.

    To fill this void and stimulate future research, this paper presents a systematic study of Flash security threats and trends, including a finer-grained taxonomy of Flash software vulnerability classes, a detailed investigation of over 700 Common Vulnerability and Exposure (CVE) articles reported between 2008–2016, and an examination of the fundamental research challenges that distinguish Flash security from other web technologies. The results of these analyses provide researchers, web developers, and security analysts a better sense of this important attack space, and identify the need for stronger security practices and defenses for protecting users of these technologies.

  • Meera Sridhar, Richard Wartell, and Kevin W. Hamlen. Hippocratic Binary Instrumentation: First Do No Harm. Science of Computer Programming (SCP), Special Issue on Invariant Generation, 93(B):110–124, November 2014. [BibTeX]

    In-lined Reference Monitors (IRMs) cure binary software of security violations by instrumenting them with runtime security checks. Although over a decade of research has firmly established the power and versatility of the in-lining approach to security, its widespread adoption by industry remains impeded by concerns that in-lining may corrupt or otherwise harm intended, safe behaviors of the software it protects. Practitioners with such concerns are unwilling to adopt the technology despite its security benefits for fear that some software may break in ways that are hard to diagnose.

    This article shows how recent approaches for machine-verifying the policy-compliance (soundness) of IRMs can be extended to also formally verify IRM preservation of policy-compliant behaviors (transparency). Feasibility of the approach is demonstrated through a transparency-checker implementation for Adobe ActionScript bytecode. The framework is applied to enforce security policies for Adobe Flash web advertisements and automatically verify that their policy-compliant behaviors are preserved.

  • Safwan M. Khan and Kevin W. Hamlen. Penny: Secure, Decentralized Data Management. International Journal of Network Security (IJNS), 16(5):340–354, September 2014. [BibTeX]

    Structured, peer-to-peer (P2P) networks scalably support efficient data-retrieval operations over large data sets without resorting to centralized servers that may be vulnerable to attack or failure. However, data confidentiality policies, such as data ownership privacy, are difficult to enforce reliably in these networks. This paper presents Penny, a structured P2P network that efficiently supports integrity and confidentiality labeling of data, and enforces a notion of ownership privacy that permits peers to publish data without revealing their ownership of the data. A decentralized reputation management system allows the network to respond and adapt to malicious peers. These features are applied to securely manage traditional files and large RDF data sets stored as N-triples in a P2P environment. Simulations demonstrate that Penny can efficiently handle realistic P2P network traffic and robust enough to prevent malicious collectives from subverting security labels.

  • Pallabi Parveen, Nathan McDaniel, Zackary Weger, Jonathan Evans, Bhavani Thuraisingham, Kevin Hamlen, and Latifur Khan. Evolving Insider Threat Detection Stream Mining Perspective. International Journal on Artificial Intelligence Tools (IJAIT), 22(5):1360013, October 2013. [BibTeX]

    Evidence of malicious insider activity is often buried within large data streams, such as system logs accumulated over months or years. Ensemble-based stream mining leverages multiple classification models to achieve highly accurate anomaly detection in such streams, even when the stream is unbounded, evolving, and unlabeled. This makes the approach effective for identifying insider threats who attempt to conceal their activities by varying their behaviors over time. This paper applies ensemble-based stream mining, supervised and unsupervised learning, and graph-based anomaly detection to the problem of insider threat detection. It demonstrates that the ensemble-based approach outperforms unsupervised learning, and increasing the cost of false positives correlates to higher accuracy. Future work will consider a wider range of tunable parameters in an effort to further reduce false positives, include a more sophisticated polling algorithm for weighting better models, and implement parallelization to lower runtimes to more rapidly detect emerging insider threats.

  • Kevin W. Hamlen and Bhavani M. Thuraisingham. Data Security Services, Solutions and Standards for Outsourcing. Computer Standards & Interfaces, 35(1):1–5, January 2013. [BibTeX] [impact factor: 1.257]

    Globalization has resulted in outsourcing data, software, hardware and various services. However, outsourcing introduces new security vulnerabilities due to the corporation's limited knowledge and control of external providers operating in foreign countries. Security of operation is therefore critical for effectively introducing and maintaining these business relationships without sacrificing product quality. This paper discusses some of these security concerns for outsourcing. In particular, it discusses security issues pertaining to data-as-a-service and software-as-a-service models as well as supply chain security issues. Relevant standards for data outsourcing are also presented. The goal is for the composite system to be secure even if the individual components that are developed by multiple organizations might be compromised.

  • Bhavani Thuraisingham, Tahseen Al-Khatib, Latifur Khan, Mehedy Masud, Kevin Hamlen, Vaibhav Khadilkar, and Satyen Abrol. Design and Implementation of a Data Mining System for Malware Detection. Journal of Integrated Design & Process Science (JIDPS), 16(2):33–49, 2012. [BibTeX]

    This paper describes the design and implementation of a data mining system called SNODMAL (Stream based NOvel class Detection for MALware) for malware detection. SNODMAL extends our data mining system called SNOD (Stream-based NOvel class Detection) for detecting malware. SNOD is a powerful system as it can detect novel classes. We also describe the design of SNODMAL++, which is an extended version of SNODMAL.

  • Kevin W. Hamlen, Lalana Kagal, and Murat Kantarcioglu. Policy Enforcement Framework for Cloud Data Management. IEEE Data Engineering Bulletin (DEB), Special Issue on Security and Privacy in Cloud Computing, 35(4):39–45, December 2012. [BibTeX]

    Cloud computing is a major emerging technology that is significantly changing industrial computing paradigms and business practices. However, security and privacy concerns have arisen as obstacles to widespread adoption of clouds by users. While much cloud security research focuses on enforcing standard access control policies typical of centralized systems, such policies often prove inadequate for the highly distributed, heterogeneous, data-diverse, and dynamic computing environment of clouds. To adequately pave the way for robust, secure cloud computing, future cloud infrastructures must consider richer, semantics-aware policies; more flexible, distributed enforcement strategies; and feedback mechanisms that provide evidence of enforcement to the users whose data integrity and confidentiality is at stake. In this paper, we propose a framework that supports such policies, including rule- and context-based access control and privacy preservation, through the use of in-lined reference monitors and a trusted application programming interface that affords enforceable policy management over heterogeneous cloud data.

  • William Hamlen and Kevin Hamlen. An Interactive Computer Model of Two-Country Trade. International Review of Economics Education (IREE), 11(2):91–101, November 2012. [BibTeX]

    We introduce an interactive computer model of two-country trade that allows students to investigate the consequences of changing economic parameters. The model is self-contained and makes no assumption concerning the existence of social welfare functions or social indifference curves. The factors of production earn incomes that lead to the demand for two goods. Students can see who are the winners and losers when going from a closed economy to an open economy. The students are able to predict the consequences and then obtain immediate feedback.

  • Kevin W. Hamlen and William Hamlen. An Economic Perspective of Message-dropping Attacks in Peer-to-peer Overlays. Security Informatics, 1:6, March 2012. [BibTeX]

    Peer-to-peer networks have many advantageous security properties, including decentralization, natural load-balancing, and data replication. However, one disadvantage of decentralization is its exclusion of any central authority who can detect and evict malicious peers from the network. It is therefore relatively easy to sustain distributed denial-of-service attacks against these networks; malicious peers simply join the network and fail to forward messages.

    This article shows that peer-to-peer message-dropping attacks can be understood in terms of a well-established category of economic theory: the theory of the second best. In particular, peers who wish to continue service during an attack seek a second best solution to a utility optimization problem. This insight reveals useful connections between economic literature on the second best and computer science literature on peer-to-peer security. To illustrate, we derive and test an economics-inspired modification to the Chord peer-to-peer routing protocol that improves network reliability during message-dropping attacks. Under simulation, networks using the modified protocol achieve a 50% increase in message deliveries for certain realistic attack scenarios.

  • Kevin W. Hamlen and Bhavani Thuraisingham. Secure Semantic Computing. International Journal of Semantic Computing (IJSC), 5(2):121–131, June 2011. [BibTeX]

    This paper explores the integration of semantic computing technologies with security technologies. Past and current research on the application of semantic web technologies for policy management and inference control, the application of data mining technologies for intrusion and malware detection, and programming language-based approaches to mobile code certification and data confidentiality enforcement are discussed.

  • Mohammad M. Masud, Tahseen M. Al-Khateeb, Kevin W. Hamlen, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. Cloud-based Malware Detection for Evolving Data Streams. ACM Transactions on Management Information Systems (TMIS), 2(3), October 2011. [BibTeX]

    Data stream classification for intrusion detection poses at least three major challenges. First, these data streams are typically infinite-length, making traditional multipass learning algorithms inapplicable. Second, they exhibit significant concept-drift as attackers react and adapt to defenses. Third, for data streams that do not have any fixed feature set, such as text streams, an additional feature extraction and selection task must be performed. If the number of candidate features is too large, then traditional feature extraction techniques fail.

    In order to address the first two challenges, this article proposes a multipartition, multichunk ensemble classifier in which a collection of v classifiers is trained from r consecutive data chunks using v-fold partitioning of the data, yielding an ensemble of such classifiers. This multipartition, multichunk ensemble technique significantly reduces classification error compared to existing single-partition, single-chunk ensemble approaches, wherein a single data chunk is used to train each classifier. To address the third challenge, a feature extraction and selection technique is proposed for data streams that do not have any fixed feature set. The technique's scalability is demonstrated through an implementation for the Hadoop MapReduce cloud computing architecture. Both theoretical and empirical evidence demonstrate its effectiveness over other state-of-the-art stream classification techniques on synthetic data, real botnet traffic, and malicious executables.

  • Mohammad Mehedy Masud, Clay Woolam, Jing Gao, Latifur Khan, Jiawei Han, Kevin W. Hamlen, and Nikunj C. Oza. Facing the Reality of Data Stream Classification: Coping with Scarcity of Labeled Data. Knowledge and Information Systems (KAIS), 1–32, November 2011. [BibTeX] [impact factor: 2.008]

    Recent approaches for classifying data streams are mostly based on supervised learning algorithms, which can only be trained with labeled data. Manual labeling of data is both costly and time consuming. Therefore, in a real streaming environment where large volumes of data appear at a high speed, only a small fraction of the data can be labeled. Thus, only a limited number of instances will be available for training and updating the classification models, leading to poorly trained classifiers. We apply a novel technique to overcome this problem by utilizing both unlabeled and labeled instances to train and update the classification model. Each classification model is built as a collection of micro-clusters using semi-supervised clustering, and an ensemble of these models is used to classify unlabeled data. Empirical evaluation of both synthetic and real data reveals that our approach outperforms state-of-the-art stream classification algorithms that use ten times more labeled data than our approach.

  • Kevin W. Hamlen, Murat Kantarcioglu, Latifur Khan, and Bhavani Thuraisingham. Security Issues for Cloud Computing. International Journal of Information Security and Privacy (IJISP), 4(2):39–51, April–June 2010. [BibTeX] [impact factor: 1.094]

    This paper discusses security issues for cloud computing and presents a layered framework for secure clouds, focusing on two of the layers: the storage layer and the data layer. In particular, it discusses a scheme for secure third-party publications of documents in a cloud. Next, it examines secure federated query processing with MapReduce and Hadoop, and discusses the use of secure co-processors for cloud computing. Finally, it discusses XACML implementation for Hadoop and argues that building trusted applications from untrusted components will be a major aspect of secure cloud computing.

  • Kevin W. Hamlen, Vishwath Mohan, Mohammad M. Masud, Latifur Khan, and Bhavani Thuraisingham. Exploiting an Antivirus Interface. Computer Standards & Interfaces, 31(6):1182–1189, April 2009. [BibTeX] [impact factor: 1.257]

    We propose a technique for defeating signature-based malware detectors by exploiting information disclosed by antivirus interfaces. This information is leveraged to reverse engineer relevant details of the detector's underlying signature database, revealing binary obfuscations that suffice to conceal malware from the detector. Experiments with real malware and antivirus interfaces on Windows operating systems justifies the effectiveness of our approach.

  • William Hamlen and Kevin W. Hamlen. A Closed System of Production Possibility and Social Welfare. Computers in Higher Education Economics Review (CHEER), 18(1):15–18, December 2006. [BibTeX]

    We offer a closed system production possibility and social welfare system that can be modeled using virtually any available software package. It has the attribute that social welfare is not independent of production possibilities. The closure is made using the famous result by Negishi (1960) for a purely competitive economy. The goal is to help students to understand the interaction, through experimentation, between production and social choice in a competitive economy.

Unrefereed Articles, Chapters, Technical Reports, and Theses

  • Richard Wartell, Yan Zhou, Kevin W. Hamlen, and Murat Kantarcioglu. Shingled Graph Disassembly: Finding the Undecidable Path. Technical Report UTDCS-12-13, Computer Science Department, The University of Texas at Dallas, Richardson, Texas, June 2013. [BibTeX]

    A probabilistic finite state machine approach to statically disassembling x86 machine language programs is presented and evaluated. Static disassembly is a crucial prerequisite for software reverse engineering, and has many applications in computer security and binary analysis. The general problem is provably undecidable because of the heavy use of unaligned instruction encodings and dynamically computed control flows in the x86 architecture. Limited work in machine learning and data mining has been undertaken on this subject. This paper shows that semantic meanings of opcode sequences can be leveraged to infer similarities between groups of opcode and operand sequences. This empowers a probabilistic finite state machine to learn statistically significant opcode and operand sequences in a training corpus of disassemblies. The similarities demonstrate the statistical significance of opcodes and operands in a surrounding context, facilitating more accurate disassembly of new binaries. Empirical results demonstrate that the algorithm is more efficient and effective than comparable approaches used by state-of-the-art disassembly tools.

  • Meera Sridhar, Richard Wartell, and Kevin W. Hamlen. Hippocratic Binary Instrumentation: First Do No Harm (Extended Version). Technical Report UTDCS-03-13, Computer Science Department, The University of Texas at Dallas, Richardson, Texas, February 2013. [BibTeX]

    In-lined Reference Monitors (IRMs) cure binary software of security violations by instrumenting them with runtime security checks. Although over a decade of research has firmly established the power and versatility of the in-lining approach to security, its widespread adoption by industry remains impeded by concerns that in-lining may corrupt or otherwise harm intended, safe behaviors of the software it protects. Practitioners with such concerns are unwilling to adopt the technology despite its security benefits for fear that some software may break in ways that are hard to diagnose.

    This paper shows how recent approaches for machine-verifying the policy-compliance (soundness) of IRMs can be extended to also formally verify IRM preservation of policy-compliant behaviors (transparency). Feasibility of the approach is demonstrated through a transparency-checker implementation for Adobe ActionScript bytecode. The framework is applied to enforce security policies for Adobe Flash web advertisements and automatically verify that their policy-compliant behaviors are preserved.

  • Kevin W. Hamlen. When Gothic Horror Gives Birth to Malware. In the 2013 Cisco Annual Security Report, Cisco Systems, pp. 46–47, January 2013. [BibTeX]

    Malware Camouflage is an emerging threat that security professionals may increasingly face. While most malware already uses simple mutation or obfuscation to diversify and make itself harder to reverse-engineer, self-camouflaging malware is even stealthier, blending in with the specific software already present on each system it infects. This can elude defenses that look for software anomalies like runtime unpacking or encrypted code, which often expose more conventional malware. [Read More]

  • Vishwath Mohan and Kevin W. Hamlen. Frankenstein: A Tale of Horror and Logic Programming. ALP Newsletter Digest, 25(4), December 2012. [BibTeX]

    Frankenstein is a new, more stealthy malware propagation system that evades feature-based detection through camouflage rather than mere diversity. Rather than mutating purely randomly as it propagates, it stitches together instruction sequences harvested from programs that have already been classified as benign by local defenses. The resulting mutants are each unique, yet fully composed of benign-looking code. This makes it hard for feature-based malware detectors to find a signature that reliably identifies all variants.

    Frankenstein relies on concepts from constraint logic programming to correctly and quickly identify potential instruction sequence orderings that produce behavior semantically equivalent to the original malware. This article presents the context for Frankenstein's development, and explains how logic programming became the tool of choice for crafting a next-generation cyber weapon.

  • Bhavani Thuraisingham, Balakrishnan Prabhakaran, Latifur Khan, and Kevin W. Hamlen. A Database Inference Controller for 3D Motion Capture Databases. Unpublished manuscript, 2012. [BibTeX]

    This article presents a strategy for restructuring private human motion capture data to enforce access and inference controls within a relational database management system. Human 3D motion capture data is an important part of electronic health records for patients with motion-related diseases and symptoms. There are significant privacy concerns regarding the safe storage and dissemination of such data. Access controls traditionally applied to other forms of medical data (e.g., textual data) are not well suited to motion captures, which contain large quantities of data with complex interdependencies that divulge privacy-violating inferences. Encoding such motion data effectively within a relational database brings the large body of relational database security research to bear on this important problem.

  • Bhavani Thuraisingham, Vaibhav Khadilkar, Jyothsna Rachapalli, Tyrone Cadenhead, Murat Kantarcioglu, Kevin Hamlen, Latifur Khan, and Farhan Husain. Towards the Design and Implementation of a Cloud-centric Assured Information Sharing System. Technical Report UTDCS-27-11, Computer Science Department, The University of Texas at Dallas, Richardson, Texas, September 2011. [BibTeX]

    In this paper we describe the design and implementation of cloud-based assured information sharing systems. In particular, we will describe our current implementation of a centralized cloud-based assured information sharing system and the design of a decentralized hybrid cloud-based assured information sharing system of the future. Our goal is for coalition organizations to share information stored in multiple clouds and enforce appropriate policies.

  • Kevin W. Hamlen, Micah M. Jones, and Meera Sridhar. Chekov: Aspect-oriented Runtime Monitor Certification via Model-checking (Extended Version). Technical Report UTDCS-16-11, Computer Science Department, The University of Texas at Dallas, Richardson, Texas, May 2011. [BibTeX]

    In-lining runtime monitors into untrusted binary programs via aspect-weaving is an increasingly popular technique for efficiently and flexibly securing untrusted mobile code. However, the complexity of the monitor implementation and in-lining process in these frameworks can lead to vulnerabilities and low assurance for code-consumers. This paper presents a machine-verification technique for aspect-oriented in-lined reference monitors based on abstract interpretation and model-checking. Rather than relying upon trusted advice, the system verifies semantic properties expressed in a purely declarative policy specification language. Experiments on a variety of real-world policies and Java applications demonstrate that the approach is practical and effective.

  • Kevin W. Hamlen, Vishwath Mohan, and Richard Wartell. Reining In Windows API Abuses with In-lined Reference Monitors. Technical Report UTDCS-18-10, Computer Science Department, The University of Texas at Dallas, Richardson, Texas, June 2010. [BibTeX]

    Malware attacks typically effect damage by abusing operating system resources (e.g., the file system) that are exposed via system API calls and their arguments is presented and evaluated. Unlike traditional monitoring approaches, the framework requires no modification of the operating system, has no effect upon trusted processes, and preserves the behavior of most complex x86 native code binaries generated by mainstream compilers, including binaries that are object-oriented, graphical, contain callbacks, and use a mixture of static and dynamic linking. A separate verifier certifies that rewritten binaries cannot circumvent the monitor at runtime, allowing the binary rewriter to remain untrusted. An implementation for Microsoft Windows demonstrates that the technique is effective and practical for real-world systems and architectures.

  • Kevin W. Hamlen. Security Policy Enforcement by Automated Program-rewriting. PhD Thesis (Advisors: Greg Morrisett and Fred B. Schneider), Cornell University, Ithaca, New York, August 2006. [BibTeX]

    Traditional approaches to protecting computer systems from malicious or other misbehaved code typically involve (1) monitoring code for unacceptable behavior as it runs, or (2) detecting potentially misbehaved code and preventing it from running at all. These approaches are effective when unacceptable behavior can be detected in time to take remedial action, but in many settings and for many important security policies this is computationally expensive or provably impossible.

    A third approach, termed in this dissertation program-rewriting, involves automatically rewriting code prior to running it in such a way that acceptable behavior is preserved but unacceptable behavior is not. Rewritten code can be run without further analysis or monitoring because it is guaranteed to exhibit only acceptable behavior. Program-rewriting has received recent attention in the literature in the form of in-lined reference monitors, which implement approach 1 above by in-lining security checks directly into the code being monitored. Program-rewriting generalizes in-lined reference monitoring, encompassing many other strategies for automatically rewriting programs as well.

    This dissertation provides a formal characterization of the class of security policies enforceable by program-rewriting and shows that it is strictly greater than the classes of policies enforceable by all other known approaches combined. The dissertation also presents the design and implementation of a certified program-rewriting system for the .NET Common Language Infrastructure, showing that program-rewriters can be developed for real architectures. The extra step of certification provides a formal proof that the program-rewriter produces code that satisfies the security policy, resulting in additional guarantees that the implementation is correct, and higher levels of assurance.

  • Kevin W. Hamlen, Greg Morrisett, and Fred B. Schneider. Certified In-lined Reference Monitoring on .NET. Technical Report TR-2005-2003, Computer Science Department, Cornell University, Ithaca, New York, November 2005.

    Mobile is an extension of the .NET Common Intermediate Language that supports certified In-lined Reference Monitoring. Mobile programs have the useful property that if they are well-typed with respect to a declared security policy, then they are guaranteed not to violate that security policy when executed. Thus, when an In-lined Reference Monitor (IRM) is expressed in Mobile, it can be certified by a simple type-checker to eliminate the need to trust the producer of the IRM.

    Security policies in Mobile are declarative, can involve unbounded collections of objects allocated at runtime, and can regard infinite-length histories of security events exhibited by those objects. Our prototype implementation of Mobile enforces properties expressed by finite-state security automata—one automaton for each security-relevant object, and can type-check Mobile programs in the presence of exceptions, finalizers, concurrency, and non-termination. Executing Mobile programs requires no change to existing .NET virtual machine implementations, since Mobile programs consist of normal managed CIL code with extra typing annotations stored in .NET attributes.

  • Kevin W. Hamlen. Proof-Carrying Code for x86 Architectures. Undergraduate Senior Honors Thesis (Advisors: Peter Lee and George C. Necula), Carnegie Mellon, Pittsburgh, Pennsylvania, May 1998. [BibTeX]

    This paper presents an extension of Necula and Lee's Proof-Carrying Code (PCC) system to support the x86 architecture. PCC is a security scheme which allows the safe execution of untrusted code. Untrusted code to be executed is required to be coupled with a proof that the code satisfies certain safety properties. This code-proof pair is statically checked by the client system prior to execution. If the check succeeds, then the code is deemed “safe” and is accepted and executed by the client. This x86 adaptation rejects all programs which could potentially terminate with an unhandled exception or memory fault. It accepts most x86 programs which satisfy the standard conventions of the architecture, but conservatively rejects some safe programs. Most reasonable programs can be translated into a form which will be accepted.

Supervised Student Dissertations and Theses

The following dissertations and theses were completed under my supervision.

  • Jun Duan. Securing Computations with GPUs. Ph.D. Thesis (Advisor: Kevin W. Hamlen), The University of Texas at Dallas, Richardson, Texas, May 2021. [BibTeX]

    Many modern computers have two diverse computing models baked-in: CPUs and GPUs. The diversity of these computing models arises from their historically disparate purposes: Early CPU designs focused on rapidly executing sequences of scalar operations, whereas GPU designs have focused on computing matrices (e.g., of pixels) through highly parallel computation. This dissertation proposes that this natural computational diversity has a secondary benefit that has gone underutilized: the potential to harden computations on both CPUs and GPUs against cyberattacks, and to elevate assurance of computational correctness, safety, and security through simultaneous CPU-GPU computations.

    To explore this thesis, the dissertation first addresses the challenge of bringing machine-checked formal methods assurances and tools to the domain of GPU architectures and computations. A prototype framework for formal, machine-checked validation of GPU pseudo-assembly code algorithms using the Coq proof assistant is therefore presented and discussed. The framework is the first to afford GPU programmers a reliable means of formally machine-validating high-assurance GPU computations without trusting any specific source-to-assembly compilation toolchain. A formal operational semantics for the PTX pseudo-assembly language is expressed as inductive, dependent Coq types, facilitating development of proofs and proof tactics that refer directly to compiled PTX object code.

    Secondly, a method of detecting malicious intrusions and runtime faults in software is proposed, which replicates untrusted computations onto the diverse but often co-located instruction architectures implemented by CPUs and GPUs. Divergence between the replicated computations signals an intrusion or fault. A prototype implementation for Java demonstrates that the approach is realizable in practice, and can successfully detect exploitation of Java VM and runtime system vulnerabilities even when the vulnerabilities are not known in advance to defenders. It is shown that GPU parallelism can be leveraged to rapidly validate CPU computations that would otherwise exhibit unacceptable performance if executed on GPU alone.

    Finally, GPU-based security is explored at the application level through the introduction of a computational methodology for reorienting, repositioning, and merging camera positions within a region under surveillance, so as to optimally cover all features of interest without overburdening human or machine analysts with an overabundance of video information. This streamlines many video monitoring applications, which are often hampered by the difficulty of manually identifying the few specific locations or frames relevant to a particular inquiry from a vast sea of scenes or recorded video clips. The approach ameliorates this problem by considering geographic information to select ideal locations and orientations of camera positions to fully cover the span without losing key visual details.

  • Masoud Ghaffarinia. Automated Binary Software Attack Surface Reduction. Ph.D. Thesis (Advisor: Kevin W. Hamlen), The University of Texas at Dallas, Richardson, Texas, May 2020. [BibTeX]

    Automated removal of potentially exploitable, abusable, or unwanted code features from binary software is potentially valuable in scenarios where security-sensitive organizations wish to employ general-purpose, closed-source, commercial software in specialized computing contexts where some of the software's functionalities are unneeded.

    This dissertation proposes binary control-flow trimming, a new method of automatically reducing the attack surfaces of binary software, affording code consumers the power to remove feature that are unwanted or unused in a particular deployment context. The approach targets stripped binary native code with no source-derived metadata or symbols, can remove semantic features irrespective of whether they were intended and/or known to code developers, and anticipates consumers who can demonstrate desired features (e.g., via unit testing), but who may not know the existence of specific unwanted features, and who lack any formal specifications of the code's semantics.

    Through a combination of runtime tracing, machine learning, in-lined reference monitoring, and contextual control-flow integrity enforcement, it is demonstrated that automated code feature removal is nevertheless feasible under these constraints, even for complex programs such as compilers and servers. The approach additionally accommodates consumers whose demonstration of desired features is incomplete; a tunable entropy-based metric detects coverage lapses and conservatively preserves unexercised but probably desired flows. A prototype implementation for Intel x86-64 exhibits low runtime overhead for trimmed binaries (about 1.87%), and case studies show that consumer-side control-flow trimming can successfully eliminate zero-day vulnerabilities.

    Binary control-flow trimming relies foundationally upon control-flow integrity (CFI) enforcement, which has become a mainstay of protecting certain classes of software from code-reuse attacks. Using CFI to enforce the highly complex, context-sensitive security policies needed for feature removal requires a detailed analysis of CFI's compatibility with large, binary software products. However, prior analyses of CFI in the literature have primarily focused on assessing CFI's security weaknesses and performance characteristics, not its ability to preserve intended program functionalities (semantic transparency) of large classes of diverse, mainstream software products. This is in part because although there exist many performance and security benchmarking suites, there remains no standard regimen for assessing compatibility. Researchers must often therefore resort to anecdotal assessments, consisting of tests on homogeneous software collections with limited variety (e.g., GNU Coreutils), or on CPU benchmarks (e.g., SPEC) whose limited code features are not representative of large, mainstream software products.

    To fill this void, this dissertation presents ConFIRM (CONtrol-Flow Integrity Relevance Metrics), a new evaluation methodology and microbenchmarking suite for assessing compatibility, applicability, and relevance of CFI protections for preserving the intended semantics of software while protecting it from abuse. Reevaluation of CFI solutions using ConFIRM reveals that there remain significant unsolved challenges in securing many large classes of software products with CFI, including software for market-dominant OSes (e.g., Windows) and code employing certain ubiquitous coding idioms (e.g., event-driven callbacks and exceptions). An estimated 47% of CFI-relevant code features with high compatibility impact remain incompletely supported by existing CFI algorithms, or receive weakened controls that leave prevalent threats unaddressed (e.g., return-oriented programming attacks). Discussion of these open problems highlights issues that future research must address to bridge these important gaps between CFI theory and practice.

  • Xiaoyang Xu. Native Software Security Hardening in the Real World: Compatibility, Modularity, Expressiveness, and Performance. Ph.D. Thesis (Advisors: Kevin W. Hamlen), The University of Texas at Dallas, Richardson, Texas, May 2020. [BibTeX]

    This dissertation presents a series of new technologies that significantly bridge the gap between theory and practice of software hijacking defenses based on control-flow integrity (CFI) and in-lined reference monitors (IRMs). CFI has emerged over the past 15 years as one of the strongest known defenses against code-reuse attacks, which are among the top threats to modern software ecosystems. Such attacks wrest control of critical software systems away from lawful users into the hands of adversaries by reusing or repurposing legitimate code blocks for malicious purposes. CFI offers provably strong protections against code-reuse attacks by confining vulnerable software to a strict security policy that constrains its flow of control to paths chosen in advance by developers and legitimate users.

    Research over the past decade has increased the power and performance of CFI defenses; however, effectively applying many of the strongest CFI algorithms to large, production-level software products has remained difficult and challenging. To expose the root causes of these difficulties, this dissertation presents a new evaluation methodology and microbenchmarking suite, ConFIRM, that is designed to measure applicability, compatibility, and performance characteristics relevant to CFI algorithm evaluation. It provides a set of 20 tests of various CFI-relevant code features and coding idioms (e.g., event-driven callbacks and exceptions), which are widely found in commodity COTS software products and constitute the greatest barriers to more widespread CFI adoption.

    To overcome a significant lass of fundamental challenges identified by ConFIRM, the dissertation then presents object flow integrity (OFI), which is the first source-agnostic CFI system that augments CFI protections with secure, first-class support for binary object exchange across inter-module trust boundaries. A prototype implementation for Microsoft Component Object Model (COM) demonstrates that OFI scales to component-based, event-driven consumer software with low overheads of under 1%. The approach is demonstrated in practice through an interface-driven approach that is the first to secure full COTS, GUI-driven Windows products with CFI without needing the application source code.

    Finally, the IRM technology underlying CFI is shown to be effective in web domains for enforcing safety policies by injecting runtime security guards into binary web scripts. In particular, a method of detecting and interrupting unauthorized, browser-based cryptomining is proposed, based on semantic signature-matching. The approach addresses a new wave of cryptojacking attacks, including XSS-assisted, web gadget-exploiting, counterfeit mining. Evaluation shows that the approach is more robust than current static code analysis defenses, which are susceptible to code obfuscation attacks.

  • Erick Bauman. Securing Binary Programs Without Perfect Disassembly. Ph.D. Thesis (Advisors: Zhiqiang Lin and Kevin W. Hamlen), The University of Texas at Dallas, Richardson, Texas, December 2019. [BibTeX]

    This dissertation introduces several approaches to improving the security, performance, and generality of binary software without source code. These source-free approaches have the advantage of being applicable to closed-source software products, which constitute the majority of software today. Many such products suffer from security vulnerabilities, performance issues in certain deployment contexts, or insufficient generality to address specialized consumer needs. Since such software is typically distributed in purely binary form, it is difficult to improve or repurpose its code via alternative approaches that analyze and modify source code. Unfortunately, binary code is much more difficult to analyze and modify than source code, due in part to the general undecidability of perfectly disassembling raw bytes to instruction sequences. Implementing critical fixes and improvements to mission-critical legacy software, or software from vendors unwilling or unable to implement functionality or security patches, therefore demands new methodologies for modifying binary code without the assistance of sources.

    Three systems are presented that address several of these challenges. The first system, SgxElide, increases the confidentiality of code in SGX binaries by encrypting SGX enclave contents without knowing the contents of the encrypted functions. SgxElide leverages a whitelist of all essential functions to encrypt all the bytes in an enclave other than those in essential functions for runtime enclave decryption. The second system, Multiverse, introduces the concept of superset disassembly, a disassembly algorithm that elegantly sidesteps the problem of obtaining accurate disassembly by obtaining a superset of all possible instruction sequences. This allows for the construction of a static binary rewriting framework that does not rely on any heuristics for binary rewriting. The third system, Miniverse, builds on the ideas of Multiverse to provide binary rewriting for dynamically generated code using superset disassembly. Finally, this dissertation provides an argument that superset disassembly is unsuitable for runtime re-randomization, and why its apparent benefits are actually counterproductive for re-randomization.

  • Gbadebo Gbadero Ayoade. Mitigating Cyberattack with Machine Learning-based Feature Space Transforms. Ph.D. Thesis (Advisors: Latifur Khan and Kevin W. Hamlen), The University of Texas at Dallas, Richardson, Texas, December 2019. [BibTeX]

    With the increase in attacks on software systems, there is a need for a new approach in software defense. In this work, we explore machine learning-based approaches for the detection and mitigation of attacks. A machine learning-based approach can help to learn patterns that can be used to detect future attacks. However, due to limited labeled training data in the cyber security domain, machine learning-based approaches perform poorly in attack detection. This work overcomes these challenges by leveraging two main methods. First, we leverage deception-based honey-patching to label the attack data that can then be used to train machine learning models to detect future attacks. The second method uses transfer learning method to adapt data from domains with sufficient labels to train our models and test on data from a different domain. By leveraging these methods, we can show an increase in the detection performance of machine learning-based models in cyber attack defense systems.

  • Wenhao Wang. Source-free, Component-driven Software Security Hardening. Ph.D. Thesis (Advisor: Kevin W. Hamlen), The University of Texas at Dallas, Richardson, Texas, May 2019. [BibTeX]

    Hardening COTS binary software products (e.g., via control-flow integrity (CFI) and/or software fault isolation (SFI) defenses) is extremely complex in contexts where the surrounding software environment includes closed-source, immutable, and possibly obfuscated binary components, such as system libraries, OS kernels, and virtualization layers. It is demonstrated that many code hardening algorithms, when applied only to the user-level software products in such environments, leave open critical vulnerabilities that arise from mismatches between the application-specific policies enforced at the application layer. Similar challenges also exist in web environments, which typically involve components of cross-language web scripts.

    This dissertation proposes the first Control Flow Integrity system to successfully harden multiple, large (millions of lines) binary Windows COTS software without sources. It implements a prototype for Microsoft COM (largest production component-based architecture in the world) with low overhead. Experiences developing and refining this approach for Microsoft Windows environments are reported and discussed.

    To evaluate and compare various CFI/SFI protections, the dissertation introduces ConFIRM, a new evaluation methodology and benchmarking suite aimed at better assessing compatibility, applicability, and relevance of CFI protections for preserving the intended semantics of real-world software while protecting it from abuse via hijacking. Reevaluation of CFI/SFI solutions using ConFIRM reveals that there remain significant unsolved challenges in securing many large classes of software products with CFI/SFI, including software for market-dominant OSes (e.g., Windows) and code employing certain ubiquitous coding idioms (e.g., event-driven callbacks and delay-loaded components).

    In addition, a method of detecting and interrupting unauthorized, browser-based cryptomining is proposed, based on semantic signature-matching. The approach addresses a new wave of cryptojacking attacks, including XSS-assisted, web gadgt-exploiting, counterfeit mining. Evaluation shows that the approach is more robust than current static code analysis defenses, which are susceptible to code obfuscation attacks. An implementation based on in-lined reference monitoring offers a browser-agnostic deployment strategy that is applicable to average end-user systems without specialized hardware or operating systems.

  • Khaled Mohammed Al-Naami. Enhancing Cybersecurity with Encrypted Traffic Fingerprinting. Ph.D. Thesis (Advisors: Latifur Khan and Kevin W. Hamlen), The University of Texas at Dallas, Richardson, Texas, December 2017. [BibTeX]

    Recently, network traffic analysis and cyber deception have been increasingly used in various applications to protect people, information, and systems from major cyber threats. Network traffic fingerprinting is a traffic analysis attack which threatens web navigation privacy. It is a set of techniques used to discover patterns from a sequence of network packets generated while a user accesses different websites. Internet users (such as online activist or journalists) may wish to hide their identity and online activity to protect their privacy. Typically, an anonymity network is utilized for this purpose. Those anonymity networks such as Tor (The Onion Router) provide layers of data encryption which poses a challenge to the traffic analysis techniques.

    Traffic fingerprinting studies have employed various traffic analysis and statistical techniques over anonymity networks. Most studies use a similar set of features including packet size, packet direction, total count of packets, and other summaries of different packets. Moreover, various defense mechanisms have been proposed to counteract these feature selection processes, thereby reducing prediction accuracy.

    In this dissertation, we address the aforementioned challenges and present a novel method to extract characteristics from encrypted traffic by utilizing data dependencies that occur over sequential transmissions of network packets. In addition, we explore the temporal nature of encrypted traffic and introduce an adaptive model that considers changes in data content over time. We not only consider traditional learning techniques for prediction, but also use semantic vector space models (VSMs) of language where each word (packet) is represented as a real-valued vector. We also introduce a novel defense algorithm to counter the traffic fingerprinting attack. The defense uses sampling and mathematical optimization techniques to morph packet sequences and destroy traffic flow dependency patterns.

    Cyber deception has ben shown to be a key ingredient in cyber warfare. Cyber security deception is the methodology followed by an organization to lure the adversary into a controlled and transparent environment for the purpose of protecting the organization, disinforming the attacker, and discovering zero-day threats. We extend our traffic fingerprinting work to the cyber deception domain and leverage recent advances in software deception to enhance Intrusion Detection Systems by feeding back attack traces into machine learning classifiers. We present a feature-rich attack classification approach to extract security-relevant network- and system-level characteristics from production servers hosting enterprise web applications.

  • Frederico Araujo. Engineering Cyber-deceptive Software. Ph.D. Thesis (Advisor: Kevin Hamlen), The University of Texas at Dallas, Richardson, Texas, August 2016. [BibTeX]

    Language-based software cyber deception leverages the science of compiler and programming language theory to equip software products with deceptive capabilities that misdirect and disinform attackers. This dissertation proposes language-based software cyber deception as a new discipline of study, and introduces five representative technologies in this domain: (1) honey-patching, (2) process image secret redaction, (3) deception-enhanced intrusion detection, (4) Deception as a Service (DaaS) in the cloud, and (5) moving target deception.

    Honey-patches fix software security vulnerabilities in a way that makes failed attacks appear successful to attackers, impeding them from discerning which probed systems are genuinely vulnerable and which are actually traps. The traps deceive, waylay, disinform, and monitor adversarial activities, warning defenders before attackers find exploitable victims. Process image secret redaction erases or replaces security-relevant data on-demand from running program processes in response to cyber intrusions. This results in deceptive programs that appear operational once penetrated, but with false secrets that misdirect intruders who hijack or reverse-engineer the victim process. Deception-enhanced intrusion detection joins honey-patching with the science of data mining-based intrusion detection to create more effective, adaptive threat monitors that coordinate multiple levels of the software stack to catch adversaries. Deception as a Service leverages the massive replication and virtualization capabilities of modern cloud computing architectures to create a "hall of mirrors" that attackers must navigate in order to distinguish valuable targets from traps. Finally, moving target deception employs software version emulation to more effectively lure adversaries and model evolving threat and vulnerability landscapes.

    A language-based approach to the design and implementation of each of these new technologies is presented and evaluated. Experiments indicate that software cyber deception can be effectively realized for large, production-level software networks and architectures—often with minimal developmental effort and performance overheads. Language-based cyber deception is therefore concluded to be a low-cost, high-reward, yet heretofore largely unexplored methodology for raising attacker risk and uncertainty, toward leveling the longstanding asymmetry between attackers and defenders in cyber warfare battlefields.

  • Vishwath R. Mohan. Source-free Binary Mutation for Offense and Defense. Ph.D. Thesis (Advisor: Kevin Hamlen), The University of Texas at Dallas, Richardson, Texas, December 2014. [BibTeX]

    The advent of advanced weaponized software over the past few years, including the Stuxnet, Duqu, and Flame viruses, is indicative of the seriousness with which advanced persistent threats (APTs) have begun to treat the cyber-realm as a potential theatre for offensive military action and espionage. This has coincided with a strong interest in creating malware obfuscations that hide their payloads for extended periods of time, even while under active search. Progress on this front threatens to render conventional software defenses obsolete, placing the world in dire need of more resilient software security solutions.

    This dissertation underlines the seriousness of this threat through the design and implementation of two novel, next-generation malware obfuscation technologies that bypass today's widely deployed defenses. Unlike conventional polymorphic malware, which mutates randomly in an effort to evade detection, the presented attacks are reactively adaptive in the sense that they intelligently surveil, analyze, and adapt their obfuscation strategies in the wild to understand and defeat rival defenses. The dissertation then presents three novel software defenses that offer strengthened software security against both current and future offensive threats. Rather than attempting to detect threats statically (i.e., before they execute), or introducing dynamic monitors that raise compatibility and performance penalties for consumers, the new defenses implement automated, source-free, binary software transformations that preemptively transform untrusted software into safe software. Experiments show that this security retrofitting approach offers higher performance, greater security, and more flexible deployment options relative to competing approaches. Thus, binary code transformation and mutation is realized as both a powerful offensive and a potent defensive paradigm for software attacks and defenses.

  • Meera Sridhar. Model-checking In-lined Reference Monitors. Ph.D. Thesis (Advisor: Kevin Hamlen), The University of Texas at Dallas, Richardson, Texas, August 2014. [BibTeX]

    The formidable growth of the cyber-threat landscape today is accompanied by an imperative need for providing high-assurance software solutions. In the last decade, binary instrumentation via In-lined Reference Monitoring (IRMs) has been firmly established as a powerful and versatile technology, providing superior security enforcement for many platforms. IRM frameworks rewrite untrusted binary code, inserting runtime checks to produce safe, self-monitoring code; IRMs are equipped with the ability to enforce a rich set of history-based policies, without requiring access to source code. These immense capabilities, however, come with a price—the power of most real-world IRM infrastructures is usually accompanied by both enormity and complexity, making them prone to implementation errors. In most scenarios this is highly undesirable; in a mission-critical system, this is unacceptable.

    Independent certification of two important properties of the IRM, namely soundness and transparency, is extremely important for minimizing the error-space and providing formal assurances for the IRM. Soundness demands that the instrumented code satisfy the policy, whereas transparency demands that the behavior of policy-compliant code is preserved over the instrumentation process. These Certifying In-lined Reference Monitors combine the power and flexibility of runtime monitoring with the strong formal guarantees of static analysis, creating automated, machine-verifiable, high-assurance systems.

    This dissertation demonstrates how the powerful software paradigm of model-checking can be utilized to produce simple, elegant verification algorithms for the special domain of IRM code, providing case-by-case verification of the instrumented code. The dissertation discusses the challenges and subsequent success of developing model-checking IRM frameworks for various platforms. An important result discussed is a new, more powerful class of web security technologies that can be deployed and used immediately, without the need to modify existing web browsers, servers, or web design platforms. Implementations demonstrating security enforcement on a variety of fairly large-scale, real-world Java and ActionScript bytecode applications are also discussed. Theoretical formalizations and proof methodologies are employed to provide string formal guarantees of the certifying in-lined reference monitoring systems.

  • Safwan Mahmud Khan. Decentralizing Trust: New Security Paradigms for Cloud Computing. Ph.D. Thesis (Advisor: Kevin Hamlen), The University of Texas at Dallas, Richardson, Texas, December 2013. [BibTeX]

    Data and computation integrity and security are major concerns for users of cloud computing facilities. Today's clouds typically place centralized, universal trust in all the cloud's nodes. This simplistic, full-trust model has the negative consequence of amplifying potential damage from node compromises, leaving such clouds vulnerable to myriad attacks.

    To address this weakness, this dissertation proposes and evaluates new paradigms for decentralizing cloud trust relationships for stronger cloud security. Five new paradigms are examined: (1) Hatman decentralizes trust through computation replication in clouds to ensure computation integrity. Its prototype implementation embodies the first full-scale, data-centric, reputation-based trust management system for Hadoop clouds. (2) AnonymousCloud decentralizes trust by decoupling private billing information from a cloud job's code and data. The resulting cloud conceals computation and data ownership information from nodes that compute using the data, thereby impeding malicious nodes from learning who owns these resources without disrupting the cloud's power to process and bill jobs. (3) Penny is a fully decentralized peer-to-peer structure that shifts trust away from traditional, centralized, cloud master nodes to an equal distribution of trust over all nodes. It supports integrity and confidentiality labeling of data, and enforces a notion of ownership privacy that permits peers to publish data without revealing their ownership of the data. (4) CloudCover decentralizes trust on the user side. It proposes a new form of Security as a Service (SECaaS) that allows untrusted, mostly serial computations in untrusted computing environments to be independently and efficiently validated by trusted, commodity clouds. Finally, (5) SilverLine automatically in-lines secure information flow tracking code into untrusted job binaries, facilitating enforcement of custom security policies without any change to the underlying cloud kernel, operating system, hypervisor, or file system implementations. This makes SilverLine exceptionally easy to deploy and maintain on rapidly evolving cloud frameworks, since the cloud and security enforcement implementations are completely separate and orthogonal.

    Experiments demonstrate that each paradigm is an effective strategy for realizing stronger security in cloud computing frameworks at modest overheads through reducing or shifting the trusted computing base.

  • Dhiraj V. Karamchandani. Surveying the Landscape of ActionScript Security Trends and Threats. Masters Thesis (Advisor: Kevin Hamlen), The University of Texas at Dallas, Richardson, Texas, December 2013. [BibTeX]

    As one of the foremost scripting languages of the World Wide Web, Adobe's ActionScript Flash platform now powers multimedia features for a significant percentage of all web sites. However, its popularity and complexity have also made it an attractive vehicle for myriad malware attacks over the past five years. Despite the perniciousness and severity of these threats, ActionScript has been relatively less studied in the scholarly security literature. To fill this void and stimulate future research, this thesis presents a systematic study of Flash security threats and trends, including an in-depth taxonomy of fifteen major Flash vulnerability and attack categories, a detailed investigation of 520 Common Vulnerability and Exposure (CVE) articles reported between 2008–2013, and an examination of what makes Flash security challenges unique. The results of these analyses provide researchers, web developers, and security analysts a better sense of this important attack space, and identify the need for stronger security practices and defenses for protecting users of these technologies.

  • Richard Wartell. Rewriting x86 Binaries Without Code Producer Cooperation. Ph.D. Thesis (Advisor: Kevin Hamlen), The University of Texas at Dallas, Richardson, Texas, December 2012. [BibTeX]

    Binary code from untrusted sources remains one of the primary vehicles for software propagation and malicious software attacks. All previous work to mitigate such attacks requires code-producer cooperation, has significant deployment issues, or incurs a high performance penalty. The problem of accurate static x86 disassembly without metadata is provably undecidable, and is regarded by many as uncircumventable.

    This dissertation presents a framework for x86 binary rewriting that requires no cooperation from code-producers in the form of source code or debugging symbols, requires no client-side support infrastructure (e.g., a virtual machine or hypervisor), and preserves the behavior of even complex, event-driven, x86 native COTS binaries generated by aggressively optimizing compilers. This makes it exceptionally easy to deploy. The framework is instantiated as two software security systems: Stir, a runtime basic block randomization rewriter for Return-oriented programming (ROP) attack mitigation, and Reins, a machine verifiable Software Fault Isolation (SFI) and security policy specification rewriter. Both systems exhibit extremely low performance overheads in experiments on real-world COTS software—1.6% and 2.4% respectively. The foundation of the system includes three novel approaches to static x86 disassembly, along with a method of statically proving transparency for rewriting systems.

  • Sunitha Ramanujam. Towards an Integrated Semantic Web: Interoperability Between Data Models. Ph.D. Thesis (Advisors: Latifur Khan and Kevin Hamlen), The University of Texas at Dallas, Richardson, Texas, December 2011. [BibTeX]

    The astronomical growth of the World Wide Web resulted in the emergence of data representation methodologies and standards such as the Resource Description Framework (RDF) that aim to enable rapid and automated access to data. The widespread deployment of RDF resulted in the emergence of a new data model paradigm, the RDF Graph Model. This, in turn, spawned an associated demand for RDF graph data modeling and visualization tools that ease the burden of data management off the administrators. However, while there is a large selection of such tools available for more established data models such as the relational data model, the assortment of tools for RDF stores are fewer in comparison as the RDF paradigm is a more recent development. This dissertation presents R2D (RDF-to-Database), a relational wrapper for RDF Data Stores, which aims to transform, at run-time, semi-structured RDF data into an equivalent, domain-specific, visual relational schema, thereby bridging the gap between RDF and RDBMS concepts and making exiting relational tools available to RDF Stores. R2D's transformation process involves two components—a schema mapping component that uses a declarative mapping language to achieve its objective, and a query transformation component that translates SQL queries into equivalent SPARQL queries. A semantics-preserving transformation methodology for R2D's components is presented with proofs that establish the fact that an SQL query, sql, run over the translated relational schema, ℝ, obtained from an RDF Graph 𝔾, through R2D's schema mapping process, returns the same result that an equivalent SPARQL query, spq, obtained by translating sql using R2D's query translation process, would when run on the original RDF graph 𝔾. Additionally, the dissertation also presents D2RQ++, an enhancement over an existing read-only relational schema-to-RDF mapping tool, D2RQ, that presents legacy data stored in relational databases as virtual RDF graphs. D2RQ++ enables bi-directional data flow by providing data manipulation facilities that permit triples to be inserted, updated, and/or deleted into appropriate tables in the underlying relational database. The primary R2D and D2RQ++ functionalities, high-level system architectures, performance graphs, and screen-shots that serve as evidence of the feasibility of our work are presented along with a semantic-preserving version of R2D's components and the relevant proofs of semantics-preservation.

  • Micah Jones. Declarative Aspect-oriented Security Policies for In-lined Reference Monitors. Ph.D. Thesis (Advisor: Kevin Hamlen), The University of Texas at Dallas, Richardson, Texas, December 2011. [BibTeX]

    Aspect-oriented in-lined reference monitor frameworks provide an elegant and increasingly popular means of enforcing security policies on untrusted code through automated rewriting. However, it is often difficult to prove that the resulting instrumented code is secure with respect to the intended policy, especially when that policy is described in terms of arbitrary imperative code insertions. The dissertation presents a fully declarative policy language, powerful enough to enforce real-world security policies yet simple enough for automated verification systems to correctly reason about them. It also presents an analysis tool that detects policy inconsistencies, a service-oriented framework for instrumenting untrusted code, and a full-scale abstract interpretation and model-checking system that verifies the safety of rewritten programs.

  • Aditi A. Patwardhan. Security-aware Program Visualization for Analyzing In-lined Reference Monitors. Master's Thesis (Advisors: Kevin Hamlen and Kendra Cooper), The University of Texas at Dallas, Richardson, Texas, June 2010. [BibTeX]

    In-lined Reference Monitoring frameworks are an emerging technology for enforcing security policies over untrusted, mobile, binary code. However, formulating correct policy specifications for such frameworks to enforce remains a daunting undertaking with few supporting tools. A visualization approach is proposed to aid in this task. In contrast to existing approaches, which typically involve tedious and error-prone manual inspection of complex binary code, the proposed framework provides automatically generated, security-aware visual models that follow the UML specification. This facilitates formulation and testing of prototype security policy specifications in a faster and more reliable manner than is possible with existing manual approaches.