Research

Ontology Alignment

The vision of the Semantic Web is that computers as well as humans will be able to leverage the information on the web. One important capability that would facilitate this goal is ontology alignment. An ontology is a representation of the concepts in a domain and how they relate to one another. Creating an ontology involves a lot of design decisions, which tend to be influenced by the designers’ backgrounds and the application they are targeting. The result is that two ontologies that represent the same domain will not necessarily be the same. The goal of ontology alignment is to determine when an entity in one ontology is semantically related to an entity in another ontology.

My dissertation research focused on this topic. I began by examining the performance of different string metrics when applied to ontology alignment. It is important to systematically analyze this because all current alignment systems use a string similarity metric in some capacity. Therefore, improvement in metric selection will improve the state of the art in the field generally. This work was published at ISWC. The work showed that choosing string similarity metrics carefully, based on characteristics of the particular ontologies to be aligned, can achieve very good performance on class alignment. Aligning properties is more challenging though, both for string metrics and full-featured alignment systems. I developed a string-based approach to property alignment as part of my dissertation that achieves much better performance.

Going forward, I plan to focus on co-reference resolution and finding more complex relationships between ontologies than 1-to-1 equivalences. My hope is to use a combination of natural language processing, statistical, and design pattern-based approaches to achieve this goal.

Thieblin, Elodie, Cheatham, Michelle, Trojahn, Cassia and Zamazal, Ondrej. A consensual dataset for complex ontology matching evaluation, The Knowledge Engineering Review, Jul 2020. (download)
Zhou, Lu, Thieblin, Elodie, Cheatham, Michelle, Faria, Daniel, Pesquita, Catia, Trojahn, Cassia and Zamazal, Ondrej. Towards Evaluating Complex Ontology Alignments, The Knowledge Engineering Review, Jul 2020. (download)
Zhou, Lu, Cheatham, Michelle, Krisnadhi, Adila and Hitzler, Pascal. GeoLink Dataset - A Complex Alignment Benchmark from Real-world Ontology, Data Intelligence Journal, Jul 2020. (download)
Cheatham, Michelle, Varanka, Dalia, Arauz, Fatima and Zhou, Lu. Alignment of Surface Water Ontologies - A comparison of manual and automated approaches, Journal of Geographical Systems, Sep 2019. (download)
Cheatham, Michelle, Pesquita, Catia, Oliveira, Daniela and McCurdy, Helena. The Properties of Property Alignment on the Semantic Web, International Journal of Metadata, Semantics and Ontologies, vol 13, issue 1, pp. 42-56, Nov 2018. (download)
Thieblin, Elodie, Cheatham, Michelle, Trojahn, Cassia, Zamazal, Ondrej and Zhou, Lu. The First Version of the OAEI Complex Alignment Benchmark, International Semantic Web Conference (ISWC 2018) Poster and Demo Session, Oct 2018. (download)
Zhou, Lu, Cheatham, Michelle, Krisnadhi, Adila and Hitzler, Pascal. A Complex Alignment Benchmark - GeoLink Dataset, International Semantic Web Conference (ISWC 2018), Oct 2018. (download)
Cheatham, Michelle and Pesquita, Catia. Semantic Data Integration, Springer Handbook on Big Data, Ed. Sakr, Sherif and Zomaya, Albert , Jan 2016. (download)
Achichi, Manel, Cheatham, Michelle, Dragisic, Zlatan, Euzenat, Jerome, Faria, Daniel, Ferrara, Alfio, Flouris, Giorgos, Fundulaki, Irini, Harrow, Ian, Ivanova, Valentina, Jimenez-Ruiz, Ernesto, Kuss, Elena, Lambrix, Patrick, Leopold, Henrik, Li, Huanyu, Meilicke, Christian, Montanelli, Stefano, Pesquita, Catia, Saveta, Tzanina, Shvaiko, Pavel, Splendiani, Andrea, Stuckenschmidt, Heiner, Todorov, Konstantin, Trojahn, Cassia and Zamazal, Ondrej. Results of the Ontology Alignment Evaluation Initiative 2016, Proceedings of the Tenth International Workshop on Ontology Matching (OM 2016), Kobe, Japan, Oct 2016. (download)
Amini, Reihaneh, Cheatham, Michelle, Grzebala, Pawel and McCurdy, Helena. Towards Best Practices for Crowdsourcing Ontology Alignment Benchmarks, Proceedings of the Eleventh International Workshop on Ontology Matching (OM 2016), Kobe, Japan, Oct 2016. (download)
Cheatham, Michelle, Amini, Reihaneh and Patel, Chandan. Matching Instances in GeoLink, Proceedings of the Eleventh International Workshop on Ontology Matching (OM 2016), Kobe, Japan, Oct 2016. (download)
Miracle, Jacob and Cheatham, Michelle. Semantic Web Enabled Record Linkage Attacks on Anonymized Data, Society, Privacy and the Semantic Web - Policy and Technology (PrivOn 2016), Kobe, Japan, Oct 2016. (download)
Cheatham, Michelle, Dragisic, Zlatan, Euzenat, Jerome, Faria, Daniel, Ferrara, Alfio, Flouris, Giorgos, Fundulaki, Irini, Granada, Roger, Ivanova, Valentina, Jimenez-Ruiz, Ernesto, Lambrix, Patrick, Montanelli, Stefano, Pesquita, Catia, Saveta, Tzanina, Shvaiko, Pavel, Solimando, Alessandro, Trojahn, Cassia and Zamazal, Ondrej. Results of the Ontology Alignment Evaluation Initiative 2015, Proceedings of the Ninth International Workshop on Ontology Matching (OM 2014), Bethlehem, PA, Oct 2015. (download)
Publishing Linked Open Data (tutorial), International Conference on Collaboration Technologies and Systems, Atlanta, GA, Jun 2015.
A Pattern-based Approach to Ontology Alignment, University of Milan-Bicocca, Milan, Italy, Oct 2014.
Cheatham, Michelle and Hitzler, Pascal. Conference v2 - An uncertain version of the OAEI Conference benchmark, Proceedings of the 13th International Semantic Web Conference (ISWC), Riva del Garda, Trentino, Italy, Oct 2014. (download)
Cheatham, Michelle and Hitzler, Pascal. The Properties of Property Alignment, Proceedings of the Ninth International Workshop on Ontology Matching (OM 2014), Riva del Garda, Trentino, Italy, Oct 2014. (download)
Cheatham, Michelle. The Properties of Property Alignment on the Semantic Web, PhD dissertation, Wright State University, Aug 2014. (download)
Cheatham, Michelle and Hitzler, Pascal. String Similarity Metrics for Ontology Alignment, Proceedings of the 12th International Semantic Web Conference (ISWC), Sydney, NSW, Australia, pp. 294-309, Oct 2013. (download)
Cheatham, Michelle and Hitzler, Pascal. StringsAuto and MapSSS Results for OAEI 2013, Proceedings of the 8th International Workshop on Ontology Matching, Sydney, NSW, Australia, pp. 146-152, Oct 2013. (download)
Cheatham, Michelle. MapSSS Results for OAEI 2011, Proceedings of the 6th International Workshop on Ontology Matching, Bonn, Germany, Oct 2011. (download)
Cheatham, Michelle. Targeted Ontology Mapping, Proceedings of the 2010 International Symposium on Collaborative Technologies and Systems (CTS 2010), Chicago, IL, USA, pp. 123-132, May 2010. (download)

Ontology Design Patterns

There are some concepts that come up repeatedly across many different datasets on the Semantic Web. Examples include groups of entities describing the trajectory of a moving object, such as a person, migrating bird, or ship, or information about an organization, such as its location, the people involved, the things it produces, etc. These recurring concepts are often encoded as Ontology Design Patterns. An ODP is a self-contained partial ontology. It represents the core components of the concept it seeks to model, as identified by domain experts. ODPs avoid making any unnecessary ontological commitments in order to remain applicable in a diverse range of situations. I have been to several GeoVoCamps, in which domain experts come together with modelling experts to create ontology design patterns important for datasets in their field of interest. In addition to continued involvement in these kinds of modelling exercises, I would like to investigate the potential for using these ODPs for ontology alignment and for making sense of unstructured text available on the web.

Cheatham, Michelle, Krisnadhi, Adila, Amini, Reihaneh, Hitzler, Pascal, Janowicz, Krzysztof, Shepherd, Adam, Narock, Tom, Jones, Matt and Ji, Peng. The GeoLink Knowledge Graph, Big Earth Data, Apr 2018. (download)
Shimizu, Cogan and Cheatham, Michelle. An Ontology Design Pattern for Microblog Entries, Workshop on Ontology and Semantic Web Patterns (WOP 2017), Vienna, Austria, Oct 2017. (download)
Cheatham, Michelle, Vardeman, Charles, Karima, Nazifa and Hitzler, Pascal. Computational Environment An ODP to Support Finding and Recreating Computational Analyses, Workshop on Ontology and Semantic Web Patterns (WOP 2017), Vienna, Austria, Oct 2017. (download)
Leadbetter, Adam, Cheatham, Michelle, Shepherd, Adam and Thomas, Rob. Linked Ocean Data 2.0, Oceanographic and Marine Cross-Domain Data Management for Sustainable Development, Ed. Diviacco, Paolo, Leadbetter, Adam and Glaves, Helen , pp. 69-99, Jan 2016. (download)
Cheatham, Michelle, Ferguson, Holly, Vardeman, Charles and Shimizu, Cogan. A Modification to the Hazardous Situation ODP to Support Risk Assessment and Mitigation, Workshop on Ontology and Semantic Web Patterns (WOP 2016), Kobe, Japan, Oct 2016. (download)
Vardeman, Charles, Krisnadhi, Adila, Cheatham, Michelle, Janowicz, Krzysztof, Ferguson, Holly, Hitzler, Pascal and Buccellato, Amy. An Ontology Design Pattern and Its Use Case for Modeling Material Transformation, Semantic Web, vol 8, issue 5, pp. 719-731, Jan 2017. (download)
Ontology Design Patterns - Piecing together an introduction, ESIP Semantic Web telecon, online, Sep 2015.
Vardeman, Charles, Krisnadhi, Adila, Cheatham, Michelle, Janowicz, Krzysztof, Ferguson, Holly, Hitzler, Pascal, Buccellato, Amy, Thirunarayan, Krishnaprasad and Berg-Cross, Gary. An Ontology Design Pattern for Material Transformation, Proceedings of the Fifth Workshop on Ontology and Semantic Web Patterns (WOP 2014), Riva del Garda, Trentino, Italy, Oct 2014.

Privacy Concerns of Big Data

I am also interested in applying ontology alignment techniques to issues related to the privacy concerns of Big Data. Currently, many linked datasets are anonymized before being made available on the Semantic Web. This anonymization process often involves ensuring k-anonymity, which requires that at least k individuals have all possible combinations of pseudo- identifier characteristics. For instance, if the dataset contains information about people’s voting district, gender, and birth month and year, at least k people would be required to have all combinations of these attributes (if not, either fake data is added or the information is made more coarse, e.g. by providing only birth year rather than month and year). As the dimensionality of data increases (i.e. more features are available for each person), k-anonymity breaks down. Often this happens when new datasets are released that can be joined with existing datasets through some public fields. I would like to research semantically-informed attacks and defenses of privacy on the Semantic Web.

Big Data Without Big Brother, Wittenberg University, Springfield, OH, Nov 2018.
Grzebala, Pawel and Cheatham, Michelle. Private Record linkage - Comparison of selected techniques for name matching, European Semantic Web Conference, Heraklion, Greece, pp. 593-606, Jun 2016. (download)
Privacy in the Age of Big Data, International Conference on Collaboration Technologies and Systems, Atlanta, GA, Jun 2015.

Software Reverse Engineering

Software reverse engineering is the process of analyzing a program in order to learn how it works. This has many uses, but my research has focused on gaining an understanding of the behavior of malware. This type of reversing generally involves examining an executable for which no source code is available, identifying the original entry point, and slogging line-by-line through assembly code. Working at the assembly code level is extremely tedious — insight often comes more quickly and easily if the level of abstraction can be raised. While employed at Riverside Research (see here) I worked to make this possible by creating Function Insight, an easy-to-use tool that leverages rule-based, machine learning, and data mining techniques to aid non-experts in analyzing anomalous sections of executables. I am interested in continuing this work at Wright State.

Cracking Binary Analysis, Ohio Celebration of Women in Computing Workshop, Sawmill Creek Resort, Ohio, Feb 2015.
Cheatham, Michelle and Raber, Jason. Function Insight: Highlighting Suspicious Sections in Binary Run Traces, Proceedings of the 18th Working Conference on Reverse Engineering, Limerick, Ireland, pp. 433-434, Oct 2011. (download)

Confidentiality in Wireless Sensor Networks

Wireless sensor networks (and by extension, ubiquitous sensing) have the potential to be a transformative technology — something like the internet the fundamentally changes the way we live and work. These networks can someday be used to extend our awareness by monitoring our homes while we are out, or a parking lot while we are walking to our cars late at night, or even what is going on inside our bodies. They can also enable autonomous responses to changing environmental conditions, such as turning on our favorite music when we enter a room or watering crops precisely when they need it. In the rush to develop technologies such as this to the point where their potential can be realized, security is often given little attention. As we have seen in the case of the internet, trying to retroactively harden a technology that is already in widespread use is extremely difficult and often comes up short. Now is the time to make wireless sensor networks trustworthy, not after they have become commonplace.

While with the Trusted Layered Sensing group in AFRL, I worked to model the degree of confidentiality in a wireless sensor network. The intent was to use this model to compare security schemes (particularly key distribution and encryption methods) that have been proposed for use in WSNs. My feeling was that existing comparisons of different approaches lacked a holistic perspective, in that implicit assumptions made by some methods required lower-level protocols that obviated efficiency gains claimed by the method itself. I still think this is the case — it was, and still is, important that models consider the impact of all such assumptions in terms of memory, communications, computation, and energy requirements. This work led to the establishment of two SBIR topics (see here).

Social Network Analysis

A social network is a graph in which the nodes represent people and the edges represent some type of communication or collaboration between them. In this work, I considered such a graph where the people are employees of the Air Force Research Laboratory and the relationship is joint authorship on a publication. With this type of graph, it is possible to answer questions such as “Which employees are working with the greatest number of others?” and “Are John Doe and Jane Smith connected by collaboration with a shared intermediary?” I also explored adding keywords from the papers as nodes. With this enhanced graph, it is possible to answer queries such as “Who is our leading expert on sensors?” or “Do we have anyone with expertise in both microelectronics and C programming?”

I also investigated using the information within this type of social network for the formation of ad-hoc teams (e.g. a new project has come up for which Company X would like to bid; who should be involved in writing the proposal?). The size of the search space is very large, so a genetic algorithm was used. The fitness function for the GA is supplied by the manager forming the team and tailored to the problem at hand. It might include such criteria as relevant subject knolwedge, importance of team members having worked together previously, importance of having recognized experts on the team, and optimal number of team members, among others. This work was done at The Design Knowledge Company as part of the BUCKI SBIR Project, see here and here.

Cheatham, Michelle, Harlow, Felicia and Cleereman, Kevin. Feature Selection for Collaborative Team Formation via SNA, Proceedings of the 2007 International Conference on Data Mining (DMIN 07), Las Vegas, NV, USA, Jun 2007. (download)
Cheatham, Michelle and Cleereman, Kevin. Application of Social Network Analysis to Collaborative Team Formation, Proceedings of the 2006 International Symposium on Collaborative Technologies and Systems (CTS 2006), Las Vegas, NV, USA, pp. 306-311, May 2006. (download)

AI Planning in Workflow Management Systems

Collaborative environments allow geographically distributed groups to work together to generate new knowledge. This work focused on workflow management systems (WfMS), which are a component of many contemporary collaborative environments. A workflow is a series of operators chained together to accomplish a goal. An example is the process a company goes through when ordering new inventory. Steps in the process might include collecting cost estimates, choosing a vendor, ordering the product, testing it on arrival, and adding the item to the company’s internal inventory tracking system. As the number and diversity of operators available for use in workflows increases, it becomes more difficult to know what services are available and how they can be combined to solve a given problem. Researchers involved in next-generation grid-based collaborative systems have suggested using AI planning techniques to help automate workflow creation. This work considered whether this approach can also be applied to the workflow management systems available in current collaborative environments.

Cheatham, Michelle and Cox, Michael. AI Workflow Management in a Collaborative Environment, Proceedings of the 2005 International Symposium on Collaborative Technologies and Systems (CTS 2005), St. Louis, MO, USA, pp. 160-166, May 2005. (download)
Cheatham, Michelle and Cox, Michael. AI Planning in Portal-based Workflow Management Systems, Proceedings of the 2005 International Conference on Integration of Knowledge Intensive Multi-Agent Systems (KIMAS 2005), Waltham, MA, USA, pp. 47-52, Apr 2005. (download)

Genetic Algorithms for Text Classification

A Nearest Neighbor Classifier (NNC) approaches the problem of text classification by computing a similarity metric between feature vector representations of an unknown document and a set of known prototype documents. The accuracy and speed of the NNC are dependent upon the choices of features and prototypes. In this work we considered the use of a genetic algorithm to optimize the feature and prototype sets for an NNC. We also examined whether simultaneously evolving the feature and prototype sets produces better results than sequential optimization.

Cheatham, Michelle and Rizki, Mateen. Feature and Prototype Evolution for Nearest Neighbor Classification of Web Documents, Proceedings of the Third International Conference on Information Technology - New Generations (ITNG 2006), Las Vegas, Nevada, USA, pp. 364-369, Apr 2006. (download)
Cheatham, Michelle. Feature and Prototype Evolution for Nearest Neighbor Classification of Web Documents, Masters thesis, Wright State University, Aug 2005.

Type-safe Programming Languages

I also did some joint research with Kevin Cleereman at the Air Force Research Laboratory and Krishnaprasad Thirunarayan at Wright State University on improving the efficiency and expressiveness of type-safe programming languages.

Cleereman, Kevin, Cheatham, Michelle and Thirunarayan, Krishnaprasad. Mechanisms for Improved Covariant Type-Checking, Journal of Computer Languages, Systems and Structures, vol 34, issue 1, pp. 1-17, Apr 2008.
Cleereman, Kevin, Cheatham, Michelle and Thirunarayan, Krishnaprasad. Runtime Support of Speculative Optimization for Offline Escape Analysis, Software Engineering Research and Practice (SERP) 2007, Jun 2007.