Research

Next Generation Semantics-Aware Data Management Systems

Sources of information and our dependence on them are increasing at a phenomenal rate. The most obvious example is the explosive growth and rapid evolution of the World Wide Web, but other projects in research, industry, healthcare and government also exhibit a critical dependence on the effective management and exploitation of large scale data. The kind of information available is also changing rapidly, and often includes un-structured and semi-structured data, streaming data, noisy and incomplete data, and linked datasets. Simultaneously dealing with the rapidly increasing size, complexity and heterogeneity of data presents a grand challenge for information systems research, and has created an urgent need for more capable information systems. Meeting this need will be critical to the UK's future competitiveness.

Informaion systems clearly have a key role to play in addressing these extremely complex problems, but they need to evolve to reflect the rapidly changing information landscape. This evolution is the basis for the emerging field of semantics-aware data management, which involves a synthesis of ontological reasoning and database management principles. Semantics-aware systems employ rich schemas (AKA ontologies) that allow them to deal with incomplete and semi-structured information from heterogeneous sources, and to answer queries in a way that reflects both knowledge and data, i.e., to deliver understanding from information.

We believe, however, that if such systems are to be widely applicable, then their enhanced capabilities must, be in addition to, and not instead of, the well-established features and high performance of existing database systems; moreover, we believe that they will need to incorporate techniques from many other areas of computer science, particularly those that give a complementary view of "Big Data" management, such as algorithms and machine learning, stream processing, and information retrieval. The goal of the Oxford Information Systems Group (ISG) is to develop next generation semantics-aware data management systems that fully realise the desired synthesis.

Research Contributions

Contributions to the iBench and gMark projects

Declarative meta-data management in the LogicBlox smart database management system

Kaiser Permanente project

Robert Piro, a postdoc supported by DBOnto and working with Ian Horrocks and Boris Motik, has been collaborating on a project with the U.S. health care provider Kaiser Permanente. The aim of the project was to conduct data analysis in health care using RDFox, the RDF triple store and parallel SWRL reasoner developed in the past four years in the KRR Group of the Computer Science Department.

The data analysis task was to compute benchmark measures for health care providers in the U.S. which are issued by a U.S. government body for quality assurance. Accuracy of these measures is important as they are entry requirements for billing health care services against government funded schemes, such as Medicare which is the national insurance program for pensioners in the U.S.

The specification of these benchmark measures is stated in more or less precise natural language statements. These statements are rendered into machine processable programs which are then evaluated on the clinical data. Traditional approaches, however, suffer from being complex and difficult to maintain.

Our project follows a novel approach, by rendering these statements in the rule language SWRL. Each SWRL rule is a straight forward if-then-statement and therefore allows to implement the specification much closer to the original natural language. This increases maintainability and makes it much easier to judge the correctness of the rendering.

Moreover, clarity is improved as SWRL is a declarative language; in contrast to a procedural language, a declarative language puts the emphasis on "what" is to be computed and not "how" it is computed. This reduces length and increases transparency of the code.

Thanks to Kaiser Permanente's involvement, the result could be evaluated on real patient data and compared to the current implementation. The initial comparison showed 350 differences in a set of 265,000 patients of which 11,000 belong to the target group. This is, judged by experts, very low. Moreover, differences do not necessarily imply a fault in the SWRL rendering of the specification; they may imply a fault in the existing implementation.

Further collaboration with Kaiser is envisaged to prove shortened development times for such specifications when done with SWRL as well as directly translating future machine processable specifications (ECQMs) into SWRL rules.

Research

Next Generation Semantics-Aware Data Management Systems

Research Contributions

Contributions to the iBench and gMark projects Show more

Declarative meta-data management in the LogicBlox smart database management system Show more

Kaiser Permanente project Show more

Contributions to the iBench and gMark projects

Declarative meta-data management in the LogicBlox smart database management system

Kaiser Permanente project