Mímir

Mímir is a Multi-paradigm Information Management Index and Repository. It supports indexing of (and searching over) the text, annotations, and semantics of documents.

From this page you can access two deployments of Mímir that demonstrate some of the novel search facilities provided. Both corpora used in these demos have been annotated for part of speech and morphological root, which are accessible using the category: and root: modifiers (see the examples in the document linked below). They also contain Sentence annotations.

The Demos

Patents

The "Patent Search" demo shows Mímir running over a corpus of 300,000 patent documents. The corpus has been annotated for document structure (document metadata and document sections), references, and measurements.

Web Pages

The "Web Archive Search" demo is running over a corpus of about 1 million web pages. The documents are annotated for measurements, and typical named entities (Address, CabinetMinister, Date, Money, Percent, Organization, Location, Person).

BBC News

The BBC News demo uses just over 8,000 news web pages crawled from the BBC website to demonstrate some elements of the GATE Process.

Examples of possible queries include:

People in the (BBC) News

This demo is a example of a specialised front-end that uses the same underlying index as the BBC News demo above. It provides a user-friendly interface for searching the news archive for mentions of people's names. Queries similar to the ones above are thus much easier to formulate, without any specialised knowledge being required of the end user.

Query Examples

A quick introduction-by-example to the query language is provided here.

An example opf a possible interactive query session is shown in this PDF file.