Bulletin 7: Qualitative Methods - Annotation

Here's the second in a series of four articles on qualitative methods.  This time, we're looking at text annotation.  We'll start with a quick reminder of where we've been and where we're going.

Qualitative Methods Mini-series

In response to client requests for qualitative methods we've developed a range of tools that we will share with you over the coming months.  We hope that you're interested in using these approaches in your projects.

Qualitative data is typically provided as passages of semi-structured text.  That text can be researched and analysed, qualitatively and quantitatively.  The matrix below lists possible techniques for each approach.

  Research Analysis
Quantitative Reverse Mail Merge Binary Choice Statistics
Qualitative Annotation Mind-mapping

 

What is Annotation?

You're probably very familiar with annotation.  You do it when you read reports/ publications and make notes in the margins, highlight sentences, or stick bookmarks to the pages.  Annotation is the process of "marking-up" a document according to some structure so that you have a guide for when you return to the text.  Annotation then, is fundamental to information extraction and retrieval.  Annotations locate text and describe why that text is of interest.  Annotations are a type of meta-data - i.e. data about data.

Annotation can be performed automatically by a computer.  The software needs to know what you're looking for and why (i.e. how to mark-up the text that it finds).

Uses of Annotation
Annotation brings structure to unstructured text.  It allows us to:Extracting Annotated Text in Context

  • Extract data by theme.  If you want to review a body of literature to identify all passages that refer to, for example, enterprise strategy, then you can simply:
  1. annotate the corpus using a keyword list (e.g. enterprise, start-up, business, entrepreneur etc);
  2. extract those annotations in context (i.e. to see the keyword and 15 words either side);
  3. read through the extracted excerpts and only refer back to the source documents if and when more depth is needed.
  • Analysis of content.  You can summarise documents by analysing their contents.  Two interesting aspects that annotations help to analyse are word frequencies and co-located words.  Word frequency may indicate relevance (e.g. a given document mentions "productivity" more times than another) whereas co-located words provide more insight into meaning (e.g. "productivity" tends to be mentioned along side "growth").

Example of Annotation

We've recently used annotation to help the East Midlands Development Agency to interpret a body of 700+ visit reports in terms of two dozen key issues.  We used annotation to speed-up the process of extracting relevant excerpts from the report narrative.

We're also researching the potential for using correlations in co-locations as an indicator of memetic convergence.  In other words, we're comparing policy documents to quantify the consensus of ideas.  We expect that this research will produce a methodology for quantifying Strategic Added Value (as described in the Impact Evaluation Framework for Regional Development Agencies).  Our initial findings are intuitively plausible: that Regional Economic Strategies in each region are roughly "half similar", and that neighbouring agencies tend to have more in common that distant ones.