Hannah Craighead

ABOUT THE PROJECT

This University of Cambridge course required developing a HCI (human-computer interation) project and then evaluating a hypothesis with a user study.

I developed an application for "visual interactive labelling" that was trialled on topic categorization of Amazon product reviews. This technique is based on active learning: the process of iteratively surfacing data for annotation that is determined to be most beneficial for model training. The visual extension provides the human in the loop with some power over the item to select for annotation.

The application surfaces the top 50 reviews closest to an SVM classifier boundary (i.e. least confident in prediction). These reviews are then projected using TSNE and are either gray (unlabelled) or coloured (labelled). Annotators were also provided with statistics about how many reviews of each category they had annotated and the label prediction when selecting a review.

The hypothesis of previous work in this area is that human pattern recognition skills (such as identifying outliers and centroids of clusters) can be as effective as purely algorithmic heuristics for active learning. There are also hypothesised benefits from providing the annotator with more agency in an annotation platform in terms of productivity.

The annotation tool was built using Django and Python.

P230 Project

Visual interactive labelling

ABOUT THE PROJECT