All posts by Oana Inel

Event Extraction From Radio News Bulletins For Linked Data

[This post is based on the BSc. Thesis of Kim van Putten (Computer Science, VU Amsterdam)]

As part of the Bachelor’s degree Computer Science at the VU Amsterdam, Kim van Putten conducted her bachelor thesis in the context of the DIVE+ project .

The DIVE+ demonstrator is an event-centric linked data browser which aims to provide exploratory search within a heterogeneous collection of historical media objects. In order to structure and link the media objects in the dataset, the events need to be identified first. Due to the size of the data collection manually identifying events in infeasible and a more automatic approach is required. The main goal of the bachelor project was to find a more effective way to extract events from the data to improve linkage within the DIVE+ system.

The thesis focused on event extraction from radio news bulletins of which the text content were extracted using optical character recognition (OCR). Data preprocessing was performed to remove errors from the OCR’ed data. A Named Entity Recognition (NER) tool was used to extract named events and a pattern-based approach combined with NER and part-of-speech tagging tools was adopted to find unnamed events in the data. Errors in the data caused by the OCR were found to cause poor performance of the NER tools, even after data cleaning.

The results show that the proposed methodology improved upon the old event extraction method. The newly extracted events improved the searchability of the media objects in the DIVE+ system, however, they did not improve the linkage between objects in the linked data structure. Furthermore,
the pattern-based method of event extraction was found to be too coarse-grained and only allowed for the extraction of one event per object. To achieve a finer granularity of event extraction, future research is necessary to find a way to identify what the relationships between Named Entities and verbs are and which Named Entities and verbs describe an event.

The full thesis is available for download here and the presentation here. Following, we show a poster that summrizes the main findings and the presentation of the thesis.

Poster - Event Extraction for Radio New Bulletins

DIVE+ @ ICTOPEN2017

The DIVE+ team is present on the 21st and 22nd of March at the ICTOpen 2017 conference to present and showcase the latest developments of the tool. As part of the latest developments, DIVE+ is also integrated in the CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure, next to other media studies research tools (CLARIAH MediaSuite), that aim at supporting the media studies researchers and scholars by providing access to digital data and tools. During the Meet the Demo sessions we also screencast the new DIVE+ interface that provides support for the automatic generation of narratives and storylines. Following you can check the DIVE+ presentation.

For more insights, you can also check our short demo!

DIVE+ at Cross Media Café: Uit het Lab

On 7th of March the DIVE+ project was presented at Cross Media Café: Uit het Lab. DIVE+ is result of a true inter-disciplinary collaboration between computer scientists, humanities scholars, cultural heritage professionals and interaction designers. In this project, we use the CrowdTruth methodology and framework in order to crowdsource events for the news broadcasts from The Netherlands Institute for Sound and Vision (NISV) that are published under open licenses in the OpenImages platform.

As part of the digital humanities effort, DIVE+ is also integrated in the CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure, next to other media studies research tools, that aims at supporting the media studies researchers and scholars by providing access to digital data and tools. In order to develop this project we work together with eScience Center, which is also funding the DIVE+ project.

Check the slides!

NLeSC-Lorentz Center eHumanities Day

On December 14th the Netherlands eScience Center and the Lorentz Center co-hosted the eHumanities Day at the Lorentz Center in Leiden. The purpose of the event was to introduce researchers from the digital humanities to the funding opportunities and partnering possible with NLeSC and with the Lorentz Center. The day included a keynote presentation, expert panels, pitches on projects that have been running together with NLeSC. During this day, Oana Inel presented the Dive+ project. The talk can be seen below.