This month the DIVE+ demonstrator was presented at the sixth AAAI Conference on Human Computation and Crowdsourcing (HCOMP), which took place in Zurich, Switzerland, July 5-8. Check out our work-in-progress submission:
A Study of Narrative Creation by Means of Crowds and Niches(Oana Inel, Sabrina Sauer, Lora Aroyo): Online video constitutes the largest, continuously growing portion of the Web content. Web users drive this growth by massively sharing their personal stories on social media platforms as compilations of their daily visual memories, or with animated GIFs and memes based on existing video material. Therefore, it is crucial to gain understanding of the semantics of video stories, i.e., what do they capture and how. The remix of visual content is also a powerful way of understanding the implicit aspects of storytelling, as well as the essential parts of audio-visual (AV) material. In this paper we take a digital hermeneutics approach to understand what are the visual attributes and semantics that drive the creation of narratives. We present insights from a nichesourcing study in which humanities scholars remix keyframes and video fragments into micro-narratives i.e., (sequences of) GIFs. To support the narrative creation for humanities scholars a specific video annotation is needed, e.g., (1) annotations that consider literal and abstract connotations of video material, and (2) annotations that are coarse-grained, i.e., focusing on keyframes and video fragments as opposed to full length videos. The main findings of the study are used to facilitate the creation of narratives in the digital humanities exploratory search tool DIVE+.
[This post is based on the BSc. Thesis of Kim van Putten (Computer Science, VU Amsterdam)]
As part of the Bachelor’s degree Computer Science at the VU Amsterdam, Kim van Putten conducted her bachelor thesis in the context of the DIVE+ project .
The DIVE+ demonstrator is an event-centric linked data browser which aims to provide exploratory search within a heterogeneous collection of historical media objects. In order to structure and link the media objects in the dataset, the events need to be identified first. Due to the size of the data collection manually identifying events in infeasible and a more automatic approach is required. The main goal of the bachelor project was to find a more effective way to extract events from the data to improve linkage within the DIVE+ system.
The thesis focused on event extraction from radio news bulletins of which the text content were extracted using optical character recognition (OCR). Data preprocessing was performed to remove errors from the OCR’ed data. A Named Entity Recognition (NER) tool was used to extract named events and a pattern-based approach combined with NER and part-of-speech tagging tools was adopted to find unnamed events in the data. Errors in the data caused by the OCR were found to cause poor performance of the NER tools, even after data cleaning.
The results show that the proposed methodology improved upon the old event extraction method. The newly extracted events improved the searchability of the media objects in the DIVE+ system, however, they did not improve the linkage between objects in the linked data structure. Furthermore,
the pattern-based method of event extraction was found to be too coarse-grained and only allowed for the extraction of one event per object. To achieve a finer granularity of event extraction, future research is necessary to find a way to identify what the relationships between Named Entities and verbs are and which Named Entities and verbs describe an event.
The full thesis is available for download here and the presentation here. Following, we show a poster that summrizes the main findings and the presentation of the thesis.
The DIVE+ team is present on the 21st and 22nd of March at the ICTOpen 2017 conference to present and showcase the latest developments of the tool. As part of the latest developments, DIVE+ is also integrated in the CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure, next to other media studies research tools (CLARIAH MediaSuite), that aim at supporting the media studies researchers and scholars by providing access to digital data and tools. During the Meet the Demo sessions we also screencast the new DIVE+ interface that provides support for the automatic generation of narratives and storylines. Following you can check the DIVE+ presentation.
On 7th of March the DIVE+ project was presented at Cross Media Café: Uit het Lab. DIVE+ is result of a true inter-disciplinary collaboration between computer scientists, humanities scholars, cultural heritage professionals and interaction designers. In this project, we use the CrowdTruth methodology and framework in order to crowdsource events for the news broadcasts from The Netherlands Institute for Sound and Vision (NISV) that are published under open licenses in the OpenImages platform.
As part of the digital humanities effort, DIVE+ is also integrated in the CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure, next to other media studies research tools, that aims at supporting the media studies researchers and scholars by providing access to digital data and tools. In order to develop this project we work together with eScience Center, which is also funding the DIVE+ project.
On December 14th the Netherlands eScience Center and the Lorentz Center co-hosted the eHumanities Day at the Lorentz Center in Leiden. The purpose of the event was to introduce researchers from the digital humanities to the funding opportunities and partnering possible with NLeSC and with the Lorentz Center. The day included a keynote presentation, expert panels, pitches on projects that have been running together with NLeSC. During this day, Oana Inel presented the Dive+ project. The talk can be seen below.