VIAN-DH is a recent web application developed by Linguistic Research Infrastructure, University of Zurich, intended for the integrated analysis of audiovisual elements of human communication, such as speech, gesture, facial expressions and more. VIAN-DH aims at bridging two approaches to the analysis of human communication: conversation analysis/interactional linguistics (IL), so far a dominantly qualitative field, and computational/corpus linguistics and its quantitative and automated methods.
Contemporary IL investigates the systematic organization of conversations and interactions composed of speech, gaze, gestures, body positioning among others. These highly integrated multimodal behavior is analyzed based on video data aimed at uncovering so called “multimodal gestalts”, patterns of linguistic and embodied conduct that reoccur in specific sequential positions employed for specific purposes.
Multimodal analyses (and other disciplines using videos) are so far dependent on time and resource intensive processes of manual transcription of each component from video materials. Automating these tasks requires advanced programming skills, which is often not in the scope of IL. Moreover, the use of different tools makes the integration and the analysis of different formats challenging. Consequently, IL research often deals with relatively small samples of annotated data which are suitable for qualitative analysis, but not enough for making generalized empirical claims derived quantitatively.
VIAN-DH aims to create a workspace where many annotation layers required for multimodal analysis of videos can be created, processed, and combined. VIAN-DH will provide a graphical interface that operates state-of the-art tools for automating parts of the data processing. The integration of tools from computational linguistics and computer vision, facilitates data processing, speeds up the overall research process, and enables processing of large amounts of data. The main features to be introduced are automatic speech recognition for transcription of language, pose estimation for extraction of gestures and other visual cues, as well as grammatical annotation for adding morphosyntactic information to the verbal content.
In order to view and search the data, VIAN-DH will provide a unified format and enable the import of the main existing formats of annotated video data and the export to other formats used in the field, while integrating different data source formats in a way that they can be combined in research. VIAN-DH will adapt querying methods from corpus linguistics to enable parallel search of many annotation levels, combining token-level and chronological search for various types of data.
VIAN-DH strives to bring crucial innovation to the fields analyzing human communicative behavior from video materials. It will allow processing of large amounts of data automatically and, implementation of quantitative analyses, combining it with the qualitative approach. It will facilitate the investigation of correlations between linguistic patterns (lexical or grammatical) with conversational aspects (turn-taking or gestures). Users will be able to automatically transcribe and annotate visual, spoken and grammatical information from videos, and to correlate those different levels and perform queries and analyses.