====== Corpora & Assistive Technology ======

The [[start]] group has expertise in handling various types of corpora. We are building tailor-made applications to explore large and structurally complex collections of language data. In particular, we are competent in:

  * The design of databases to hold application-relevant data
  * Generating interactive visualizations
  * Efficiently querying large data collections (in particular corpora)
  * Anonymisation of large data sets
  * Data crawling/scraping and processing of web sources, batch download of documents
  * Data extraction and conversion

===== Examples of our work =====

Swissdox@LiRI -- web based service for extraction of subcorpora from the Swiss media database Swissdox

{{swissdox.png?600|}}

[[https://swissdox.linguistik.uzh.ch/]]
----

VIAN -- web application for multimodal corpora; comprises of corpus querying interface, multimodal corpus viewer, video and audio player and timeline with time-aligned text and annotations

{{vian.png?600|}}
----

CoLiCaSlav -- web corpus application used as an empirical basis for teaching and studying the principle categories and concepts relevant for the Slavic languages

{{lehrkorpus.png?600|}}

[[https://lehrkorpus-slav.linguistik.uzh.ch/]]
----

Kollo -- command line tool for extracting collocations from VERT formatted corpora

{{kollo1.png?400|}}
{{kollo2.png?400|}}