LiRI Wiki

Linguistic Research Infrastructure - University of Zurich

User Tools

Site Tools


langtech:lcp:corpora:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
langtech:lcp:corpora:start [2024/04/22 07:05] Igor Mustaclangtech:lcp:corpora:start [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1
Line 1: Line 1:
-====== Corpora in LCP ====== 
- 
-In LCP corpora is modeled as connected layers: at least three layers must represent (i) ordered units, (ii) ordered collections of said units, and (iii) unordered collections of the latter. 
- 
-Layers can have any number of attributes for annotation purposes, and corpus authors can define additional layers to model further embedding or dependency relations. 
- 
-The diagram in the figure below shows the structure of a corpus created from the Open Subtitles database, that anotates tokens (layer i) with a form, a lemma and part-of-speech, groups them as a segments (sentences, layer ii) which are themselves contained in movies (layer iii); a paralel layer models the dependency relations between tokens. 
- 
-{{:langtech:lcp:corpora:lcp-open-subtitles-segments.png|}} 
- 
-A [[langtech:lcp:importing:start|simple command-line interface]] allows users to submit corpora to LCP as standard TSV tables along with JSON metadata (for their either private or public use).  
  

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki