LiRI Wiki

Linguistic Research Infrastructure - University of Zurich

User Tools

Site Tools


langtech:lcp:corpora:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
langtech:lcp:corpora:start [2024/04/22 07:05] Igor Mustaclangtech:lcp:corpora:start [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1
Line 1: Line 1:
-====== Corpora in LCP ====== 
- 
-In LCP corpora is modeled as connected layers: at least three layers must represent (i) ordered units, (ii) ordered collections of said units, and (iii) unordered collections of the latter. 
- 
-Layers can have any number of attributes for annotation purposes, and corpus authors can define additional layers to model further embedding or dependency relations. 
- 
-The diagram in the figure below shows the structure of a corpus created from the Open Subtitles database, that anotates tokens (layer i) with a form, a lemma and part-of-speech, groups them as a segments (sentences, layer ii) which are themselves contained in movies (layer iii); a paralel layer models the dependency relations between tokens. 
- 
-{{:langtech:lcp:corpora:lcp-open-subtitles-segments.png|}} 
- 
-A [[langtech:lcp:importing:start|simple command-line interface]] allows users to submit corpora to LCP as standard TSV tables along with JSON metadata (for their either private or public use).  
  
langtech/lcp/corpora/start.1713769522.txt.gz · Last modified: by Igor Mustac

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki