langtech:lcp:corpora:start
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| langtech:lcp:corpora:start [2024/04/22 07:03] – Igor Mustac | langtech:lcp:corpora:start [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Corpora in LCP ====== | ||
| - | |||
| - | In LCP corpora is modeled as connected layers: at least three layers must represent (i) ordered units, (ii) ordered collections of said units, and (iii) unordered collections of the latter. | ||
| - | |||
| - | Layers can have any number of attributes for annotation purposes, and corpus authors can define additional layers to model further embedding or dependency relations. | ||
| - | |||
| - | The diagram in the figure below shows the structure of a corpus created from the Open Subtitles database, that anotates tokens (layer i) with a form, a lemma and part-of-speech, | ||
| - | |||
| - | {{: | ||
| - | |||
| - | A simple command-line interface allows users to submit corpora to LCP as standard TSV tables along with JSON metadata (for their either private or public use). FIXME Add reference to importer here. | ||
| - | |||
langtech/lcp/corpora/start.1713769420.txt.gz · Last modified: by Igor Mustac
