{"id":3842,"date":"2019-03-20T13:36:55","date_gmt":"2019-03-20T13:36:55","guid":{"rendered":"http:\/\/www.clarin.si\/info\/?page_id=3842"},"modified":"2026-02-23T13:23:26","modified_gmt":"2026-02-23T13:23:26","slug":"spletne-storitve","status":"publish","type":"page","link":"https:\/\/www.clarin.si\/info\/k-center\/spletne-storitve\/","title":{"rendered":"ReLDIanno \u2013 storitev za ozna\u010devanje slovenskih, hrva\u0161kih in srbskih besedil"},"content":{"rendered":"<p>(SLOVENSKI PREVOD V PRIPRAVI)<\/p>\n<style><span data-mce-type=\"bookmark\" style=\"display: inline-block; width: 0px; overflow: hidden; line-height: 0;\" class=\"mce_SELRES_start\">\ufeff<\/span><span data-mce-type=\"bookmark\" style=\"display: inline-block; width: 0px; overflow: hidden; line-height: 0;\" class=\"mce_SELRES_start\">\ufeff<\/span><br \/>pre {white-space:pre-wrap; word-wrap:break-word;}<br \/><\/style>\n<p><strong>Update<\/strong>: The ReLDIanno service has been replaced by the current state-of-the-art in linguistic processing of South Slavic languages \u2013 the\u00a0<a href=\"https:\/\/pypi.org\/project\/classla\/\" target=\"_blank\" rel=\"noopener\">CLASSLA python package<\/a>\u00a0that deals with the same levels of processing as this web service, but for an extended list of South Slavic languages, and with significantly improved performance. You can test the CLASSLA-Stanza library through the\u00a0<a href=\"https:\/\/clarin.si\/oznacevalnik\/eng\/\" target=\"_blank\" rel=\"noopener\">CLASSLA annotator web interface<\/a>.<\/p>\n<p>The ReLDIanno service that enables processing of three South Slavic languages (Slovenian, Croatian and Serbian) can still be used\u00a0<a href=\"https:\/\/www.clarin.si\/info\/k-center\/web-services-documentation\/#library\">through a Python library<\/a>. Some of the tools available through the web application\/library were developed within the <a href=\"https:\/\/reldi.spur.uzh.ch\/\" target=\"_blank\" rel=\"noopener\">ReLDI<\/a> and <a href=\"http:\/\/nl.ijs.si\/janes\/\" target=\"_blank\" rel=\"noopener\">JANES<\/a> projects.<\/p>\n\n<h1 id=\"application\"><strong><span id=\"Functionalities\">Functionalities<\/span><\/strong><\/h1>\n<p>The ReLDIanno text annotation service has two basic functionalities: <a href=\"#tagger\">Tagger<\/a> and <a href=\"#lexicon\">Lexicon<\/a>.<\/p>\n<h2 id=\"tagger\"><strong>Tagger<\/strong><\/h2>\n<p>The Tagger is an online text-processing tool which allows you to perform four different types of linguistic annotation: morphosyntactic tagging, lemmatisation, named entity recognition (NER), and Universal Dependency (UD) parsing.<\/p>\n<h3>Language<\/h3>\n<p>Before starting, please make sure that you have selected one of the three available languages from the Language drop-down menu: Croatian, Serbian, or Slovenian.<\/p>\n<h3>Format<\/h3>\n<p>Next, select one of the two options in the Format field: Text or TCF. The <i>Text <\/i>option represents plain-text formatting, while <i>TCF <\/i>stands for Text Corpus Format, which is a file format that enables different types of linguistic annotation to be stored within one document. <i>(More information on TCF can be found<\/i> <a href=\"https:\/\/weblicht.sfs.uni-tuebingen.de\/weblichtwiki\/index.php\/The_TCF_Format\" target=\"_blank\" rel=\"noopener\"><i>here<\/i><\/a><i>.)<\/i><\/p>\n<h3>Text<\/h3>\n<p>The Text field allows you to input text by typing it directly, or by copying it from another document.<\/p>\n<h3>File<\/h3>\n<p>Alternatively, you can upload an existing document by clicking on the Choose File button. The supported text formats for upload are, once again, plain text and TCF.<\/p>\n<p>If you want to change the file you have uploaded, you can do so by clicking on the Remove button. This will clear the selection and allow you to upload another file.<\/p>\n<h3>Function<\/h3>\n<p>The Function field contains a selection of the four different annotation processes:<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>1. Tag<\/strong><\/span><\/p>\n<p>The Tag option provides morphosyntactic descriptions, in the form of an MSD (morphosyntactic description) tag for each token.<\/p>\n<table style=\"height: 198px;\" width=\"278\">\n<tbody>\n<tr>\n<td><\/td>\n<td><b>Surface<\/b><\/td>\n<td><b>Tags<\/b><\/td>\n<\/tr>\n<tr>\n<td>1.<\/td>\n<td>Gosti<\/td>\n<td>Ncmpn<\/td>\n<\/tr>\n<tr>\n<td>2.<\/td>\n<td>su<\/td>\n<td>Var3p<\/td>\n<\/tr>\n<tr>\n<td>3.<\/td>\n<td>stigli<\/td>\n<td>Vmp-pm<\/td>\n<\/tr>\n<tr>\n<td>4.<\/td>\n<td>.<\/td>\n<td>Z<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>The MSD tag <i>Ncmpn<\/i>, for example, tells us that the word <i>gosti <\/i>is a common noun (<i>Nc<\/i>) of masculine gender (<i>m<\/i>) and is plural (<i>p<\/i>) and in the nominative case (<i>n<\/i>).<\/p>\n<p><b>Tagset:<\/b> There is a complete list of MSD tags and their features for each language \u2014 <a href=\"http:\/\/nl.ijs.si\/ME\/V5\/msd\/html\/msd-hr.html\" target=\"_blank\" rel=\"noopener\">Croatian<\/a>, <a href=\"http:\/\/nl.ijs.si\/ME\/Vault\/V5\/msd\/html\/msd-sr.html\" target=\"_blank\" rel=\"noopener\">Serbian<\/a>, and <a href=\"http:\/\/nl.ijs.si\/ME\/V5\/msd\/html\/msd-sl.html\" target=\"_blank\" rel=\"noopener\">Slovenian<\/a>.<\/p>\n<p><b>Accuracy<\/b>: Croatian (92.53%), Serbian (92.33%), Slovenian (94.27%).<\/p>\n<p><b>References:<\/b><\/p>\n<ul>\n<li>Nikola Ljube\u0161i\u0107, Toma\u017e Erjavec (2016). <i>Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene<\/i>. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portoro\u017e, Slovenia. [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/ljubesic16b-corpus.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/ljubesic16b-corpus.txt\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<li>Nikola Ljube\u0161i\u0107, Filip Klubi\u010dka, \u017deljko Agi\u0107, Ivo-Pavao Jazbec (2016). <i>New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian<\/i>. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portoro\u017e, Slovenia. [<a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2016\/pdf\/340_Paper.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2016\/summaries\/340.html\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<\/ul>\n<p><span style=\"text-decoration: underline;\"><strong>2. Lemmatise<\/strong><\/span><\/p>\n<p>The Lemmatise option provides a lemma, also known as the citation or dictionary form of a word. The process of lemmatization yields a single form which stands for different inflected forms of one word.<\/p>\n<table style=\"height: 200px;\" width=\"276\">\n<tbody>\n<tr>\n<td><\/td>\n<td><b>Surface<\/b><\/td>\n<td><b>Lemma<\/b><\/td>\n<\/tr>\n<tr>\n<td>1.<\/td>\n<td>Gosti<\/td>\n<td>gost<\/td>\n<\/tr>\n<tr>\n<td>2.<\/td>\n<td>su<\/td>\n<td>biti<\/td>\n<\/tr>\n<tr>\n<td>3.<\/td>\n<td>stigli<\/td>\n<td>sti\u0107i<\/td>\n<\/tr>\n<tr>\n<td>4.<\/td>\n<td>.<\/td>\n<td>.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>For instance, in the case of the noun <i>gosti<\/i>, its lemma is the nominative singular form <i>gost<\/i>. The lemma of a verb is its infinitive form, in this case <i>biti<\/i> and <i>sti\u0107i<\/i>.<\/p>\n<p>For uninflected words, the lemma will remain the same as the surface form. This is also the case with digits and punctuation marks.<\/p>\n<p><b>Accuracy: <\/b>&gt;99.5% for all three languages.<\/p>\n<p><b>References: <\/b>unpublished.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>3. NER<\/strong><\/span><\/p>\n<p>NER (Named entity recognition) is a process through which named entities that appear in a text are located and categorized.<\/p>\n<table style=\"height: 401px;\" width=\"252\">\n<tbody>\n<tr>\n<td><\/td>\n<td><b>Surface<\/b><\/td>\n<td><b>NER<\/b><\/td>\n<\/tr>\n<tr>\n<td>1.<\/td>\n<td>Barak<\/td>\n<td>B-Per<\/td>\n<\/tr>\n<tr>\n<td>2.<\/td>\n<td>Obama<\/td>\n<td>I-Per<\/td>\n<\/tr>\n<tr>\n<td>3.<\/td>\n<td>je<\/td>\n<td>O<\/td>\n<\/tr>\n<tr>\n<td>4.<\/td>\n<td>bio<\/td>\n<td>O<\/td>\n<\/tr>\n<tr>\n<td>5.<\/td>\n<td>44.<\/td>\n<td>O<\/td>\n<\/tr>\n<tr>\n<td>6.<\/td>\n<td>predsednik<\/td>\n<td>O<\/td>\n<\/tr>\n<tr>\n<td>7.<\/td>\n<td>SAD<\/td>\n<td>B-Loc<\/td>\n<\/tr>\n<tr>\n<td>8.<\/td>\n<td>.<\/td>\n<td>O<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>Named entities (NEs) are classified into five categories: person (PER), person derivative (DERIV-PER), location (LOC), organization (ORG), and miscellaneous (MISC).<\/p>\n<p>The NER in the Tagger uses the<a href=\"https:\/\/en.wikipedia.org\/wiki\/Inside%E2%80%93outside%E2%80%93beginning_%28tagging%29\" target=\"_blank\" rel=\"noopener\"> IOB2<\/a>\/BIO format, which means that, in multiword NEs (<i>e.g. Barak Obama<\/i>), the first item in a chunk is marked with a B-tag (beginning) and all subsequent items in the same chunk are assigned the I-tag (inside). Single-word NEs are marked with the B-tag (e.g. <i>SAD<\/i>), while tokens that are not NEs are tagged with O (outside).<\/p>\n<p><b>Tagset:<\/b> The NER annotation guidelines can be found <a href=\"http:\/\/nl.ijs.si\/janes\/wp-content\/uploads\/2017\/09\/SlovenianNER-eng-v1.1.pdf\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p><b>Accuracy: <\/b>evaluated on Slovene data, the\u00a0PER class has F1 of 0.91, LOC of 0.79, ORG of 0.57, DERIV-PER of 0.49 and MISC of 0.3.<\/p>\n<p><b>References:<\/b><\/p>\n<ul>\n<li>Darja Fi\u0161er, Nikola Ljube\u0161i\u0107 and Toma\u017e Erjavec (2018). <em>The Janes project: language resources and tools for Slovene <\/em>user generated<em> content<\/em>. Language Resouces and Evaluation. [<a href=\"https:\/\/link.springer.com\/article\/10.1007\/s10579-018-9425-z\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"https:\/\/citation-needed.springer.com\/v2\/references\/10.1007\/s10579-018-9425-z?format=bibtex&amp;flavour=citation\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<li>Nikola Ljube\u0161i\u0107, Marija Stupar, Tereza Juri\u0107 and \u017deljko Agi\u0107 (2013). <i>Combining Available Datasets for Building Named Entity Recognition Models of Croatian and Slovene<\/i>. Sloven\u0161\u010dina 2.0.[<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/ljubesic13-combining.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/ljubesic13-combining.txt\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<\/ul>\n<p><span style=\"text-decoration: underline;\"><strong>4. Dep Parse<\/strong><\/span><\/p>\n<p>The Dep Parse (dependency parsing) option refers to the annotation of Universal Dependencies (UD). UD annotation, which is part of the<a href=\"http:\/\/universaldependencies.org\/\" target=\"_blank\" rel=\"noopener\"> Universal Dependencies project<\/a>, entails both morphological and syntactic annotation.<\/p>\n<p>The morphological portion of UD annotation consists of a lemma, part-of-speech (POS) tag, and a set of features encoding grammatical and lexical properties. (<i>Since this information is already provided by the Tag and Lemmatise options, it is not included in the Tagger\u2019s Dep Parse option.<\/i>)<\/p>\n<p>The syntactic portion is a description of the syntactic structure of a sentence by means of directed binary relations, referred to as dependencies, between its words. Each word \u201cdepends\u201d on another word in the sentence, apart from the one marked as \u2018root\u2019, which is taken to be the center of the sentence.<\/p>\n<p>As shown in the table below, every tag in the \u2018Dep parse\u2019 column consists of a number (gov) and an abbreviation (func). \u00a0For example, the adjective <i>stoni<\/i> (token 3) is governed by (i.e. depends on) the noun <i>tenis <\/i>(token 4) and functions as its adverbial modifier, which is why its tag is \u20184 \/ amod\u2019. The noun <i>tenis<\/i> (token 4) is, in turn, governed by the verb <i>igra<\/i> (token 2), but serves the function of its direct object, thus being tagged as \u20182 \/ dobj\u2019. This verb also happens to be the so-called center of the sentence, and is tagged as \u20180 \/ root\u2019.<\/p>\n<p>Since <i>Petar igra stoni tenis, a Ana trenira plivanje<\/i> is a complex sentence, its constituent clauses have to be connected to one another. This is why the center of the second clause, the verb <i>trenira<\/i> (token 8) is marked as being dependent on the center (root) of the entire sentence (token 2), which is located in the first clause. Because the sentence consists of two independent clauses in coordination, the second clause functions as a conjunct of the first one, and <i>trenira<\/i> is tagged as \u20182 \/ conj\u2019.<\/p>\n<table style=\"height: 500px;\" width=\"500\">\n<tbody>\n<tr>\n<td><\/td>\n<td><b>Surface<\/b><\/td>\n<td><b>Dep parse \u2013 gov \/ func<\/b><\/td>\n<\/tr>\n<tr>\n<td>1.<\/td>\n<td>Petar<\/td>\n<td>2 \/ nsubj<\/td>\n<\/tr>\n<tr>\n<td>2.<\/td>\n<td>igra<\/td>\n<td>0 \/ root<\/td>\n<\/tr>\n<tr>\n<td>3.<\/td>\n<td>stoni<\/td>\n<td>4 \/ amod<\/td>\n<\/tr>\n<tr>\n<td>4.<\/td>\n<td>tenis<\/td>\n<td>2 \/ dobj<\/td>\n<\/tr>\n<tr>\n<td>5.<\/td>\n<td>,<\/td>\n<td>2 \/ punct<\/td>\n<\/tr>\n<tr>\n<td>6.<\/td>\n<td>a<\/td>\n<td>2 \/ cc<\/td>\n<\/tr>\n<tr>\n<td>7.<\/td>\n<td>Ana<\/td>\n<td>8 \/ nsubj<\/td>\n<\/tr>\n<tr>\n<td>8.<\/td>\n<td>trenira<\/td>\n<td>2 \/ conj<\/td>\n<\/tr>\n<tr>\n<td>9.<\/td>\n<td>plivanje<\/td>\n<td>8 \/ dobj<\/td>\n<\/tr>\n<tr>\n<td>10.<\/td>\n<td>.<\/td>\n<td>2 \/ punct<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p><b>Tagset:<\/b> The list of UD relations can be found\u00a0<a href=\"http:\/\/universaldependencies.org\/u\/dep\/index.html\" target=\"_blank\" rel=\"noopener\">here.<\/a><\/p>\n<p><b>Accuracy: <\/b>Croatian (~0.9 UAS, ~0.86 LAS), Serbian (~0.92 UAS, ~0.88 LAS), Slovenian (~0.87 UAS, ~0.86 LAS). These are gross estimates and are heavily dependent on the specificity of test data for each language.<\/p>\n<p><b>References:<\/b><\/p>\n<ul>\n<li>\u017deljko Agi\u0107 and Nikola Ljube\u0161i\u0107 (2015). <i>Universal Dependencies for Croatian (that Work for Serbian, too)<\/i>. Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2015). Hissar, Bulgaria. [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/agic15-universal.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/agic15-universal.txt\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<li>Tanja Samard\u017ei\u0107, Mirjana Starovi\u0107, \u017deljko Agi\u0107, Nikola Ljube\u0161i\u0107 (2017). <i>Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages<\/i>. Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2017). Valencia, Spain. [<a href=\"https:\/\/aclanthology.org\/W17-1407\/\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"https:\/\/aclanthology.org\/W17-1407.bib\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<\/ul>\n<p><b>N.B.<\/b> <i>The Tag and Lemmatise options can be chosen individually or in combination with one another. The NER and Dep Parse options always include the Tag and Lemmatise options.<\/i><\/p>\n<h3>Result<\/h3>\n<p>Once you have finished with text input\/file upload and set the parameters of your query, click on the PROCESS button. To perform a different type of annotation on the same text, simply select a different process or combination of processes in the Function field and click on the PROCESS button again. If you want to change the chosen text, click on the CLEAR button first.<\/p>\n<p>The Result field offers three different options:<\/p>\n<ul>\n<li>Table \u2013 The text is verticalized and presented in table form. Each token is located in a separate row. The first two columns are the token numbers and the tokens themselves, respectively. They are followed by columns containing tags, in the order that their respective annotations are listed in the Function field. The last two columns are the first and final character number of a given token within the given input text.<\/li>\n<li>Raw \u2013 The Raw view represents the way the text shown in the Table view is saved on the machine. The text is shown in JSON format. JSON is a file format used by machines to store and exchange data. It is easy both for humans to read and write and for machines to parse and generate. JSON is independent of any programming language, which makes it one of the best data-interchange formats. JSON attributes in the Raw view include information about sentences, tokens, text and annotation tags. The attribute \u201csentences\u201d contains information about every sentence, the attribute \u201ctoken\u201d contains information about every token, while the attribute \u201ctext\u201d contains the entire text from the Text field or the uploaded file. Attributes \u201cPOSTags\u201d, \u201cdepparsing\u201d, \u201clemmas\u201d and \u201cnamedEntities\u201d contain information about tags, with respect to the annotations selected in the Function field. (<i>More information on JSON can be found <\/i><a href=\"https:\/\/www.json.org\/\" target=\"_blank\" rel=\"noopener\"><i>here<\/i><\/a>.)<\/li>\n<li>Download \u2013 This option allows you to download the result of your query in the form of a TSV (Tab-Separated Values) file. The sentence is verticalized and the columns are separated by the \u201ctab\u201d character. (<i>However, please bear in mind that, unlike the Table view, there are <\/i><b><i>no<\/i><\/b><i> column headings.<\/i>) Much like JSON, the TSV format is commonly used by machines to store and exchange data.<\/li>\n<\/ul>\n<h2 id=\"lexicon\"><strong>Lexicon<\/strong><\/h2>\n<p>The \u00a0Lexicon is an online inflectional lexicon of Croatian, Serbian and Slovenian. It is based on<a href=\"http:\/\/hdl.handle.net\/11356\/1067\" target=\"_blank\" rel=\"noopener\"> hrLex<\/a>,<a href=\"http:\/\/hdl.handle.net\/11356\/1066\" target=\"_blank\" rel=\"noopener\"> srLex<\/a> and <a href=\"http:\/\/hdl.handle.net\/11356\/1033\" target=\"_blank\" rel=\"noopener\">Sloleks<\/a> for Slovenian.<\/p>\n<h3>Language<\/h3>\n<p>Before starting, please make sure that you have selected one of the three available languages from the Language drop-down menu at the bottom of the page: Croatian, Serbian, or Slovenian.<\/p>\n<h3>Search parameters<\/h3>\n<p><span style=\"text-decoration: underline;\"><b>Input<\/b><\/span><\/p>\n<ul>\n<li><b> Regular Input <\/b>allows you to completely match a string (e.g. <i>petosatni<\/i>), or to use the % character as wildcard. For instance: pet% (<i>pet<\/i>, <i>petodnevni<\/i>, <i>petostran<\/i>, etc.), %pet (<i>pet, napet, trepet, opet<\/i> etc.), %pet% (any string containing the substring <i>pet<\/i>).<\/li>\n<li><b> Regex Input <\/b>refers to the use of regular expressions. A regular expression is a type of string that contains special characters and is used to search for patterns. If you are, for instance, trying to find all verbs derived from the verb <i>ljubiti<\/i>, you can find them by searching for <i>[a-\u017e]ljubiti<\/i>. The search will yield results such as <i>izljubiti<\/i>, <i>obljubiti<\/i>, <i>poljubiti<\/i>, <i>priljubiti<\/i>, etc. This is because the range specified in the square brackets encompasses the entire alphabet.<\/li>\n<\/ul>\n<p>A list of frequently used regular expressions with explanations can be found\u00a0<a href=\"https:\/\/www.sketchengine.eu\/user-guide\/user-manual\/concordance-introduction\/regular-expressions\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p><span style=\"text-decoration: underline;\"><b>Surface form<\/b><\/span><\/p>\n<p>The surface form of a word is the form a word appears in in a given text. For instance, in the sentence <i>Gosti su stigli<\/i>, the surface form of the verb <i>sti\u0107i<\/i> is <i>stigli<\/i>.<\/p>\n<p><span style=\"text-decoration: underline;\"><b>Lemma<\/b><\/span><\/p>\n<p>A lemma (also known as the citation or dictionary form) is a single form which stands for different inflected forms of one word. For example, the lemmas of the words in the sentence <i>Gosti su stigli <\/i>are <i>gost <\/i>(the nominative singular form of the noun), <i>biti<\/i> and <i>sti\u0107i <\/i>(the infinitive form of the verbs).<\/p>\n<p>In the case of uninflected words, the lemma is the same as the surface form.<\/p>\n<p>If you are unsure about the lemma of a particular word, please use the Tagger and select the Lemmatise option.<\/p>\n<p><span style=\"text-decoration: underline;\"><b>MSD<\/b><\/span><\/p>\n<p>MSD tags contain morphosyntactic descriptions of words. In the sentence <i>Gosti su stigli<\/i>, the MSD tag for the word <i>gosti<\/i> \u00a0is <i>Ncmpn<\/i>. This tag tells us that <i>gosti <\/i>is a common noun (<i>Nc<\/i>) of masculine gender (<i>m<\/i>) and is plural (<i>p<\/i>) and in the nominative case (<i>n<\/i>).<\/p>\n<p>There is a complete list of MSD tags and their features for each language \u2014 <a href=\"http:\/\/nl.ijs.si\/ME\/V5\/msd\/html\/msd-hr.html\" target=\"_blank\" rel=\"noopener\">Croatian<\/a>, <a href=\"http:\/\/nl.ijs.si\/ME\/Vault\/V5\/msd\/html\/msd-sr.html\" target=\"_blank\" rel=\"noopener\">Serbian<\/a>, and <a href=\"http:\/\/nl.ijs.si\/ME\/V5\/msd\/html\/msd-sl.html\" target=\"_blank\" rel=\"noopener\">Slovenian<\/a>.<\/p>\n<p><span style=\"text-decoration: underline;\"><b>No. of syllables<\/b><\/span><\/p>\n<p>This option allows you to narrow down your search by limiting the number of syllables.<\/p>\n<p><span style=\"text-decoration: underline;\"><b>Result<\/b><\/span><\/p>\n<p>Once you have set the parameters of your query, click on the FILTER button. To perform a new search, click on the CLEAR button first.<\/p>\n<p>The results can be viewed in the form of a table, in which the first column is the surface form, followed by a MSD tag column and a lemma column, respectively. Choosing the Table view also allows you to search within the results.<\/p>\n<p>In the Raw view, the results are presented in JSON format. (<i>More information on JSON can be found <\/i><a href=\"https:\/\/www.json.org\/\" target=\"_blank\" rel=\"noopener\"><i>here<\/i><\/a>.)<\/p>\n<p><b>References:<\/b><\/p>\n<ul>\n<li>Nikola Ljube\u0161i\u0107, Filip Klubi\u010dka, \u017deljko Agi\u0107, Ivo-Pavao Jazbec (2016). <i>New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian<\/i>. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portoro\u017e, Slovenia. [<a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2016\/pdf\/340_Paper.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2016\/summaries\/340.html\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<\/ul>\n<h1 id=\"library\">Python library<\/h1>\n<p>The Python library can be found at: <a href=\"https:\/\/github.com\/clarinsi\/reldi-lib\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/clarinsi\/reldi-lib<\/a><\/p>\n<h2><strong>Installing the library<\/strong><\/h2>\n<p>The easiest way to install the ReLDI library is through <a href=\"https:\/\/pypi.python.org\/pypi\" target=\"_blank\" rel=\"noopener\">PyPI<\/a> from your command line interface.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<pre>$ sudo pip install reldi<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><strong>Using the library<\/strong><\/h2>\n<h3>Requirements<\/h3>\n<p>The library is run using Python 2. In Mac and Linux operating systems Python 2 is pre installed. If Python 2 is not pre installed, it is suggested to setup a virtual environment. For more information, please check the following link: <a href=\"https:\/\/conda.io\/projects\/conda\/en\/latest\/user-guide\/tasks\/manage-python.html\" target=\"_blank\" rel=\"noopener\">https:\/\/conda.io\/projects\/conda\/en\/latest\/user-guide\/tasks\/manage-python.html<\/a>. In addition, be aware that server settings prevent the library from processing files larger than 8KB.<\/p>\n<h3>Scripts<\/h3>\n<p><strong>restore_all.py<\/strong><\/p>\n<p>If you need diacritic restoration, you will want to use the restore_all.py script. You can observe the output of the script in the file examples\/example.txt.redi.<\/p>\n<pre>$ python restore_all.py hr examples\/example.txt<\/pre>\n<p>&nbsp;<\/p>\n<p>Notice that batch file processing is available, as well, by giving a directory as the second argument. Running the following command will process all files in the defined directory with the extension .txt.<\/p>\n<pre>$ python restore_all.py hr examples\/<\/pre>\n<p>&nbsp;<\/p>\n<p>You can get more information by running:<\/p>\n<pre>$ python restore_all.py -h<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>tag_all.py<\/strong><\/p>\n<p>If you need tokenisation, morphosyntactic tagging and\/or lemmatisation, you will want to use the tag_all.py script. You can inspect the output of the script in the file examples\/example.txt.redi.taglem.<\/p>\n<pre>$ python tag_all.py hr examples\/example.txt.redi<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>ner_all.py<\/strong><\/p>\n<p>If you need named entity recognition with morphosyntactic tagging and lemmatisation, you will use the ner_all.py script. You can inspect the output of the script in the file examples\/example.txt.redi.tagNERlem.<\/p>\n<pre>$ python ner_all.py hr examples\/example.txt.redi<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>parse_all.py<\/strong><\/p>\n<p>If you also need dependency parsing, you can use the parse_all.py script. The output of this script is available in the file examples\/example.txt.redi.parse. The interface of all three scripts scripts is very similar.<\/p>\n<pre>$ python parse_all.py hr examples\/example.txt.redi\r\n\r\n<\/pre>\n<h3>Library<\/h3>\n<p>If you want to use the web service responses in your own code, the best option is to use the library directly. Below are some simple examples of the diacritic restorer and the tokeniser\/tagger\/lemmatiser from the Python interactive mode:<\/p>\n<pre><b>&gt;&gt;&gt; <\/b><b>import<\/b> json\r\n<b>&gt;&gt;&gt; <\/b><b>from<\/b> reldi.restorer <b>import<\/b> DiacriticRestorer\r\n<b>&gt;&gt;&gt; <\/b>dr=DiacriticRestorer('hr')\r\n<b>&gt;&gt;&gt; <\/b>dr.authorize('my_username','my_password')\r\n<b>&gt;&gt;&gt; <\/b>json.loads(dr.restore('Cudil bi se da ovo dela.')) \u00a0\r\n{'orthography': [{'tokenIDs': 't_0', 'ID': 'pt_0', 'value': '\\xc4\\x8cudil'}, {'tokenIDs': 't_1', 'ID': 'pt_1', 'value': 'bi'}, {'tokenIDs': 't_2', 'ID': 'pt_2', 'value': 'se'}, {'tokenIDs': 't_3', 'ID': 'pt_3', 'value': 'da'}, {'tokenIDs': 't_4', 'ID': 'pt_4', 'value': 'ovo'}, {'tokenIDs': 't_5', 'ID': 'pt_5', 'value': 'dela'}, {'tokenIDs': 't_6', 'ID': 'pt_6', 'value': '.'}], 'text': 'Cudil bi se da ovo dela.', 'tokens': [{'endChar': '5', 'startChar': '1', 'ID': 't_0', 'value': 'Cudil'}, {'endChar': '8', 'startChar': '7', 'ID': 't_1', 'value': 'bi'}, {'endChar': '11', 'startChar': '10', 'ID': 't_2', 'value': 'se'}, {'endChar': '14', 'startChar': '13', 'ID': 't_3', 'value': 'da'}, {'endChar': '18', 'startChar': '16', 'ID': 't_4', 'value': 'ovo'}, {'endChar': '23', 'startChar': '20', 'ID': 't_5', 'value': 'dela'}, {'endChar': '24', 'startChar': '24', 'ID': 't_6', 'value': '.'}], 'sentences': [{'tokenIDs': 't_0 t_1 t_2 t_3 t_4 t_5 t_6', 'ID': 's_0'}]}\r\n\r\n<b>&gt;&gt;&gt; <\/b><b>from<\/b> reldi.tagger <b>import<\/b> Tagger\r\n<b>&gt;&gt;&gt; <\/b>t=Tagger('hr')\r\n<b>&gt;&gt;&gt; <\/b>t.authorize('my_username','my_password')\r\n<b>&gt;&gt;&gt; <\/b>json.loads(t.tagLemmatise(u'Ovi alati rade dobro.'.encode('utf8')))\r\n{'tokens': [{'endChar': '3', 'startChar': '1', 'ID': 't_0', 'value': 'Ovi'}, {'endChar': '9', 'startChar': '5', 'ID': 't_1', 'value': 'alati'}, {'endChar': '14', 'startChar': '11', 'ID': 't_2', 'value': 'rade'}, {'endChar': '20', 'startChar': '16', 'ID': 't_3', 'value': 'dobro'}, {'endChar': '21', 'startChar': '21', 'ID': 't_4', 'value': '.'}], 'lemmas': [{'tokenIDs': 't_0', 'ID': 'le_0', 'value': 'ovaj'}, {'tokenIDs': 't_1', 'ID': 'le_1', 'value': 'alat'}, {'tokenIDs': 't_2', 'ID': 'le_2', 'value': 'raditi'}, {'tokenIDs': 't_3', 'ID': 'le_3', 'value': 'dobro'}, {'tokenIDs': 't_4', 'ID': 'le_4', 'value': '.'}], 'text': 'Ovi alati rade dobro.', 'POSTags': [{'tokenIDs': 't_0', 'ID': 'pt_0', 'value': 'Pd-mpn'}, {'tokenIDs': 't_1', 'ID': 'pt_1', 'value': 'Ncmpn'}, {'tokenIDs': 't_2', 'ID': 'pt_2', 'value': 'Vmr3p'}, {'tokenIDs': 't_3', 'ID': 'pt_3', 'value': 'Rgp'}, {'tokenIDs': 't_4', 'ID': 'pt_4', 'value': 'Z'}], 'sentences': [{'tokenIDs': 't_0 t_1 t_2 t_3 t_4', 'ID': 's_0'}]}\r\n\r\n<b>&gt;&gt;&gt; <\/b><b>from<\/b> reldi.parser <b>import<\/b> Parser\r\n<b>&gt;&gt;&gt; <\/b>p=Parser('hr')\r\n<b>&gt;&gt;&gt; <\/b>p.authorize('my_username','my_password')\r\n<b>&gt;&gt;&gt; <\/b>json.loads(p.tagLemmatiseParse(u'Ovi alati rade dobro.'.encode('utf8')))\r\n\r\n<b>&gt;&gt;&gt; <\/b><b>from<\/b> reldi.ner_tagger <b>import<\/b> NERTagger\r\n<b>&gt;&gt;&gt; <\/b>n=NERTagger('hr')\r\n<b>&gt;&gt;&gt; <\/b>n.authorize('my_username','my_password')\r\n<b>&gt;&gt;&gt; <\/b>json.loads(n.tag(u'Ovi alati u Sloveniji rade dobro.'.encode('utf8')))\r\n\r\n<b>&gt;&gt;&gt; <\/b><b>from<\/b> reldi.lexicon <b>import<\/b> Lexicon\r\n<b>&gt;&gt;&gt; <\/b>lex=Lexicon('hr')\r\n<b>&gt;&gt;&gt; <\/b>lex.authorize('my_username','my_password')\r\n<b>&gt;&gt;&gt; <\/b>json.loads(lex.queryEntries(surface=\"pet\"))<\/pre>\n<h2><strong>Code documentation<\/strong><\/h2>\n<h3>Code diagram<\/h3>\n<p><a href=\"http:\/\/www.clarin.si\/info\/wp-content\/uploads\/2018\/11\/code_diagram.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3658 size-full\" src=\"http:\/\/www.clarin.si\/info\/wp-content\/uploads\/2018\/11\/code_diagram.jpg\" alt=\"\" width=\"1369\" height=\"673\" srcset=\"https:\/\/www.clarin.si\/info\/wp-content\/uploads\/2018\/11\/code_diagram.jpg 1369w, https:\/\/www.clarin.si\/info\/wp-content\/uploads\/2018\/11\/code_diagram-300x147.jpg 300w, https:\/\/www.clarin.si\/info\/wp-content\/uploads\/2018\/11\/code_diagram-768x378.jpg 768w, https:\/\/www.clarin.si\/info\/wp-content\/uploads\/2018\/11\/code_diagram-1024x503.jpg 1024w\" sizes=\"auto, (max-width: 1369px) 100vw, 1369px\" \/><\/a><\/p>\n<h3>Source code description<\/h3>\n<p>The directory <i>reldi <\/i>contains the code of the library. The library can be run using this code or the scripts which represent an additional layer of user interface.<\/p>\n<p><strong>auth.py<\/strong><\/p>\n<p>The class that checks if authentication credentials are valid.<\/p>\n<p><strong>client.py<\/strong><\/p>\n<p>The class that queries the backend, if the authorization is successful. This is a base class that is derived by every specific purpose class (tagger, lexicon, etc).<\/p>\n<p><strong>connection.py<\/strong><\/p>\n<p>The class that directly communicates with the backend. Both <i>auth.py<\/i> and <i>client.py <\/i>use this class as a mediator between themselves and the backend.<\/p>\n<p><strong>lemmatiser.py<\/strong><\/p>\n<p>The class that lemmatises the input text. Derived from the Client class.<\/p>\n<p><strong>lexicon.py<\/strong><\/p>\n<p>The class that offers lexicon options. Derived from the Client class.<\/p>\n<p><strong>ner_tagger.py<\/strong><\/p>\n<p>The class that is used for named entity recognition. Derived from the Client class.<\/p>\n<p><strong>parser.py<\/strong><\/p>\n<p>The class that is used for dependency parsing. Derived from the Client class.<\/p>\n<p><strong>restorer.py<\/strong><\/p>\n<p>The class that is used for diacritic restoration. Derived from the Client class.<\/p>\n<p><strong>tagger.py<\/strong><\/p>\n<p>The class that is used for tagging and lemmatising. Derived from the Client class.<\/p>\n<h1><strong>References<\/strong><\/h1>\n<p>The papers describing specific technologies (that should be cited if any of them are used) are the following:<\/p>\n<p><strong>Tokenisation<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/clarinsi\/reldi-tokeniser\" target=\"_blank\" rel=\"noopener\">Tokeniser tool Github repository<\/a><\/li>\n<\/ul>\n<p><strong>Diacritic restoration<\/strong><\/p>\n<ul>\n<li>Nikola Ljube\u0161i\u0107, Toma\u017e Erjavec, and Darja Fi\u0161er (2016). Corpus-based diacritic restoration for south slavic languages. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portoro\u017e, Slovenia. [<a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2016\/pdf\/361_Paper.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2016\/summaries\/361.html\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<li><a href=\"https:\/\/github.com\/clarinsi\/redi\" target=\"_blank\" rel=\"noopener\">Diacritic restoration tool Github repository<\/a><\/li>\n<\/ul>\n<p><strong>MSD tagging<\/strong><\/p>\n<ul>\n<li>Nikola Ljube\u0161i\u0107, Toma\u017e Erjavec (2016). <i>Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene<\/i>. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/ljubesic16b-corpus.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/ljubesic16b-corpus.txt\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<li>Nikola Ljube\u0161i\u0107, Filip Klubi\u010dka, \u017deljko Agi\u0107, Ivo-Pavao Jazbec (2016). <i>New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian<\/i>. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portoro\u017e, Slovenia. [<a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2016\/pdf\/340_Paper.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2016\/summaries\/340.html\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<li><a href=\"https:\/\/github.com\/clarinsi\/reldi-tagger\" target=\"_blank\" rel=\"noopener\">MSD tagger and lemmatiser Github repository<\/a><\/li>\n<\/ul>\n<p><strong>Dependency parsing<\/strong><\/p>\n<ul>\n<li>\u017deljko Agi\u0107 and Nikola Ljube\u0161i\u0107 (2015). <i>Universal Dependencies for Croatian (that Work for Serbian, too)<\/i>. Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2015). Hissar, Bulgaria. [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/agic15-universal.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/agic15-universal.txt\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<li>Tanja Samard\u017ei\u0107, Mirjana Starovi\u0107, \u017deljko Agi\u0107, Nikola Ljube\u0161i\u0107 (2017). <i>Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages<\/i>. Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2017). Valencia, Spain. [<a href=\"https:\/\/aclanthology.org\/W17-1407\/\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"https:\/\/aclanthology.org\/W17-1407.bib\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<\/ul>\n<p><strong>Named entity recognition<\/strong><\/p>\n<ul>\n<li>based on the <a href=\"https:\/\/github.com\/clarinsi\/janes-ner\" target=\"_blank\" rel=\"noopener\">[janes-ner]<\/a> NER tagger<\/li>\n<li>Darja Fi\u0161er, Nikola Ljube\u0161i\u0107 and Toma\u017e Erjavec (2018). <em>The Janes project: language resources and tools for Slovene user generated content<\/em>. Language Resouces and Evaluation. [<a href=\"https:\/\/link.springer.com\/article\/10.1007\/s10579-018-9425-z\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"https:\/\/citation-needed.springer.com\/v2\/references\/10.1007\/s10579-018-9425-z?format=bibtex&amp;flavour=citation\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<li>Nikola Ljube\u0161i\u0107, Marija Stupar, Tereza Juri\u0107 and \u017deljko Agi\u0107 (2013). <i>Combining Available Datasets for Building Named Entity Recognition Models of Croatian and Slovene<\/i>. Sloven\u0161\u010dina 2.0. [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/ljubesic13-combining.pdf\" target=\"_blank\" rel=\"noopener\">Link<\/a>] [<a href=\"http:\/\/nlp.ffzg.hr\/data\/publications\/nljubesi\/ljubesic13-combining.txt\" target=\"_blank\" rel=\"noopener\">.bib<\/a>]<\/li>\n<li><a href=\"https:\/\/github.com\/clarinsi\/janes-ner\" target=\"_blank\" rel=\"noopener\">NER tagger Github repository<\/a><\/li>\n<\/ul>\n<div id=\"themify_builder_content-3842\" data-postid=\"3842\" class=\"themify_builder_content themify_builder_content-3842 themify_builder\">\n    <\/div>\n<!-- \/themify_builder_content -->\n","protected":false},"excerpt":{"rendered":"<p>(SLOVENSKI PREVOD V PRIPRAVI) Update: The ReLDIanno service has been replaced by the current state-of-the-art in linguistic processing of South Slavic languages \u2013 the\u00a0CLASSLA python package\u00a0that deals with the same levels of processing as this web service, but for an extended list of South Slavic languages, and with significantly improved performance. You can test the [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"parent":3834,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-3842","page","type-page","status-publish","hentry","has-post-title","has-post-date","has-post-category","has-post-tag","has-post-comment","has-post-author",""],"_links":{"self":[{"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages\/3842","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/comments?post=3842"}],"version-history":[{"count":25,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages\/3842\/revisions"}],"predecessor-version":[{"id":8828,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages\/3842\/revisions\/8828"}],"up":[{"embeddable":true,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages\/3834"}],"wp:attachment":[{"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/media?parent=3842"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}