Logo image
Early Slavic language models
Preprint   Open access

Early Slavic language models

Nilo Pedrazzini
2023

Abstract

<p class="MsoNormal" style="background:white;line-height:17.15pt;margin-bottom:12.0pt;">Word embeddings trained on the lemmatised TOROT Treebank, using Word2Vec and the following parameters:<o:p></o:p></p><p class="MsoNormal" style="background:white;tab-stops:45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt;">sg = True min_count = <1,3,5> window = <3,5> vector_size = <100,200,300> epochs = 5<o:p></o:p></p><p class="MsoNormal" style="background:white;line-height:17.15pt;margin-bottom:12.0pt;">One model was trained for each combination of the parameters enclosed in angled brackets (< >). <o:p></o:p></p><p class="MsoNormal" style="background:white;line-height:17.15pt;margin-bottom:12.0pt;">The release contains both the full models (.model) and the plain vector files (_vectors.txt). The models are named according to the parameters they were trained with.<o:p></o:p></p><p class="MsoNormal" style="background:white;line-height:17.15pt;margin-bottom:12.0pt;">Note that these are the result of very preliminary experiments and no systematic evaluation of their quality was carried out, so use with caution.<o:p></o:p></p>
url
https://doi.org/10.5281/zenodo.8414137View
Published (Version of record) Open

Metrics

1 Record Views

Details

Logo image