Show simple item record

 
dc.contributor.author Jelovšek, Tjaša
dc.contributor.author Lebar Bajec, Iztok
dc.contributor.author Bajec, Marko
dc.contributor.author Bajec, Žan
dc.contributor.author Cvek, Jernej
dc.date.accessioned 2022-12-02T10:50:11Z
dc.date.available 2022-12-02T10:50:11Z
dc.date.issued 2022-12-01
dc.identifier.uri http://hdl.handle.net/11356/1743
dc.description This Text Denormalisator converts Slovene spoken-form text into written-form text. Typically it is used as a post-processing step in Automatic Speech Recognition, which traditionally outputs spoken-form text. As input it accepts text in either string form, list of tokens, or a list of dictionaries with a mandatory "text" field. The output is a dictionary. Example of use: denormalize("Danes, osmega sedmega dva tisoč dvaindvajset, je lep sončen dan, saj je zunaj prijetnih petindvajset stopinj Celzija.") {'denormalized_content': [{'text': 'Danes', 'index': [0]}, {'text': ',', 'index': [1]}, {'text': '8.', 'index': [2]}, {'text': '7.', 'index': [3]}, {'text': '2022', 'index': [4, 5, 6]}, {'text': ',', 'index': [7]}, {'text': 'je', 'index': [8]}, {'text': 'lep', 'index': [9]}, {'text': 'sončen', 'index': [10]}, {'text': 'dan', 'index': [11]}, {'text': ',', 'index': [12]}, {'text': 'saj', 'index': [13]}, {'text': 'je', 'index': [14]}, {'text': 'zunaj', 'index': [15]}, {'text': 'prijetnih', 'index': [16]}, {'text': '25', 'index': [17]}, {'text': '°C', 'index': [18, 19]}, {'text': '.', 'index': [20]}], 'denormalized_string': 'Danes, 8. 7. 2022, je lep sončen dan, saj je zunaj prijetnih 25 °C.'}
dc.language.iso slv
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation.isreferencedby https://rsdo.slovenscina.eu/en/speech-technologies
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/licenses/Apache-2.0
dc.rights.label PUB
dc.source.uri https://github.com/clarinsi/Slovene_denormalizator
dc.subject text denormalisation
dc.title Slovene Text Denormalizator RSDO-DS2-DENORM 1.0
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
contact.person Iztok Lebar Bajec ilb@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
files.count 1
files.size 9349120


 Files in this item

This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
Slovene_denormalizator-1.0.tar
Size
8.92 MB
Format
Unknown
Description
RSDO DS2 DENORM 1.0
MD5
53a4894927e9b29d02c30efa729286b2
 Download file

Show simple item record