What's New
corpus

Description:
The Greek web corpus MaCoCu-el 1.0 was built by crawling the ".gr", ".ελ", ".cy" and ".eu" internet top-level domains in 2023, extending the crawl dynamically to other domains as well. The crawler is available at ...
Ta vnos vsebuje 2 datotek(e) (16.23
GB).
Publicly Available
corpus

Description:
The Catalan web corpus MaCoCu-ca 1.0 was built by crawling the ".cat", ".es", ".ad", ".fr", ".it" and ".eu" internet top-level domains in 2022, extending the crawl dynamically to other domains as well. The crawler is ...
Ta vnos vsebuje 2 datotek(e) (4.72
GB).
Publicly Available
corpus

Description:
The Ukrainian web corpus MaCoCu-uk 1.0 was built by crawling the ".ua" and ".укр" internet top-level domains in 2022, extending the crawl dynamically to other domains as well. The crawler is available at https://github.c ...
Ta vnos vsebuje 2 datotek(e) (24.58
GB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus

Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (2.17
GB).
Publicly Available


corpus

Description:
The Catalan web corpus MaCoCu-ca 1.0 was built by crawling the ".cat", ".es", ".ad", ".fr", ".it" and ".eu" internet top-level domains in 2022, extending the crawl dynamically to other domains as well. The crawler is ...
Ta vnos vsebuje 2 datotek(e) (4.72
GB).
Publicly Available
corpus

Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (23.37
GB).
Publicly Available

