What's New
corpus

Description:
The Greek web corpus MaCoCu-el 1.0 was built by crawling the ".gr", ".ελ", ".cy" and ".eu" internet top-level domains in 2023, extending the crawl dynamically to other domains as well. The crawler is available at ...
This item contains 2 files (16.23
GB).
Publicly Available
corpus

Description:
The Catalan web corpus MaCoCu-ca 1.0 was built by crawling the ".cat", ".es", ".ad", ".fr", ".it" and ".eu" internet top-level domains in 2022, extending the crawl dynamically to other domains as well. The crawler is ...
This item contains 2 files (4.72
GB).
Publicly Available
corpus

Description:
The Ukrainian web corpus MaCoCu-uk 1.0 was built by crawling the ".ua" and ".укр" internet top-level domains in 2022, extending the crawl dynamically to other domains as well. The crawler is available at https://github.c ...
This item contains 2 files (24.58
GB).
Publicly Available
Most Viewed Items
Top Last Week
corpus

Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
This item contains 18 files (2.17
GB).
Publicly Available


corpus

Description:
The Catalan web corpus MaCoCu-ca 1.0 was built by crawling the ".cat", ".es", ".ad", ".fr", ".it" and ".eu" internet top-level domains in 2022, extending the crawl dynamically to other domains as well. The crawler is ...
This item contains 2 files (4.72
GB).
Publicly Available
corpus

Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
This item contains 18 files (23.37
GB).
Publicly Available

