Here are
91 public repositories
matching this topic...
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
Updated
May 21, 2024
-
Python
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
-
Updated
May 20, 2024
-
Python
Bitextor generates translation memories from multilingual websites
-
Updated
May 21, 2024
-
Python
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
-
Updated
Feb 11, 2024
-
Macaulay2
Python library for handling audio datasets.
-
Updated
Jul 6, 2023
-
Python
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
-
Updated
May 17, 2024
-
Python
OpusFilter - Parallel corpus processing toolkit
-
Updated
May 2, 2024
-
Python
Utilities for Processing the Switchboard Dialogue Act Corpus
-
Updated
Jan 24, 2021
-
Python
An open source reimplementation of Benny Brodda's BETA in Python
-
Updated
Oct 28, 2019
-
Python
An advanced, extensible web front-end for the Manatee-open corpus search engine
-
Updated
May 21, 2024
-
TypeScript
-
Updated
Aug 11, 2023
-
HTML
Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework
-
Updated
May 22, 2023
-
Python
A set of workflows for corpus building through OCR, post-correction and normalisation
-
Updated
Sep 7, 2022
-
Python
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
A parser for annotated MuseScore 3 files.
-
Updated
Jan 22, 2024
-
Python
Python library for extracting quantitative, reproducible metrics of multi-level alignment between two speakers in naturalistic language corpora.
-
Updated
Dec 27, 2023
-
Python
Reading the data from OPIEC - an Open Information Extraction corpus
-
Updated
Jun 12, 2019
-
Java
Rezonator: Dynamics of human engagement
-
Updated
May 11, 2023
-
Yacc
Utilities for Processing the Meeting Recorder Dialogue Act Corpus
-
Updated
Jan 24, 2021
-
Python
Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora
Improve this page
Add a description, image, and links to the
corpus-tools
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
corpus-tools
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.