Here are
91 public repositories
matching this topic...
(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too
-
Updated
Jan 3, 2023
-
Python
The AP Exam Corpus Project is a Python application that generates corpora for AP exams.
-
Updated
May 14, 2021
-
Python
Tools for creating speech corpora by extracting audio from YouTube videos
-
Updated
Aug 15, 2022
-
Python
It can help you to convert srt file into CN-? parallel corpus
-
Updated
Mar 31, 2018
-
JavaScript
Python scripts for the construction of the LEXB parallel corpus of South Tyrolean legislation (IT-DE).
-
Updated
Jan 23, 2022
-
Python
Open source Python package to produce word sketches inspired by Sketch Engine (to make reproducible analyses)
-
Updated
Jan 19, 2023
-
GLSL
Python API for extracting data from the MPQA corpus
-
Updated
Jan 6, 2017
-
Python
This package provides utility classes and static methods for Python that make use of different third party software commonly used in text processing such as: Unitex-GramLab, TreeTagger, Apache-Tika and Google-Tesseract.
-
Updated
Mar 4, 2022
-
Python
-
Updated
Jun 29, 2015
-
Python
Forpus is a Python library for processing plain text corpora to various corpus formats.
-
Updated
Mar 16, 2018
-
Python
Corpus analysis of plain text and providing Type-Token Ratio as well as some other statistics.
-
Updated
Oct 30, 2023
-
Python
Tool to generate lists of Bengali words and transcriptions matching given phonological descriptions
-
Updated
Dec 3, 2021
-
Python
An open-source web-based application for multi-task lexical normalisation
-
Updated
Feb 24, 2022
-
JavaScript
Cod yr ap Paldaruo i iOS ar gyfer torfoli casglu corpws lleferydd | Code for the Paldaruo speech corpus crowdsourcing ap for iOS
-
Updated
Aug 3, 2017
-
Objective-C
Linguistic resources for adapting FreeLing to Chilean Spanish
-
Updated
Jan 6, 2020
-
Makefile
Utility to guess some affix splits on Cherokee texts. Developed to use with the Moses Machine Translation software.
-
Updated
Aug 31, 2020
-
Java
Online parallel text alignment tool.
-
Updated
Feb 17, 2021
-
TypeScript
Analyzes binary executables and can generate a test corpus for defined instruction paths, each discovered function, or it can generate a test corpus to reach every basic block detected in non library/shared object parts of the bin's text section.
-
Updated
May 17, 2024
-
Python
Tidy concordances, collocates, and wordlist
Repositorio para disponibilizacao de bases de dados do Wikipedia e Simple Wikipedia pre-processadas, alem de scripts de pre-processamento e geracao de bases em Python.
Improve this page
Add a description, image, and links to the
corpus-tools
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
corpus-tools
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.