Here are
64 public repositories
matching this topic...
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
Updated
Jun 3, 2024
-
Python
The open-source tool for building high-quality datasets and computer vision models
-
Updated
Jun 3, 2024
-
Python
-
Updated
Jan 12, 2024
-
Python
fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
-
Updated
May 30, 2024
-
Python
Interactively explore unstructured datasets from your dataframe.
-
Updated
May 29, 2024
-
TypeScript
A curated, but incomplete, list of data-centric AI resources.
Curated list of open source tooling for data-centric AI on unstructured data.
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
-
Updated
May 28, 2024
-
Python
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
-
Updated
Jan 10, 2024
-
Python
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
-
Updated
Dec 26, 2022
-
Python
Lesson guide and textbook for "Data as a Science" course.
-
Updated
Jun 5, 2021
-
Jupyter Notebook
A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well as S.O.T.A. diffusion and auto-tag/caption models for your purposes. Custom datasets can be added!
-
Updated
Jan 5, 2024
-
Python
Code and data for "Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation" (EMNLP 2023)
-
Updated
Apr 22, 2024
-
Python
A web service for semi-automated conversion of raw imaging data to BIDS
Client interface for all things Cleanlab Studio
-
Updated
Jun 3, 2024
-
Python
Curation of BIDS (CuBIDS): A sanity-preserving software package for processing BIDS datasets.
-
Updated
Jun 3, 2024
-
Python
Curated list of known efforts in collecting and/or curating of chemical/materials data
-
Updated
Oct 17, 2017
-
Mathematica
AqSolDB: A curated aqueous solubility dataset contains 9.982 unique compounds.
-
Updated
Apr 18, 2020
-
Python
???? A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.
-
Updated
May 30, 2024
-
Python
Improve this page
Add a description, image, and links to the
data-curation
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
data-curation
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.