• About
  • Documentation

  • More Universes
  • Recent Updates
  • Leader board

  • All repositories
  • All packages
  • All articles
  • All datasets
  • All system Libraries
data-cleaning
  • Builds
  • Packages
  • Articles
  • Datasets
  • Contribution
  • Badges
  • API
  • Feed

Links todata-cleaning

validate - Data Validation Infrastructure

Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. Supports checks implied by an SDMX DSD file as well. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, Chapter 6 and the JSS paper (2021) <doi:10.18637/jss.v097.i10>.

Last updated

data-cleaningvalidation

12.19 score 431 stars 9 dependents 588 scripts 2.4k downloads

editrules - Parsing, Applying, and Manipulating Data Cleaning Rules

Please note: active development has moved to packages 'validate' and 'errorlocate'. Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the 'igraph' package.

Last updated

7.00 score 22 stars 1 dependents 152 scripts 443 downloads

errorlocate - Locate Errors with Validation Rules

Errors in data can be located and removed using validation rules from package 'validate'. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, chapter 7.

Last updated

data-cleaningerrorsinvalidation

6.82 score 22 stars 75 scripts 391 downloads

dcmodify - Modify Data Using Externally Defined Modification Rules

Data cleaning scripts typically contain a lot of 'if this change that' type of statements. Such statements are typically condensed expert knowledge. With this package, such 'data modifying rules' are taken out of the code and become in stead parameters to the work flow. This allows one to maintain, document, and reason about data modification rules as separate entities.

Last updated

6.02 score 11 stars 64 scripts 320 downloads

lintools - Manipulation of Linear Systems of (in)Equalities

Variable elimination (Gaussian elimination, Fourier-Motzkin elimination), Moore-Penrose pseudoinverse, reduction to reduced row echelon form, value substitution, projecting a vector on the convex polytope described by a system of (in)equations, simplify systems by removing spurious columns and rows and collapse implied equalities, test if a matrix is totally unimodular, compute variable ranges implied by linear (in)equalities.

Last updated

5.56 score 4 stars 3 dependents 20 scripts 456 downloads

validatetools - Checking and Simplifying Validation Rule Sets

Rule sets with validation rules may contain redundancies or contradictions. Functions for finding redundancies and problematic rules are provided, given a set a rules formulated with 'validate'.

Last updated

data-cleaningrulesvalidation

5.26 score 16 stars 46 scripts 289 downloads

validatesuggest - Generate Suggestions for Validation Rules

Generate suggestions for validation rules from a reference data set, which can be used as a starting point for domain specific rules to be checked with package 'validate'.

Last updated

data-cleaningvalidation

4.40 score 5 stars 5 scripts 204 downloads

deducorrect - Deductive Correction, Deductive Imputation, and Deterministic Correction

A collection of methods for automated data cleaning where all actions are logged. NOTE: active development has moved to the 'deductive' package.

Last updated

4.22 score 9 stars 37 scripts 281 downloads

deductive - Data Correction and Imputation Using Deductive Methods

Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.

Last updated

data-cleaning

3.99 score 14 stars 14 scripts 400 downloads