NEWS
validate 1.1.6
- Fix: rules in YAML that included a literal "—" would not be parsed
correctly (thanks to Paul Horikx for the fix)
- updated the Eurostat SDMX REST API endpoint
validate 1.1.5 (2024-02-14)
- Fixed several (possibly) invalid URLs pointing to the ESS CROS website
by replacing them to versions of documents on markvanderloo.eu.
- Fixed several Rd bugs (thanks to Kurt Hornik for pointing out the changes).
- Changed internal function names to comply with stricter S3 methods checking
in R CMD check --as-cran
validate 1.1.3 (2023-03-28)
- 'violating', 'satisfying' and 'lacking' are now generic (useful for validatedb)
- as.data.frame,confrontation-method now includes results of rules that yielded
a warning. (Thanks to Patrick Driessens for reporting)
- as.data.frame,confrontation-method now returns a zero-row data frame when
confrontation object has no results or only errors. (Thanks to Patrick
Driessens for reporting)
- Fix: rules are checked when read from file (was skipped in earlier versions)
- Cookbook updates: removed reference to FAO SDMX server as it apears to not
exist anymore. Updated references and links as journal papers are now
published.
validate 1.1.1 (2022-03-24)
- 'aggregate' with argument by="rule" now ensures that the rule-by-rule
output is in the same order as in the confrontation object (they
used to be grouped by dimension structure). This fixes an issue
where plots of validation objects would have the wrong placement
of y-axis labels. (Thanks to Jonas Klingwort for reporting).
- Fix in plot method for 'validation' objects: test for errors yielded a
warning.
validate 1.1.0 (2021-10-07)
- Added support for SDMX codelists from SDMX REST APIs, see
'sdmx_codelist', 'estat_codelist', 'global_codelist'.
- plot-validation method now prints a message to console when one or more
rules yielded an error so it's explicit that not all data can be plotted.
(Thanks to John Kennedy for pointing this out).
- Fix: 'aggregate' would fail for validations based on empty data frames (thanks
to Matthials Gomolka for reporting).
- Fix: empty names in a data frame passed as '.data' to 'validator' would
yield empty names in the resulting 'validator' object. (Thanks to Wytze
Gelderloos for reporting)
validate 1.0.4 (2021-04-29)
- Added 'rmarkdown' to 'Suggests' as requested by CRAN
- Small updates for compatibility with new version of 'settings'
validate 1.0.2 (2021-03-30)
- Updated CITATION since paper will be published by JSS (Thanks to Achim Zeileis).
- Improved 'print' method of all objects inheriting from 'confrontation' (#129)
- 'compare' now also works with 'tibble' objects -- who disguise themselves
as data frames but behave differently.
- Fix: add_indicators() would fail or not add results properly in some
cases. (Thanks to Jos de Waard for reporting #128).
- Fix: 'as.data.frame' would fail for validator objects where at least
one of the rules contained a var_group(). (Thanks to Matthias Gomolka
for reporting). The issue with 'var_group', and ':=' assignments is that
the expanded list of expressions generated by a validator is not one-to-one
with the original list of expressions in the validator. Therefore the
(internal) ‘$exprs()' method now adds an attribute to each expression
it outputs, called ’reference' that is an index into the original
validator object.
- Fix: Parsing yaml files would crash when the 'expr' field is empty
(Thanks to Matthias Gomolka for reporting).
validate 1.0.1 (2020-12-08)
- Fixed URL issues caught by CRAN but not by R CMD check --as-cran.
validate 1.0.0
- Added The Data Validation Cookbook as vignette. See also
<https://data-cleaning.github.io/validate>
- Improved visualisation of validation output using 'plot'. The 'barplot'
method for 'confrontation' objects is now deprecated and will be removed
eventually.
- New functions 'violating', 'satisfying', 'lacking' to select records
based on 'validator' objects or the outcome of validations ('validation'
objects).
- New function 'hierarchy' checks aggregations against hierarchical code lists.
Supports globbing and regular expression to define parent-child relations.
- New functions 'in_linear_sequence' and 'is_linear_sequence'
check for gaps/duplicates in numerical or time series.
- New function in_range checks whether a variable is between bounds.
- New function 'part_whole_relation' facilitates checking aggregates
against details, for data in 'long' format.
- New helper functions that for split-apply-combine on vectors, with the property
that output is always of the same size as the input: 'sum_by', 'min_by', 'max_by'
'mean_by' and the general 'do_by'.
- New function 'hb' implementing Hiridoglou-Berthelot outlier measure.
- New convenience function 'field_length' checks number (or range) of code
points.
- New convenience function 'number_format' with simple syntax for
specifying layout of numbers that are stored as character.
- New functions to check (non)existens of a given set of key combinations:
'contains_exactly', 'contains_at_least', 'contains_at_most', 'does_not_contain'.
Support for globbing and regular expression is built in, and tests can be
performed group-wise.
- New function 'field_format': convenience wrapper for 'grepl' that also supports
globbing.
- New dataset 'samplonomy' with simulated national account figures of the
island of Samplonia in long format, and with errors.
- New dataset 'nace_rev2', representing the 2008 (revision 2) NACE hierarchy
for classification of economic activity.
- Added 'names' method for 'confrontation' objects.
- Fix: option 'lin.ineq.eps' was ignored by 'confront'.
- Fix: indexing of 'confrontation' objects failed in 'lapply' context. (Thanks
to Matthias Gomolka for GH issue #116)
- Package now depends on R >= 3.5.0 because of the samplonomy RData file.
- Breaking change: for consistency, the '...' argument in 'exists_one' and 'exists_any'
is replaced with 'by'. This only affects cases with multiple grouping variables.
- Internal: of x$y in reference data are now parsed to x[["y"]] (used to be x[,"y"]).
validate 0.9.3 (2019-12-16)
- New functions 'exists_any' and 'exists_one' to help define cross-record
validation rules (thanks to David Salgado)
- results of 'sort' and 'aggregate' now include key columns (if any)
- Added accepted JSS paper as vignette.
- Added CITATION file
validate 0.9.2 (2019-08-28)
- fixed CRAN issue: plots generated during tests would end up under inst/tinytest
which is against CRAN policy. (Thanks to Kurt Hornik).
validate 0.9.1 (2019-08-23)
- Fixed nonstandard URL caught on CRAN
- Added ORCID, link to book in DESCRIPTION
validate 0.9.0
- 'confront' now accepts a vector of 'key's (thanks to Luca Gramaglia for
suggesting)
- new function 'run_validation_file' runs an R script while capturing output
of 'confront' calls.
- Objects inheriting from 'confrontation' now store event metadata. See ?event.
- Objects of class 'validator' now store (default) metadata on expression
language and rule severity.
- Reference data passed to 'confront' can now be of any type (not just data.frame).
This breaks code where both 'ref' and 'key' were submitted and reference data is
expected to be matched with data under scrutiny.
- new helper functions: 'is_unique', 'all_unique', 'is_complete', 'all_complete'
- 'lbj_cells' object now has an (experimental) $plot() method.
- 'lbj_cells' and 'lbj_rules' have new slot 'label', to comply to 'lumberjack' update.
- Standard files for output by 'lbj_cells' and 'lbj_rules' now have 'lbj_' prefix.
- switched to 'tinytest' testing framework
- Fix in algorithm for clustering rules by shared variables #98.
- Fix in 'all' and 'any' methods for confrontation objects #92 (Thanks to
GitHub user 001ben for reporting).
- Fix: printing confront(data.frame(), validator()) would crash (thanks to Daniel Pritchard)
validate 0.2.6 (2018-08-01)
- New methods 'all' and 'any' for 'validation' objects
- New 'plot' method for 'validator' objects.
- New 'plot' method for 'validation' objects.
- New 'as.data.frame' methods for 'validatorComparison' and 'cellComparison'
objects.
- New 'plot' method for 'validatorComparison' and 'cellComparison' objects.
- New 'barplot' method for 'validatorComparison' and 'cellComparison' objects.
- Improved 'plot'/'barplot' method for objects of class 'validator'.
- Bugfixes in 'cells' and 'compare'.
- The row order of output in 'cells' has changed to be more consistent with
row order in the output of 'compare'.
- The row 'new_missing' is renamed 'removed' in 'cellComparison' objects.
- Improved cross-references in reference manual
validate 0.2.5 (2018-06-22)
- Documentation updates (Thanks to Matthias Gomolka)
- Removed deprecated functions 'number_missing', 'fraction_missing',
'row_missing', 'number_unique', 'any_unplicated', 'any_missing'.
- Bugfix: the 'call' object in a 'confrontation' object was created incorrectly.
This only affects printing.
validate 0.2.4 (2018-03-28)
- New set membership operator "%vin%", behaving consistently with "=="
- Set membership operator %in% is now replaced with %vin% (#80).
- Added '[[<-' and '[<-' methods for validator objects.
- Validation rules can now be endowed with user-defined metadata, see ?meta (#59).
- fixed (harmless) compiler warning (thanks to Brian Ripley)
- bugfix: empty [] selection on expressionset crashed (thanks to Anne Petersen)
- bugfix: NA in validating expression would crash parser (thanks to Masafumi Okada)
validate 0.2.0 (2017-08-07)
- new object of class 'indicator'
- validation rules now support if-then-else syntax.
- validaton rules can now contain embedded if-statements, e.g. A | (if (B) C)
- expressionset objects (validator, indicator) can be combined using '+'
- results of getter functions (e.g. 'labels') are now named.
- less verbose printng of options for expressionset objects (validator, indicator)
- expressionset objects (validator, indicator) can now be coerced to as.data.frame.
- confrontation objects (validation, indication) can now be coerced to data.frame.
- safer expression manipulation under the hood.
- breaking: 'rule' column now called 'name' in summary of 'confrontation' objects.
- bugfix: character indexing with multiple elements of expressionset objects would crash.
- bugfix: proper recycling for replace methods in expressionset.
validate 0.1.8
- Added loggers for the 'lumberjack' package: lbj_cells and lbj_rules
- Tolerance for checking (non-strict) linear inequalities is now controlled by voptions(lin.ineq.opts).
- Macro assignments through ':=' are no longer put in brackets.
validate 0.1.7 (2017-04-07)
- The missingess counters are now only internally documented and will be deprecated (the
introduction of the '.' made them more or less obsolete).
- Package now 'depends' on methods to allow dispatch on objects inhereting from data.frame
- bugfix: The assignment created(validator) <- POSIXct was broken. #65, thanks to Andrew R Gibson.
- bugfix: Simple equallity checks on character data behaved unexpectedly. #67, thanks to Kevin Kuo.
- Native routines now registered as required by updated CRAN policy.
validate 0.1.5 (2016-06-24)
- The '.' is now used to reference the validated data set as whole.
- Small change in output of 'compare' to match the table in van den Broek et al. (2013)
- registered native routines as now recommended by CRAN
validate 0.1.4 (2016-04-15)
- 'confront' now emits a warining when variable name conflicts with name of a reference data set
- Deprecated 'validate_reset', in favour of the shorter 'reset' (use 'validate::reset' in case of ambiguity)
- Deprecated 'validate_options' in favour of the shorter 'voptions'
- New option na.value with default value NA, controlling the output when a rule evaluates to NA.
- Added rules from the ESSnet on validation (deliverable 17) to automated tests.
- added 'grepl' to allowed validation syntax (suggested by Dusan Sovic)
- exported a few functions w/ keywords internal for extensibility
- Bugfix: blocks sometimes reported wrong nr of blocks (in case of a single connected block.)
- Bugfix: macro expansion failed when macros were reused in other macros.
- Bugfix: certain nonlinear relations were recognized as linear
- Bugfix: rules that use (anonymous) function definitions raised error when printed.
validate 0.1.3 (2015-09-09)