inputfixerΒΆ

Input Fixer (qurry.qurrium.utils.inputfixer)

qurry.qurrium.utils.inputfixer.damerau_levenshtein_distance(seq1: Sequence[str], seq2: Sequence[str]) int[source]ΒΆ

Calculate the Damerau-Levenshtein distance between sequences. This distance is the number of additions, deletions, substitutions,

If you want to compare long strings, we recommend using RapidFuzz instead of this function. This function is designed for input suggestion for short string. which is hard to handle very long string.

Parameters:
  • seq1 (Iterable) – Sequence of items to be compared.

  • seq2 (Iterable) – Sequence of items to be compared.

Returns:

The distance between the two sequences.

Return type:

int

qurry.qurrium.utils.inputfixer.damerau_levenshtein_distance_py(seq1: Sequence[str], seq2: Sequence[str]) int[source]ΒΆ

Calculate the Damerau-Levenshtein distance between sequences.

This distance is the number of additions, deletions, substitutions, and transpositions needed to transform the first sequence into the second. Although generally used with strings, any sequences of comparable objects will work.

Transpositions are exchanges of consecutive characters; all other operations are self-explanatory.

This implementation is O(N*M) time and O(M) space, for N and M the lengths of the two sequences.

>>> dameraulevenshtein('ba', 'abc')
2
>>> dameraulevenshtein('fee', 'deed')
2

It works with arbitrary sequences too: >>> dameraulevenshtein(β€˜abcd’, [β€˜b’, β€˜a’, β€˜c’, β€˜d’, β€˜e’]) 2

This implementation is based on Michael Homer’s implementation (https://web.archive.org/web/20150909134357/http://mwh.geek.nz:80/2009/04/26/python-damerau-levenshtein-distance/) and inspired by https://github.com/lanl/pyxDamerauLevenshtein, a Cython implementation of same algorithm.

For more powerful string comparison, including Levenshtein distance, We recommend using the https://github.com/maxbachmann/RapidFuzz, It’s a library that wraps the C++ Levenshtein algorithm and other string processing functions. The most efficient Python implementation (using Cython) currently.

Parameters:
  • seq1 (Iterable) – Sequence of items to be compared.

  • seq2 (Iterable) – Sequence of items to be compared.

Returns:

The distance between the two sequences.

Return type:

int

qurry.qurrium.utils.inputfixer.outfields_check(outfields: dict[str, Any], infields: Sequence[str], simialrity_threshold: int = 2) tuple[dict[str, list[str]], list[str]][source]ΒΆ

Check if the outfields are in the infields but just typing wrong by Damerau-Levenshtein distance.

Parameters:
  • outfields (dict[str, Any]) – The outfields of the experiment.

  • infields (Iterable[str]) – The infields of the experiment.

  • simialrity_threshold (int, optional) – Similarity threshold. Defaults to 2.

Returns:

outfields_maybe:

The outfields that may be in the infields but typing wrong.

outfields_unknown:

The outfields that are not in the infields.

Return type:

tuple[dict[str, list[str]], list[str]]

qurry.qurrium.utils.inputfixer.outfields_hint(outfields_maybe: dict[str, list[str]], outfields_unknown: list[str], mute_outfields_warning: bool = False) None[source]ΒΆ

Print the outfields that may be in the infields but typing wrong.

Parameters:
  • outfields_maybe (dict[str, list[str]]) – The outfields that may be in the infields but typing wrong.

  • outfields_unknown (list[str]) – The outfields that are not in the infields.

  • mute_outfields_warning (bool, optional) – Mute the warning of unrecognized arguments. Defaults to False.