Release note: we change loose checking from char-by-char comparison to word-by-word comparison. Also, we change the benchmark from loose checking to strict checking by default.
Release note: we rebuild the code and API of gtchecker, and we deleted the Min Edit Distance method because it has mathmatical mistake in our previous implementation. In 0.30 version, gtchecker supports two checking mode: strict mode and loose mode, users can use different mode by their needs. We also add a new method called high frequency words signature (HFWS), which aims to solve the computation and storage overhead during the checking process in a large numbers of documents. Note that, HFWS is still under researching and developping.
Release note: we change the code for GenText-Checker from Python to C++, achiving an order of magnitudes performance speedup. We also add an ensemble method to select similarity calculator automatically by the size of input documents. We delete jaccard similarity calculator.
Release note: the first release of GenText-Checker package, which contains a basic benchmark for different plagiarism-checking methods, including cosine similarity, jaccard similarity, and min edit distance.
Release note: the first release of libsig package, which contains a basic python-support library to generate public and private keys, sign files, and verify signature.