Lets start with Levenshtein distance algorithm to compare two texts.  Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. Get clear explanation this algorithm at wiki.

Method to find Levenshtein Distance

    public static int LevenshteinDistance(String s0, String s1) {

        int len0 = s0.length() + 1;
        int len1 = s1.length() + 1;

        // the array of distances
        int[] cost = new int[len0];
        int[] newcost = new int[len0];

        // initial cost of skipping prefix in String s0
        for (int i = 0; i < len0; i++)
            cost[i] = i;

        // dynamicaly computing the array of distances

        // transformation cost for each letter in s1
        for (int j = 1; j < len1; j++) {

            // initial cost of skipping prefix in String s1
            newcost[0] = j - 1;

            // transformation cost for each letter in s0
            for (int i = 1; i < len0; i++) {

                // matching current letters in both strings
                int match = (s0.charAt(i - 1) == s1.charAt(j - 1)) ? 0 : 1;

                // computing cost for each transformation
                int cost_replace = cost[i - 1] + match;
                int cost_insert = cost[i] + 1;
                int cost_delete = newcost[i - 1] + 1;

                // keep minimum cost
                newcost[i] = Math.min(Math.min(cost_insert, cost_delete),
                        cost_replace);
            }

            // swap cost/newcost arrays
            int[] swap = cost;
            cost = newcost;
            newcost = swap;
        }

        // the distance is the cost for transforming all letters in both strings
        return cost[len0 - 1];
    }

Percentage of Text Match 

public static int pecentageOfTextMatch(String s0, String s1) {
        int percentage = 0;
        // Trim and remove duplicate spaces
        s0 = s0.trim().replaceAll("\\s+", " ");
        s1 = s1.trim().replaceAll("\\s+", " ");
        percentage=(int) (100 - (float) LevenshteinDistance(s0, s1) * 100 / (float) (s0.length() + s1.length()));
        return percentage;
    }

Percentage of Match between array of Strings 

  1. Get as0, as1 (arrary of Strings)
  2. Calculate String frequency of as0, as1 with HashMaps hm0, hm1
  3. Calculate frequency difference of hm0, hm1 with diff HashMap 
  4. Calculate total frequency difference ( Summation of  diff frequencies and hm1 frequencies)
  5. Calculate percentage of  match
 public static int pecentageOfMatch(String[] as0, String[] as1) {
        int n = as0.length;
        Integer temp = null;
        
        // String frequency of as0 
        HashMap<String, Integer> hm0 = new HashMap<String, Integer>();
        for (int i = 0; i < n; i++) {
            temp = hm0.get(as0[i]);
            if (temp == null) {
                hm0.put(as0[i], new Integer(1));
            } else {
                hm0.put(as0[i], new Integer(temp.intValue() + 1));
            }
        }

        // String frequency of as1
        n = as1.length;
        HashMap<String, Integer> hm1 = new HashMap<String, Integer>();
        for (int i = 0; i < n; i++) {
            temp = hm1.get(as1[i]);
            if (temp == null) {
                hm1.put(as1[i], new Integer(1));
            } else {
                hm1.put(as1[i], new Integer(temp.intValue() + 1));
            }
        }

        // Frequency difference between hm0 and hm1 to diff
        HashMap<String, Integer> diff = new HashMap<String, Integer>();
        String key;
        Integer value, value1, rval;
        Iterator it = hm0.entrySet().iterator();
        while (it.hasNext()) {
            Map.Entry<String, Integer> pairs = (Map.Entry<String, Integer>) it
                    .next();
            key = pairs.getKey();
            value = pairs.getValue();
            value1 = hm1.get(key);
            it.remove();
            hm1.remove(key);
            if (value1 != null)
                rval = new Integer(Math.abs(value1.intValue()
                        - value.intValue()));
            else
                rval = value;
            diff.put(key, rval);
        }

        // Sum all remaining String frequencies in hm1
        int val = 0;
        it = hm1.entrySet().iterator();
        while (it.hasNext()) {
            Map.Entry<String, Integer> pairs = (Map.Entry<String, Integer>) it
                    .next();
            val += pairs.getValue().intValue();
        }
        
        // Sum all frequencies in diff
        it = diff.entrySet().iterator();
        while (it.hasNext()) {
            Map.Entry<String, Integer> pairs = (Map.Entry<String, Integer>) it
                    .next();
            val += pairs.getValue().intValue();
        }

        // Calculate word match percentage
        int per = (int) ((((float) val * 100)) / ((float) (as0.length + as1.length)));
        per = 100 - per;
        return per;
    }

Percentage of Word Match :

It separates two sentences into words and it will give result of that words matching
    public static int pecentageOfWordMatch(String s0, String s1) {
        // Trim and Replace all . ? ! with spaces to make easy to split to words 
        s0 = s0.trim().replaceAll("[.?!]", " ");
        s1 = s1.trim().replaceAll("[.?!]", " ");
        //Split by space
        String[] as0 = s0.split(" ");
        String[] as1 = s1.split(" ");
        return pecentageOfMatch(as0, as1);
    }

Percentage of Sentence Match :

It separates two Texts into Sentences and it will give result of that sentences matching
    public static int pecentageOfSentenceMatch(String s0, String s1) {
        // Trim and Replace all . ? ! with ". " to make easy to split to sentences
        s0 = s0.trim().replaceAll("[.?!]", ". ");
        s1 = s1.trim().replaceAll("[.?!]", ". ");
        //Split by ". "
        String[] as0 = s0.split("(?i)(?<=[.])\\s+(?=[a-zA-Z])");
        String[] as1 = s1.split("(?i)(?<=[.])\\s+(?=[a-zA-Z])");
        return pecentageOfMatch(as0, as1);
    }

Test

String s0 = "I am engineer and I work here.I am here";
String s1 = "I am here";
System.out.println(LevenshteinDistance(s0, s1));
System.out.println(pecentageOfTextMatch(s0, s1));
System.out.println(pecentageOfWordMatch(s0, s1));
System.out.println(pecentageOfSentenceMatch(s0, s1));

4 comments:

  1. A percentage calculator can be handy on a great many occasions in one's daily life. Such a calculator program may be used to provide assistance with everyday functions. Take the example of a trip to the store where you are trying to maximize special discounts or coupons. Using your head to perform certain calculations may be challenging if not inaccurate. A percentage calculator is specifically created to help an individual in instances like these. Check out: percentage calculator app

    ReplyDelete
  2. Good blog. Keep sharing. I love them Are you also searching for ajman assignment help? we are the best solution for you. We are best known for delivering writing services to students without having to break the bank

    ReplyDelete
  3. Great Post, Thanks for sharing.
    There is a video that explains the Levenshtein distance algorithm its in Spanish but it is very good.

    Part 1 - https://www.youtube.com/watch?v=4oTFJOQpmRY
    Part 2 - https://www.youtube.com/watch?v=83PnEZNsa-8

    ReplyDelete
  4. Strings can be compared via interning. However, there are considerable issues with doing so and. In most circumstances, it is not recommended. Anyway, I’m not just here because of that, I’m also here because there are something that I want to share with you. Here is the game called mystic messenger pc. If you want to communicate with other people around the world, then this game is for you. Also, visit Codigames website if you want to download awesome games for free of charge.


    ReplyDelete

Blogroll

Popular Posts