Machine learning teaches computers to behave like humans by feeding them historical data and forecasting what might happen in the future. This section will look at fascinating machine learning algorithms such as the Backtracking, AC-3, and SimHash algorithms.

Backtracking algorithm

Backtracking is an algorithmic strategy for solving problems recursively by attempting to create a solution progressively. one component at a time and discarding alternatives that fail to satisfy the problem's criteria at any given stage. It is essential for solving constraint satisfaction problems like crosswords, verbal arithmetic, Sudoku, and many other puzzles. In addition, it is often the easiest way to solve combinatorial optimization problems like parsing, the knapsack problem, and others. It is also the basis for languages like Icon, Planner, and Prolog, called "logic programming languages."

Backtracking is based on "black box procedures" given by the user, which describe the problem to be solved, the kind of partial candidates, and how they are turned into complete candidates. It is not a specific algorithm but rather a metaheuristic. 

AC-3 algorithm

AC-3 works with constraints, variables, and the domains of the variables (scopes). A constraint is a relationship that tells a variable what values it can take. The value of other variables may be part of the constraint.

During the algorithm, the current state of the CSP can be thought of as a directed graph and the edges, or arcs, are symmetric constraints that link the variables:

  1. AC-3 looks at the paths between each pair of variables (x, y).
  2. It takes out from the domain of x the values that don't fit the rules between x and y.
  3. The algorithm keeps a list of arcs that still need to be checked since the ranges of the variables are finite at each step.

SimHash algorithm

SimHash is a method used in computer science to figure out how similar two sets are quick. For example, the Google Crawler uses the algorithm to find pages that are almost the same as others. Moses Charikar came up with it. Google said in 2021 that they planned to use the algorithm in their new FLoC (Federated Learning of Cohorts) system.

SimHash is a hashing function, and its property is that the Hamming distance between two hashes is smaller the more similar the text inputs are.

  • The algorithm works by breaking the text into chunks and hashing each piece with a hashing function of your choice.
  • Each hashed chunk is shown as a binary vector, and the bit values are changed to +1 or -1.

Google did a large-scale evaluation in 2006 to compare how well the Minhash and Simhash algorithms worked. In 2007, Google said it used Simhash to find duplicates when crawling the web and Minhash and LSH to personalize Google News.

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in