Distributed web crawling, selection algorithm and sorting algorithm

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

Machine learning allows computers to mimic human behaviour by teaching them historical data and knowledge about potential future events. This part will explore interesting machine learning approaches like Distributed web crawling, selection algorithm and sorting algorithm.

Distributed web crawling

Distributed web crawling is a computing technology in which Internet search engines use many computers to explore the Internet. Users can offer their processing and bandwidth resources to crawl web pages in such systems. As a result, costs associated with operating significant computing clusters are reduced by spreading a load of these jobs over numerous processors.

With this approach, a central server dynamically provides new URLs to different crawlers. It enables the central server to balance a load of each crawler, for example, dynamically. With a dynamic assignment, systems can typically add or subtract downloader processes. However, for huge crawls, the central server may become the bottleneck. Hence it must move the majority of the workload to distributed crawling processes.

Selection algorithm

A selection algorithm in computer science is an algorithm for determining the kth smallest integer in a list or array; this number is known as the kth order statistic. It includes scenarios when you need to find the minimum, maximum, and median elements. There are O(n)-time (worst-case linear time) selection techniques, and structured data can achieve sublinear performance; in the extreme, an array of sorted data can achieve O(1) performance. Selection is a subproblem of more significant issues such as nearest neighbour and shortest path. Many selection algorithms are generated by generalizing a sorting algorithm, while we can derive some sorting algorithms by applying selection repeatedly.

The most straightforward selection algorithm is iterating through the list, keeping track of the running minimum (or maximum), and connecting it to the selection sort. Finding the median, on the other hand, is the most challenging situation of a selection method. In reality, we can use a specific median selection method to develop a general selection algorithm as in the median of medians. Quickselect, linked to Quicksort, is the most well-known selection algorithm; like Quicksort, it has (asymptotically) excellent average performance but poor worst-case performance, but it can be tweaked to provide optimal worst-case performance as well.

Sorting algorithm

A sorting algorithm in computer science is an algorithm that arranges the elements of a list. The most common orders are numerical and lexicographical, and they can be ascending or descending. Efficient sorting is critical for improving the efficiency of other algorithms that require input data to be in sorted lists (such as search and merging algorithms). Sorting is also frequently used to canonicalize data and generate human-readable output.

Formally, the result of every sorting algorithm must meet two requirements:

The output is in monotonic order (each element is no smaller/more significant than the last element in the order specified).
The output is a permutation (reordering the input while keeping all the original elements).

IndiaAI Recommends

Interesting ML algorithms - Distributed web crawling, selection algorithm and sorting algorithm

Want to publish your content?

ALSO EXPLORE