Researchers created two complementary methods that could significantly improve graph analytics and generative AI performance. Both novel strategies leverage tensor sparsity—zero values—efficiently. The complementing techniques could boost the performance and energy efficiency of enormous machine-learning models that power generative AI.

Tensors

Tensors serve as data structures employed by machine learning models. Both new methods aim to effectively utilize sparsity, which refers to zero values, in the tensors. By disregarding the zeros when manipulating these tensors, one can reduce both the computational workload and memory use. For example, every value multiplied by zero results in zero; hence, it can bypass that calculation. Furthermore, it can compress the tensor by eliminating the need to store zeros, allowing a more significant part to be stored in on-chip memory. 

Still, there are numerous obstacles to effectively utilizing sparsity. Identifying the nonzero values in a substantial tensor is a challenging endeavour. Current methodologies frequently restrict the positions of nonzero elements by imposing a sparse pattern to streamline the search process. However, this constraint hampers the ability to handle a wide range of sparse tensors efficiently. 

Off-chip memory

The fact that the number of nonzero values can change in different parts of the tensor is another problem. It's hard to tell how much room is needed to store other memory details because of this. It means the storage buffer is not being used to its full potential. It makes more flow to off-chip memory, which uses more energy.

The experts came up with two ways to solve these issues. One thing they did was develop a way for the hardware to quickly find the nonzero values for a more significant range of sparsity patterns. For the other answer, they devised a way to deal with data that doesn't fit in memory. It makes better use of the storage buffer and lowers the amount of traffic that goes through off-chip memory. Both methods make hardware accelerators to speed up the processing of sparse tensors work better and use less energy.

There are several reasons why the tensor may become sparse. For instance, researchers occasionally "prune" excessive portions of the machine-learning models by adding zeros to the tensor in place of some of the values, which produces sparsity. For various models, the placements of the zeros and the degree of sparsity (% of zeros) may differ.

Single sparsity pattern

Researchers frequently restrict the position of the nonzero values. Hence, they fall into a predefined pattern, which makes it easier to discover the remaining nonzero values in a model with billions of unique values. However, the flexibility of each hardware accelerator is limited because they are usually intended to accommodate a single sparsity pattern. They can utilize sparsity to enhance the efficiency of data movement and processing on a computer chip. 

On-chip memory buffer

Due to the size of tensors exceeding the capacity of the on-chip memory buffer, the chip selectively retrieves and processes a portion of the tensor at a given moment. The individual pieces are referred to as tiles. To optimize the usage of the buffer and minimize the chip's need to access external memory, which is a significant contributor to energy consumption and processing speed limitations, researchers aim to utilize the largest tile that can be accommodated within the buffer. 

Sparse tensor

However, with a sparse tensor, a significant number of data values are zero, allowing for a bigger tile to fit into the buffer than anticipated based on its capacity. There is no need to keep values that are equal to zero. However, the quantity of zero values may differ throughout distinct sections of the tensor, therefore potentially varying for each tile as well. It poses a challenge in determining an appropriate tile size that can be accommodated inside the buffer. Consequently, current methods frequently make the cautious assumption that there are no zeros, leading to the selection of a smaller tile and resulting in unused empty spaces in the buffer. 

Conclusion

A sparse tensor can have a tile size with enough zeros to fit most into the buffer. Sometimes, a tile has more nonzero values than fits. These data are bumped from the buffer. The findings allow the hardware to re-fetch bumped data without reprocessing the tile. Tailors handle this by changing the buffer's "tail end."

Furthermore, the researchers hope to apply overbooking to other computer architecture components and enhance the technique of evaluating the ideal overbooking amount.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE