Results for ""
Now imagine sitting there with a red pen, flipping through answers, ticking this, crossing that. Once a paper is graded, you then proceed to the next task. You type that grade into the university site, and then again into the college site. It's like being in an echo chamber, the same thing over and over. Frankly, it's a snooze and not very smart.
But wait, there's another way! Gather all the papers, focus, and mark them one by one without a single break. With every paper marked, the stack gets shorter, and you start feeling pretty relaxed. Now with all the grades on one big list, you visit the university site, upload the whole batch and—bam!—they're in. Do the same with the college site. Just like that, what seemed like a mountain of work turns into a little hill. You're done before you know it, more time to enjoy a nice cup of coffee or just kick back. It's batch processing in real life, and it feels like the smart way to go. So there you have it, a whole batch of work, finished in what feels like a shortcut. By marking all the papers first, then moving on to entering all the grades at once, you've turned a tedious task into something a little more manageable. It’s a small change, but it makes a huge difference.
We encounter a similar scenario when working with machine learning models to process data. Consider the traditional approach: you input a single file into the model, it computes and stores the results in RAM, and you repeat this for each of the hundreds of files you need to process. The monotony is palpable—it's like grading those papers one by one and entering grades individually all over again.
Move to batch processing, and the picture changes. You gather all 100 files and upload them to the model in one go. The system processes the entire batch in a single session and outputs the predictions simultaneously. The result? Drastically reduced processing time and an enormous saving in manual effort. It's not just about efficiency; it's about revolutionizing your workflow to optimize both computational resources and your valuable time.
Batch processing is an automated method of executing multiple jobs on a computer without the need for human intervention during runtime of these jobs. This approach requires all the necessary inputs be pre-defined through script, and the system executes the predefined processes in one uninterrupted sequence.
It involves the aggregate handling of the data such as CPU, GPU (if available), memory and storage. By processing large volumes of data collectively, the system can optimize workflow efficiency by reducing the repetitive tasks associated with executing similar jobs.
Batch Processing in Machine Learning Models
While we talk about the batch processing in machine learning models, then all the inputs for processing with ML model, load in the RAM, then convert to matrix, pre-processing of this matrix and then inference with ML model to generate prediction. II know it can feel overwhelming to see all of these steps at once, so let’s understand this with the example.
Certainly! Picture a group of tourists ready to go on a tour. Before they start, they all have to get ready. This includes ensuring that every tourist is dressed comfortably according to the destination. Similarly, in machine learning, a batch of input data is first loaded in the RAM. Once the data is in RAM, it may need to be preprocessed. This includes operations such as tokenizing text, normalization and resizing images.
Now, rather than sending the tourists one by one, which would be slow, the tour guide gathers them into the bus - the ‘batch’. In machine learning, an array of input data points is compiled into a single batch. Each element of the array corresponds to a tourist, and the entire array represents the bus filled with tourists.
Each tourist has their seat on the bus, allocated by the guide. In ML, the batch of input data is transformed into a suitable mathematical representation, such as a 2D or 3D matrix (for example, images characterized by length, width, and depth), which is performed by the pipeline. In this matrix, the first row represents the first input, the second row represents the second input, and the nth row represents the nth input. The model's configuration determines the type of input data it requires and whether it supports batch processing.
Before heading to the destination, there is one last check to ensure that all tourists fit on the bus and are comfortable. Similarly, in the ML model, all input data should fit in the matrix so that the processing journey can proceed smoothly. Processes such as padding with zeroes happen here, which we will discuss more in the next section.
On every journey, the tourists may have many good experiences or sometimes bad ones too. When they complete the journey, each tourist has their own story and experiences. In the same way, in ML, the processing of the input matrix with the pre-trained model occurs. There is a calculation between both matrices, and then an output matrix is obtained. In the output matrix, each row represents the output corresponding to each row of the input matrix. The main calculation between both the matrix is matrix multiplication.
How input matrix adjust with the weight matrix for multiplication ??
In machine learning, when we're ready to process our batch of input data through a model, we need to ensure the input matrix matches the dimensions required for multiplication with the weight matrix. It's like making sure tourists can get on a bus through a door of the right size. If the matrices don't match, we have a couple of methods to adjust the input matrix. Some of them are :
1. Padding with Zeros (Zero-padding): Padding is the process of adding rows and/or columns of zeros to an input matrix to match the dimensions necessary for multiplication with the weight matrix. For example, if the input matrix is smaller than required, you can add zeros around the original matrix until it reaches the required size.
2. Cutting (Truncating): Cutting or truncating is the reverse process, where you remove rows and/or columns from an input matrix to make it compatible with the weight matrix when it is larger than required. But it may result in loss of data.
How batch processing work ?
Batch processing is a technique used in machine learning to efficiently compute matrix multiplications. Since matrix multiplication requires significant computational resources, to optimize this process, a method known as 'Tiling' is often used. Tiling involves breaking down the matrices into smaller sub-matrices or 'tiles' that fit into the cache memory, which is much faster than RAM. This allows for more efficient computation as it reduces the time spent on memory access.
If you're interested in delving deeper into the tiles matrix multiplication, I recommend checking out my dedicated blog on this subject.
Blog name : Optimizing Matrix Multiplication: Unveiling the Power of Tiles
Batch processing and SIMD
One significant benefit of tiled multiplication in batch processing is that it utilizes SIMD (Single Instruction, Multiple Data) within each core. For example, consider two operations:
c1 = a1 * b1 + a2 * b3
c2 = a1 * b2 + a2 * b4
With multiple cores, the processes for C1 and C2 can run in parallel on different cores. Through SIMD , the multiplications a1 * b1 and a2 * b3 can be completed in a single instruction. However, it's important to note that one SIMD operation works within a single core, not between cores.
Tiling makes efficient use of multiple cores by processing different tiles on different cores, thereby increasing parallel processing efficiency.
Suppose we have 10 files of matrix size: 1x32. The total number of operations per file is 32 multiplications and 31 additions, which equals 2016 operations per file. For 10 files processed sequentially, this would be: 2016 operations/file * 10 files = 20,160 operations in total.
When processed sequentially, these 20,160 operations will happen one after another. If the SIMD width of the core is 'q', then 'q' operations can be performed simultaneously within a single core, it only affects how quickly each core can process its share of the operations.
Now, with batch processing and tiled matrix multiplication, the computation time can be reduced by parallelization. If 'p' is the number of cores and 'q' is the SIMD width, the operations are distributed across the cores, and each core can use SIMD to process 'q' operations simultaneously. The ideal computation time reduction could be expressed as:
Total time = ((Total operational time) / p ) / q ).
However, it's important to note that this is a highly theoretical and optimistic view. Because their are some limitation like Load balancing, Overhead etc.
It is also important to note that, in batch processing with the tiles matrix multiplication, it reduce the manual time to load the input file, Improved throughput, Effective utilization of Cache Memory, Bottleneck Reduction etc.
Important Consideration for Batch Processing Implementation
A critical aspect to keep in mind is that not all systems inherently condense batches of inputs into a singular matrix. The ability to do so largely depends on the underlying configuration of the model in question. Some models are explicitly designed to support batch processing.
Despite this variance in model behavior, engaging in batch processing is advantageous even when dealing with models that process each item individually within a batch. This is due to batch processing help to fast process in pre-processing and post-processing. It reduce input/output (I/O) transactions. It enables batch-level operations, such as normalization and encoding, which can save time and reduce computational overhead. It enhances overall data throughput and ensures more optimized resource utilization due to aggregated handling.
Conclusion
Batch processing, then, is the silent accelerant to our productivity, transforming daunting tasks into manageable endeavors and allowing both machines and humans to work smarter, not harder. This efficient approach not only streamlines workflows but also optimizes system performance, ensuring valuable resources are utilized effectively.
This blog details my learnings from my internship at Softude while working with Mradul Kanugo.
Wikipedia, geeksforgeeks