{%Pragmatic Coding%} Tips and tricks for software developers.

12Feb/130

How to: Using TPL Dataflow for multithreaded file compression

In this small tutorial I will show you how to use TPL Dataflow  library with a quite trivial task - multithreaded file compression.

PreInit

We need to implement efficient compression of files using GZipStream  class in the System.IO.Compression namespace. It is assumed that we will compress large files that can not fit entirely in memory.

TPL Dataflow

TPL Dataflow (TDF) (The Task Parallel Library (TPL) Dataflow) was introduced in the .NET Framework 4, providing core building blocks and algorithms for parallel computation and asynchrony.  This work was centered around the System.Threading.Tasks.Task type, as well as on a few higher-level constructs. These higher-level constructs address a specific subset of common parallel patterns, e.g. Parallel.For/ForEach for delightfully parallel problems expressible as parallelized loops.

solution

To solve this problem we need only 3 blocks:

  1. The buffer for the data read from the data source:
  2. Block data compression:

    The compression function:
  3. Recording block of compressed data:

Combine our blocks:

Also, we'll share information with our block when the data is over and they can complete their work. You can do this by calling the Complete method:

While reading the file, we will send our data to buffer by using  Post method:

This conctruction we need to consider the situation when the block data is full and will not accept more data.

Upon completion of the reading will notify our block that we ran out of the data:

Now we just have to wait for the end of our block, responsible for writing compressed data to stream:

Our updated method:

We could finish on it, if not for one "but": the code for speed is the same as synchronous. In order to make it run faster, we need to point out that our compression operation should be done asynchronously. You can do this by adding the necessary settings in our block:

Also, we need to predict the situation where the data is read faster than compressed or written slower than compressed. It can be done by modifying the BoundedCapacity parameter:

Finaly, method will looks like this:

Using our Compress method

Sample code:

Conclusion

As you can see the use of TPL Dataflow can greatly simplify multi threading.
Happy coding!

Share on social network

Share to Facebook
Share to Google Plus
Share to LiveJournal
Share to MyWorld
Share to Odnoklassniki
Share to Yandex
Comments (0) Trackbacks (0)

No comments yet.


Leave a comment

No trackbacks yet.