This blog post was originally published at https://www.polidea.com/blog.

Maintaining different technologies is always a big challenge for both developers and business. This is especially important in the big data world that is constantly expanding. There are so many big data technologies like Hadoop, Apache Spark, Apache Flink, etc. that it is easy to get lost. Which tool is the best for real-time streaming? Is the speed of one particular tool enough in our use case? How should you integrate different data sources? If these are the questions that often appear in your company, you may want to consider Apache Beam.


Some time ago I wrote an article about the asyncio library in Python. The article describes the benefits of asynchronous operations over OS threads as well as how to use a new async/await syntax which arrives with asyncio. If you have not seen it yet, I strongly recommend to have a look at this. Although most fundamental concepts have been covered by the article, there are still plenty of things that you can do with asyncio. Surely they deserve more attention. This is why I am getting back with the second installment of a Guide to Python asyncio. This time…


This blog post was originally published at https://www.polidea.com/blog.

It is hard to imagine modern programming in Python without the asyncio library. The package that lets Python programmers write concurrent code is one of the largest and most ambitious libraries ever added to Python. In this asyncio tutorial, we will examine what are the biggest advantages of using it. I will begin with a few basic examples that describe asyncio’s fundamentals. After that, I will present a bigger example — a script for downloading files asynchronously that is written with asyncio. …


This blog post was originally published at https://www.polidea.com/blog.

The Big Data Industry has seen the emergence of a variety of new data processing frameworks in the last decade. One of them is Apache Spark, a data processing engine that offers in-memory cluster computing with built-in extensions for SQL, streaming and machine learning. Apache Spark was open sourced in 2010 and donated to the Apache Software Foundation in 2013. Since then, the project has become one of the most widely used big data technologies. According to the results of a survey conducted by Atscale, Cloudera and ODPi.org, …

Kamil Wasilewski

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store