Comparing Apache Storm and Apache Flink – Message Delivery and Windowing

In stream processing, “message delivery” defines on how messages in a stream are sent and received while “windowing” is the time or event based selection to accumulate and process the events. While the corresponding semantic is a critical decision to be taken in regards to latency and process guarantee, Apache Storm as well as Apache Flink offer different ways of implementing window semantics. This article provides a brief overview of the available windowing implementations in both real time computing systems.

Message Delivery Semantics

Before comparing windowing semantics, it’s important to know how messages are being delivered in Apache Storm and Flink. When comparing delivery semantics, we’re usually looking for the best compromise of the delivery guarantee and the delivery overhead. The guarantee is usually achieved either on the cost of message de-duplication or on the cost of delivery validation.

  • At most once
    • Message delivered at most once
    • Message delivery guarantee: low
    • Duplicates possible: no
    • Message recovery: no
    • Delivery overhead: no
  • At least once (supported in Storm and Flink)
    • Message delivered at least once
    • Message delivery guarantee: high
    • Duplicates possible: yes
    • Message recovery: no
    • Delivery overhead: no
  • Exactly once (supported Storm with Trident and in Flink)
    • Message delivered exactly once
    • Message delivery guarantee: highest
    • Duplicates possible: no
    • Message recovery: yes
    • Delivery overhead: yes

Window Semantics

Semantic Type Description Storm Flink
Tumbling Windows Window Assigners
  • Predefined window size
  • A tuple belongs to only one window
  • Defined by time or event count
Implemented Implemented
Sliding Windows Window Assigners
  • Predefined window size
  • A tuple can belong to multiple windows (slides)
  • Defined by time or event  count
Implemented Implemented
Session Windows Window Assigners
  • Predefined max. gap size
  • No fixed, start/end time
Not Implemented Implemented
Global Windows Window Assigners
  • Key based
  • Requires trigger
Not Implemented Implemented
PreWindow Functions Window Function
  • Ability to perform functions on window assigners
  • ReduceFunction
  • FoldFunction
  • WindowFunction with Incremental Aggregation
  • WindowFunction – The Generic Case
  • Requires writing “bolts” to perform computation
Source timestamp  Window option
  •  Ability to window based on source generated timestamp
Implemented Implemented
Handling unordered events  Window option
  •  Ability to handle unordered and delayed events
Implemented Implemented


Posted by db, 0 comments

Getting started with Consul on Docker

Consul allows you to do service discovery, but also comes with a straight forward template engine run by an agent. Consul’s key/value store can be used to manage your configuration, which also provides querying your registered services well structured across datacenters. This is just one of the reasons why it is often used in micro service architecture environments.

In order to get started and test Consul’s functionality along with the template agent, you can use Docker running Consul including the API, a UI and a DNS server providing service discovery and configuration management functionality.

docker run -p 8400:8400 -p 8500:8500 -p 8600:53/udp -h node1 progrium/consul -server -bootstrap -ui-dir /ui

When starting the container you can browse the ui on http://localhost:8500/ui/ as well as you can connect to port 8500 to use Consul’s REST API to store and get key/value pairs. This is in example on how it looks like in python after installing the consul module with pip install python-consul:

import consul
index = None
# Connect to Consul
c = consul.Consul(host='', port=8500)
# Write a key/value pair to Consul's key/value store 
c.kv.put('foo', 'bar')
# Get the value back from the API
index, data = c.kv.get('foo', index=index)
print(index, data)

To start the template engine go into one of your linux boxes (e.g. a Docker container) and download the latest consul agent from Unpack it. Then create a sample template called conf.ctmpl with the following content:

My value: {{key "foo"}}

After running the template agent with

./consul-template -consul ip-or-name-of-your-consul-server:8500 -template "conf.ctmpl:/tmp/conf" -dry

you can see that the stored variable based on the template conf.ctmpl would get written into the file /tmp/conf. In case you remove the -dry option when running the agent this way, it writes the file but for testing the dry-mode is good to see that variables are instantly exposed as soon as they change in Consul.

It becomes very powerful when querying Consul out of your application or building a template querying a list of all of your endpoints for a particular service and storing it as an NginX configuration for NginX to add or remove backends.

Get here to see all of Consul template’s possibilities:

Posted by db, 2 comments

Getting started with Docker

Probably you want to get started with Docker to deploy applications, provide development environments or to run tests on your local box. Getting started with Docker is easy. After you followed the installation instructions on (and after you have chosen your OS) you are able to download your first Docker image. An example:

 docker pull centos

This triggers a download of the latest official Docker image from Docker Hub ( and stores it on your local box for you to launch containers based on that image. Run

docker images

to get a list and information about your local images.

Running Containers

Now that you have the Centos image downloaded, you can run a container based on that image with

docker run centos sleep 120

This runs a Centos container until the command (sleep 120) finishes running. While the command is running (you have 2 minutes time :) ), you can determine your running container with

docker ps

and get information about your “running” container like for example the container id and name, the executed command, the status as well as the image that it’s based on. After the 2 minutes passed and the command finally finished, your container won’t show up in the list of running containers determined with “docker ps” anymore. A typical confusion a beginner is running into. When you run

docker ps -a

you can see “all” containers, not only the actual running once. So your previously executed (and exited) container should appear in this list. You could start it again or remove it in case you don’t need it anymore (which is usually the case). To remove the container run

docker rm <container_id or container_name>

respectively for your container_id or container_name listed by “docker ps -a”. Your image of course will remain on you local box for you to do further runs based on that image. If you want your container file to automatically getting removed after its execution, you can use

docker run --rm centos <your_command_to_execute>

If for some reason you like to get rid of your Docker image (for example the centos image) you can do that with:

docker rmi centos

If you’re wondering how you can login to your container you could run the container in interactive mode with

docker run -it centos bash

(The bash is executed as an entrypoint in the Centos imge, so this would work even without adding the bash command). Now you can do changes to your container (for example create a file /test.txt) and commit those changes with

docker commit -m "Commit Message" <container_id> yournewimage

Based on your new image name you can now launch containers with

docker run yournewimage cat /test.txt

Building and managing images

Now that you got the difference between an image and a container, you can start building your own images. There are several options to build them. One option you already explored by running a container and committing changes. The probably most common used option is building an image based on a Docker file. Beside some metadata a Docker file describes which image to base from, which files to copy into the image, which commands to run to build the image, as well as information about how to run the container. You can find good practices on building Docker files at

For example if you want to build an image based on Centos 7 and install apache to run the http process when a container is launched based on your new template you can do it like this:

FROM centos:7

RUN yum -y update && yum clean all
RUN yum -y install httpd && yum clean all


CMD /usr/sbin/apachectl -D FOREGROUND

Put this snippet into your file named “Dockerfile” and run

docker build -t "myhttpimage" .

in the same folder as the Dockerfile. This will build an image for you based on Centos 7, with all packages updated by yum, the httpd installed and all RPM artefacts deleted. The CMD directive describes what is executed when running a container with

docker run myhttpimage

based on the image after it was built. The EXPOSE directive exposes the port 80 of your Docker container to your local host, so that you can access the webserver. You can always use the docker run option “-p” to customize the port exposes when running a container. (If you’re on mac using docker-machine or boot2docker you need to take care of forwarding ports to the OpenVz container)

A Docker registry

A Docker registry makes sense if you need a central place to host and organise all your Docker images. A registry can easily be setup as a Docker container itself with

docker run registry

This will setup a registry exposed on port 5000 for you. On you can find out how to pull and push images from and to a registry. As long as your images are pushed locally to your registry, Docker considers the registry as secure and trusted, but as soon as you place the registry on a remote host, Docker either expects you to setup a valid SSL certificate or you need to run your Docker daemon in “insecure mode”. It is also good practice to add http auth to your registry to add an additional layer of security. Here you can find more information on how to setup a secure Docker registry

Posted by db in Docker, 0 comments

tcpdump piped through ssh into wireshark

To analyze network traffic it is always a good choice to use wireshark, as it allows a pretty easy way to filter and trace tcp streams. Usually you run wireshark on you Desktop PC or Notebook to capture your local traffic, but how to view traffic that way on a remote host? To do so you can use tcpdump on the remote host and write its output to a stream and pipe it through ssh into wireshark:
Continue reading →

Posted by db in Common, Linux, 0 comments

Sending files via netcat

Just a quickwin from one of my colleagues to send files via netcat. Just start a netcat listener like this:

nc -l 1234 | tar xv

And pipe a file or directory from the sender host to it:

tar cv /folder | nc 1234

That’s it!

Posted by db in Common, Linux, 0 comments

A Guide through Puppet Smoke Tests

This article is good for a Puppet beginner as you can start develop, test and even execute your modules without having a full Puppet infrastructure setup. All you need is an editor and a Puppet client installed. You can later use these Puppet Smoke Tests to bring a better quality to your Puppet infrastructure by executing module tests in dummy environments before the modules are actually deployed to your Puppet master. Continue reading →

Posted by db in Linux, Puppet, 0 comments

Cookie based redirects on F5 load balancer

To enable cookie based redirects on the F5 load balancer you can use Irule to implement a redirect to a particular node or pool. In this way you’re able to run A/B-Testing, which means, that you can run two versions of your application, while the user is sticked to the particular application (which is probably a node or a pool of nodes).

Continue reading →

Posted by db in Loadbalancing, 0 comments

Yamledit – A commandline editor for yaml hashes

In my daily work with puppets ENC (external node classifier) I’m heavily using hashes in yamlfiles to configure nodes, classes and variables. When there are a lot of same changes to a mass of yaml files you might want to use a commandline yamleditor which can easily update hashes of yamlfiles recursivly. That’s what yamledit is doing. Before explaining the usage of yamledit, some words about how to use yaml in Ruby.

Continue reading →

Posted by db in Ruby, 0 comments

AWK with multiple seperators

Just a quickwin. If you have a file, each line like this…

constant_mykey2: myvalue

…and if you like to extract mykey2, you’d need to use two seperators in awk which is easily possible with the -F option and separating each separator with a pipe |

cat yourfile.txt | awk -F '_|:' '{print $2}'

That’s it :)

Posted by db in Linux, Shell, 0 comments