Adventures in ZeroMQ
This post is mostly for myself — to remember if i ever need to set up something like this again, what to watch out for.
A little background. I run a behavioral neuroscience lab. I have been designing and building high-throughput training facilities for rodents since about 2007. I helped build one in Princeton and then I built another one at NYU-Shanghai and I am currently building a new one at UCL. The core peice of software for the previous builds was a MariaDB server. All the configuration information, settings, metadata and data were stored in the DB. MariaDB is a rock solid piece of software and for anyone else building something on MariaDB I would just recommend the following:
- Follow an “insert-only” model. Never update. updates lose information. (You can now use system-versioned tables to update without updating).
- Have two DBs. Write to main and do analysis from a replica. That way if someone writes a strange query that locks up your replica, it doesn’t affect the core function of the facility.
- Pay attention to table size — partition tables that will grow very large.
The main limitation of MariaDB that has been a thorn in my side is that there is no mechanism for push notification or PUB/SUB like mechanism for message passing. I am aware that postgres has
NOTIFY, but I’m much more familiar with MariaDB so I did not want to make the switch. Also, it seemed like our facility would benefit from having a messaging layer to avoid code that was heavily dependent on the specific layout of database tables.
So, how to choose a library for a messaging layer? My main factors in choosing are/where:
- Ease of use and setup
- Matlab compatibility (which usually just means java)
The contenders were zeromq, Redis, Kafka, and rabbitMQ.
I previously had some familiarity with zmq and rabbitmq, so I thought that I would try to implement those first, and try to avoid the learning curve for Redis and Kafka (note: i’m not convinced that it is not worth learning to use those tools).
I started off with the most basic zmq XSUB/XPUB proxy config. I had the proxy running on a linux machine and logging messages through a capture port. There were two problems:
- Messages from python on a windows machine were getting dropped.
- I was having a lot of issues with idle sockets closing.
I tested out so many theories about what was going wrong with the dropped messages. I still do not know why messages are dropping. On my todo list is to make a minimal working example and submit an issue.
The solution for problem 1 was to run a PULL-PUB proxy (written in python) on the windows machine itself and PUSH messages from python to that proxy. Then I subscribed to the proxy from a remote machine for message processing. This seems to be reliable. It has been running for about a week without problems.
To deal with number two I started fiddling around with the
zmq.KEEPALIVE socket options in
pyzmq. However, the effectiveness of this was highly OS dependent. Things that worked fine on macOS didn’t work on linux.
For example setting the
zmq.TCP_KEEPALIVE_CNT to values larger than 100 worked well on MacOS for keeping a publisher alive, but on linux, it prevented the publisher from working at all. After reading more about TCP KEEPALIVE I realized that given our use case, which is a mix of linux, windows and mac, that using these options would be a challenge.
Finally, i decided to use the newish
HEARTBEAT_IVL socket option. So far, it looks like that is working for us.
Will update this post as we do more testing.