A new open source counting and throttling server: say hello to dracula

dracula is a high performance, low latency, low resource counting server with auto-expiration.

The Mailsac engineering team recently open sourced our internal throttling service, dracula, under an open source license. Check it out on Github. In the repo we prebuild server and CLI binaries for mac and linux, and provide a client library for Go.

Dracula has performed extremely well in AWS on ARM64 in production for us. It handles thousands of requests per second without noticeable CPU spikes, while maintaining low memory.

In this blog post we’re going to give an overview of why it was necessary, explain how it works, and describe dracula’s limitations.

Why we made it

For the past few years Mailsac tracked throttling in a PostgreSQL unlogged table. By using an unlogged table we didn’t have to worry about lots of disk writes, nor the safety provided by having the write-ahead-log. Throttling records are only kept for a few minutes. We figured if Postgres was rebooting, losing throttling records from the past minutes would be the least of our worries.

In the months leading up to replacing this unlogged table with dracula we began having performance bottlenecks. Mailsac is experiencing fast growth in the past few years. Heavy sustained inbound mail was resulting in big CPU time while Postgres vacuumed the throttling tables. The throttling table started eating too many CPU credits in AWS RDS – credits the we needed for more important stuff like processing emails.

We needed a better throttling solution. One that could independently protect inbound mail processing and REST API services. Postgres was also the primary data store for parsed emails. The Postgres-based solution was a multi-tiered approach to throttling – especially against bad actors – and helped our website and REST API snappy, even when receiving a lot of mail from questionable sources. The throttling layer also caches customer data so we can filter out the paying users from unknown users. Separating this layer from the primary data store would help them scale independently.

Can Redis do it?

So it was time to add a dedicated throttle cache. We reached for Redis, the beloved data structure server.

We were surprised to find our use case – counting quickly-expiring entries – is not something Redis does very well.

Redis can count items in a hash or list. Redis can return keys matching a pattern. Redis can expire keys. But it can’t expire list or hash item entries. And Redis can’t count the number of keys matching a pattern – it can only return those keys which you count yourself.

What we needed Redis to do was count items matching a pattern while also automatically expiring old entries. Since Redis couldn’t do this combination of things, we looked elsewhere.

Other utility services seemed too heavy and full-of-features for our needs. We could have stood up a separate Postgres instance, used MongoDB, Elasticache, or Prometheus. The team has experience running all these services. But the team is also aware that the more features and knobs a service has, the more context is needed to debug it – the more expertise to understand its nuances, the more risk you’ll actually use additional features, and the more risk you’ll be slow responding to issues under heavy load.

All we wanted to do was put some values in a store, have them expired automatically, and frequently count them. We’d need application level logic to do at least some of this, so we made a service for it – dracula. Please check it out and give it a try!

How it works under the hood

Dracula is a server where you can put entries, count the entries, and have the entries automatically expire.

The dracula packet layout is as follows. See protocol.go for the implementation.

Section DescriptionCommand characterPut, Count, Errorspacexxhashpre shared key + id + namespace + dataspaceClient Message IDspaceNamespacespaceEntry data
Size1 byte1 byte8 bytes1 byte4 bytesunsigned 32 bit integer (Little Endian)1 byte64 bytes1 byteremaining 1419 bytes
Examplebyte(‘P’), ‘C’, ‘E’byte(‘ ‘)0x1c330fb2d66be179byte(‘ ‘)6 or []byte{6, 0, 0, 0}byte(‘ ‘)“Default” or “anything” up to 64 bytesbyte(‘ ‘)192.169.0.1, or any string up to end of packet
500 byte dracula packet byte order

Here’s roughly how the dracula client-server model works:

  1. The client constructs a 1500 byte packet containing a client-message ID, the namespace, and the value they want to store in the namespace (to be counted later).
  2. A hash of the pre-shared secret + message ID + namespace + entry data is set inside the front part of the message.
  3. A handler is registered under the client message ID.
  4. The bytes are sent over UDP to the dracula server.
  5. Client is listening on a response port.
  6. If no response is received before the message times out, a timeout error is returned and the handler is destroyed. If the response comes after the timeout, it’s ignored.
  7. Server receives packet, decodes it and checks the hash which contains a pre-shared secret.
  8. Server performs the action. There are only two commands – either Put a namespace + entry key, or Count a namespace + entry key.
  9. Server responds to the client using the same command (Put or Count). The entry data is replaced with a 32 bit unsigned integer in the case of a Count command. The hash is computed similarly to before.
  10. Client receives the packed, decodes it and confirms the response hash.

Data structures

Dracula uses a few data structures for storing data.

Namespaces are stored in a hashmap provided by github.com/emirpasic/gods, and we use a simple mutex to sync multithreaded access. Entries in each namespace are stored in wrapped AVL tree from the same repo, which we added garbage collection and thread safety. Each node of the AVL tree has an array of sorted dates.

Here’s another view:

  • dracula server
    • Namespaces (hashmap)
      • Entries (avltree)
        • sorted dates (go slice / dynamic array of int64)

Server configuration

When using dracula, the client has a different receiving port than the server. By default the dracula server uses port 3509. The server will write responses back to the same UDP port it received messages from on the client.

Messages are stored in a “namespace” which is pretty much just a container for stored values. The namespace is like a top-level key in Redis. The CLI has a default namespace if you don’t provide one. The go client requires choosing a namespace.

Namespaces and entries in namespaces are exact – dracula does not offer any matching on namespaces.

At Mailsac, we use uses the namespaces to separate messages on a per-customer basis, and to separate free traffic. Namespaces are intentionally generic. You could just use one namespace if you like, but performance under load improves if entries are bucketed into namespaces.

Production Performance

Dracula is fast and uses minimal resources by today’s standards.

While we develop it on Intel, and in production we run dracula on Arm64 architecture under Amazon Linux for a significant savings.

In its first months of use, dracula did not spike above 1% CPU usage and 19 MB of RAM, even when handling single-digit-thousands of requests simultaneously.

Tradeoffs

By focusing on a small subset of needs, we designed a service with sharp edges. Some of these may be unexpected features so we want to enumerate what we know.

It only counts

It’s called dracula in an allusion to Count Dracula. There’s no way to list namespaces, keys, nor return stored values. Entries in a namespace can be counted, and the number of namespaces can be counted. That’s it! If we provided features like listing keys or namespace, we would have needed to change the name to List Dracula.

No persistence

Dracula is designed for short-lived ephemeral data. If dracula restarts, nothing is currently saved. This may considered for the future, though. Storing metrics or session data in dracula is an interesting idea. On the other hand, we see no need to reinvent Redis or Prometheus.

Small messages

An entire dracula protocol message is 1500 bytes. If that sounds familiar, it’s because 1500 bytes is the normal maximum-transmission-unit for UDP. Namespaces are capped at 64 bytes and values can be up to 1419. After that they’re cut off.

Same expiry

All namespaces and entries in the entire server have the same expire time (in seconds). It shouldn’t be too difficult to run multiple draculas on other ports f you have different expiry needs.

HA

The approach to high-availability assumes short-lived expiry of entries. A pool of dracula servers can replicate to one another, and dracula clients track health of pool members, automatically handling failover. Any client can read from any server, but in the case of network partitioning, consistency won’t be perfect.

Retries

Messages that fail or timeout are not retried by the dracula client right now. There’s nothing stopping the application level from handling this. It may be added as an option later.

Garbage

While we have not yet experienced issues with dracula’s garbage collection, it’s worth noting that it exists. A subset of entries are crawled and expired on a schedule. On “count” commands, old entries are expired. The entire store is typically not locked, but it’s likely you would see a little slowdown when counting entires in very large namespaces, or when there are a lot of old entires to cleanup, while GC is running. In our testing it’s on the order of single digit miliseconds, but this can be expected to grow linearly with size.

Unknown scale

We’re working with low-tens-of-thousands entries per namespace, maximum. Above that, we’re unsure how it will perform.

Language support

Upon release, dracula has a reference client implementation in Golang. Node.js support is on our radar, but not finished. Please open an issue in the dracula repo to request support for a specific language. We’d be thrilled to receive links to community written drivers as well.

What’s next?

Hopefully you enjoyed learning a little bit about dracula and are ready to give it a try. Head over to Github https://github.com/mailsac/dracula where added examples of using the server, client library, and CLI.

Check out the roadmap of features, currently tracked in the Readme.

Finally, Mailsac would love your feedback. Open a Github issue or head to forum.mailsac.com. If you’d like to see additional library languages supported, let us know.

Happy counting!

Open Source Spam Processing

Mailsac has added a new spamminess indicator in the API.

New messages will have a spam property that’s a score between 0.0 and 1.0. 1.0 indicates a high likelihood of spam. The system will get smarter over time.

The main component of spam detection is a time tested approach called Naive Bayes Classifier. The core of the code is open source at github.com/mailsac/spam-classifier, though a little special sauce is applied, too. This classifier library is ready to use and includes a simple API server and a terminal training tool.

In the works are additional spam classification projects using three different deep learning techniques: random forest, a long short term neural network, and a combination recurrent network + liquid state machine with spike dependent plasticity. The goal is to open source the useful pieces of the coming spam classifiers.