0

I'm trying to implement a simple event table in MySQL, that I want other processes to poll for new information. The design is pretty simple:

  1. I hava a table event with an AUTO_INCREMENT id column and then additional columns describing the event
  2. Via triggers on other tables, rows are inserted
  3. Other processes poll this table and by using the ID column knows where in the "event stream" they are and select accordingly for new information.

[2] is always performed inside a transaction.

The problem we are seeing, and I should have foreseen this, is that this really doesn't work when writes are done within transactions. The ID of the next row in the event table is decided within the transaction, but there is no guarantee that from an external point of view that events are written in ascending order. This means that you can not simply select for the next highest ID, because there still may be rows with a lower ID not yet committed.

Example:

Transaction 1

start transaction;
insert into event (...) values (...);
commit;

Transaction 2

start transaction;
insert into event (...) values (...);
commit;

Consider when transaction 1 is first to insert into event, and then transaction 2. Transaction 1's event will be given 1 as value for its auto incrementing column (assuming table is empty, first writes), whereas transaction 2's will be given 2. If then transaction 2 is first to commit, for external processes polling this table they will just see the event having id 2, process it and incorrectly assume that from now on they should select for rows having id > 2. The event from transaction 1 is effectively lost.

Is there a clever way to work around this? In practice, I'd like to be able to poll for committed rows in ascending order. Unfortunately, I can not expect the IDs in the table to be an unbroken sequence; transactions may be rolled back effectively creating gaps of "unused" ids.

I'm stumped. Include a timestamp with each event, and instead always select for rows having a timestamp greater than what we expect the longest running transactions to have, and then ignore events that were already processed?

Edit

I'm adding more information to try to explain the problem I want to solve. A common requirement is that whenever table X is updated, we want to notify external processes. A simple design would then be: add trigger to table X, write row to event table, have external process poll this table and act accordingly.

In a perfect world, I'd like "event" to instead be pushed to a topic and not involve the database at all, but since writes to table X are done all over the place it would be very error prone to get right; having a trigger that writes a row solved that problem very nicely. In fact, if the polling of the event table was simple to get right we could just add a simple bridge solution that polls and pushes to a topic.

Edit 2

A possible solution expanding on previous edit would be to have a single process polling the event table selecting rows in ascending order by id. Information about events are pushed to a topic, and then deleted from the table. This way we are not dependent on rows getting stricter higher ids as they are written.

Matthias
  • 146
  • 1
  • 8
  • This whole idea just feels so wrong. Regrettably you do not explain which problem this "event" table is supposed to solve. All you say is: "I want other processes to poll (it) for new information.". That is not saying much. I can therefore not suggest a better solution, but I would strongly advice not to go further down this route. – KIKO Software Jun 26 '19 at 09:09
  • Thank you for the reply @KIKOSoftware. I tried to expand the initial post. I'm not sure if I'm just digging a deeper hole here, but in our initial design we wanted to allow multiple processes to "poll" this event table. A possible redesign would be to have one single polling process, owned by the application that otherwise "owns" this database, that continuously polls for new rows and posts on a topic. Whenever a row/event has been posted, it is *deleted*. This way we work around the initial problem: as long as >0 rows exist, we have rows to publish. – Matthias Jun 26 '19 at 09:30
  • After reading your two edits and comment, I still only know there are some 'external processes' that need to know about 'events' in a database. I still have no clue as to which problem you're trying to solve, I only know about the solution you chose. My idea is that your solution just adds an extra layer of (yet unknown) complications and that it should be avoided. How about backup restoration? However, I don't have the information needed to suggest an alternative solution. – KIKO Software Jun 26 '19 at 10:05
  • Some references: [Reasons to Avoid Triggers](https://www.itprotoday.com/development-techniques-and-management/reasons-avoid-triggers), [Are database triggers evil?](https://stackoverflow.com/questions/460316/are-database-triggers-evil) "it's easy to forget they are there until they hurt you with unintended (and very mysterious) consequences.", [The Problem With Triggers](https://www.red-gate.com/simple-talk/sql/oracle/the-problem-with-triggers/)... as you can see, they all basically say the same... however, your situation is different and you _will_ be able to keep them under control. – KIKO Software Jun 26 '19 at 10:21
  • @KIKOSoftware I agree, we don't really like triggers as well. We have had problems in the past with triggers having too much logic. Nowadays we have a few, but they all perform very trivial inserts. Having discussed the problem I'm describing more in-house, we will attempt a redesign that does not require us to depend on auto increment columns being written in sequence. – Matthias Jun 26 '19 at 10:30
  • Yes, the obvious solution is using timestamps.... but still... you're only repairing one problem with this now, and not all possible future problems this solution might have. – KIKO Software Jun 26 '19 at 10:32
  • We decided to not go with timestamps. Instead we will treat the event table as a backlog. Events are selected in ascending order by id guaranteeing that they arrive in the same order as each transaction wrote them. Then events will be deleted as soon as they have been processed. – Matthias Jun 26 '19 at 10:38

0 Answers0