blog-banner

The Halloween Problem: a spooky (and true) SQL story

Last edited on October 31, 2023

0 minute read

    Let’s start by ripping the band-aid off: there are no ghosts, ghouls, or goblins in the story of the Halloween problem. But for anyone who works with databases, it does tell a scary story about how a simple SQL operation can haunt your database in a pretty unexpected way.

    So, let’s talk about the “Halloween Problem,” and whether it’s truly as spooky as it sounds.

    What is the Halloween Problem?Copy Icon

    The Halloween Problem describes a phenomenon in which INSERT, UPDATE, DELETE, and MERGE queries in SQL can, under certain circumstances, result in rows being updated multiple times during an operation in which they should only be updated once.

    Why is it called the Halloween Problem?Copy Icon

    The Halloween Problem was first discovered by Don Chamberlin, Patricia Selinger, and Morton Astrahan.

    According to a 2001 interview with Chamberlin, it was uncovered while Selinger and Astrahan were working on query optimization. They were working with an example problem: giving 10% raises to every employee in a relational database table whose yearly salary was less than $25,000. They discovered that, if executed in a certain way, that query could end up running over and over again until every employee’s salary was at least $25,000.

    Chamberlin says that Selinger and Astrahan discovered this on October 31, a Friday. Realizing that the issue was too big for them to solve before the weekend, the group decided to give it a name – the Halloween Problem, since it was discovered on Halloween – so that they could more easily refer to it in the future when they had time to address it. (Note: this discovery probably happened in 1975, but it’s not entirely certain. See our note at the end for additional details).

    How does the Halloween Problem happen?Copy Icon

    In short, the Halloween Problem can occur in relational databases if a write operation is executed on a table using a non-clustered index on a column that is being updated. In non-clustered indexes, the logical order of the index may not match the order in which the rows are stored physically — meaning that an update to a row can change its location. This, in turn, can move it back ahead of the index scan the query is using, resulting in the same row being updated multiple times…or possibly even in an infinite loop. Talk about scary!

    We can demonstrate this using the same example Selinger and Astrahan were looking at when they uncovered it. Imagine we have a table of employees and salaries in which the first few rows look something like this:

    If we wanted to update the table to give a 10% raise to anyone making under $25,000 per year, we might run a query like this:

    UPDATE employee SET salary = salary * 1.1 WHERE salary < 25000;

    We would expect the result of that query to be an updated table in which Ellen’s salary does not change, but Parker’s salary is updated to $19,360 and Ash’s salary is updated to $13,530, and so on.

    If, however, our query is processed using a non-clustered index, Parker and Ash’s salaries may be updated continually until they reach $25,000 or higher.

    How? Imagine this is our index, a non-clustered index sorted (ascending) by the values in the salary column:

    If our query is executed iteratively using this index, then it will look first at the row for ash. Since this value is below 25000, the value will be updated – as per our query, multiplied by 1.1, so the new value is 13530. Updating that row will also update the index, so our index is now:

    As the query is proceeding iteratively, the next step will be to update the parker row, but we can already see the problem here: our update to the ash row also changed its position in the index. And since its value is still less than 25000, the update will be applied to it again when the index scan reaches that point in the index.

    And with these specific values, the same thing will happen with parker’s salary, such that ash and parker will continually leapfrog each other in the index, remaining one step ahead of the index scan, until both values pass 25000.

    It’s important to note that this but is one version of the Halloween problem. It does have other forms, and can also impact INSERT, DELETE, and MERGE statements in addition to UPDATE statements.

    Do YOU need to be scared of the Halloween problem?Copy Icon

    Long story short: probably not. This problem, after all, was discovered by some of the core developers of relational database systems, and has been a known potential issue since the 1970s. Many modern relational database management systems thus have query optimizers that are designed to detect queries that require “Halloween protection” and execute them accordingly.

    However, don’t let that prevent you from getting spooked by the Halloween problem! As mentioned, what we’ve described above is just one relatively simple example of the Halloween problem, taken from Chamberlin, Selinger, and Astrahan’s original example back in the System R days. The Halloween problem can potentially occur in other ways depending on the specifics of your DBMS. Many DBMS still have the potential for “Halloween-like” problems to occur under specific circumstances, and Halloween Protection strategies can sometimes have an impact on performance.

    So, while the Halloween Problem might only have been named that because of the coincidental date on which it was discovered, the idea of it should still send a shiver down your spine… at least until you’ve checked your database’s documentation to see how it’s handled.

    (If you’re a CockroachDB user, you can read a bit about the investigation of this problem in the context of our distributed SQL database here, and see how that issue was resolved here).

    Endnote: The Halloween Problem and memory issues

    When was the original Halloween Problem discovered? Probably 1975, but the issue is clouded somewhat by the fallibility of (human) memory.

    To be clear, Chamberlin and Selinger have both said the problem was discovered in 1976 or 1977. Wikipedia (and a plethora of other online sources citing it) also say it was 1976.

    However, Chamberlin’s explanation for the name hinges on the problem having been discovered on a Friday. Halloween did not fall on a Friday in 1976 or 1977. So, Chamberlin must either be misremembering the reason for the name or misremembering the year. The latter seems more likely.

    Halloween did fall on a Friday in 1975, and that was the only Friday Halloween of the 1970s, so if Chamberlin is remembering the reason for the naming correctly, 1975 seems to be the most likely candidate.

    sql

    Keep reading

    View all posts