Issues with Cartesian products in MySQL
Guillaume Briday
2 minutes
Recently, while working on a web development project, I needed to use the SUM
aggregation function with multiple JOIN
operations. I encountered strange behavior: MySQL was doubling or tripling the expected results. After spending some time finding a solution, I wanted to share what I discovered along with a few details.
In my case, I needed to retrieve a list of unpaid invoices by comparing invoice lines and their associated payments. Here’s a simplified schema of my tables:
invoices
id | reference |
---|---|
1 | AAA |
invoice_lines
id | amount | invoice_id |
---|---|---|
1 | 500 | 1 |
2 | 1000 | 1 |
3 | 2000 | 1 |
payments
id | amount | invoice_id |
---|---|---|
1 | 2000 | 1 |
The structure is straightforward: an invoice can have multiple invoice lines and associated payments. Payments and invoice lines are not directly related to each other.
In total, we can see that invoice AAA
has received a payment of 2000
against a total due of 3500
, so the invoice remains unpaid.
To create my list, I started with the following query:
/* The SUM functions in the SELECT are temporary and help to understand what's happening. */
SELECT invoices.*, SUM(payments.amount) AS paid_amount, SUM(invoice_lines.amount) AS due_amount
FROM invoices
JOIN payments ON invoices.id = payments.invoice_id
JOIN invoice_lines ON invoices.id = invoice_lines.invoice_id
GROUP BY invoices.id
HAVING SUM(payments.amount) < SUM(invoice_lines.amount);
When I executed this seemingly correct query, oddly, it returned no rows. Removing the HAVING
condition to inspect the results, I got this:
id | reference | paid_amount | due_amount |
---|---|---|---|
1 | AAA | 6000 | 3500 |
Here, it shows that 6000
has been paid instead of the expected 2000
, which is three times the correct amount, corresponding to the number of invoice lines.
Adding another invoice line increases the paid_amount
to 8000
, and similarly, adding another payment doubles the due_amount
.
What's happening?
Without a direct relationship between the payments
and invoice_lines
tables, but with a GROUP BY
on the invoices
, MySQL performs a Cartesian product between all rows in the joined tables linked to the invoices
.
In simple terms, for each row in payments
, MySQL multiplies the amount
value by the number of rows in invoice_lines
and vice versa. For each row in invoice_lines
, it multiplies the amount
value by the number of rows in payments
.
I’m not sure why this is MySQL’s default behavior, but if you know the reason, feel free to share it in the comments.
How to resolve this issue?
There are several solutions. One option is to use subqueries in the JOIN
clauses to isolate the tables, but I find this less readable.
The official documentation explains that we can use the DISTINCT
keyword in the SUM
function to force it to use distinct values, ensuring each row is counted only once.
Here’s the updated query with DISTINCT
added to all the SUM
functions:
/* The SUM functions in the SELECT are temporary and help to understand what's happening. */
SELECT invoices.*, SUM(DISTINCT payments.amount) AS paid_amount, SUM(DISTINCT invoice_lines.amount) AS due_amount
FROM invoices
JOIN payments ON invoices.id = payments.invoice_id
JOIN invoice_lines ON invoices.id = invoice_lines.invoice_id
GROUP BY invoices.id
HAVING SUM(DISTINCT payments.amount) < SUM(DISTINCT invoice_lines.amount);
This time, the unpaid invoice appears correctly with the right amounts:
id | reference | paid_amount | due_amount |
---|---|---|---|
1 | AAA | 2000 | 3500 |
Of course, if each table contains only one field for the aggregation, you won’t notice this issue. This was the case for me for a long time, and it was sheer luck that I encountered the problem before fixing it.
Conclusion
If you’re using multiple JOIN
operations with aggregation functions, make sure to check all possible scenarios to avoid such issues. Add DISTINCT
as needed to ensure accurate results.
Since this article covers a specific case, feel free to share any suggestions, feedback, or improvements in the comments.
Thank you!