Dev

Issues with Cartesian products in MySQL

Profil Picture

Guillaume Briday

2 minutes

Recently, while working on a web development project, I needed to use the SUM aggregation function with multiple JOIN operations. I encountered strange behavior: MySQL was doubling or tripling the expected results. After spending some time finding a solution, I wanted to share what I discovered along with a few details.

In my case, I needed to retrieve a list of unpaid invoices by comparing invoice lines and their associated payments. Here’s a simplified schema of my tables:

invoices

id reference
1 AAA

invoice_lines

id amount invoice_id
1 500 1
2 1000 1
3 2000 1

payments

id amount invoice_id
1 2000 1

The structure is straightforward: an invoice can have multiple invoice lines and associated payments. Payments and invoice lines are not directly related to each other.

In total, we can see that invoice AAA has received a payment of 2000 against a total due of 3500, so the invoice remains unpaid.

To create my list, I started with the following query:

/* The SUM functions in the SELECT are temporary and help to understand what's happening. */

SELECT invoices.*, SUM(payments.amount) AS paid_amount, SUM(invoice_lines.amount) AS due_amount
FROM invoices
JOIN payments ON invoices.id = payments.invoice_id
JOIN invoice_lines ON invoices.id = invoice_lines.invoice_id
GROUP BY invoices.id
HAVING SUM(payments.amount) < SUM(invoice_lines.amount);

When I executed this seemingly correct query, oddly, it returned no rows. Removing the HAVING condition to inspect the results, I got this:

id reference paid_amount due_amount
1 AAA 6000 3500

Here, it shows that 6000 has been paid instead of the expected 2000, which is three times the correct amount, corresponding to the number of invoice lines.

Adding another invoice line increases the paid_amount to 8000, and similarly, adding another payment doubles the due_amount.

What's happening?

Without a direct relationship between the payments and invoice_lines tables, but with a GROUP BY on the invoices, MySQL performs a Cartesian product between all rows in the joined tables linked to the invoices.

In simple terms, for each row in payments, MySQL multiplies the amount value by the number of rows in invoice_lines and vice versa. For each row in invoice_lines, it multiplies the amount value by the number of rows in payments.

I’m not sure why this is MySQL’s default behavior, but if you know the reason, feel free to share it in the comments.

How to resolve this issue?

There are several solutions. One option is to use subqueries in the JOIN clauses to isolate the tables, but I find this less readable.

The official documentation explains that we can use the DISTINCT keyword in the SUM function to force it to use distinct values, ensuring each row is counted only once.

Here’s the updated query with DISTINCT added to all the SUM functions:

/* The SUM functions in the SELECT are temporary and help to understand what's happening. */

SELECT invoices.*, SUM(DISTINCT payments.amount) AS paid_amount, SUM(DISTINCT invoice_lines.amount) AS due_amount
FROM invoices
JOIN payments ON invoices.id = payments.invoice_id
JOIN invoice_lines ON invoices.id = invoice_lines.invoice_id
GROUP BY invoices.id
HAVING SUM(DISTINCT payments.amount) < SUM(DISTINCT invoice_lines.amount);

This time, the unpaid invoice appears correctly with the right amounts:

id reference paid_amount due_amount
1 AAA 2000 3500

Of course, if each table contains only one field for the aggregation, you won’t notice this issue. This was the case for me for a long time, and it was sheer luck that I encountered the problem before fixing it.

Conclusion

If you’re using multiple JOIN operations with aggregation functions, make sure to check all possible scenarios to avoid such issues. Add DISTINCT as needed to ensure accurate results.

Since this article covers a specific case, feel free to share any suggestions, feedback, or improvements in the comments.

Thank you!

Simplify your time tracking with Timecop

Timecop is a time tracking app that brings simplicity in your day to day life.

Timecop projects