Sequelize: What is the value of counting DISTINCT on a primary key?
jodysimpson opened this issue · 5 comments
This change was introduced to add DISTINCT(<primaryKey>)
to the counts. With Postgres in particular, this dramatically slows down queries on large tables because the planner will try to sort the values. Link to Stackoverflow discussing this issue. I'm not sure if I'm missing something, but wouldn't adding DISTINCT
in this case be pointless because you're always counting the primary key which is unique to begin with?
This change is added by the #304
@DenisFerrero could you explain it?
Ok take for example a user-auths binding through an M-N, I applied the below defaultScope to the users
User.addScope('defaultScope', {
include: [{
model: models.auths, as: 'auths',
through: { attributes: [] }
attributes: ['id', 'name'],
order: [['name', 'ASC']]
}]
});
Then in your database, you have the following data
AUTHS
id | name |
---|---|
1 | application |
2 | administration |
3 | debugging |
USERS
id | name |
---|---|
1 | John Doe |
USERS_AUTHS (M-N table)
user_id | auth_id |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
When you try to call the list
action for the users the data query will look like this
SELECT
user.id AS "user.id", user.name AS "user.name",
auth.id AS "auth.id", auth.name AS "auth.name"
FROM users AS user
JOIN users_auths AS ua ON ua.user_id = user.id
JOIN auths AS auth ON au.auth_id = auth.id
and the results like this
user.id | user.name | auths.id | auths.id |
---|---|---|---|
1 | John Doe | 1 | application |
1 | John Doe | 2 | administration |
1 | John Doe | 3 | debugging |
The count will have a similar pattern but will count each row so the results will be 3 and not 1.
The DISTINCT(user.id)
just helps to count just once the user and get the right result
This defaultScope pattern's pretty much unconventional as services should be independent of each other (errors as a newbie moleculer dev).
The same result can be performed with a basic default scope and calling a populate method on request.
If you decide to remove it just let me know so I can dedicate some time to fix this behavior in my code
@DenisFerrero I feel like your example breaks the intent of Moleculer DB having one model per service.
Additionally, a count query designed to paginate a larger query should return the same overall count as the original query. Adding behavior to accommodate an unintended pattern does not seem like the best choice to me, especially when the change dramatically slows down the intended usage pattern for larger tables.
Yeah, you're right it was my mistake when I started designing the application. Currently, I don't have much time but I'll revert it to the previous version and create a PR
Sorry if I disappeared for a while but I have been on other works. I created a Pull Request (#377) with the revert of the changes as we discussed