Counting distinct number of items in multiple group by query efficiently

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












I have the following query that counts the number of vessels in each zone for each week:



SELECT zone, DATE_FORMAT(creation_date, '%Y%u') AS date, COUNT(DISTINCT vessel_imo) AS vessel_count FROM vessel_position
WHERE zone IS NOT NULL
AND creation_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
GROUP BY zone, date;


The table has about 40 million rows. The execution plan for this is:



+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| 1 | SIMPLE | vessel_position | NULL | range | creation_date,zone | zone | 5 | NULL | 21190904 | 50.00 | Using where; Using index; Using filesort |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+


Columns vessel_imo, zone and creation_date each indexed. Primary key is the composite key (vessel_imo, creation_date).



When I look at the query profile, I can see that a large amount of time i spent doing Creating sort index.



Is there anything I can do to improve this query further?







share|improve this question





















  • The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
    – Mathias Ettinger
    May 25 at 9:30
















up vote
2
down vote

favorite












I have the following query that counts the number of vessels in each zone for each week:



SELECT zone, DATE_FORMAT(creation_date, '%Y%u') AS date, COUNT(DISTINCT vessel_imo) AS vessel_count FROM vessel_position
WHERE zone IS NOT NULL
AND creation_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
GROUP BY zone, date;


The table has about 40 million rows. The execution plan for this is:



+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| 1 | SIMPLE | vessel_position | NULL | range | creation_date,zone | zone | 5 | NULL | 21190904 | 50.00 | Using where; Using index; Using filesort |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+


Columns vessel_imo, zone and creation_date each indexed. Primary key is the composite key (vessel_imo, creation_date).



When I look at the query profile, I can see that a large amount of time i spent doing Creating sort index.



Is there anything I can do to improve this query further?







share|improve this question





















  • The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
    – Mathias Ettinger
    May 25 at 9:30












up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have the following query that counts the number of vessels in each zone for each week:



SELECT zone, DATE_FORMAT(creation_date, '%Y%u') AS date, COUNT(DISTINCT vessel_imo) AS vessel_count FROM vessel_position
WHERE zone IS NOT NULL
AND creation_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
GROUP BY zone, date;


The table has about 40 million rows. The execution plan for this is:



+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| 1 | SIMPLE | vessel_position | NULL | range | creation_date,zone | zone | 5 | NULL | 21190904 | 50.00 | Using where; Using index; Using filesort |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+


Columns vessel_imo, zone and creation_date each indexed. Primary key is the composite key (vessel_imo, creation_date).



When I look at the query profile, I can see that a large amount of time i spent doing Creating sort index.



Is there anything I can do to improve this query further?







share|improve this question













I have the following query that counts the number of vessels in each zone for each week:



SELECT zone, DATE_FORMAT(creation_date, '%Y%u') AS date, COUNT(DISTINCT vessel_imo) AS vessel_count FROM vessel_position
WHERE zone IS NOT NULL
AND creation_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
GROUP BY zone, date;


The table has about 40 million rows. The execution plan for this is:



+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| 1 | SIMPLE | vessel_position | NULL | range | creation_date,zone | zone | 5 | NULL | 21190904 | 50.00 | Using where; Using index; Using filesort |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+


Columns vessel_imo, zone and creation_date each indexed. Primary key is the composite key (vessel_imo, creation_date).



When I look at the query profile, I can see that a large amount of time i spent doing Creating sort index.



Is there anything I can do to improve this query further?









share|improve this question












share|improve this question




share|improve this question








edited May 28 at 1:26









Jamal♦

30.1k11114225




30.1k11114225









asked May 25 at 8:55









swdon

1384




1384











  • The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
    – Mathias Ettinger
    May 25 at 9:30
















  • The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
    – Mathias Ettinger
    May 25 at 9:30















The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
– Mathias Ettinger
May 25 at 9:30




The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
– Mathias Ettinger
May 25 at 9:30















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195145%2fcounting-distinct-number-of-items-in-multiple-group-by-query-efficiently%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195145%2fcounting-distinct-number-of-items-in-multiple-group-by-query-efficiently%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation