Find how much reputation a user had on a given date

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
5
down vote

favorite












I saw this question on MSE and went ahead and wrote a solution to it:
https://meta.stackexchange.com/questions/313561/determining-users-reputation-as-of-particular-date



This calculates the reputation a user should have had at a certain date. I attempt to factor in everything that SEDE allows me to see. So -1 events from downvoting posts and serial voting reversal as well as the documentation reputation aren't factored in.



I'm curious about what could be improved here



SELECT
-- Total Reputation
(
SUM(CASE WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200 ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits END)
+ SUM(r2d.ReputationFromBounties)
+ COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0)
+ COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0)
+ SUM(r2d.ReputationFromAccepts)
) AS TotalReputation,
-- Rep Capped Activities with the Cap Factored in
SUM(
CASE
WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200
ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits
END) AS ReputationFromRepCap,
-- Total Bounties recieved
SUM(r2d.ReputationFromBounties) AS ReputationFromBounties,
-- Total Bounties given
COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0) AS ReputationGivenAsBounties,
-- Total Reputation from Accepting Answers
COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0) AS ReputationFromAcceptingAnswers,
-- Total Reputation from Accepted Answers
SUM(r2d.ReputationFromAccepts) AS ReputationFromAcceptedAnswers


FROM

(

SELECT
v.CreationDate AS VoteDate,

-- Total Reputation from Post Upvotes
-- PostTypeId 1 = Question, 2 = Answer
-- VoteTypeId 2 = Upvote, 3 = Downvote
-- CommunityOwnedDate is when a post was made CW.
-- Votes before that count, after not.
-- Vote Date is truncated to full days only so grouping works
SUM((CASE
WHEN (p.PostTypeId = 1 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 5
WHEN (p.PostTypeId = 2 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 10
WHEN (v.VoteTypeId = 3 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN -2
ELSE 0
END)) AS ReputationFromVotes,

-- Total Reputation from Answer Bounties
-- VoteTypeId 9 = Bounty Close (Bounty Awarded)
-- BountyAmount = Amount of Reputation awarded
SUM(CASE
WHEN v.VoteTypeId = 9 THEN v.BountyAmount
ELSE 0
END) AS ReputationFromBounties,

-- Total Reputation from Answer Accepts
-- VoteTypeId 1 = AcceptedByOriginator (Answer Accepted)
SUM(CASE
WHEN (v.VoteTypeId = 1 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 15
ELSE 0
END) AS ReputationFromAccepts,

-- Total Reputation from Suggested Edits
-- if ApprovalDate isn't NULL and RejectionDate is NULL it's been approved and not overriden
-- Group by the same Date as Votes for Rep-Cap evaluation (They count towards it)
COALESCE((SELECT
SUM(CASE WHEN (se.ApprovalDate IS NOT NULL AND se.RejectionDate IS NULL) THEN 2 ELSE 0 END)
FROM SuggestedEdits AS se
WHERE se.OwnerUserId = ##UserId##
AND YEAR(v.CreationDate) = YEAR(se.ApprovalDate)
AND MONTH(v.CreationDate) = MONTH(se.ApprovalDate)
AND DAY(v.CreationDate) = DAY(se.ApprovalDate) ),0) AS ReputationFromSuggestedEdits

FROM Posts AS p
INNER JOIN Votes AS v ON v.PostId = p.Id

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##

GROUP BY v.CreationDate

) as r2d






share|improve this question



















  • For comparison, see @rolfl's Reputation Races query.
    – 200_success
    Aug 3 at 9:24
















up vote
5
down vote

favorite












I saw this question on MSE and went ahead and wrote a solution to it:
https://meta.stackexchange.com/questions/313561/determining-users-reputation-as-of-particular-date



This calculates the reputation a user should have had at a certain date. I attempt to factor in everything that SEDE allows me to see. So -1 events from downvoting posts and serial voting reversal as well as the documentation reputation aren't factored in.



I'm curious about what could be improved here



SELECT
-- Total Reputation
(
SUM(CASE WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200 ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits END)
+ SUM(r2d.ReputationFromBounties)
+ COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0)
+ COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0)
+ SUM(r2d.ReputationFromAccepts)
) AS TotalReputation,
-- Rep Capped Activities with the Cap Factored in
SUM(
CASE
WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200
ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits
END) AS ReputationFromRepCap,
-- Total Bounties recieved
SUM(r2d.ReputationFromBounties) AS ReputationFromBounties,
-- Total Bounties given
COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0) AS ReputationGivenAsBounties,
-- Total Reputation from Accepting Answers
COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0) AS ReputationFromAcceptingAnswers,
-- Total Reputation from Accepted Answers
SUM(r2d.ReputationFromAccepts) AS ReputationFromAcceptedAnswers


FROM

(

SELECT
v.CreationDate AS VoteDate,

-- Total Reputation from Post Upvotes
-- PostTypeId 1 = Question, 2 = Answer
-- VoteTypeId 2 = Upvote, 3 = Downvote
-- CommunityOwnedDate is when a post was made CW.
-- Votes before that count, after not.
-- Vote Date is truncated to full days only so grouping works
SUM((CASE
WHEN (p.PostTypeId = 1 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 5
WHEN (p.PostTypeId = 2 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 10
WHEN (v.VoteTypeId = 3 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN -2
ELSE 0
END)) AS ReputationFromVotes,

-- Total Reputation from Answer Bounties
-- VoteTypeId 9 = Bounty Close (Bounty Awarded)
-- BountyAmount = Amount of Reputation awarded
SUM(CASE
WHEN v.VoteTypeId = 9 THEN v.BountyAmount
ELSE 0
END) AS ReputationFromBounties,

-- Total Reputation from Answer Accepts
-- VoteTypeId 1 = AcceptedByOriginator (Answer Accepted)
SUM(CASE
WHEN (v.VoteTypeId = 1 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 15
ELSE 0
END) AS ReputationFromAccepts,

-- Total Reputation from Suggested Edits
-- if ApprovalDate isn't NULL and RejectionDate is NULL it's been approved and not overriden
-- Group by the same Date as Votes for Rep-Cap evaluation (They count towards it)
COALESCE((SELECT
SUM(CASE WHEN (se.ApprovalDate IS NOT NULL AND se.RejectionDate IS NULL) THEN 2 ELSE 0 END)
FROM SuggestedEdits AS se
WHERE se.OwnerUserId = ##UserId##
AND YEAR(v.CreationDate) = YEAR(se.ApprovalDate)
AND MONTH(v.CreationDate) = MONTH(se.ApprovalDate)
AND DAY(v.CreationDate) = DAY(se.ApprovalDate) ),0) AS ReputationFromSuggestedEdits

FROM Posts AS p
INNER JOIN Votes AS v ON v.PostId = p.Id

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##

GROUP BY v.CreationDate

) as r2d






share|improve this question



















  • For comparison, see @rolfl's Reputation Races query.
    – 200_success
    Aug 3 at 9:24












up vote
5
down vote

favorite









up vote
5
down vote

favorite











I saw this question on MSE and went ahead and wrote a solution to it:
https://meta.stackexchange.com/questions/313561/determining-users-reputation-as-of-particular-date



This calculates the reputation a user should have had at a certain date. I attempt to factor in everything that SEDE allows me to see. So -1 events from downvoting posts and serial voting reversal as well as the documentation reputation aren't factored in.



I'm curious about what could be improved here



SELECT
-- Total Reputation
(
SUM(CASE WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200 ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits END)
+ SUM(r2d.ReputationFromBounties)
+ COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0)
+ COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0)
+ SUM(r2d.ReputationFromAccepts)
) AS TotalReputation,
-- Rep Capped Activities with the Cap Factored in
SUM(
CASE
WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200
ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits
END) AS ReputationFromRepCap,
-- Total Bounties recieved
SUM(r2d.ReputationFromBounties) AS ReputationFromBounties,
-- Total Bounties given
COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0) AS ReputationGivenAsBounties,
-- Total Reputation from Accepting Answers
COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0) AS ReputationFromAcceptingAnswers,
-- Total Reputation from Accepted Answers
SUM(r2d.ReputationFromAccepts) AS ReputationFromAcceptedAnswers


FROM

(

SELECT
v.CreationDate AS VoteDate,

-- Total Reputation from Post Upvotes
-- PostTypeId 1 = Question, 2 = Answer
-- VoteTypeId 2 = Upvote, 3 = Downvote
-- CommunityOwnedDate is when a post was made CW.
-- Votes before that count, after not.
-- Vote Date is truncated to full days only so grouping works
SUM((CASE
WHEN (p.PostTypeId = 1 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 5
WHEN (p.PostTypeId = 2 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 10
WHEN (v.VoteTypeId = 3 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN -2
ELSE 0
END)) AS ReputationFromVotes,

-- Total Reputation from Answer Bounties
-- VoteTypeId 9 = Bounty Close (Bounty Awarded)
-- BountyAmount = Amount of Reputation awarded
SUM(CASE
WHEN v.VoteTypeId = 9 THEN v.BountyAmount
ELSE 0
END) AS ReputationFromBounties,

-- Total Reputation from Answer Accepts
-- VoteTypeId 1 = AcceptedByOriginator (Answer Accepted)
SUM(CASE
WHEN (v.VoteTypeId = 1 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 15
ELSE 0
END) AS ReputationFromAccepts,

-- Total Reputation from Suggested Edits
-- if ApprovalDate isn't NULL and RejectionDate is NULL it's been approved and not overriden
-- Group by the same Date as Votes for Rep-Cap evaluation (They count towards it)
COALESCE((SELECT
SUM(CASE WHEN (se.ApprovalDate IS NOT NULL AND se.RejectionDate IS NULL) THEN 2 ELSE 0 END)
FROM SuggestedEdits AS se
WHERE se.OwnerUserId = ##UserId##
AND YEAR(v.CreationDate) = YEAR(se.ApprovalDate)
AND MONTH(v.CreationDate) = MONTH(se.ApprovalDate)
AND DAY(v.CreationDate) = DAY(se.ApprovalDate) ),0) AS ReputationFromSuggestedEdits

FROM Posts AS p
INNER JOIN Votes AS v ON v.PostId = p.Id

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##

GROUP BY v.CreationDate

) as r2d






share|improve this question











I saw this question on MSE and went ahead and wrote a solution to it:
https://meta.stackexchange.com/questions/313561/determining-users-reputation-as-of-particular-date



This calculates the reputation a user should have had at a certain date. I attempt to factor in everything that SEDE allows me to see. So -1 events from downvoting posts and serial voting reversal as well as the documentation reputation aren't factored in.



I'm curious about what could be improved here



SELECT
-- Total Reputation
(
SUM(CASE WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200 ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits END)
+ SUM(r2d.ReputationFromBounties)
+ COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0)
+ COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0)
+ SUM(r2d.ReputationFromAccepts)
) AS TotalReputation,
-- Rep Capped Activities with the Cap Factored in
SUM(
CASE
WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200
ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits
END) AS ReputationFromRepCap,
-- Total Bounties recieved
SUM(r2d.ReputationFromBounties) AS ReputationFromBounties,
-- Total Bounties given
COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0) AS ReputationGivenAsBounties,
-- Total Reputation from Accepting Answers
COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0) AS ReputationFromAcceptingAnswers,
-- Total Reputation from Accepted Answers
SUM(r2d.ReputationFromAccepts) AS ReputationFromAcceptedAnswers


FROM

(

SELECT
v.CreationDate AS VoteDate,

-- Total Reputation from Post Upvotes
-- PostTypeId 1 = Question, 2 = Answer
-- VoteTypeId 2 = Upvote, 3 = Downvote
-- CommunityOwnedDate is when a post was made CW.
-- Votes before that count, after not.
-- Vote Date is truncated to full days only so grouping works
SUM((CASE
WHEN (p.PostTypeId = 1 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 5
WHEN (p.PostTypeId = 2 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 10
WHEN (v.VoteTypeId = 3 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN -2
ELSE 0
END)) AS ReputationFromVotes,

-- Total Reputation from Answer Bounties
-- VoteTypeId 9 = Bounty Close (Bounty Awarded)
-- BountyAmount = Amount of Reputation awarded
SUM(CASE
WHEN v.VoteTypeId = 9 THEN v.BountyAmount
ELSE 0
END) AS ReputationFromBounties,

-- Total Reputation from Answer Accepts
-- VoteTypeId 1 = AcceptedByOriginator (Answer Accepted)
SUM(CASE
WHEN (v.VoteTypeId = 1 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 15
ELSE 0
END) AS ReputationFromAccepts,

-- Total Reputation from Suggested Edits
-- if ApprovalDate isn't NULL and RejectionDate is NULL it's been approved and not overriden
-- Group by the same Date as Votes for Rep-Cap evaluation (They count towards it)
COALESCE((SELECT
SUM(CASE WHEN (se.ApprovalDate IS NOT NULL AND se.RejectionDate IS NULL) THEN 2 ELSE 0 END)
FROM SuggestedEdits AS se
WHERE se.OwnerUserId = ##UserId##
AND YEAR(v.CreationDate) = YEAR(se.ApprovalDate)
AND MONTH(v.CreationDate) = MONTH(se.ApprovalDate)
AND DAY(v.CreationDate) = DAY(se.ApprovalDate) ),0) AS ReputationFromSuggestedEdits

FROM Posts AS p
INNER JOIN Votes AS v ON v.PostId = p.Id

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##

GROUP BY v.CreationDate

) as r2d








share|improve this question










share|improve this question




share|improve this question









asked Aug 3 at 8:59









Magisch

209210




209210











  • For comparison, see @rolfl's Reputation Races query.
    – 200_success
    Aug 3 at 9:24
















  • For comparison, see @rolfl's Reputation Races query.
    – 200_success
    Aug 3 at 9:24















For comparison, see @rolfl's Reputation Races query.
– 200_success
Aug 3 at 9:24




For comparison, see @rolfl's Reputation Races query.
– 200_success
Aug 3 at 9:24










2 Answers
2






active

oldest

votes

















up vote
5
down vote













The main improvements I would suggest are:



  1. Minimize the number of accesses made to the large, minimally indexed, SEDE tables.

  2. Break the query into multiple steps.

  3. Reduce or eliminate code repetition.

Temporary tables are allowed on SEDE. Use these to fetch the minimal data needed from the large base tables to perform the calculations. Referring to these smaller data sets will be much more efficient than interrogating the base tables multiple times.



Properly used, temporary tables enable more accurate cardinality estimation, automatic statistics on intermediate results, and provide indexing opportunities - all of which can improve final plan quality and performance.



Breaking the query up also makes the logic easier to comprehend and debug (initially, and when maintaining it in future). Errors and redundancies are much easier to spot with smaller queries.



I have not attempted to improve the reputation calculation logic itself, except in a few minor ways, but the following illustrates a possible re-implementation of the provided code using temporary tables:



Initialization





DECLARE
@UserId integer = 22656, -- Jon Skeet, why not
@UntilDate datetime = CURRENT_TIMESTAMP;

DROP TABLE IF EXISTS
#Posts, #Data, #Edits;


Posts data



Just the columns we are going to need from the Posts table:



CREATE TABLE #Posts
(
Id integer NOT NULL,
AcceptedAnswerId integer NULL,
CommunityOwnedDate datetime NULL,
PostTypeId tinyint NOT NULL
);


Load the minimal number of rows needed:



INSERT #Posts
(
Id,
AcceptedAnswerId,
CommunityOwnedDate,
PostTypeId
)
SELECT
P.Id,
P.AcceptedAnswerId,
P.CommunityOwnedDate,
P.PostTypeId
FROM dbo.Posts AS P
WHERE
P.OwnerUserId = @UserId
AND P.CreationDate <= @UntilDate
AND
(
P.CommunityOwnedDate IS NULL
OR P.CommunityOwnedDate <= @UntilDate
);


This applies the common predicates in the original query as early as possible, including setting a boundary of the values of CommunityOwnedDate that might impact the reputation calculation up to the desired date. Other filtering could be added here, for example to restrict the PostTypeId values to just those of interest.



The execution plan (for Jon Skeet) is:



load posts



This plan is relatively efficient, being primarily based on a seek to the OwnerUserId specified. The Merge Interval subtree is concerned with finding the range of CommunityOwnedDate values used as a secondary seek predicate.



The Key Lookup is sadly unavoidable, given the SEDE view dbo.Posts wrapping the underlying table dbo.PostsWithDeleted (filtering on DeletionDate IS NULL without a supporting index). Nevertheless, it does mean we can filter CreationDate in the same lookup without adding significant cost.



Adding votes data



The next step adds the columns we need from the voting data associated with the qualifying posts. This is performed as a second step rather than joining all in one step because the #Posts table provides useful accurate cardinality and statistical information. A join query is quite likely to mis-estimate, resulting in an inappropriate plan selection or hash spill, for example.



CREATE TABLE #Data
(
AcceptedAnswerId integer NULL,
CommunityOwnedDate datetime NULL,
PostTypeId tinyint NOT NULL,
BountyAmount integer NULL,
VoteTypeId tinyint NOT NULL,
CreationDate datetime NOT NULL
)
WITH (DATA_COMPRESSION = ROW);


Row compression is not necessarily super-useful here, but it does illustrate the extra flexibility of using discrete result sets.



INSERT #Data WITH (TABLOCK)
(
AcceptedAnswerId,
CommunityOwnedDate,
PostTypeId,
BountyAmount,
VoteTypeId,
CreationDate
)
SELECT
P.AcceptedAnswerId,
P.CommunityOwnedDate,
P.PostTypeId,
V.BountyAmount,
V.VoteTypeId,
V.CreationDate
FROM #Posts AS P
JOIN dbo.Votes AS V
ON V.PostId = P.Id
WHERE
V.CreationDate <= @UntilDate
AND
(
P.CommunityOwnedDate IS NULL
OR P.CommunityOwnedDate > V.CreationDate
);


The query above again applies predicates as early as possible, and gives us a combined view of the posts and votes that we can interrogate in multiple ways later on at low cost (certainly much cheaper than accessing the base tables again).



load post and vote data



The execution plan features accurate cardinality estimates, appropriate use of parallelism, and a hashtable build-side bitmap filter used to perform semi-join reduction on the Votes table scan. This removes rows that cannot possibly join even before they are surfaced from the storage engine to the query processor (as indicated by the INROW attribute).



Suggested Edit Data



We only need the date (no time component) and total reputation earned for suggested edits:



CREATE TABLE #Edits
(
ApprovalDate datetime PRIMARY KEY,
Reputation integer NOT NULL
);




INSERT #Edits
(
ApprovalDate,
Reputation
)
SELECT
G.ApprovalDate,
EditRep = 2 * COUNT(*) -- 2 rep per approved suggested edit
FROM dbo.SuggestedEdits AS SE
CROSS APPLY
(
VALUES (CONVERT(datetime, CONVERT(date, SE.ApprovalDate)))
) AS G (ApprovalDate)
WHERE
SE.OwnerUserId = @UserId
AND SE.ApprovalDate <= @UntilDate
AND SE.RejectionDate IS NULL
GROUP BY
G.ApprovalDate;


Using CROSS APPLY with VALUES allows us to avoid repeating the datetime to date conversion in the SELECT list and GROUP BY clause. The converted value is put back in a datetime type for consistency with the other SEDE-derived tables.



It is generally best practice to pay careful attention to data types, avoiding implicit conversions. This has myriad benefits, not least for cardinality estimation and join performance & optimizations.



A possible improvement to the code here would be to use the date type everywhere only the date portion is needed.



load suggested edit data



The parallel scan of the SuggestedEdits table is the best we can do here, without better indexing in SEDE.



Reputation Per Day



This table largely replaces the original r2d subquery, computing reputation totals per day:



CREATE TABLE #DayTotals
(
CreationDate datetime PRIMARY KEY,
ReputationFromVotes integer NOT NULL,
ReputationFromBounties integer NOT NULL,
ReputationFromAccepts integer NOT NULL,
ReputationFromSuggestedEdits integer NOT NULL
);




INSERT #DayTotals WITH (TABLOCK)
(
CreationDate,
ReputationFromVotes,
ReputationFromBounties,
ReputationFromAccepts,
ReputationFromSuggestedEdits
)
SELECT
D.CreationDate,
ReputationFromVotes =
ISNULL
(
SUM
(
CASE
WHEN D.PostTypeId = 1 AND D.VoteTypeId = 2 THEN 5
WHEN D.PostTypeId = 2 AND D.VoteTypeId = 2 THEN 10
WHEN D.VoteTypeId = 3 THEN -2
ELSE 0
END
), 0
),
ReputationFromBounties =
ISNULL(SUM(CASE WHEN D.VoteTypeId = 9 THEN D.BountyAmount ELSE 0 END), 0),
ReputationFromAccepts =
ISNULL(SUM(CASE WHEN D.VoteTypeId = 1 THEN 15 ELSE 0 END), 0),
ReputationFromSuggestedEdits = ISNULL(SUM(E.Reputation), 0)
FROM #Data AS D
LEFT JOIN #Edits AS E
ON E.ApprovalDate = D.CreationDate
GROUP BY
D.CreationDate;


The execution plan shows efficient parallel processing using only the relatively small temporary tables:



day totals



Other reputation totals



These are single values, stored in variables for convenience:



DECLARE 
@RepBountiesGiven integer =
(
SELECT
ISNULL(SUM(-D.BountyAmount), 0)
FROM #Data AS D
WHERE
D.VoteTypeId = 8 -- BountyStart
),
@RepFromAcceptingAnswers integer =
(
SELECT
2 * COUNT(*) -- 2 rep per answer accepted
FROM #Posts AS P
WHERE
P.PostTypeId = 1
AND P.AcceptedAnswerId IS NOT NULL
),
@ReputationFromRepCap integer =
(
SELECT
SUM
(
CASE
WHEN ReputationFromVotes + ReputationFromSuggestedEdits <= 200
THEN ReputationFromVotes + ReputationFromSuggestedEdits
ELSE 200
END
)
FROM #DayTotals
);


variable totals



Final query



Now we have all the elements we need to produce the final output:



SELECT
TotalReputation =
@ReputationFromRepCap +
SUM(DT.ReputationFromBounties) +
@RepBountiesGiven +
@RepFromAcceptingAnswers +
SUM(DT.ReputationFromAccepts),
ReputationFromRepCap = @ReputationFromRepCap,
ReputationFromBounties = SUM(DT.ReputationFromBounties),
ReputationGivenAsBounties = @RepBountiesGiven,
ReputationFromAcceptingAnswers = @RepFromAcceptingAnswers,
ReputationFromAcceptedAnswers = SUM(DT.ReputationFromAccepts)
FROM #DayTotals AS DT;


final query



Try the complete script at: SEDE Demo






share|improve this answer



















  • 2




    This is at least 5x faster then mine for the test cases I tried. Impressive!
    – Magisch
    10 hours ago

















up vote
4
down vote













This is a partial answer as these were the bits I could verify easily.



In your column query for ReputationFromSuggestedEdits you can move all of the projection to the where clause of that query and then use a count(*) and a multiplication to get you final result. It is key to know here that ApprovalDate contains a full datetime where as votes.creationdate is sanatized on a date only. To compensate for that I use the between operator and calculate the date of the next day with dateadd function. An alternative could be to convert approvaldate to a date (with convert(date,approvaldate)) but that guarantees you'll never have the benefit of an index on approvaldate. Unfortunately SEDE doesn't have lots of indexes on non-key fields so the point is moot here.



My additions are lower-case so you can more easily spot them. I'm impartial if you use upper or lowercase. Use a convetion that suits you.



With above mentioned changes that part of your query would look like this:



COALESCE((SELECT count(*) * 2
FROM SuggestedEdits AS se
WHERE se.OwnerUserId = ##UserId##
and se.Approvaldate between v.creationdate
and dateadd(d,1,v.creationdate)
), 0) AS ReputationFromSuggestedEdits


If I compare the two Execution Plans I see I've eliminated an Filter step.



Your outer query is basically



SELECT 
-- aggregate columns for a votettypeid

FROM Posts AS p
INNER JOIN Votes AS v ON v.PostId = p.Id

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##

GROUP BY v.CreationDate


You can limit the number of rows in both Posts and Votes to add for which posttype or votetype you're selecting rows



WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##
and p.posttypeid in (1,2)
and v.votetypeid in (1,2,3,9)


The Votes table is huge but each row that you don't fetch, don't need to be projected and as you do as lot of work in that projection phase every row you don't fetch is a few nano-seconds profit.






share|improve this answer





















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200880%2ffind-how-much-reputation-a-user-had-on-a-given-date%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    5
    down vote













    The main improvements I would suggest are:



    1. Minimize the number of accesses made to the large, minimally indexed, SEDE tables.

    2. Break the query into multiple steps.

    3. Reduce or eliminate code repetition.

    Temporary tables are allowed on SEDE. Use these to fetch the minimal data needed from the large base tables to perform the calculations. Referring to these smaller data sets will be much more efficient than interrogating the base tables multiple times.



    Properly used, temporary tables enable more accurate cardinality estimation, automatic statistics on intermediate results, and provide indexing opportunities - all of which can improve final plan quality and performance.



    Breaking the query up also makes the logic easier to comprehend and debug (initially, and when maintaining it in future). Errors and redundancies are much easier to spot with smaller queries.



    I have not attempted to improve the reputation calculation logic itself, except in a few minor ways, but the following illustrates a possible re-implementation of the provided code using temporary tables:



    Initialization





    DECLARE
    @UserId integer = 22656, -- Jon Skeet, why not
    @UntilDate datetime = CURRENT_TIMESTAMP;

    DROP TABLE IF EXISTS
    #Posts, #Data, #Edits;


    Posts data



    Just the columns we are going to need from the Posts table:



    CREATE TABLE #Posts
    (
    Id integer NOT NULL,
    AcceptedAnswerId integer NULL,
    CommunityOwnedDate datetime NULL,
    PostTypeId tinyint NOT NULL
    );


    Load the minimal number of rows needed:



    INSERT #Posts
    (
    Id,
    AcceptedAnswerId,
    CommunityOwnedDate,
    PostTypeId
    )
    SELECT
    P.Id,
    P.AcceptedAnswerId,
    P.CommunityOwnedDate,
    P.PostTypeId
    FROM dbo.Posts AS P
    WHERE
    P.OwnerUserId = @UserId
    AND P.CreationDate <= @UntilDate
    AND
    (
    P.CommunityOwnedDate IS NULL
    OR P.CommunityOwnedDate <= @UntilDate
    );


    This applies the common predicates in the original query as early as possible, including setting a boundary of the values of CommunityOwnedDate that might impact the reputation calculation up to the desired date. Other filtering could be added here, for example to restrict the PostTypeId values to just those of interest.



    The execution plan (for Jon Skeet) is:



    load posts



    This plan is relatively efficient, being primarily based on a seek to the OwnerUserId specified. The Merge Interval subtree is concerned with finding the range of CommunityOwnedDate values used as a secondary seek predicate.



    The Key Lookup is sadly unavoidable, given the SEDE view dbo.Posts wrapping the underlying table dbo.PostsWithDeleted (filtering on DeletionDate IS NULL without a supporting index). Nevertheless, it does mean we can filter CreationDate in the same lookup without adding significant cost.



    Adding votes data



    The next step adds the columns we need from the voting data associated with the qualifying posts. This is performed as a second step rather than joining all in one step because the #Posts table provides useful accurate cardinality and statistical information. A join query is quite likely to mis-estimate, resulting in an inappropriate plan selection or hash spill, for example.



    CREATE TABLE #Data
    (
    AcceptedAnswerId integer NULL,
    CommunityOwnedDate datetime NULL,
    PostTypeId tinyint NOT NULL,
    BountyAmount integer NULL,
    VoteTypeId tinyint NOT NULL,
    CreationDate datetime NOT NULL
    )
    WITH (DATA_COMPRESSION = ROW);


    Row compression is not necessarily super-useful here, but it does illustrate the extra flexibility of using discrete result sets.



    INSERT #Data WITH (TABLOCK)
    (
    AcceptedAnswerId,
    CommunityOwnedDate,
    PostTypeId,
    BountyAmount,
    VoteTypeId,
    CreationDate
    )
    SELECT
    P.AcceptedAnswerId,
    P.CommunityOwnedDate,
    P.PostTypeId,
    V.BountyAmount,
    V.VoteTypeId,
    V.CreationDate
    FROM #Posts AS P
    JOIN dbo.Votes AS V
    ON V.PostId = P.Id
    WHERE
    V.CreationDate <= @UntilDate
    AND
    (
    P.CommunityOwnedDate IS NULL
    OR P.CommunityOwnedDate > V.CreationDate
    );


    The query above again applies predicates as early as possible, and gives us a combined view of the posts and votes that we can interrogate in multiple ways later on at low cost (certainly much cheaper than accessing the base tables again).



    load post and vote data



    The execution plan features accurate cardinality estimates, appropriate use of parallelism, and a hashtable build-side bitmap filter used to perform semi-join reduction on the Votes table scan. This removes rows that cannot possibly join even before they are surfaced from the storage engine to the query processor (as indicated by the INROW attribute).



    Suggested Edit Data



    We only need the date (no time component) and total reputation earned for suggested edits:



    CREATE TABLE #Edits
    (
    ApprovalDate datetime PRIMARY KEY,
    Reputation integer NOT NULL
    );




    INSERT #Edits
    (
    ApprovalDate,
    Reputation
    )
    SELECT
    G.ApprovalDate,
    EditRep = 2 * COUNT(*) -- 2 rep per approved suggested edit
    FROM dbo.SuggestedEdits AS SE
    CROSS APPLY
    (
    VALUES (CONVERT(datetime, CONVERT(date, SE.ApprovalDate)))
    ) AS G (ApprovalDate)
    WHERE
    SE.OwnerUserId = @UserId
    AND SE.ApprovalDate <= @UntilDate
    AND SE.RejectionDate IS NULL
    GROUP BY
    G.ApprovalDate;


    Using CROSS APPLY with VALUES allows us to avoid repeating the datetime to date conversion in the SELECT list and GROUP BY clause. The converted value is put back in a datetime type for consistency with the other SEDE-derived tables.



    It is generally best practice to pay careful attention to data types, avoiding implicit conversions. This has myriad benefits, not least for cardinality estimation and join performance & optimizations.



    A possible improvement to the code here would be to use the date type everywhere only the date portion is needed.



    load suggested edit data



    The parallel scan of the SuggestedEdits table is the best we can do here, without better indexing in SEDE.



    Reputation Per Day



    This table largely replaces the original r2d subquery, computing reputation totals per day:



    CREATE TABLE #DayTotals
    (
    CreationDate datetime PRIMARY KEY,
    ReputationFromVotes integer NOT NULL,
    ReputationFromBounties integer NOT NULL,
    ReputationFromAccepts integer NOT NULL,
    ReputationFromSuggestedEdits integer NOT NULL
    );




    INSERT #DayTotals WITH (TABLOCK)
    (
    CreationDate,
    ReputationFromVotes,
    ReputationFromBounties,
    ReputationFromAccepts,
    ReputationFromSuggestedEdits
    )
    SELECT
    D.CreationDate,
    ReputationFromVotes =
    ISNULL
    (
    SUM
    (
    CASE
    WHEN D.PostTypeId = 1 AND D.VoteTypeId = 2 THEN 5
    WHEN D.PostTypeId = 2 AND D.VoteTypeId = 2 THEN 10
    WHEN D.VoteTypeId = 3 THEN -2
    ELSE 0
    END
    ), 0
    ),
    ReputationFromBounties =
    ISNULL(SUM(CASE WHEN D.VoteTypeId = 9 THEN D.BountyAmount ELSE 0 END), 0),
    ReputationFromAccepts =
    ISNULL(SUM(CASE WHEN D.VoteTypeId = 1 THEN 15 ELSE 0 END), 0),
    ReputationFromSuggestedEdits = ISNULL(SUM(E.Reputation), 0)
    FROM #Data AS D
    LEFT JOIN #Edits AS E
    ON E.ApprovalDate = D.CreationDate
    GROUP BY
    D.CreationDate;


    The execution plan shows efficient parallel processing using only the relatively small temporary tables:



    day totals



    Other reputation totals



    These are single values, stored in variables for convenience:



    DECLARE 
    @RepBountiesGiven integer =
    (
    SELECT
    ISNULL(SUM(-D.BountyAmount), 0)
    FROM #Data AS D
    WHERE
    D.VoteTypeId = 8 -- BountyStart
    ),
    @RepFromAcceptingAnswers integer =
    (
    SELECT
    2 * COUNT(*) -- 2 rep per answer accepted
    FROM #Posts AS P
    WHERE
    P.PostTypeId = 1
    AND P.AcceptedAnswerId IS NOT NULL
    ),
    @ReputationFromRepCap integer =
    (
    SELECT
    SUM
    (
    CASE
    WHEN ReputationFromVotes + ReputationFromSuggestedEdits <= 200
    THEN ReputationFromVotes + ReputationFromSuggestedEdits
    ELSE 200
    END
    )
    FROM #DayTotals
    );


    variable totals



    Final query



    Now we have all the elements we need to produce the final output:



    SELECT
    TotalReputation =
    @ReputationFromRepCap +
    SUM(DT.ReputationFromBounties) +
    @RepBountiesGiven +
    @RepFromAcceptingAnswers +
    SUM(DT.ReputationFromAccepts),
    ReputationFromRepCap = @ReputationFromRepCap,
    ReputationFromBounties = SUM(DT.ReputationFromBounties),
    ReputationGivenAsBounties = @RepBountiesGiven,
    ReputationFromAcceptingAnswers = @RepFromAcceptingAnswers,
    ReputationFromAcceptedAnswers = SUM(DT.ReputationFromAccepts)
    FROM #DayTotals AS DT;


    final query



    Try the complete script at: SEDE Demo






    share|improve this answer



















    • 2




      This is at least 5x faster then mine for the test cases I tried. Impressive!
      – Magisch
      10 hours ago














    up vote
    5
    down vote













    The main improvements I would suggest are:



    1. Minimize the number of accesses made to the large, minimally indexed, SEDE tables.

    2. Break the query into multiple steps.

    3. Reduce or eliminate code repetition.

    Temporary tables are allowed on SEDE. Use these to fetch the minimal data needed from the large base tables to perform the calculations. Referring to these smaller data sets will be much more efficient than interrogating the base tables multiple times.



    Properly used, temporary tables enable more accurate cardinality estimation, automatic statistics on intermediate results, and provide indexing opportunities - all of which can improve final plan quality and performance.



    Breaking the query up also makes the logic easier to comprehend and debug (initially, and when maintaining it in future). Errors and redundancies are much easier to spot with smaller queries.



    I have not attempted to improve the reputation calculation logic itself, except in a few minor ways, but the following illustrates a possible re-implementation of the provided code using temporary tables:



    Initialization





    DECLARE
    @UserId integer = 22656, -- Jon Skeet, why not
    @UntilDate datetime = CURRENT_TIMESTAMP;

    DROP TABLE IF EXISTS
    #Posts, #Data, #Edits;


    Posts data



    Just the columns we are going to need from the Posts table:



    CREATE TABLE #Posts
    (
    Id integer NOT NULL,
    AcceptedAnswerId integer NULL,
    CommunityOwnedDate datetime NULL,
    PostTypeId tinyint NOT NULL
    );


    Load the minimal number of rows needed:



    INSERT #Posts
    (
    Id,
    AcceptedAnswerId,
    CommunityOwnedDate,
    PostTypeId
    )
    SELECT
    P.Id,
    P.AcceptedAnswerId,
    P.CommunityOwnedDate,
    P.PostTypeId
    FROM dbo.Posts AS P
    WHERE
    P.OwnerUserId = @UserId
    AND P.CreationDate <= @UntilDate
    AND
    (
    P.CommunityOwnedDate IS NULL
    OR P.CommunityOwnedDate <= @UntilDate
    );


    This applies the common predicates in the original query as early as possible, including setting a boundary of the values of CommunityOwnedDate that might impact the reputation calculation up to the desired date. Other filtering could be added here, for example to restrict the PostTypeId values to just those of interest.



    The execution plan (for Jon Skeet) is:



    load posts



    This plan is relatively efficient, being primarily based on a seek to the OwnerUserId specified. The Merge Interval subtree is concerned with finding the range of CommunityOwnedDate values used as a secondary seek predicate.



    The Key Lookup is sadly unavoidable, given the SEDE view dbo.Posts wrapping the underlying table dbo.PostsWithDeleted (filtering on DeletionDate IS NULL without a supporting index). Nevertheless, it does mean we can filter CreationDate in the same lookup without adding significant cost.



    Adding votes data



    The next step adds the columns we need from the voting data associated with the qualifying posts. This is performed as a second step rather than joining all in one step because the #Posts table provides useful accurate cardinality and statistical information. A join query is quite likely to mis-estimate, resulting in an inappropriate plan selection or hash spill, for example.



    CREATE TABLE #Data
    (
    AcceptedAnswerId integer NULL,
    CommunityOwnedDate datetime NULL,
    PostTypeId tinyint NOT NULL,
    BountyAmount integer NULL,
    VoteTypeId tinyint NOT NULL,
    CreationDate datetime NOT NULL
    )
    WITH (DATA_COMPRESSION = ROW);


    Row compression is not necessarily super-useful here, but it does illustrate the extra flexibility of using discrete result sets.



    INSERT #Data WITH (TABLOCK)
    (
    AcceptedAnswerId,
    CommunityOwnedDate,
    PostTypeId,
    BountyAmount,
    VoteTypeId,
    CreationDate
    )
    SELECT
    P.AcceptedAnswerId,
    P.CommunityOwnedDate,
    P.PostTypeId,
    V.BountyAmount,
    V.VoteTypeId,
    V.CreationDate
    FROM #Posts AS P
    JOIN dbo.Votes AS V
    ON V.PostId = P.Id
    WHERE
    V.CreationDate <= @UntilDate
    AND
    (
    P.CommunityOwnedDate IS NULL
    OR P.CommunityOwnedDate > V.CreationDate
    );


    The query above again applies predicates as early as possible, and gives us a combined view of the posts and votes that we can interrogate in multiple ways later on at low cost (certainly much cheaper than accessing the base tables again).



    load post and vote data



    The execution plan features accurate cardinality estimates, appropriate use of parallelism, and a hashtable build-side bitmap filter used to perform semi-join reduction on the Votes table scan. This removes rows that cannot possibly join even before they are surfaced from the storage engine to the query processor (as indicated by the INROW attribute).



    Suggested Edit Data



    We only need the date (no time component) and total reputation earned for suggested edits:



    CREATE TABLE #Edits
    (
    ApprovalDate datetime PRIMARY KEY,
    Reputation integer NOT NULL
    );




    INSERT #Edits
    (
    ApprovalDate,
    Reputation
    )
    SELECT
    G.ApprovalDate,
    EditRep = 2 * COUNT(*) -- 2 rep per approved suggested edit
    FROM dbo.SuggestedEdits AS SE
    CROSS APPLY
    (
    VALUES (CONVERT(datetime, CONVERT(date, SE.ApprovalDate)))
    ) AS G (ApprovalDate)
    WHERE
    SE.OwnerUserId = @UserId
    AND SE.ApprovalDate <= @UntilDate
    AND SE.RejectionDate IS NULL
    GROUP BY
    G.ApprovalDate;


    Using CROSS APPLY with VALUES allows us to avoid repeating the datetime to date conversion in the SELECT list and GROUP BY clause. The converted value is put back in a datetime type for consistency with the other SEDE-derived tables.



    It is generally best practice to pay careful attention to data types, avoiding implicit conversions. This has myriad benefits, not least for cardinality estimation and join performance & optimizations.



    A possible improvement to the code here would be to use the date type everywhere only the date portion is needed.



    load suggested edit data



    The parallel scan of the SuggestedEdits table is the best we can do here, without better indexing in SEDE.



    Reputation Per Day



    This table largely replaces the original r2d subquery, computing reputation totals per day:



    CREATE TABLE #DayTotals
    (
    CreationDate datetime PRIMARY KEY,
    ReputationFromVotes integer NOT NULL,
    ReputationFromBounties integer NOT NULL,
    ReputationFromAccepts integer NOT NULL,
    ReputationFromSuggestedEdits integer NOT NULL
    );




    INSERT #DayTotals WITH (TABLOCK)
    (
    CreationDate,
    ReputationFromVotes,
    ReputationFromBounties,
    ReputationFromAccepts,
    ReputationFromSuggestedEdits
    )
    SELECT
    D.CreationDate,
    ReputationFromVotes =
    ISNULL
    (
    SUM
    (
    CASE
    WHEN D.PostTypeId = 1 AND D.VoteTypeId = 2 THEN 5
    WHEN D.PostTypeId = 2 AND D.VoteTypeId = 2 THEN 10
    WHEN D.VoteTypeId = 3 THEN -2
    ELSE 0
    END
    ), 0
    ),
    ReputationFromBounties =
    ISNULL(SUM(CASE WHEN D.VoteTypeId = 9 THEN D.BountyAmount ELSE 0 END), 0),
    ReputationFromAccepts =
    ISNULL(SUM(CASE WHEN D.VoteTypeId = 1 THEN 15 ELSE 0 END), 0),
    ReputationFromSuggestedEdits = ISNULL(SUM(E.Reputation), 0)
    FROM #Data AS D
    LEFT JOIN #Edits AS E
    ON E.ApprovalDate = D.CreationDate
    GROUP BY
    D.CreationDate;


    The execution plan shows efficient parallel processing using only the relatively small temporary tables:



    day totals



    Other reputation totals



    These are single values, stored in variables for convenience:



    DECLARE 
    @RepBountiesGiven integer =
    (
    SELECT
    ISNULL(SUM(-D.BountyAmount), 0)
    FROM #Data AS D
    WHERE
    D.VoteTypeId = 8 -- BountyStart
    ),
    @RepFromAcceptingAnswers integer =
    (
    SELECT
    2 * COUNT(*) -- 2 rep per answer accepted
    FROM #Posts AS P
    WHERE
    P.PostTypeId = 1
    AND P.AcceptedAnswerId IS NOT NULL
    ),
    @ReputationFromRepCap integer =
    (
    SELECT
    SUM
    (
    CASE
    WHEN ReputationFromVotes + ReputationFromSuggestedEdits <= 200
    THEN ReputationFromVotes + ReputationFromSuggestedEdits
    ELSE 200
    END
    )
    FROM #DayTotals
    );


    variable totals



    Final query



    Now we have all the elements we need to produce the final output:



    SELECT
    TotalReputation =
    @ReputationFromRepCap +
    SUM(DT.ReputationFromBounties) +
    @RepBountiesGiven +
    @RepFromAcceptingAnswers +
    SUM(DT.ReputationFromAccepts),
    ReputationFromRepCap = @ReputationFromRepCap,
    ReputationFromBounties = SUM(DT.ReputationFromBounties),
    ReputationGivenAsBounties = @RepBountiesGiven,
    ReputationFromAcceptingAnswers = @RepFromAcceptingAnswers,
    ReputationFromAcceptedAnswers = SUM(DT.ReputationFromAccepts)
    FROM #DayTotals AS DT;


    final query



    Try the complete script at: SEDE Demo






    share|improve this answer



















    • 2




      This is at least 5x faster then mine for the test cases I tried. Impressive!
      – Magisch
      10 hours ago












    up vote
    5
    down vote










    up vote
    5
    down vote









    The main improvements I would suggest are:



    1. Minimize the number of accesses made to the large, minimally indexed, SEDE tables.

    2. Break the query into multiple steps.

    3. Reduce or eliminate code repetition.

    Temporary tables are allowed on SEDE. Use these to fetch the minimal data needed from the large base tables to perform the calculations. Referring to these smaller data sets will be much more efficient than interrogating the base tables multiple times.



    Properly used, temporary tables enable more accurate cardinality estimation, automatic statistics on intermediate results, and provide indexing opportunities - all of which can improve final plan quality and performance.



    Breaking the query up also makes the logic easier to comprehend and debug (initially, and when maintaining it in future). Errors and redundancies are much easier to spot with smaller queries.



    I have not attempted to improve the reputation calculation logic itself, except in a few minor ways, but the following illustrates a possible re-implementation of the provided code using temporary tables:



    Initialization





    DECLARE
    @UserId integer = 22656, -- Jon Skeet, why not
    @UntilDate datetime = CURRENT_TIMESTAMP;

    DROP TABLE IF EXISTS
    #Posts, #Data, #Edits;


    Posts data



    Just the columns we are going to need from the Posts table:



    CREATE TABLE #Posts
    (
    Id integer NOT NULL,
    AcceptedAnswerId integer NULL,
    CommunityOwnedDate datetime NULL,
    PostTypeId tinyint NOT NULL
    );


    Load the minimal number of rows needed:



    INSERT #Posts
    (
    Id,
    AcceptedAnswerId,
    CommunityOwnedDate,
    PostTypeId
    )
    SELECT
    P.Id,
    P.AcceptedAnswerId,
    P.CommunityOwnedDate,
    P.PostTypeId
    FROM dbo.Posts AS P
    WHERE
    P.OwnerUserId = @UserId
    AND P.CreationDate <= @UntilDate
    AND
    (
    P.CommunityOwnedDate IS NULL
    OR P.CommunityOwnedDate <= @UntilDate
    );


    This applies the common predicates in the original query as early as possible, including setting a boundary of the values of CommunityOwnedDate that might impact the reputation calculation up to the desired date. Other filtering could be added here, for example to restrict the PostTypeId values to just those of interest.



    The execution plan (for Jon Skeet) is:



    load posts



    This plan is relatively efficient, being primarily based on a seek to the OwnerUserId specified. The Merge Interval subtree is concerned with finding the range of CommunityOwnedDate values used as a secondary seek predicate.



    The Key Lookup is sadly unavoidable, given the SEDE view dbo.Posts wrapping the underlying table dbo.PostsWithDeleted (filtering on DeletionDate IS NULL without a supporting index). Nevertheless, it does mean we can filter CreationDate in the same lookup without adding significant cost.



    Adding votes data



    The next step adds the columns we need from the voting data associated with the qualifying posts. This is performed as a second step rather than joining all in one step because the #Posts table provides useful accurate cardinality and statistical information. A join query is quite likely to mis-estimate, resulting in an inappropriate plan selection or hash spill, for example.



    CREATE TABLE #Data
    (
    AcceptedAnswerId integer NULL,
    CommunityOwnedDate datetime NULL,
    PostTypeId tinyint NOT NULL,
    BountyAmount integer NULL,
    VoteTypeId tinyint NOT NULL,
    CreationDate datetime NOT NULL
    )
    WITH (DATA_COMPRESSION = ROW);


    Row compression is not necessarily super-useful here, but it does illustrate the extra flexibility of using discrete result sets.



    INSERT #Data WITH (TABLOCK)
    (
    AcceptedAnswerId,
    CommunityOwnedDate,
    PostTypeId,
    BountyAmount,
    VoteTypeId,
    CreationDate
    )
    SELECT
    P.AcceptedAnswerId,
    P.CommunityOwnedDate,
    P.PostTypeId,
    V.BountyAmount,
    V.VoteTypeId,
    V.CreationDate
    FROM #Posts AS P
    JOIN dbo.Votes AS V
    ON V.PostId = P.Id
    WHERE
    V.CreationDate <= @UntilDate
    AND
    (
    P.CommunityOwnedDate IS NULL
    OR P.CommunityOwnedDate > V.CreationDate
    );


    The query above again applies predicates as early as possible, and gives us a combined view of the posts and votes that we can interrogate in multiple ways later on at low cost (certainly much cheaper than accessing the base tables again).



    load post and vote data



    The execution plan features accurate cardinality estimates, appropriate use of parallelism, and a hashtable build-side bitmap filter used to perform semi-join reduction on the Votes table scan. This removes rows that cannot possibly join even before they are surfaced from the storage engine to the query processor (as indicated by the INROW attribute).



    Suggested Edit Data



    We only need the date (no time component) and total reputation earned for suggested edits:



    CREATE TABLE #Edits
    (
    ApprovalDate datetime PRIMARY KEY,
    Reputation integer NOT NULL
    );




    INSERT #Edits
    (
    ApprovalDate,
    Reputation
    )
    SELECT
    G.ApprovalDate,
    EditRep = 2 * COUNT(*) -- 2 rep per approved suggested edit
    FROM dbo.SuggestedEdits AS SE
    CROSS APPLY
    (
    VALUES (CONVERT(datetime, CONVERT(date, SE.ApprovalDate)))
    ) AS G (ApprovalDate)
    WHERE
    SE.OwnerUserId = @UserId
    AND SE.ApprovalDate <= @UntilDate
    AND SE.RejectionDate IS NULL
    GROUP BY
    G.ApprovalDate;


    Using CROSS APPLY with VALUES allows us to avoid repeating the datetime to date conversion in the SELECT list and GROUP BY clause. The converted value is put back in a datetime type for consistency with the other SEDE-derived tables.



    It is generally best practice to pay careful attention to data types, avoiding implicit conversions. This has myriad benefits, not least for cardinality estimation and join performance & optimizations.



    A possible improvement to the code here would be to use the date type everywhere only the date portion is needed.



    load suggested edit data



    The parallel scan of the SuggestedEdits table is the best we can do here, without better indexing in SEDE.



    Reputation Per Day



    This table largely replaces the original r2d subquery, computing reputation totals per day:



    CREATE TABLE #DayTotals
    (
    CreationDate datetime PRIMARY KEY,
    ReputationFromVotes integer NOT NULL,
    ReputationFromBounties integer NOT NULL,
    ReputationFromAccepts integer NOT NULL,
    ReputationFromSuggestedEdits integer NOT NULL
    );




    INSERT #DayTotals WITH (TABLOCK)
    (
    CreationDate,
    ReputationFromVotes,
    ReputationFromBounties,
    ReputationFromAccepts,
    ReputationFromSuggestedEdits
    )
    SELECT
    D.CreationDate,
    ReputationFromVotes =
    ISNULL
    (
    SUM
    (
    CASE
    WHEN D.PostTypeId = 1 AND D.VoteTypeId = 2 THEN 5
    WHEN D.PostTypeId = 2 AND D.VoteTypeId = 2 THEN 10
    WHEN D.VoteTypeId = 3 THEN -2
    ELSE 0
    END
    ), 0
    ),
    ReputationFromBounties =
    ISNULL(SUM(CASE WHEN D.VoteTypeId = 9 THEN D.BountyAmount ELSE 0 END), 0),
    ReputationFromAccepts =
    ISNULL(SUM(CASE WHEN D.VoteTypeId = 1 THEN 15 ELSE 0 END), 0),
    ReputationFromSuggestedEdits = ISNULL(SUM(E.Reputation), 0)
    FROM #Data AS D
    LEFT JOIN #Edits AS E
    ON E.ApprovalDate = D.CreationDate
    GROUP BY
    D.CreationDate;


    The execution plan shows efficient parallel processing using only the relatively small temporary tables:



    day totals



    Other reputation totals



    These are single values, stored in variables for convenience:



    DECLARE 
    @RepBountiesGiven integer =
    (
    SELECT
    ISNULL(SUM(-D.BountyAmount), 0)
    FROM #Data AS D
    WHERE
    D.VoteTypeId = 8 -- BountyStart
    ),
    @RepFromAcceptingAnswers integer =
    (
    SELECT
    2 * COUNT(*) -- 2 rep per answer accepted
    FROM #Posts AS P
    WHERE
    P.PostTypeId = 1
    AND P.AcceptedAnswerId IS NOT NULL
    ),
    @ReputationFromRepCap integer =
    (
    SELECT
    SUM
    (
    CASE
    WHEN ReputationFromVotes + ReputationFromSuggestedEdits <= 200
    THEN ReputationFromVotes + ReputationFromSuggestedEdits
    ELSE 200
    END
    )
    FROM #DayTotals
    );


    variable totals



    Final query



    Now we have all the elements we need to produce the final output:



    SELECT
    TotalReputation =
    @ReputationFromRepCap +
    SUM(DT.ReputationFromBounties) +
    @RepBountiesGiven +
    @RepFromAcceptingAnswers +
    SUM(DT.ReputationFromAccepts),
    ReputationFromRepCap = @ReputationFromRepCap,
    ReputationFromBounties = SUM(DT.ReputationFromBounties),
    ReputationGivenAsBounties = @RepBountiesGiven,
    ReputationFromAcceptingAnswers = @RepFromAcceptingAnswers,
    ReputationFromAcceptedAnswers = SUM(DT.ReputationFromAccepts)
    FROM #DayTotals AS DT;


    final query



    Try the complete script at: SEDE Demo






    share|improve this answer















    The main improvements I would suggest are:



    1. Minimize the number of accesses made to the large, minimally indexed, SEDE tables.

    2. Break the query into multiple steps.

    3. Reduce or eliminate code repetition.

    Temporary tables are allowed on SEDE. Use these to fetch the minimal data needed from the large base tables to perform the calculations. Referring to these smaller data sets will be much more efficient than interrogating the base tables multiple times.



    Properly used, temporary tables enable more accurate cardinality estimation, automatic statistics on intermediate results, and provide indexing opportunities - all of which can improve final plan quality and performance.



    Breaking the query up also makes the logic easier to comprehend and debug (initially, and when maintaining it in future). Errors and redundancies are much easier to spot with smaller queries.



    I have not attempted to improve the reputation calculation logic itself, except in a few minor ways, but the following illustrates a possible re-implementation of the provided code using temporary tables:



    Initialization





    DECLARE
    @UserId integer = 22656, -- Jon Skeet, why not
    @UntilDate datetime = CURRENT_TIMESTAMP;

    DROP TABLE IF EXISTS
    #Posts, #Data, #Edits;


    Posts data



    Just the columns we are going to need from the Posts table:



    CREATE TABLE #Posts
    (
    Id integer NOT NULL,
    AcceptedAnswerId integer NULL,
    CommunityOwnedDate datetime NULL,
    PostTypeId tinyint NOT NULL
    );


    Load the minimal number of rows needed:



    INSERT #Posts
    (
    Id,
    AcceptedAnswerId,
    CommunityOwnedDate,
    PostTypeId
    )
    SELECT
    P.Id,
    P.AcceptedAnswerId,
    P.CommunityOwnedDate,
    P.PostTypeId
    FROM dbo.Posts AS P
    WHERE
    P.OwnerUserId = @UserId
    AND P.CreationDate <= @UntilDate
    AND
    (
    P.CommunityOwnedDate IS NULL
    OR P.CommunityOwnedDate <= @UntilDate
    );


    This applies the common predicates in the original query as early as possible, including setting a boundary of the values of CommunityOwnedDate that might impact the reputation calculation up to the desired date. Other filtering could be added here, for example to restrict the PostTypeId values to just those of interest.



    The execution plan (for Jon Skeet) is:



    load posts



    This plan is relatively efficient, being primarily based on a seek to the OwnerUserId specified. The Merge Interval subtree is concerned with finding the range of CommunityOwnedDate values used as a secondary seek predicate.



    The Key Lookup is sadly unavoidable, given the SEDE view dbo.Posts wrapping the underlying table dbo.PostsWithDeleted (filtering on DeletionDate IS NULL without a supporting index). Nevertheless, it does mean we can filter CreationDate in the same lookup without adding significant cost.



    Adding votes data



    The next step adds the columns we need from the voting data associated with the qualifying posts. This is performed as a second step rather than joining all in one step because the #Posts table provides useful accurate cardinality and statistical information. A join query is quite likely to mis-estimate, resulting in an inappropriate plan selection or hash spill, for example.



    CREATE TABLE #Data
    (
    AcceptedAnswerId integer NULL,
    CommunityOwnedDate datetime NULL,
    PostTypeId tinyint NOT NULL,
    BountyAmount integer NULL,
    VoteTypeId tinyint NOT NULL,
    CreationDate datetime NOT NULL
    )
    WITH (DATA_COMPRESSION = ROW);


    Row compression is not necessarily super-useful here, but it does illustrate the extra flexibility of using discrete result sets.



    INSERT #Data WITH (TABLOCK)
    (
    AcceptedAnswerId,
    CommunityOwnedDate,
    PostTypeId,
    BountyAmount,
    VoteTypeId,
    CreationDate
    )
    SELECT
    P.AcceptedAnswerId,
    P.CommunityOwnedDate,
    P.PostTypeId,
    V.BountyAmount,
    V.VoteTypeId,
    V.CreationDate
    FROM #Posts AS P
    JOIN dbo.Votes AS V
    ON V.PostId = P.Id
    WHERE
    V.CreationDate <= @UntilDate
    AND
    (
    P.CommunityOwnedDate IS NULL
    OR P.CommunityOwnedDate > V.CreationDate
    );


    The query above again applies predicates as early as possible, and gives us a combined view of the posts and votes that we can interrogate in multiple ways later on at low cost (certainly much cheaper than accessing the base tables again).



    load post and vote data



    The execution plan features accurate cardinality estimates, appropriate use of parallelism, and a hashtable build-side bitmap filter used to perform semi-join reduction on the Votes table scan. This removes rows that cannot possibly join even before they are surfaced from the storage engine to the query processor (as indicated by the INROW attribute).



    Suggested Edit Data



    We only need the date (no time component) and total reputation earned for suggested edits:



    CREATE TABLE #Edits
    (
    ApprovalDate datetime PRIMARY KEY,
    Reputation integer NOT NULL
    );




    INSERT #Edits
    (
    ApprovalDate,
    Reputation
    )
    SELECT
    G.ApprovalDate,
    EditRep = 2 * COUNT(*) -- 2 rep per approved suggested edit
    FROM dbo.SuggestedEdits AS SE
    CROSS APPLY
    (
    VALUES (CONVERT(datetime, CONVERT(date, SE.ApprovalDate)))
    ) AS G (ApprovalDate)
    WHERE
    SE.OwnerUserId = @UserId
    AND SE.ApprovalDate <= @UntilDate
    AND SE.RejectionDate IS NULL
    GROUP BY
    G.ApprovalDate;


    Using CROSS APPLY with VALUES allows us to avoid repeating the datetime to date conversion in the SELECT list and GROUP BY clause. The converted value is put back in a datetime type for consistency with the other SEDE-derived tables.



    It is generally best practice to pay careful attention to data types, avoiding implicit conversions. This has myriad benefits, not least for cardinality estimation and join performance & optimizations.



    A possible improvement to the code here would be to use the date type everywhere only the date portion is needed.



    load suggested edit data



    The parallel scan of the SuggestedEdits table is the best we can do here, without better indexing in SEDE.



    Reputation Per Day



    This table largely replaces the original r2d subquery, computing reputation totals per day:



    CREATE TABLE #DayTotals
    (
    CreationDate datetime PRIMARY KEY,
    ReputationFromVotes integer NOT NULL,
    ReputationFromBounties integer NOT NULL,
    ReputationFromAccepts integer NOT NULL,
    ReputationFromSuggestedEdits integer NOT NULL
    );




    INSERT #DayTotals WITH (TABLOCK)
    (
    CreationDate,
    ReputationFromVotes,
    ReputationFromBounties,
    ReputationFromAccepts,
    ReputationFromSuggestedEdits
    )
    SELECT
    D.CreationDate,
    ReputationFromVotes =
    ISNULL
    (
    SUM
    (
    CASE
    WHEN D.PostTypeId = 1 AND D.VoteTypeId = 2 THEN 5
    WHEN D.PostTypeId = 2 AND D.VoteTypeId = 2 THEN 10
    WHEN D.VoteTypeId = 3 THEN -2
    ELSE 0
    END
    ), 0
    ),
    ReputationFromBounties =
    ISNULL(SUM(CASE WHEN D.VoteTypeId = 9 THEN D.BountyAmount ELSE 0 END), 0),
    ReputationFromAccepts =
    ISNULL(SUM(CASE WHEN D.VoteTypeId = 1 THEN 15 ELSE 0 END), 0),
    ReputationFromSuggestedEdits = ISNULL(SUM(E.Reputation), 0)
    FROM #Data AS D
    LEFT JOIN #Edits AS E
    ON E.ApprovalDate = D.CreationDate
    GROUP BY
    D.CreationDate;


    The execution plan shows efficient parallel processing using only the relatively small temporary tables:



    day totals



    Other reputation totals



    These are single values, stored in variables for convenience:



    DECLARE 
    @RepBountiesGiven integer =
    (
    SELECT
    ISNULL(SUM(-D.BountyAmount), 0)
    FROM #Data AS D
    WHERE
    D.VoteTypeId = 8 -- BountyStart
    ),
    @RepFromAcceptingAnswers integer =
    (
    SELECT
    2 * COUNT(*) -- 2 rep per answer accepted
    FROM #Posts AS P
    WHERE
    P.PostTypeId = 1
    AND P.AcceptedAnswerId IS NOT NULL
    ),
    @ReputationFromRepCap integer =
    (
    SELECT
    SUM
    (
    CASE
    WHEN ReputationFromVotes + ReputationFromSuggestedEdits <= 200
    THEN ReputationFromVotes + ReputationFromSuggestedEdits
    ELSE 200
    END
    )
    FROM #DayTotals
    );


    variable totals



    Final query



    Now we have all the elements we need to produce the final output:



    SELECT
    TotalReputation =
    @ReputationFromRepCap +
    SUM(DT.ReputationFromBounties) +
    @RepBountiesGiven +
    @RepFromAcceptingAnswers +
    SUM(DT.ReputationFromAccepts),
    ReputationFromRepCap = @ReputationFromRepCap,
    ReputationFromBounties = SUM(DT.ReputationFromBounties),
    ReputationGivenAsBounties = @RepBountiesGiven,
    ReputationFromAcceptingAnswers = @RepFromAcceptingAnswers,
    ReputationFromAcceptedAnswers = SUM(DT.ReputationFromAccepts)
    FROM #DayTotals AS DT;


    final query



    Try the complete script at: SEDE Demo







    share|improve this answer















    share|improve this answer



    share|improve this answer








    edited 6 hours ago


























    answered 22 hours ago









    Paul White

    1515




    1515







    • 2




      This is at least 5x faster then mine for the test cases I tried. Impressive!
      – Magisch
      10 hours ago












    • 2




      This is at least 5x faster then mine for the test cases I tried. Impressive!
      – Magisch
      10 hours ago







    2




    2




    This is at least 5x faster then mine for the test cases I tried. Impressive!
    – Magisch
    10 hours ago




    This is at least 5x faster then mine for the test cases I tried. Impressive!
    – Magisch
    10 hours ago












    up vote
    4
    down vote













    This is a partial answer as these were the bits I could verify easily.



    In your column query for ReputationFromSuggestedEdits you can move all of the projection to the where clause of that query and then use a count(*) and a multiplication to get you final result. It is key to know here that ApprovalDate contains a full datetime where as votes.creationdate is sanatized on a date only. To compensate for that I use the between operator and calculate the date of the next day with dateadd function. An alternative could be to convert approvaldate to a date (with convert(date,approvaldate)) but that guarantees you'll never have the benefit of an index on approvaldate. Unfortunately SEDE doesn't have lots of indexes on non-key fields so the point is moot here.



    My additions are lower-case so you can more easily spot them. I'm impartial if you use upper or lowercase. Use a convetion that suits you.



    With above mentioned changes that part of your query would look like this:



    COALESCE((SELECT count(*) * 2
    FROM SuggestedEdits AS se
    WHERE se.OwnerUserId = ##UserId##
    and se.Approvaldate between v.creationdate
    and dateadd(d,1,v.creationdate)
    ), 0) AS ReputationFromSuggestedEdits


    If I compare the two Execution Plans I see I've eliminated an Filter step.



    Your outer query is basically



    SELECT 
    -- aggregate columns for a votettypeid

    FROM Posts AS p
    INNER JOIN Votes AS v ON v.PostId = p.Id

    WHERE p.OwnerUserId = ##UserId:int##
    AND v.CreationDate <= ##UntilDate:string##

    GROUP BY v.CreationDate


    You can limit the number of rows in both Posts and Votes to add for which posttype or votetype you're selecting rows



    WHERE p.OwnerUserId = ##UserId:int##
    AND v.CreationDate <= ##UntilDate:string##
    and p.posttypeid in (1,2)
    and v.votetypeid in (1,2,3,9)


    The Votes table is huge but each row that you don't fetch, don't need to be projected and as you do as lot of work in that projection phase every row you don't fetch is a few nano-seconds profit.






    share|improve this answer

























      up vote
      4
      down vote













      This is a partial answer as these were the bits I could verify easily.



      In your column query for ReputationFromSuggestedEdits you can move all of the projection to the where clause of that query and then use a count(*) and a multiplication to get you final result. It is key to know here that ApprovalDate contains a full datetime where as votes.creationdate is sanatized on a date only. To compensate for that I use the between operator and calculate the date of the next day with dateadd function. An alternative could be to convert approvaldate to a date (with convert(date,approvaldate)) but that guarantees you'll never have the benefit of an index on approvaldate. Unfortunately SEDE doesn't have lots of indexes on non-key fields so the point is moot here.



      My additions are lower-case so you can more easily spot them. I'm impartial if you use upper or lowercase. Use a convetion that suits you.



      With above mentioned changes that part of your query would look like this:



      COALESCE((SELECT count(*) * 2
      FROM SuggestedEdits AS se
      WHERE se.OwnerUserId = ##UserId##
      and se.Approvaldate between v.creationdate
      and dateadd(d,1,v.creationdate)
      ), 0) AS ReputationFromSuggestedEdits


      If I compare the two Execution Plans I see I've eliminated an Filter step.



      Your outer query is basically



      SELECT 
      -- aggregate columns for a votettypeid

      FROM Posts AS p
      INNER JOIN Votes AS v ON v.PostId = p.Id

      WHERE p.OwnerUserId = ##UserId:int##
      AND v.CreationDate <= ##UntilDate:string##

      GROUP BY v.CreationDate


      You can limit the number of rows in both Posts and Votes to add for which posttype or votetype you're selecting rows



      WHERE p.OwnerUserId = ##UserId:int##
      AND v.CreationDate <= ##UntilDate:string##
      and p.posttypeid in (1,2)
      and v.votetypeid in (1,2,3,9)


      The Votes table is huge but each row that you don't fetch, don't need to be projected and as you do as lot of work in that projection phase every row you don't fetch is a few nano-seconds profit.






      share|improve this answer























        up vote
        4
        down vote










        up vote
        4
        down vote









        This is a partial answer as these were the bits I could verify easily.



        In your column query for ReputationFromSuggestedEdits you can move all of the projection to the where clause of that query and then use a count(*) and a multiplication to get you final result. It is key to know here that ApprovalDate contains a full datetime where as votes.creationdate is sanatized on a date only. To compensate for that I use the between operator and calculate the date of the next day with dateadd function. An alternative could be to convert approvaldate to a date (with convert(date,approvaldate)) but that guarantees you'll never have the benefit of an index on approvaldate. Unfortunately SEDE doesn't have lots of indexes on non-key fields so the point is moot here.



        My additions are lower-case so you can more easily spot them. I'm impartial if you use upper or lowercase. Use a convetion that suits you.



        With above mentioned changes that part of your query would look like this:



        COALESCE((SELECT count(*) * 2
        FROM SuggestedEdits AS se
        WHERE se.OwnerUserId = ##UserId##
        and se.Approvaldate between v.creationdate
        and dateadd(d,1,v.creationdate)
        ), 0) AS ReputationFromSuggestedEdits


        If I compare the two Execution Plans I see I've eliminated an Filter step.



        Your outer query is basically



        SELECT 
        -- aggregate columns for a votettypeid

        FROM Posts AS p
        INNER JOIN Votes AS v ON v.PostId = p.Id

        WHERE p.OwnerUserId = ##UserId:int##
        AND v.CreationDate <= ##UntilDate:string##

        GROUP BY v.CreationDate


        You can limit the number of rows in both Posts and Votes to add for which posttype or votetype you're selecting rows



        WHERE p.OwnerUserId = ##UserId:int##
        AND v.CreationDate <= ##UntilDate:string##
        and p.posttypeid in (1,2)
        and v.votetypeid in (1,2,3,9)


        The Votes table is huge but each row that you don't fetch, don't need to be projected and as you do as lot of work in that projection phase every row you don't fetch is a few nano-seconds profit.






        share|improve this answer













        This is a partial answer as these were the bits I could verify easily.



        In your column query for ReputationFromSuggestedEdits you can move all of the projection to the where clause of that query and then use a count(*) and a multiplication to get you final result. It is key to know here that ApprovalDate contains a full datetime where as votes.creationdate is sanatized on a date only. To compensate for that I use the between operator and calculate the date of the next day with dateadd function. An alternative could be to convert approvaldate to a date (with convert(date,approvaldate)) but that guarantees you'll never have the benefit of an index on approvaldate. Unfortunately SEDE doesn't have lots of indexes on non-key fields so the point is moot here.



        My additions are lower-case so you can more easily spot them. I'm impartial if you use upper or lowercase. Use a convetion that suits you.



        With above mentioned changes that part of your query would look like this:



        COALESCE((SELECT count(*) * 2
        FROM SuggestedEdits AS se
        WHERE se.OwnerUserId = ##UserId##
        and se.Approvaldate between v.creationdate
        and dateadd(d,1,v.creationdate)
        ), 0) AS ReputationFromSuggestedEdits


        If I compare the two Execution Plans I see I've eliminated an Filter step.



        Your outer query is basically



        SELECT 
        -- aggregate columns for a votettypeid

        FROM Posts AS p
        INNER JOIN Votes AS v ON v.PostId = p.Id

        WHERE p.OwnerUserId = ##UserId:int##
        AND v.CreationDate <= ##UntilDate:string##

        GROUP BY v.CreationDate


        You can limit the number of rows in both Posts and Votes to add for which posttype or votetype you're selecting rows



        WHERE p.OwnerUserId = ##UserId:int##
        AND v.CreationDate <= ##UntilDate:string##
        and p.posttypeid in (1,2)
        and v.votetypeid in (1,2,3,9)


        The Votes table is huge but each row that you don't fetch, don't need to be projected and as you do as lot of work in that projection phase every row you don't fetch is a few nano-seconds profit.







        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered 2 days ago









        rene

        1415




        1415






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200880%2ffind-how-much-reputation-a-user-had-on-a-given-date%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Python Lists

            Aion

            JavaScript Array Iteration Methods