trjhtr

Question

I saw this question on MSE and went ahead and wrote a solution to it:
https://meta.stackexchange.com/questions/313561/determining-users-reputation-as-of-particular-date

This calculates the reputation a user should have had at a certain date. I attempt to factor in everything that SEDE allows me to see. So -1 events from downvoting posts and serial voting reversal as well as the documentation reputation aren't factored in.

I'm curious about what could be improved here

SELECT
-- Total Reputation
(
SUM(CASE WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200 ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits END) 
+ SUM(r2d.ReputationFromBounties) 
+ COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0)
+ COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0)
+ SUM(r2d.ReputationFromAccepts)
) AS TotalReputation,
-- Rep Capped Activities with the Cap Factored in
SUM(
CASE 
WHEN r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits > 200 THEN 200
ELSE r2d.ReputationFromVotes + r2d.ReputationFromSuggestedEdits 
END) AS ReputationFromRepCap,
-- Total Bounties recieved
SUM(r2d.ReputationFromBounties) AS ReputationFromBounties,
-- Total Bounties given
COALESCE((SELECT SUM(v4.BountyAmount * -1) FROM Votes AS v4 WHERE v4.VoteTypeId = 8 AND v4.UserId = ##UserId## AND v4.CreationDate < ##UntilDate:string## ),0) AS ReputationGivenAsBounties,
-- Total Reputation from Accepting Answers
COALESCE((SELECT COUNT(*) * 2 FROM Posts AS p3 WHERE p3.OwnerUserId = ##UserId## AND p3.AcceptedAnswerId IS NOT NULL),0) AS ReputationFromAcceptingAnswers,
-- Total Reputation from Accepted Answers
SUM(r2d.ReputationFromAccepts) AS ReputationFromAcceptedAnswers


FROM

(

SELECT 
v.CreationDate AS VoteDate,

-- Total Reputation from Post Upvotes
-- PostTypeId 1 = Question, 2 = Answer
-- VoteTypeId 2 = Upvote, 3 = Downvote
-- CommunityOwnedDate is when a post was made CW. 
-- Votes before that count, after not.
-- Vote Date is truncated to full days only so grouping works
SUM((CASE 
 WHEN (p.PostTypeId = 1 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 5 
 WHEN (p.PostTypeId = 2 AND v.VoteTypeId = 2 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 10 
 WHEN (v.VoteTypeId = 3 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN -2
 ELSE 0 
END)) AS ReputationFromVotes,

-- Total Reputation from Answer Bounties
-- VoteTypeId 9 = Bounty Close (Bounty Awarded)
-- BountyAmount = Amount of Reputation awarded
SUM(CASE 
 WHEN v.VoteTypeId = 9 THEN v.BountyAmount
 ELSE 0
END) AS ReputationFromBounties,

-- Total Reputation from Answer Accepts
-- VoteTypeId 1 = AcceptedByOriginator (Answer Accepted)
SUM(CASE
 WHEN (v.VoteTypeId = 1 AND (p.CommunityOwnedDate > v.CreationDate OR p.CommunityOwnedDate IS NULL)) THEN 15
 ELSE 0 
END) AS ReputationFromAccepts,

-- Total Reputation from Suggested Edits
-- if ApprovalDate isn't NULL and RejectionDate is NULL it's been approved and not overriden
-- Group by the same Date as Votes for Rep-Cap evaluation (They count towards it)
COALESCE((SELECT
SUM(CASE WHEN (se.ApprovalDate IS NOT NULL AND se.RejectionDate IS NULL) THEN 2 ELSE 0 END)
FROM SuggestedEdits AS se
WHERE se.OwnerUserId = ##UserId##
AND YEAR(v.CreationDate) = YEAR(se.ApprovalDate)
AND MONTH(v.CreationDate) = MONTH(se.ApprovalDate)
AND DAY(v.CreationDate) = DAY(se.ApprovalDate) ),0) AS ReputationFromSuggestedEdits

FROM Posts AS p 
INNER JOIN Votes AS v ON v.PostId = p.Id

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##

GROUP BY v.CreationDate

) as r2d

For comparison, see @rolfl's Reputation Races query.
â€“Â 200_success
Aug 3 at 9:24 — Aug 3 at 9:24

score 5 · Answer 1 · 2018-08-06 11:33:08Z

The main improvements I would suggest are:

Minimize the number of accesses made to the large, minimally indexed, SEDE tables.

Break the query into multiple steps.

Reduce or eliminate code repetition.

Temporary tables are allowed on SEDE. Use these to fetch the minimal data needed from the large base tables to perform the calculations. Referring to these smaller data sets will be much more efficient than interrogating the base tables multiple times.

Properly used, temporary tables enable more accurate cardinality estimation, automatic statistics on intermediate results, and provide indexing opportunities - all of which can improve final plan quality and performance.

Breaking the query up also makes the logic easier to comprehend and debug (initially, and when maintaining it in future). Errors and redundancies are much easier to spot with smaller queries.

I have not attempted to improve the reputation calculation logic itself, except in a few minor ways, but the following illustrates a possible re-implementation of the provided code using temporary tables:

Initialization

DECLARE
 @UserId integer = 22656, -- Jon Skeet, why not
 @UntilDate datetime = CURRENT_TIMESTAMP;

DROP TABLE IF EXISTS 
 #Posts, #Data, #Edits;

Posts data

Just the columns we are going to need from the Posts table:

CREATE TABLE #Posts
(
 Id integer NOT NULL,
 AcceptedAnswerId integer NULL,
 CommunityOwnedDate datetime NULL,
 PostTypeId tinyint NOT NULL
);

Load the minimal number of rows needed:

INSERT #Posts
(
 Id,
 AcceptedAnswerId,
 CommunityOwnedDate,
 PostTypeId
)
SELECT
 P.Id,
 P.AcceptedAnswerId,
 P.CommunityOwnedDate,
 P.PostTypeId
FROM dbo.Posts AS P
WHERE 
 P.OwnerUserId = @UserId
 AND P.CreationDate <= @UntilDate
 AND 
 (
 P.CommunityOwnedDate IS NULL
 OR P.CommunityOwnedDate <= @UntilDate
 );

This applies the common predicates in the original query as early as possible, including setting a boundary of the values of CommunityOwnedDate that might impact the reputation calculation up to the desired date. Other filtering could be added here, for example to restrict the PostTypeId values to just those of interest.

The execution plan (for Jon Skeet) is:

load posts

This plan is relatively efficient, being primarily based on a seek to the OwnerUserId specified. The Merge Interval subtree is concerned with finding the range of CommunityOwnedDate values used as a secondary seek predicate.

The Key Lookup is sadly unavoidable, given the SEDE view dbo.Posts wrapping the underlying table dbo.PostsWithDeleted (filtering on DeletionDate IS NULL without a supporting index). Nevertheless, it does mean we can filter CreationDate in the same lookup without adding significant cost.

Adding votes data

The next step adds the columns we need from the voting data associated with the qualifying posts. This is performed as a second step rather than joining all in one step because the #Posts table provides useful accurate cardinality and statistical information. A join query is quite likely to mis-estimate, resulting in an inappropriate plan selection or hash spill, for example.

CREATE TABLE #Data
(
 AcceptedAnswerId integer NULL,
 CommunityOwnedDate datetime NULL,
 PostTypeId tinyint NOT NULL,
 BountyAmount integer NULL,
 VoteTypeId tinyint NOT NULL,
 CreationDate datetime NOT NULL
)
WITH (DATA_COMPRESSION = ROW);

Row compression is not necessarily super-useful here, but it does illustrate the extra flexibility of using discrete result sets.

INSERT #Data WITH (TABLOCK)
(
 AcceptedAnswerId,
 CommunityOwnedDate,
 PostTypeId,
 BountyAmount,
 VoteTypeId,
 CreationDate
)
SELECT
 P.AcceptedAnswerId,
 P.CommunityOwnedDate,
 P.PostTypeId,
 V.BountyAmount,
 V.VoteTypeId,
 V.CreationDate
FROM #Posts AS P
JOIN dbo.Votes AS V
 ON V.PostId = P.Id
WHERE 
 V.CreationDate <= @UntilDate
 AND 
 (
 P.CommunityOwnedDate IS NULL
 OR P.CommunityOwnedDate > V.CreationDate
 );

The query above again applies predicates as early as possible, and gives us a combined view of the posts and votes that we can interrogate in multiple ways later on at low cost (certainly much cheaper than accessing the base tables again).

load post and vote data

The execution plan features accurate cardinality estimates, appropriate use of parallelism, and a hashtable build-side bitmap filter used to perform semi-join reduction on the Votes table scan. This removes rows that cannot possibly join even before they are surfaced from the storage engine to the query processor (as indicated by the INROW attribute).

Suggested Edit Data

We only need the date (no time component) and total reputation earned for suggested edits:

CREATE TABLE #Edits
(
 ApprovalDate datetime PRIMARY KEY,
 Reputation integer NOT NULL
);

INSERT #Edits
(
 ApprovalDate,
 Reputation
)
SELECT
 G.ApprovalDate,
 EditRep = 2 * COUNT(*) -- 2 rep per approved suggested edit
FROM dbo.SuggestedEdits AS SE
CROSS APPLY
(
 VALUES (CONVERT(datetime, CONVERT(date, SE.ApprovalDate)))
) AS G (ApprovalDate)
WHERE
 SE.OwnerUserId = @UserId
 AND SE.ApprovalDate <= @UntilDate
 AND SE.RejectionDate IS NULL
GROUP BY
 G.ApprovalDate;

Using CROSS APPLY with VALUES allows us to avoid repeating the datetime to date conversion in the SELECT list and GROUP BY clause. The converted value is put back in a datetime type for consistency with the other SEDE-derived tables.

It is generally best practice to pay careful attention to data types, avoiding implicit conversions. This has myriad benefits, not least for cardinality estimation and join performance & optimizations.

A possible improvement to the code here would be to use the date type everywhere only the date portion is needed.

load suggested edit data

The parallel scan of the SuggestedEdits table is the best we can do here, without better indexing in SEDE.

Reputation Per Day

This table largely replaces the original r2d subquery, computing reputation totals per day:

CREATE TABLE #DayTotals
(
 CreationDate datetime PRIMARY KEY,
 ReputationFromVotes integer NOT NULL,
 ReputationFromBounties integer NOT NULL,
 ReputationFromAccepts integer NOT NULL,
 ReputationFromSuggestedEdits integer NOT NULL
);

INSERT #DayTotals WITH (TABLOCK)
(
 CreationDate,
 ReputationFromVotes,
 ReputationFromBounties,
 ReputationFromAccepts,
 ReputationFromSuggestedEdits
)
SELECT
 D.CreationDate,
 ReputationFromVotes = 
 ISNULL
 (
 SUM
 (
 CASE 
 WHEN D.PostTypeId = 1 AND D.VoteTypeId = 2 THEN 5
 WHEN D.PostTypeId = 2 AND D.VoteTypeId = 2 THEN 10
 WHEN D.VoteTypeId = 3 THEN -2
 ELSE 0
 END
 ), 0
 ),
 ReputationFromBounties =
 ISNULL(SUM(CASE WHEN D.VoteTypeId = 9 THEN D.BountyAmount ELSE 0 END), 0),
 ReputationFromAccepts =
 ISNULL(SUM(CASE WHEN D.VoteTypeId = 1 THEN 15 ELSE 0 END), 0),
 ReputationFromSuggestedEdits = ISNULL(SUM(E.Reputation), 0)
FROM #Data AS D
LEFT JOIN #Edits AS E
 ON E.ApprovalDate = D.CreationDate
GROUP BY 
 D.CreationDate;

The execution plan shows efficient parallel processing using only the relatively small temporary tables:

day totals

Other reputation totals

These are single values, stored in variables for convenience:

DECLARE 
 @RepBountiesGiven integer =
 (
 SELECT
 ISNULL(SUM(-D.BountyAmount), 0)
 FROM #Data AS D
 WHERE
 D.VoteTypeId = 8 -- BountyStart
 ),
 @RepFromAcceptingAnswers integer =
 (
 SELECT
 2 * COUNT(*) -- 2 rep per answer accepted
 FROM #Posts AS P
 WHERE
 P.PostTypeId = 1
 AND P.AcceptedAnswerId IS NOT NULL
 ),
 @ReputationFromRepCap integer =
 (
 SELECT
 SUM
 (
 CASE
 WHEN ReputationFromVotes + ReputationFromSuggestedEdits <= 200
 THEN ReputationFromVotes + ReputationFromSuggestedEdits
 ELSE 200
 END
 )
 FROM #DayTotals
 );

variable totals

Final query

Now we have all the elements we need to produce the final output:

SELECT
 TotalReputation = 
 @ReputationFromRepCap +
 SUM(DT.ReputationFromBounties) +
 @RepBountiesGiven +
 @RepFromAcceptingAnswers +
 SUM(DT.ReputationFromAccepts),
 ReputationFromRepCap = @ReputationFromRepCap,
 ReputationFromBounties = SUM(DT.ReputationFromBounties),
 ReputationGivenAsBounties = @RepBountiesGiven,
 ReputationFromAcceptingAnswers = @RepFromAcceptingAnswers,
 ReputationFromAcceptedAnswers = SUM(DT.ReputationFromAccepts)
FROM #DayTotals AS DT;

final query

Try the complete script at: SEDE Demo

This is at least 5x faster then mine for the test cases I tried. Impressive! — 10 hours ago

rene 1415 · Answer 2 · 2018-08-03 18:23:46Z

This is a partial answer as these were the bits I could verify easily.

In your column query for ReputationFromSuggestedEdits you can move all of the projection to the where clause of that query and then use a count(*) and a multiplication to get you final result. It is key to know here that ApprovalDate contains a full datetime where as votes.creationdate is sanatized on a date only. To compensate for that I use the between operator and calculate the date of the next day with dateadd function. An alternative could be to convert approvaldate to a date (with convert(date,approvaldate)) but that guarantees you'll never have the benefit of an index on approvaldate. Unfortunately SEDE doesn't have lots of indexes on non-key fields so the point is moot here.

My additions are lower-case so you can more easily spot them. I'm impartial if you use upper or lowercase. Use a convetion that suits you.

With above mentioned changes that part of your query would look like this:

COALESCE((SELECT count(*) * 2
 FROM SuggestedEdits AS se
 WHERE se.OwnerUserId = ##UserId##
 and se.Approvaldate between v.creationdate 
 and dateadd(d,1,v.creationdate)
 ), 0) AS ReputationFromSuggestedEdits

If I compare the two Execution Plans I see I've eliminated an Filter step.

Your outer query is basically

SELECT 
 -- aggregate columns for a votettypeid

FROM Posts AS p 
INNER JOIN Votes AS v ON v.PostId = p.Id

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##

GROUP BY v.CreationDate

You can limit the number of rows in both Posts and Votes to add for which posttype or votetype you're selecting rows

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##
and p.posttypeid in (1,2)
and v.votetypeid in (1,2,3,9)

The Votes table is huge but each row that you don't fetch, don't need to be projected and as you do as lot of work in that projection phase every row you don't fetch is a few nano-seconds profit.

score 5 · Answer 3 · 2018-08-06 11:33:08Z

The main improvements I would suggest are:

Minimize the number of accesses made to the large, minimally indexed, SEDE tables.

Break the query into multiple steps.

Reduce or eliminate code repetition.

Temporary tables are allowed on SEDE. Use these to fetch the minimal data needed from the large base tables to perform the calculations. Referring to these smaller data sets will be much more efficient than interrogating the base tables multiple times.

Properly used, temporary tables enable more accurate cardinality estimation, automatic statistics on intermediate results, and provide indexing opportunities - all of which can improve final plan quality and performance.

Breaking the query up also makes the logic easier to comprehend and debug (initially, and when maintaining it in future). Errors and redundancies are much easier to spot with smaller queries.

I have not attempted to improve the reputation calculation logic itself, except in a few minor ways, but the following illustrates a possible re-implementation of the provided code using temporary tables:

Initialization

DECLARE
 @UserId integer = 22656, -- Jon Skeet, why not
 @UntilDate datetime = CURRENT_TIMESTAMP;

DROP TABLE IF EXISTS 
 #Posts, #Data, #Edits;

Posts data

Just the columns we are going to need from the Posts table:

CREATE TABLE #Posts
(
 Id integer NOT NULL,
 AcceptedAnswerId integer NULL,
 CommunityOwnedDate datetime NULL,
 PostTypeId tinyint NOT NULL
);

Load the minimal number of rows needed:

INSERT #Posts
(
 Id,
 AcceptedAnswerId,
 CommunityOwnedDate,
 PostTypeId
)
SELECT
 P.Id,
 P.AcceptedAnswerId,
 P.CommunityOwnedDate,
 P.PostTypeId
FROM dbo.Posts AS P
WHERE 
 P.OwnerUserId = @UserId
 AND P.CreationDate <= @UntilDate
 AND 
 (
 P.CommunityOwnedDate IS NULL
 OR P.CommunityOwnedDate <= @UntilDate
 );

This applies the common predicates in the original query as early as possible, including setting a boundary of the values of CommunityOwnedDate that might impact the reputation calculation up to the desired date. Other filtering could be added here, for example to restrict the PostTypeId values to just those of interest.

The execution plan (for Jon Skeet) is:

load posts

This plan is relatively efficient, being primarily based on a seek to the OwnerUserId specified. The Merge Interval subtree is concerned with finding the range of CommunityOwnedDate values used as a secondary seek predicate.

The Key Lookup is sadly unavoidable, given the SEDE view dbo.Posts wrapping the underlying table dbo.PostsWithDeleted (filtering on DeletionDate IS NULL without a supporting index). Nevertheless, it does mean we can filter CreationDate in the same lookup without adding significant cost.

Adding votes data

The next step adds the columns we need from the voting data associated with the qualifying posts. This is performed as a second step rather than joining all in one step because the #Posts table provides useful accurate cardinality and statistical information. A join query is quite likely to mis-estimate, resulting in an inappropriate plan selection or hash spill, for example.

CREATE TABLE #Data
(
 AcceptedAnswerId integer NULL,
 CommunityOwnedDate datetime NULL,
 PostTypeId tinyint NOT NULL,
 BountyAmount integer NULL,
 VoteTypeId tinyint NOT NULL,
 CreationDate datetime NOT NULL
)
WITH (DATA_COMPRESSION = ROW);

Row compression is not necessarily super-useful here, but it does illustrate the extra flexibility of using discrete result sets.

INSERT #Data WITH (TABLOCK)
(
 AcceptedAnswerId,
 CommunityOwnedDate,
 PostTypeId,
 BountyAmount,
 VoteTypeId,
 CreationDate
)
SELECT
 P.AcceptedAnswerId,
 P.CommunityOwnedDate,
 P.PostTypeId,
 V.BountyAmount,
 V.VoteTypeId,
 V.CreationDate
FROM #Posts AS P
JOIN dbo.Votes AS V
 ON V.PostId = P.Id
WHERE 
 V.CreationDate <= @UntilDate
 AND 
 (
 P.CommunityOwnedDate IS NULL
 OR P.CommunityOwnedDate > V.CreationDate
 );

The query above again applies predicates as early as possible, and gives us a combined view of the posts and votes that we can interrogate in multiple ways later on at low cost (certainly much cheaper than accessing the base tables again).

load post and vote data

The execution plan features accurate cardinality estimates, appropriate use of parallelism, and a hashtable build-side bitmap filter used to perform semi-join reduction on the Votes table scan. This removes rows that cannot possibly join even before they are surfaced from the storage engine to the query processor (as indicated by the INROW attribute).

Suggested Edit Data

We only need the date (no time component) and total reputation earned for suggested edits:

CREATE TABLE #Edits
(
 ApprovalDate datetime PRIMARY KEY,
 Reputation integer NOT NULL
);

INSERT #Edits
(
 ApprovalDate,
 Reputation
)
SELECT
 G.ApprovalDate,
 EditRep = 2 * COUNT(*) -- 2 rep per approved suggested edit
FROM dbo.SuggestedEdits AS SE
CROSS APPLY
(
 VALUES (CONVERT(datetime, CONVERT(date, SE.ApprovalDate)))
) AS G (ApprovalDate)
WHERE
 SE.OwnerUserId = @UserId
 AND SE.ApprovalDate <= @UntilDate
 AND SE.RejectionDate IS NULL
GROUP BY
 G.ApprovalDate;

Using CROSS APPLY with VALUES allows us to avoid repeating the datetime to date conversion in the SELECT list and GROUP BY clause. The converted value is put back in a datetime type for consistency with the other SEDE-derived tables.

It is generally best practice to pay careful attention to data types, avoiding implicit conversions. This has myriad benefits, not least for cardinality estimation and join performance & optimizations.

A possible improvement to the code here would be to use the date type everywhere only the date portion is needed.

load suggested edit data

The parallel scan of the SuggestedEdits table is the best we can do here, without better indexing in SEDE.

Reputation Per Day

This table largely replaces the original r2d subquery, computing reputation totals per day:

CREATE TABLE #DayTotals
(
 CreationDate datetime PRIMARY KEY,
 ReputationFromVotes integer NOT NULL,
 ReputationFromBounties integer NOT NULL,
 ReputationFromAccepts integer NOT NULL,
 ReputationFromSuggestedEdits integer NOT NULL
);

INSERT #DayTotals WITH (TABLOCK)
(
 CreationDate,
 ReputationFromVotes,
 ReputationFromBounties,
 ReputationFromAccepts,
 ReputationFromSuggestedEdits
)
SELECT
 D.CreationDate,
 ReputationFromVotes = 
 ISNULL
 (
 SUM
 (
 CASE 
 WHEN D.PostTypeId = 1 AND D.VoteTypeId = 2 THEN 5
 WHEN D.PostTypeId = 2 AND D.VoteTypeId = 2 THEN 10
 WHEN D.VoteTypeId = 3 THEN -2
 ELSE 0
 END
 ), 0
 ),
 ReputationFromBounties =
 ISNULL(SUM(CASE WHEN D.VoteTypeId = 9 THEN D.BountyAmount ELSE 0 END), 0),
 ReputationFromAccepts =
 ISNULL(SUM(CASE WHEN D.VoteTypeId = 1 THEN 15 ELSE 0 END), 0),
 ReputationFromSuggestedEdits = ISNULL(SUM(E.Reputation), 0)
FROM #Data AS D
LEFT JOIN #Edits AS E
 ON E.ApprovalDate = D.CreationDate
GROUP BY 
 D.CreationDate;

The execution plan shows efficient parallel processing using only the relatively small temporary tables:

day totals

Other reputation totals

These are single values, stored in variables for convenience:

DECLARE 
 @RepBountiesGiven integer =
 (
 SELECT
 ISNULL(SUM(-D.BountyAmount), 0)
 FROM #Data AS D
 WHERE
 D.VoteTypeId = 8 -- BountyStart
 ),
 @RepFromAcceptingAnswers integer =
 (
 SELECT
 2 * COUNT(*) -- 2 rep per answer accepted
 FROM #Posts AS P
 WHERE
 P.PostTypeId = 1
 AND P.AcceptedAnswerId IS NOT NULL
 ),
 @ReputationFromRepCap integer =
 (
 SELECT
 SUM
 (
 CASE
 WHEN ReputationFromVotes + ReputationFromSuggestedEdits <= 200
 THEN ReputationFromVotes + ReputationFromSuggestedEdits
 ELSE 200
 END
 )
 FROM #DayTotals
 );

variable totals

Final query

Now we have all the elements we need to produce the final output:

SELECT
 TotalReputation = 
 @ReputationFromRepCap +
 SUM(DT.ReputationFromBounties) +
 @RepBountiesGiven +
 @RepFromAcceptingAnswers +
 SUM(DT.ReputationFromAccepts),
 ReputationFromRepCap = @ReputationFromRepCap,
 ReputationFromBounties = SUM(DT.ReputationFromBounties),
 ReputationGivenAsBounties = @RepBountiesGiven,
 ReputationFromAcceptingAnswers = @RepFromAcceptingAnswers,
 ReputationFromAcceptedAnswers = SUM(DT.ReputationFromAccepts)
FROM #DayTotals AS DT;

final query

Try the complete script at: SEDE Demo

This is at least 5x faster then mine for the test cases I tried. Impressive! — 10 hours ago

rene 1415 · Answer 4 · 2018-08-03 18:23:46Z

This is a partial answer as these were the bits I could verify easily.

In your column query for ReputationFromSuggestedEdits you can move all of the projection to the where clause of that query and then use a count(*) and a multiplication to get you final result. It is key to know here that ApprovalDate contains a full datetime where as votes.creationdate is sanatized on a date only. To compensate for that I use the between operator and calculate the date of the next day with dateadd function. An alternative could be to convert approvaldate to a date (with convert(date,approvaldate)) but that guarantees you'll never have the benefit of an index on approvaldate. Unfortunately SEDE doesn't have lots of indexes on non-key fields so the point is moot here.

My additions are lower-case so you can more easily spot them. I'm impartial if you use upper or lowercase. Use a convetion that suits you.

With above mentioned changes that part of your query would look like this:

COALESCE((SELECT count(*) * 2
 FROM SuggestedEdits AS se
 WHERE se.OwnerUserId = ##UserId##
 and se.Approvaldate between v.creationdate 
 and dateadd(d,1,v.creationdate)
 ), 0) AS ReputationFromSuggestedEdits

If I compare the two Execution Plans I see I've eliminated an Filter step.

Your outer query is basically

SELECT 
 -- aggregate columns for a votettypeid

FROM Posts AS p 
INNER JOIN Votes AS v ON v.PostId = p.Id

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##

GROUP BY v.CreationDate

You can limit the number of rows in both Posts and Votes to add for which posttype or votetype you're selecting rows

WHERE p.OwnerUserId = ##UserId:int##
AND v.CreationDate <= ##UntilDate:string##
and p.posttypeid in (1,2)
and v.votetypeid in (1,2,3,9)

The Votes table is huge but each row that you don't fetch, don't need to be projected and as you do as lot of work in that projection phase every row you don't fetch is a few nano-seconds profit.

Find how much reputation a user had on a given date

2 Answers 2

Initialization

Posts data

Adding votes data

Suggested Edit Data

Reputation Per Day

Other reputation totals

Final query

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Initialization

Posts data

Adding votes data

Suggested Edit Data

Reputation Per Day

Other reputation totals

Final query

Initialization

Posts data

Adding votes data

Suggested Edit Data

Reputation Per Day

Other reputation totals

Final query

Initialization

Posts data

Adding votes data

Suggested Edit Data

Reputation Per Day

Other reputation totals

Final query

Initialization

Posts data

Adding votes data

Suggested Edit Data

Reputation Per Day

Other reputation totals

Final query

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods

2 Answers
2

2 Answers
2

2 Answers
2