EF Core, reduce amount of query calls/ overhead caused by them [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
0
down vote

favorite












To give a bit of a background to the apllication.

There is a blobtrigger. This trigger picks up .csv files, which have been provided by users.

These csv files usually contain between the 50k and 80k rows, and are by nature dynamic. Hence the csv may adhere to 1 of the xxx structures we accept.



Within these csv's there are a subset of fields, which can be identified by name, which have to be 'standardized' and 'verified'.



So a sample scenario.

A transaction is made, provider A communicates a clientname: Kees B B.

Provider B communicates the same client under the name Kees B Barn.

We already know that Provider A and B communicate the same client under a different name, and have a 'mapping' table. Which will standardize the names A & B provide in our database under the the client That one guy living in a barn.



The obtained 'row' from this csv, for example sake, may adhere to the following.



public class Transaction: IHasFileName, IStandardizeModels

public string ClientName;
public string TransactionType;
public async Task<bool> StandardizeAsync(ILogger logger, DatabaseContext context)

try

if (context == null)

// Logging line
return false;


// Client Name
var clientNameResult = context.ClientsNamesMapping
.Where(y => y.Name == ClientName.RemoveWhitespace().ToUpperInvariant())
.Select(x => x.Client)
.FirstOrDefault();

if (clientNameResult == default)

// Logging line
return false;

ClientName = clientNameResult.Name;





The above model is used in the following chain of execution.



// This function handles the process from input stream to storage in db.
public async Task StoreAsync<T>(Stream fileStream, string fileName)
where T : IHasFileName, IStandardizeModels

// Some preprocessing, not relevant to the question
var preprocessedStream = _fileParser.PreprocessAsync(fileStream);

// Parse filestream to standarizable model aka, the above mentioned transaction
var parseResult = await _fileParser.ParseAsync<T>(await preprocessedStream);

// Iterate model to add origin filename
parseResult = parseResult.Select(x => x.FileName = fileName; return x; );

// Group objects based on wether they could be verified and standardized
var seperatedParseResult = parseResult.GroupBy(async x =>
(await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
).ToList();

// More code, not relevant to issue



Now the painfull part of the above code is the following line:



var seperatedParseResult = parseResult.GroupBy(async x =>
(await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
).ToList();


As this, in the case of 50k rows, leads to 50k times the execution of the query located in the StandardizeAsync() call.



All though the execution time on it self is pretty good (query id 409)enter image description here



The overhead of performing so many queries causes significant performance overhead. (gmt +2, hence an execution time of nearly an hour, and still not done) enter image description here



Take note that the images above where executed with 3 such standardization within a model.



The DBContext is Generated from an exisiting SQLDB. The Client name scenario above, consists of the following:



CREATE TABLE [dbo].[ClientsNamesMapping]
(
[Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Name] NVARCHAR(MAX) NOT NULL,
[ClientId] SMALLINT NOT NULL

CONSTRAINT FK_Clients_Id FOREIGN KEY ([ClientId])
REFERENCES [dbo].[Clients] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE
)

CREATE TABLE [dbo].[Clients]
(
[Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Name] NVARCHAR(50) NOT NULL,
[Address] NVARCHAR(MAX) NOT NULL,
[PostalCode] NVARCHAR(50) NOT NULL,
[CityId] INT NOT NULL,
[CountryId] SMALLINT NOT NULL

CONSTRAINT FK_Customers_CountryId FOREIGN KEY (CountryId)
REFERENCES [dbo].[CountryCodes] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE

CONSTRAINT FK_Customers_CityId FOREIGN KEY (CityId)
REFERENCES [dbo].[Cities] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE
)


With that my question being, how can I reduce the amount queries, whilst maintain the flexibility of accepting different models, which may have to adhere to different rules?







share|improve this question











closed as off-topic by t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela Jul 30 at 22:53


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela
If this question can be reworded to fit the rules in the help center, please edit the question.








  • 1




    You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
    – t3chb0t
    Jul 30 at 17:28











  • @t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
    – MX D
    Jul 30 at 20:58
















up vote
0
down vote

favorite












To give a bit of a background to the apllication.

There is a blobtrigger. This trigger picks up .csv files, which have been provided by users.

These csv files usually contain between the 50k and 80k rows, and are by nature dynamic. Hence the csv may adhere to 1 of the xxx structures we accept.



Within these csv's there are a subset of fields, which can be identified by name, which have to be 'standardized' and 'verified'.



So a sample scenario.

A transaction is made, provider A communicates a clientname: Kees B B.

Provider B communicates the same client under the name Kees B Barn.

We already know that Provider A and B communicate the same client under a different name, and have a 'mapping' table. Which will standardize the names A & B provide in our database under the the client That one guy living in a barn.



The obtained 'row' from this csv, for example sake, may adhere to the following.



public class Transaction: IHasFileName, IStandardizeModels

public string ClientName;
public string TransactionType;
public async Task<bool> StandardizeAsync(ILogger logger, DatabaseContext context)

try

if (context == null)

// Logging line
return false;


// Client Name
var clientNameResult = context.ClientsNamesMapping
.Where(y => y.Name == ClientName.RemoveWhitespace().ToUpperInvariant())
.Select(x => x.Client)
.FirstOrDefault();

if (clientNameResult == default)

// Logging line
return false;

ClientName = clientNameResult.Name;





The above model is used in the following chain of execution.



// This function handles the process from input stream to storage in db.
public async Task StoreAsync<T>(Stream fileStream, string fileName)
where T : IHasFileName, IStandardizeModels

// Some preprocessing, not relevant to the question
var preprocessedStream = _fileParser.PreprocessAsync(fileStream);

// Parse filestream to standarizable model aka, the above mentioned transaction
var parseResult = await _fileParser.ParseAsync<T>(await preprocessedStream);

// Iterate model to add origin filename
parseResult = parseResult.Select(x => x.FileName = fileName; return x; );

// Group objects based on wether they could be verified and standardized
var seperatedParseResult = parseResult.GroupBy(async x =>
(await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
).ToList();

// More code, not relevant to issue



Now the painfull part of the above code is the following line:



var seperatedParseResult = parseResult.GroupBy(async x =>
(await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
).ToList();


As this, in the case of 50k rows, leads to 50k times the execution of the query located in the StandardizeAsync() call.



All though the execution time on it self is pretty good (query id 409)enter image description here



The overhead of performing so many queries causes significant performance overhead. (gmt +2, hence an execution time of nearly an hour, and still not done) enter image description here



Take note that the images above where executed with 3 such standardization within a model.



The DBContext is Generated from an exisiting SQLDB. The Client name scenario above, consists of the following:



CREATE TABLE [dbo].[ClientsNamesMapping]
(
[Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Name] NVARCHAR(MAX) NOT NULL,
[ClientId] SMALLINT NOT NULL

CONSTRAINT FK_Clients_Id FOREIGN KEY ([ClientId])
REFERENCES [dbo].[Clients] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE
)

CREATE TABLE [dbo].[Clients]
(
[Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Name] NVARCHAR(50) NOT NULL,
[Address] NVARCHAR(MAX) NOT NULL,
[PostalCode] NVARCHAR(50) NOT NULL,
[CityId] INT NOT NULL,
[CountryId] SMALLINT NOT NULL

CONSTRAINT FK_Customers_CountryId FOREIGN KEY (CountryId)
REFERENCES [dbo].[CountryCodes] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE

CONSTRAINT FK_Customers_CityId FOREIGN KEY (CityId)
REFERENCES [dbo].[Cities] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE
)


With that my question being, how can I reduce the amount queries, whilst maintain the flexibility of accepting different models, which may have to adhere to different rules?







share|improve this question











closed as off-topic by t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela Jul 30 at 22:53


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela
If this question can be reworded to fit the rules in the help center, please edit the question.








  • 1




    You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
    – t3chb0t
    Jul 30 at 17:28











  • @t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
    – MX D
    Jul 30 at 20:58












up vote
0
down vote

favorite









up vote
0
down vote

favorite











To give a bit of a background to the apllication.

There is a blobtrigger. This trigger picks up .csv files, which have been provided by users.

These csv files usually contain between the 50k and 80k rows, and are by nature dynamic. Hence the csv may adhere to 1 of the xxx structures we accept.



Within these csv's there are a subset of fields, which can be identified by name, which have to be 'standardized' and 'verified'.



So a sample scenario.

A transaction is made, provider A communicates a clientname: Kees B B.

Provider B communicates the same client under the name Kees B Barn.

We already know that Provider A and B communicate the same client under a different name, and have a 'mapping' table. Which will standardize the names A & B provide in our database under the the client That one guy living in a barn.



The obtained 'row' from this csv, for example sake, may adhere to the following.



public class Transaction: IHasFileName, IStandardizeModels

public string ClientName;
public string TransactionType;
public async Task<bool> StandardizeAsync(ILogger logger, DatabaseContext context)

try

if (context == null)

// Logging line
return false;


// Client Name
var clientNameResult = context.ClientsNamesMapping
.Where(y => y.Name == ClientName.RemoveWhitespace().ToUpperInvariant())
.Select(x => x.Client)
.FirstOrDefault();

if (clientNameResult == default)

// Logging line
return false;

ClientName = clientNameResult.Name;





The above model is used in the following chain of execution.



// This function handles the process from input stream to storage in db.
public async Task StoreAsync<T>(Stream fileStream, string fileName)
where T : IHasFileName, IStandardizeModels

// Some preprocessing, not relevant to the question
var preprocessedStream = _fileParser.PreprocessAsync(fileStream);

// Parse filestream to standarizable model aka, the above mentioned transaction
var parseResult = await _fileParser.ParseAsync<T>(await preprocessedStream);

// Iterate model to add origin filename
parseResult = parseResult.Select(x => x.FileName = fileName; return x; );

// Group objects based on wether they could be verified and standardized
var seperatedParseResult = parseResult.GroupBy(async x =>
(await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
).ToList();

// More code, not relevant to issue



Now the painfull part of the above code is the following line:



var seperatedParseResult = parseResult.GroupBy(async x =>
(await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
).ToList();


As this, in the case of 50k rows, leads to 50k times the execution of the query located in the StandardizeAsync() call.



All though the execution time on it self is pretty good (query id 409)enter image description here



The overhead of performing so many queries causes significant performance overhead. (gmt +2, hence an execution time of nearly an hour, and still not done) enter image description here



Take note that the images above where executed with 3 such standardization within a model.



The DBContext is Generated from an exisiting SQLDB. The Client name scenario above, consists of the following:



CREATE TABLE [dbo].[ClientsNamesMapping]
(
[Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Name] NVARCHAR(MAX) NOT NULL,
[ClientId] SMALLINT NOT NULL

CONSTRAINT FK_Clients_Id FOREIGN KEY ([ClientId])
REFERENCES [dbo].[Clients] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE
)

CREATE TABLE [dbo].[Clients]
(
[Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Name] NVARCHAR(50) NOT NULL,
[Address] NVARCHAR(MAX) NOT NULL,
[PostalCode] NVARCHAR(50) NOT NULL,
[CityId] INT NOT NULL,
[CountryId] SMALLINT NOT NULL

CONSTRAINT FK_Customers_CountryId FOREIGN KEY (CountryId)
REFERENCES [dbo].[CountryCodes] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE

CONSTRAINT FK_Customers_CityId FOREIGN KEY (CityId)
REFERENCES [dbo].[Cities] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE
)


With that my question being, how can I reduce the amount queries, whilst maintain the flexibility of accepting different models, which may have to adhere to different rules?







share|improve this question











To give a bit of a background to the apllication.

There is a blobtrigger. This trigger picks up .csv files, which have been provided by users.

These csv files usually contain between the 50k and 80k rows, and are by nature dynamic. Hence the csv may adhere to 1 of the xxx structures we accept.



Within these csv's there are a subset of fields, which can be identified by name, which have to be 'standardized' and 'verified'.



So a sample scenario.

A transaction is made, provider A communicates a clientname: Kees B B.

Provider B communicates the same client under the name Kees B Barn.

We already know that Provider A and B communicate the same client under a different name, and have a 'mapping' table. Which will standardize the names A & B provide in our database under the the client That one guy living in a barn.



The obtained 'row' from this csv, for example sake, may adhere to the following.



public class Transaction: IHasFileName, IStandardizeModels

public string ClientName;
public string TransactionType;
public async Task<bool> StandardizeAsync(ILogger logger, DatabaseContext context)

try

if (context == null)

// Logging line
return false;


// Client Name
var clientNameResult = context.ClientsNamesMapping
.Where(y => y.Name == ClientName.RemoveWhitespace().ToUpperInvariant())
.Select(x => x.Client)
.FirstOrDefault();

if (clientNameResult == default)

// Logging line
return false;

ClientName = clientNameResult.Name;





The above model is used in the following chain of execution.



// This function handles the process from input stream to storage in db.
public async Task StoreAsync<T>(Stream fileStream, string fileName)
where T : IHasFileName, IStandardizeModels

// Some preprocessing, not relevant to the question
var preprocessedStream = _fileParser.PreprocessAsync(fileStream);

// Parse filestream to standarizable model aka, the above mentioned transaction
var parseResult = await _fileParser.ParseAsync<T>(await preprocessedStream);

// Iterate model to add origin filename
parseResult = parseResult.Select(x => x.FileName = fileName; return x; );

// Group objects based on wether they could be verified and standardized
var seperatedParseResult = parseResult.GroupBy(async x =>
(await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
).ToList();

// More code, not relevant to issue



Now the painfull part of the above code is the following line:



var seperatedParseResult = parseResult.GroupBy(async x =>
(await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
).ToList();


As this, in the case of 50k rows, leads to 50k times the execution of the query located in the StandardizeAsync() call.



All though the execution time on it self is pretty good (query id 409)enter image description here



The overhead of performing so many queries causes significant performance overhead. (gmt +2, hence an execution time of nearly an hour, and still not done) enter image description here



Take note that the images above where executed with 3 such standardization within a model.



The DBContext is Generated from an exisiting SQLDB. The Client name scenario above, consists of the following:



CREATE TABLE [dbo].[ClientsNamesMapping]
(
[Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Name] NVARCHAR(MAX) NOT NULL,
[ClientId] SMALLINT NOT NULL

CONSTRAINT FK_Clients_Id FOREIGN KEY ([ClientId])
REFERENCES [dbo].[Clients] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE
)

CREATE TABLE [dbo].[Clients]
(
[Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Name] NVARCHAR(50) NOT NULL,
[Address] NVARCHAR(MAX) NOT NULL,
[PostalCode] NVARCHAR(50) NOT NULL,
[CityId] INT NOT NULL,
[CountryId] SMALLINT NOT NULL

CONSTRAINT FK_Customers_CountryId FOREIGN KEY (CountryId)
REFERENCES [dbo].[CountryCodes] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE

CONSTRAINT FK_Customers_CityId FOREIGN KEY (CityId)
REFERENCES [dbo].[Cities] (Id)
ON DELETE CASCADE
ON UPDATE CASCADE
)


With that my question being, how can I reduce the amount queries, whilst maintain the flexibility of accepting different models, which may have to adhere to different rules?









share|improve this question










share|improve this question




share|improve this question









asked Jul 30 at 14:54









MX D

333216




333216




closed as off-topic by t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela Jul 30 at 22:53


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela
If this question can be reworded to fit the rules in the help center, please edit the question.




closed as off-topic by t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela Jul 30 at 22:53


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela
If this question can be reworded to fit the rules in the help center, please edit the question.







  • 1




    You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
    – t3chb0t
    Jul 30 at 17:28











  • @t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
    – MX D
    Jul 30 at 20:58












  • 1




    You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
    – t3chb0t
    Jul 30 at 17:28











  • @t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
    – MX D
    Jul 30 at 20:58







1




1




You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
– t3chb0t
Jul 30 at 17:28





You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
– t3chb0t
Jul 30 at 17:28













@t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
– MX D
Jul 30 at 20:58




@t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
– MX D
Jul 30 at 20:58















active

oldest

votes






















active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes

Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation