EF Core, reduce amount of query calls/ overhead caused by them [closed]

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
0
down vote

favorite

To give a bit of a background to the apllication.

There is a blobtrigger. This trigger picks up .csv files, which have been provided by users.

These csv files usually contain between the 50k and 80k rows, and are by nature dynamic. Hence the csv may adhere to 1 of the xxx structures we accept.

Within these csv's there are a subset of fields, which can be identified by name, which have to be 'standardized' and 'verified'.

So a sample scenario.

A transaction is made, provider A communicates a clientname: Kees B B.

Provider B communicates the same client under the name Kees B Barn.

We already know that Provider A and B communicate the same client under a different name, and have a 'mapping' table. Which will standardize the names A & B provide in our database under the the client That one guy living in a barn.

The obtained 'row' from this csv, for example sake, may adhere to the following.

public class Transaction: IHasFileName, IStandardizeModels

 public string ClientName;
 public string TransactionType;
 public async Task<bool> StandardizeAsync(ILogger logger, DatabaseContext context)
 
 try
 
 if (context == null)
 
 // Logging line
 return false;
 

 // Client Name
 var clientNameResult = context.ClientsNamesMapping
 .Where(y => y.Name == ClientName.RemoveWhitespace().ToUpperInvariant())
 .Select(x => x.Client)
 .FirstOrDefault();

 if (clientNameResult == default)
 
 // Logging line
 return false;
 
 ClientName = clientNameResult.Name;

The above model is used in the following chain of execution.

// This function handles the process from input stream to storage in db.
public async Task StoreAsync<T>(Stream fileStream, string fileName)
 where T : IHasFileName, IStandardizeModels

 // Some preprocessing, not relevant to the question
 var preprocessedStream = _fileParser.PreprocessAsync(fileStream);

 // Parse filestream to standarizable model aka, the above mentioned transaction
 var parseResult = await _fileParser.ParseAsync<T>(await preprocessedStream);

 // Iterate model to add origin filename
 parseResult = parseResult.Select(x => x.FileName = fileName; return x; );

 // Group objects based on wether they could be verified and standardized
 var seperatedParseResult = parseResult.GroupBy(async x =>
 (await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
 ).ToList();

 // More code, not relevant to issue

Now the painfull part of the above code is the following line:

var seperatedParseResult = parseResult.GroupBy(async x =>
 (await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
 ).ToList();

As this, in the case of 50k rows, leads to 50k times the execution of the query located in the StandardizeAsync() call.

All though the execution time on it self is pretty good (query id 409) enter image description here

The overhead of performing so many queries causes significant performance overhead. (gmt +2, hence an execution time of nearly an hour, and still not done) enter image description here

Take note that the images above where executed with 3 such standardization within a model.

The DBContext is Generated from an exisiting SQLDB. The Client name scenario above, consists of the following:

CREATE TABLE [dbo].[ClientsNamesMapping]
(
 [Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY, 
 [Name] NVARCHAR(MAX) NOT NULL, 
 [ClientId] SMALLINT NOT NULL

 CONSTRAINT FK_Clients_Id FOREIGN KEY ([ClientId]) 
 REFERENCES [dbo].[Clients] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 
)

CREATE TABLE [dbo].[Clients]
(
 [Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY, 
 [Name] NVARCHAR(50) NOT NULL, 
 [Address] NVARCHAR(MAX) NOT NULL, 
 [PostalCode] NVARCHAR(50) NOT NULL, 
 [CityId] INT NOT NULL, 
 [CountryId] SMALLINT NOT NULL

 CONSTRAINT FK_Customers_CountryId FOREIGN KEY (CountryId) 
 REFERENCES [dbo].[CountryCodes] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 

 CONSTRAINT FK_Customers_CityId FOREIGN KEY (CityId) 
 REFERENCES [dbo].[Cities] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 
)

With that my question being, how can I reduce the amount queries, whilst maintain the flexibility of accepting different models, which may have to adhere to different rules?

asked Jul 30 at 14:54

MX D

333216

closed as off-topic by t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela Jul 30 at 22:53

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â€“ t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela

If this question can be reworded to fit the rules in the help center, please edit the question.

1

You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
â€“Â t3chb0t
Jul 30 at 17:28

@t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
â€“Â MX D
Jul 30 at 20:58

add a commentÂ |Â

up vote
0
down vote

favorite

Within these csv's there are a subset of fields, which can be identified by name, which have to be 'standardized' and 'verified'.

The obtained 'row' from this csv, for example sake, may adhere to the following.

public class Transaction: IHasFileName, IStandardizeModels

 public string ClientName;
 public string TransactionType;
 public async Task<bool> StandardizeAsync(ILogger logger, DatabaseContext context)
 
 try
 
 if (context == null)
 
 // Logging line
 return false;
 

 // Client Name
 var clientNameResult = context.ClientsNamesMapping
 .Where(y => y.Name == ClientName.RemoveWhitespace().ToUpperInvariant())
 .Select(x => x.Client)
 .FirstOrDefault();

 if (clientNameResult == default)
 
 // Logging line
 return false;
 
 ClientName = clientNameResult.Name;

The above model is used in the following chain of execution.

// This function handles the process from input stream to storage in db.
public async Task StoreAsync<T>(Stream fileStream, string fileName)
 where T : IHasFileName, IStandardizeModels

 // Some preprocessing, not relevant to the question
 var preprocessedStream = _fileParser.PreprocessAsync(fileStream);

 // Parse filestream to standarizable model aka, the above mentioned transaction
 var parseResult = await _fileParser.ParseAsync<T>(await preprocessedStream);

 // Iterate model to add origin filename
 parseResult = parseResult.Select(x => x.FileName = fileName; return x; );

 // Group objects based on wether they could be verified and standardized
 var seperatedParseResult = parseResult.GroupBy(async x =>
 (await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
 ).ToList();

 // More code, not relevant to issue

Now the painfull part of the above code is the following line:

var seperatedParseResult = parseResult.GroupBy(async x =>
 (await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
 ).ToList();

As this, in the case of 50k rows, leads to 50k times the execution of the query located in the StandardizeAsync() call.

All though the execution time on it self is pretty good (query id 409) enter image description here

The overhead of performing so many queries causes significant performance overhead. (gmt +2, hence an execution time of nearly an hour, and still not done) enter image description here

Take note that the images above where executed with 3 such standardization within a model.

The DBContext is Generated from an exisiting SQLDB. The Client name scenario above, consists of the following:

CREATE TABLE [dbo].[ClientsNamesMapping]
(
 [Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY, 
 [Name] NVARCHAR(MAX) NOT NULL, 
 [ClientId] SMALLINT NOT NULL

 CONSTRAINT FK_Clients_Id FOREIGN KEY ([ClientId]) 
 REFERENCES [dbo].[Clients] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 
)

CREATE TABLE [dbo].[Clients]
(
 [Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY, 
 [Name] NVARCHAR(50) NOT NULL, 
 [Address] NVARCHAR(MAX) NOT NULL, 
 [PostalCode] NVARCHAR(50) NOT NULL, 
 [CityId] INT NOT NULL, 
 [CountryId] SMALLINT NOT NULL

 CONSTRAINT FK_Customers_CountryId FOREIGN KEY (CountryId) 
 REFERENCES [dbo].[CountryCodes] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 

 CONSTRAINT FK_Customers_CityId FOREIGN KEY (CityId) 
 REFERENCES [dbo].[Cities] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 
)

With that my question being, how can I reduce the amount queries, whilst maintain the flexibility of accepting different models, which may have to adhere to different rules?

asked Jul 30 at 14:54

MX D

333216

closed as off-topic by t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela Jul 30 at 22:53

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â€“ t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela

If this question can be reworded to fit the rules in the help center, please edit the question.

1

You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
â€“Â t3chb0t
Jul 30 at 17:28

@t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
â€“Â MX D
Jul 30 at 20:58

add a commentÂ |Â

up vote
0
down vote

favorite

Within these csv's there are a subset of fields, which can be identified by name, which have to be 'standardized' and 'verified'.

The obtained 'row' from this csv, for example sake, may adhere to the following.

public class Transaction: IHasFileName, IStandardizeModels

 public string ClientName;
 public string TransactionType;
 public async Task<bool> StandardizeAsync(ILogger logger, DatabaseContext context)
 
 try
 
 if (context == null)
 
 // Logging line
 return false;
 

 // Client Name
 var clientNameResult = context.ClientsNamesMapping
 .Where(y => y.Name == ClientName.RemoveWhitespace().ToUpperInvariant())
 .Select(x => x.Client)
 .FirstOrDefault();

 if (clientNameResult == default)
 
 // Logging line
 return false;
 
 ClientName = clientNameResult.Name;

The above model is used in the following chain of execution.

// This function handles the process from input stream to storage in db.
public async Task StoreAsync<T>(Stream fileStream, string fileName)
 where T : IHasFileName, IStandardizeModels

 // Some preprocessing, not relevant to the question
 var preprocessedStream = _fileParser.PreprocessAsync(fileStream);

 // Parse filestream to standarizable model aka, the above mentioned transaction
 var parseResult = await _fileParser.ParseAsync<T>(await preprocessedStream);

 // Iterate model to add origin filename
 parseResult = parseResult.Select(x => x.FileName = fileName; return x; );

 // Group objects based on wether they could be verified and standardized
 var seperatedParseResult = parseResult.GroupBy(async x =>
 (await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
 ).ToList();

 // More code, not relevant to issue

Now the painfull part of the above code is the following line:

var seperatedParseResult = parseResult.GroupBy(async x =>
 (await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
 ).ToList();

As this, in the case of 50k rows, leads to 50k times the execution of the query located in the StandardizeAsync() call.

All though the execution time on it self is pretty good (query id 409) enter image description here

The overhead of performing so many queries causes significant performance overhead. (gmt +2, hence an execution time of nearly an hour, and still not done) enter image description here

Take note that the images above where executed with 3 such standardization within a model.

The DBContext is Generated from an exisiting SQLDB. The Client name scenario above, consists of the following:

CREATE TABLE [dbo].[ClientsNamesMapping]
(
 [Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY, 
 [Name] NVARCHAR(MAX) NOT NULL, 
 [ClientId] SMALLINT NOT NULL

 CONSTRAINT FK_Clients_Id FOREIGN KEY ([ClientId]) 
 REFERENCES [dbo].[Clients] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 
)

CREATE TABLE [dbo].[Clients]
(
 [Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY, 
 [Name] NVARCHAR(50) NOT NULL, 
 [Address] NVARCHAR(MAX) NOT NULL, 
 [PostalCode] NVARCHAR(50) NOT NULL, 
 [CityId] INT NOT NULL, 
 [CountryId] SMALLINT NOT NULL

 CONSTRAINT FK_Customers_CountryId FOREIGN KEY (CountryId) 
 REFERENCES [dbo].[CountryCodes] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 

 CONSTRAINT FK_Customers_CityId FOREIGN KEY (CityId) 
 REFERENCES [dbo].[Cities] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 
)

With that my question being, how can I reduce the amount queries, whilst maintain the flexibility of accepting different models, which may have to adhere to different rules?

asked Jul 30 at 14:54

MX D

333216

Within these csv's there are a subset of fields, which can be identified by name, which have to be 'standardized' and 'verified'.

The obtained 'row' from this csv, for example sake, may adhere to the following.

public class Transaction: IHasFileName, IStandardizeModels

 public string ClientName;
 public string TransactionType;
 public async Task<bool> StandardizeAsync(ILogger logger, DatabaseContext context)
 
 try
 
 if (context == null)
 
 // Logging line
 return false;
 

 // Client Name
 var clientNameResult = context.ClientsNamesMapping
 .Where(y => y.Name == ClientName.RemoveWhitespace().ToUpperInvariant())
 .Select(x => x.Client)
 .FirstOrDefault();

 if (clientNameResult == default)
 
 // Logging line
 return false;
 
 ClientName = clientNameResult.Name;

The above model is used in the following chain of execution.

// This function handles the process from input stream to storage in db.
public async Task StoreAsync<T>(Stream fileStream, string fileName)
 where T : IHasFileName, IStandardizeModels

 // Some preprocessing, not relevant to the question
 var preprocessedStream = _fileParser.PreprocessAsync(fileStream);

 // Parse filestream to standarizable model aka, the above mentioned transaction
 var parseResult = await _fileParser.ParseAsync<T>(await preprocessedStream);

 // Iterate model to add origin filename
 parseResult = parseResult.Select(x => x.FileName = fileName; return x; );

 // Group objects based on wether they could be verified and standardized
 var seperatedParseResult = parseResult.GroupBy(async x =>
 (await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
 ).ToList();

 // More code, not relevant to issue

Now the painfull part of the above code is the following line:

var seperatedParseResult = parseResult.GroupBy(async x =>
 (await x.StandardizeAsync(_logger, _context) && await x.VerifyAsync(_logger, _context))
 ).ToList();

As this, in the case of 50k rows, leads to 50k times the execution of the query located in the StandardizeAsync() call.

All though the execution time on it self is pretty good (query id 409) enter image description here

The overhead of performing so many queries causes significant performance overhead. (gmt +2, hence an execution time of nearly an hour, and still not done) enter image description here

Take note that the images above where executed with 3 such standardization within a model.

The DBContext is Generated from an exisiting SQLDB. The Client name scenario above, consists of the following:

CREATE TABLE [dbo].[ClientsNamesMapping]
(
 [Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY, 
 [Name] NVARCHAR(MAX) NOT NULL, 
 [ClientId] SMALLINT NOT NULL

 CONSTRAINT FK_Clients_Id FOREIGN KEY ([ClientId]) 
 REFERENCES [dbo].[Clients] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 
)

CREATE TABLE [dbo].[Clients]
(
 [Id] SMALLINT IDENTITY(1,1) NOT NULL PRIMARY KEY, 
 [Name] NVARCHAR(50) NOT NULL, 
 [Address] NVARCHAR(MAX) NOT NULL, 
 [PostalCode] NVARCHAR(50) NOT NULL, 
 [CityId] INT NOT NULL, 
 [CountryId] SMALLINT NOT NULL

 CONSTRAINT FK_Customers_CountryId FOREIGN KEY (CountryId) 
 REFERENCES [dbo].[CountryCodes] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 

 CONSTRAINT FK_Customers_CityId FOREIGN KEY (CityId) 
 REFERENCES [dbo].[Cities] (Id) 
 ON DELETE CASCADE 
 ON UPDATE CASCADE 
)

With that my question being, how can I reduce the amount queries, whilst maintain the flexibility of accepting different models, which may have to adhere to different rules?

asked Jul 30 at 14:54

MX D

333216

asked Jul 30 at 14:54

MX D

333216

asked Jul 30 at 14:54

MX D

333216

asked Jul 30 at 14:54

MX D

333216

closed as off-topic by t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela Jul 30 at 22:53

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â€“ t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela

If this question can be reworded to fit the rules in the help center, please edit the question.

closed as off-topic by t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela Jul 30 at 22:53

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â€“ t3chb0t, Stephen Rauch, Dannnno, yuri, Sam Onela

If this question can be reworded to fit the rules in the help center, please edit the question.

1

You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
â€“Â t3chb0t
Jul 30 at 17:28

@t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
â€“Â MX D
Jul 30 at 20:58

add a commentÂ |Â

1

You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
â€“Â t3chb0t
Jul 30 at 17:28

@t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
â€“Â MX D
Jul 30 at 20:58

You have commented and removed a lot of lines as not relevant. How are we supposed to see the big picture of your solution if so many parts are missing or probably changed? Please post your real code.
â€“Â t3chb0t
Jul 30 at 17:28

@t3chb0t Nothing is changed. Just redacted non relevant code. This is an MCVE I use myself to test and debug said issue on as well. I did seem to have missed the VerifyAsync(), however, in the test scenario this function is return true, as it's not relevant to my test case.
â€“Â MX D
Jul 30 at 20:58

add a commentÂ |Â

active

oldest

votes

搜尋此網誌

trjhtr