Memory management - Large amount of data

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
0
down vote

favorite

Context

I created a bundle to run asynchronous actions (using RabbitMQ) from data extracted by a service named extract_rule in the bundle context.

There are some definitions from documentationÃ‚Â :

An extract_rule refers to a symfony service that will retrieve an array of data. A task will be created for each item of this array.

An action is a service doing any work you want. It can be triggered by other previous actions in a definable and predictable order (composing what we call a workflow).

A workflow refers to the way actions are linked together. You can use conditions depending of the results of previous actions to trigger an action or another.

A task refers to multiple actions linked together for one extracted data. It's a mongo document, and can be used to resume actions if they failed

With all these, we can compose a task configuration to define how tasks are created and processed.

Here is a simple schema that will help get a picture of how tasks are created and processed. Each arrow can represent a RabbitMq message that is sent and will be consumed.

enter image description here

The data extraction is the entrypoint of the execution.

I am creating a custom extract rule and i'm facing with a memory problem...

This is a code snippet of the handler that will call the extract rule when neededÃ‚Â :

<?php

namespace IDCIBundleTaskBundleHandler;

use SymfonyComponentEventDispatcherEventDispatcherInterface;
use IDCIBundleTaskBundleModelAbstractTaskConfiguration;
use IDCIBundleTaskBundleEventDataExtractedEvent;
use IDCIBundleTaskBundleExtractRuleExtractRuleRegistry;

class ExtractRuleHandler

 /**
 * @var ExtractRuleRegistry
 */
 protected $registry;

 /**
 * @var EventDispatcherInterface
 */
 protected $dispatcher;

 /**
 * Constructor.
 *
 * @param ExtractRuleRegistry $registry
 * @param EventDispatcherInterface $dispatcher
 */
 public function __construct(
 ExtractRuleRegistry $registry,
 EventDispatcherInterface $dispatcher
 ) 
 $this->registry = $registry;
 $this->dispatcher = $dispatcher;
 

 /**
 * Execute all extract rules and log for each
 *
 * @param AbstractTaskConfiguration $taskConfiguration
 */
 public function execute(AbstractTaskConfiguration $taskConfiguration)
 
 $extractRuleConfiguration = json_decode($taskConfiguration->getExtractRule(), true);

 // Extract data
 $extractedData = $this->registry
 ->getRule($extractRuleConfiguration['service'])
 ->extract($extractRuleConfiguration['parameters'])
 ;

 // Dispatch event with extractData and taskConfiguration
 $this->dispatcher->dispatch(
 DataExtractedEvent::NAME,
 new DataExtractedEvent($taskConfiguration, $extractedData)
 );

As you can see above, the extract method called directly without optimization. If someone create an extract rule that extracts millions of data it will be very long and intensive for the memory... Data can be extracted from everything possible (API, file, etc...).

I am sure I'm missing something in my extract rule management. I want to find a "generic" way to handle properly the memory and performance. I think this concept should be abstract for someone who want to create a custom extract rule.

Questions

Are they concepts (design pattern like) to handle this problem ?

Can someone enlighten me because i'm really lost ?

I really hope I have been clear,
Thanks a lot for your answers :)

edited Jun 6 at 13:56

asked Jun 6 at 12:32

BwaBwa

add a commentÂ |Â

up vote
0
down vote

favorite

Context

I created a bundle to run asynchronous actions (using RabbitMQ) from data extracted by a service named extract_rule in the bundle context.

There are some definitions from documentationÃ‚Â :

An extract_rule refers to a symfony service that will retrieve an array of data. A task will be created for each item of this array.

An action is a service doing any work you want. It can be triggered by other previous actions in a definable and predictable order (composing what we call a workflow).

A workflow refers to the way actions are linked together. You can use conditions depending of the results of previous actions to trigger an action or another.

A task refers to multiple actions linked together for one extracted data. It's a mongo document, and can be used to resume actions if they failed

With all these, we can compose a task configuration to define how tasks are created and processed.

Here is a simple schema that will help get a picture of how tasks are created and processed. Each arrow can represent a RabbitMq message that is sent and will be consumed.

enter image description here

The data extraction is the entrypoint of the execution.

I am creating a custom extract rule and i'm facing with a memory problem...

This is a code snippet of the handler that will call the extract rule when neededÃ‚Â :

<?php

namespace IDCIBundleTaskBundleHandler;

use SymfonyComponentEventDispatcherEventDispatcherInterface;
use IDCIBundleTaskBundleModelAbstractTaskConfiguration;
use IDCIBundleTaskBundleEventDataExtractedEvent;
use IDCIBundleTaskBundleExtractRuleExtractRuleRegistry;

class ExtractRuleHandler

 /**
 * @var ExtractRuleRegistry
 */
 protected $registry;

 /**
 * @var EventDispatcherInterface
 */
 protected $dispatcher;

 /**
 * Constructor.
 *
 * @param ExtractRuleRegistry $registry
 * @param EventDispatcherInterface $dispatcher
 */
 public function __construct(
 ExtractRuleRegistry $registry,
 EventDispatcherInterface $dispatcher
 ) 
 $this->registry = $registry;
 $this->dispatcher = $dispatcher;
 

 /**
 * Execute all extract rules and log for each
 *
 * @param AbstractTaskConfiguration $taskConfiguration
 */
 public function execute(AbstractTaskConfiguration $taskConfiguration)
 
 $extractRuleConfiguration = json_decode($taskConfiguration->getExtractRule(), true);

 // Extract data
 $extractedData = $this->registry
 ->getRule($extractRuleConfiguration['service'])
 ->extract($extractRuleConfiguration['parameters'])
 ;

 // Dispatch event with extractData and taskConfiguration
 $this->dispatcher->dispatch(
 DataExtractedEvent::NAME,
 new DataExtractedEvent($taskConfiguration, $extractedData)
 );

Questions

Are they concepts (design pattern like) to handle this problem ?

Can someone enlighten me because i'm really lost ?

I really hope I have been clear,
Thanks a lot for your answers :)

edited Jun 6 at 13:56

asked Jun 6 at 12:32

BwaBwa

add a commentÂ |Â

up vote
0
down vote

favorite

Context

I created a bundle to run asynchronous actions (using RabbitMQ) from data extracted by a service named extract_rule in the bundle context.

There are some definitions from documentationÃ‚Â :

An extract_rule refers to a symfony service that will retrieve an array of data. A task will be created for each item of this array.

An action is a service doing any work you want. It can be triggered by other previous actions in a definable and predictable order (composing what we call a workflow).

A workflow refers to the way actions are linked together. You can use conditions depending of the results of previous actions to trigger an action or another.

A task refers to multiple actions linked together for one extracted data. It's a mongo document, and can be used to resume actions if they failed

With all these, we can compose a task configuration to define how tasks are created and processed.

Here is a simple schema that will help get a picture of how tasks are created and processed. Each arrow can represent a RabbitMq message that is sent and will be consumed.

enter image description here

The data extraction is the entrypoint of the execution.

I am creating a custom extract rule and i'm facing with a memory problem...

This is a code snippet of the handler that will call the extract rule when neededÃ‚Â :

<?php

namespace IDCIBundleTaskBundleHandler;

use SymfonyComponentEventDispatcherEventDispatcherInterface;
use IDCIBundleTaskBundleModelAbstractTaskConfiguration;
use IDCIBundleTaskBundleEventDataExtractedEvent;
use IDCIBundleTaskBundleExtractRuleExtractRuleRegistry;

class ExtractRuleHandler

 /**
 * @var ExtractRuleRegistry
 */
 protected $registry;

 /**
 * @var EventDispatcherInterface
 */
 protected $dispatcher;

 /**
 * Constructor.
 *
 * @param ExtractRuleRegistry $registry
 * @param EventDispatcherInterface $dispatcher
 */
 public function __construct(
 ExtractRuleRegistry $registry,
 EventDispatcherInterface $dispatcher
 ) 
 $this->registry = $registry;
 $this->dispatcher = $dispatcher;
 

 /**
 * Execute all extract rules and log for each
 *
 * @param AbstractTaskConfiguration $taskConfiguration
 */
 public function execute(AbstractTaskConfiguration $taskConfiguration)
 
 $extractRuleConfiguration = json_decode($taskConfiguration->getExtractRule(), true);

 // Extract data
 $extractedData = $this->registry
 ->getRule($extractRuleConfiguration['service'])
 ->extract($extractRuleConfiguration['parameters'])
 ;

 // Dispatch event with extractData and taskConfiguration
 $this->dispatcher->dispatch(
 DataExtractedEvent::NAME,
 new DataExtractedEvent($taskConfiguration, $extractedData)
 );

Questions

Are they concepts (design pattern like) to handle this problem ?

Can someone enlighten me because i'm really lost ?

I really hope I have been clear,
Thanks a lot for your answers :)

edited Jun 6 at 13:56

asked Jun 6 at 12:32

BwaBwa

Context

I created a bundle to run asynchronous actions (using RabbitMQ) from data extracted by a service named extract_rule in the bundle context.

There are some definitions from documentationÃ‚Â :

An extract_rule refers to a symfony service that will retrieve an array of data. A task will be created for each item of this array.

An action is a service doing any work you want. It can be triggered by other previous actions in a definable and predictable order (composing what we call a workflow).

A workflow refers to the way actions are linked together. You can use conditions depending of the results of previous actions to trigger an action or another.

A task refers to multiple actions linked together for one extracted data. It's a mongo document, and can be used to resume actions if they failed

With all these, we can compose a task configuration to define how tasks are created and processed.

Here is a simple schema that will help get a picture of how tasks are created and processed. Each arrow can represent a RabbitMq message that is sent and will be consumed.

enter image description here

The data extraction is the entrypoint of the execution.

I am creating a custom extract rule and i'm facing with a memory problem...

This is a code snippet of the handler that will call the extract rule when neededÃ‚Â :

<?php

namespace IDCIBundleTaskBundleHandler;

use SymfonyComponentEventDispatcherEventDispatcherInterface;
use IDCIBundleTaskBundleModelAbstractTaskConfiguration;
use IDCIBundleTaskBundleEventDataExtractedEvent;
use IDCIBundleTaskBundleExtractRuleExtractRuleRegistry;

class ExtractRuleHandler

 /**
 * @var ExtractRuleRegistry
 */
 protected $registry;

 /**
 * @var EventDispatcherInterface
 */
 protected $dispatcher;

 /**
 * Constructor.
 *
 * @param ExtractRuleRegistry $registry
 * @param EventDispatcherInterface $dispatcher
 */
 public function __construct(
 ExtractRuleRegistry $registry,
 EventDispatcherInterface $dispatcher
 ) 
 $this->registry = $registry;
 $this->dispatcher = $dispatcher;
 

 /**
 * Execute all extract rules and log for each
 *
 * @param AbstractTaskConfiguration $taskConfiguration
 */
 public function execute(AbstractTaskConfiguration $taskConfiguration)
 
 $extractRuleConfiguration = json_decode($taskConfiguration->getExtractRule(), true);

 // Extract data
 $extractedData = $this->registry
 ->getRule($extractRuleConfiguration['service'])
 ->extract($extractRuleConfiguration['parameters'])
 ;

 // Dispatch event with extractData and taskConfiguration
 $this->dispatcher->dispatch(
 DataExtractedEvent::NAME,
 new DataExtractedEvent($taskConfiguration, $extractedData)
 );

Questions

Are they concepts (design pattern like) to handle this problem ?

Can someone enlighten me because i'm really lost ?

I really hope I have been clear,
Thanks a lot for your answers :)

edited Jun 6 at 13:56

asked Jun 6 at 12:32

BwaBwa

edited Jun 6 at 13:56

asked Jun 6 at 12:32

BwaBwa

asked Jun 6 at 12:32

BwaBwa

asked Jun 6 at 12:32

BwaBwa

add a commentÂ |Â

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195953%2fmemory-management-large-amount-of-data%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr