Memory management - Large amount of data

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
0
down vote

favorite












Context



I created a bundle to run asynchronous actions (using RabbitMQ) from data extracted by a service named extract_rule in the bundle context.



There are some definitions from documentation :




  • An extract_rule refers to a symfony service that will retrieve an array of data. A task will be created for each item of this array.


  • An action is a service doing any work you want. It can be triggered by other previous actions in a definable and predictable order (composing what we call a workflow).


  • A workflow refers to the way actions are linked together. You can use conditions depending of the results of previous actions to trigger an action or another.


  • A task refers to multiple actions linked together for one extracted data. It's a mongo document, and can be used to resume actions if they failed

  • With all these, we can compose a task configuration to define how tasks are created and processed.

Here is a simple schema that will help get a picture of how tasks are created and processed. Each arrow can represent a RabbitMq message that is sent and will be consumed.



enter image description here



The data extraction is the entrypoint of the execution.



I am creating a custom extract rule and i'm facing with a memory problem...



This is a code snippet of the handler that will call the extract rule when needed :



<?php

namespace IDCIBundleTaskBundleHandler;

use SymfonyComponentEventDispatcherEventDispatcherInterface;
use IDCIBundleTaskBundleModelAbstractTaskConfiguration;
use IDCIBundleTaskBundleEventDataExtractedEvent;
use IDCIBundleTaskBundleExtractRuleExtractRuleRegistry;

class ExtractRuleHandler

/**
* @var ExtractRuleRegistry
*/
protected $registry;

/**
* @var EventDispatcherInterface
*/
protected $dispatcher;

/**
* Constructor.
*
* @param ExtractRuleRegistry $registry
* @param EventDispatcherInterface $dispatcher
*/
public function __construct(
ExtractRuleRegistry $registry,
EventDispatcherInterface $dispatcher
)
$this->registry = $registry;
$this->dispatcher = $dispatcher;


/**
* Execute all extract rules and log for each
*
* @param AbstractTaskConfiguration $taskConfiguration
*/
public function execute(AbstractTaskConfiguration $taskConfiguration)

$extractRuleConfiguration = json_decode($taskConfiguration->getExtractRule(), true);

// Extract data
$extractedData = $this->registry
->getRule($extractRuleConfiguration['service'])
->extract($extractRuleConfiguration['parameters'])
;

// Dispatch event with extractData and taskConfiguration
$this->dispatcher->dispatch(
DataExtractedEvent::NAME,
new DataExtractedEvent($taskConfiguration, $extractedData)
);




As you can see above, the extract method called directly without optimization. If someone create an extract rule that extracts millions of data it will be very long and intensive for the memory... Data can be extracted from everything possible (API, file, etc...).



I am sure I'm missing something in my extract rule management. I want to find a "generic" way to handle properly the memory and performance. I think this concept should be abstract for someone who want to create a custom extract rule.



Questions



  • Are they concepts (design pattern like) to handle this problem ?

  • Can someone enlighten me because i'm really lost ?

I really hope I have been clear,
Thanks a lot for your answers :)







share|improve this question



























    up vote
    0
    down vote

    favorite












    Context



    I created a bundle to run asynchronous actions (using RabbitMQ) from data extracted by a service named extract_rule in the bundle context.



    There are some definitions from documentation :




    • An extract_rule refers to a symfony service that will retrieve an array of data. A task will be created for each item of this array.


    • An action is a service doing any work you want. It can be triggered by other previous actions in a definable and predictable order (composing what we call a workflow).


    • A workflow refers to the way actions are linked together. You can use conditions depending of the results of previous actions to trigger an action or another.


    • A task refers to multiple actions linked together for one extracted data. It's a mongo document, and can be used to resume actions if they failed

    • With all these, we can compose a task configuration to define how tasks are created and processed.

    Here is a simple schema that will help get a picture of how tasks are created and processed. Each arrow can represent a RabbitMq message that is sent and will be consumed.



    enter image description here



    The data extraction is the entrypoint of the execution.



    I am creating a custom extract rule and i'm facing with a memory problem...



    This is a code snippet of the handler that will call the extract rule when needed :



    <?php

    namespace IDCIBundleTaskBundleHandler;

    use SymfonyComponentEventDispatcherEventDispatcherInterface;
    use IDCIBundleTaskBundleModelAbstractTaskConfiguration;
    use IDCIBundleTaskBundleEventDataExtractedEvent;
    use IDCIBundleTaskBundleExtractRuleExtractRuleRegistry;

    class ExtractRuleHandler

    /**
    * @var ExtractRuleRegistry
    */
    protected $registry;

    /**
    * @var EventDispatcherInterface
    */
    protected $dispatcher;

    /**
    * Constructor.
    *
    * @param ExtractRuleRegistry $registry
    * @param EventDispatcherInterface $dispatcher
    */
    public function __construct(
    ExtractRuleRegistry $registry,
    EventDispatcherInterface $dispatcher
    )
    $this->registry = $registry;
    $this->dispatcher = $dispatcher;


    /**
    * Execute all extract rules and log for each
    *
    * @param AbstractTaskConfiguration $taskConfiguration
    */
    public function execute(AbstractTaskConfiguration $taskConfiguration)

    $extractRuleConfiguration = json_decode($taskConfiguration->getExtractRule(), true);

    // Extract data
    $extractedData = $this->registry
    ->getRule($extractRuleConfiguration['service'])
    ->extract($extractRuleConfiguration['parameters'])
    ;

    // Dispatch event with extractData and taskConfiguration
    $this->dispatcher->dispatch(
    DataExtractedEvent::NAME,
    new DataExtractedEvent($taskConfiguration, $extractedData)
    );




    As you can see above, the extract method called directly without optimization. If someone create an extract rule that extracts millions of data it will be very long and intensive for the memory... Data can be extracted from everything possible (API, file, etc...).



    I am sure I'm missing something in my extract rule management. I want to find a "generic" way to handle properly the memory and performance. I think this concept should be abstract for someone who want to create a custom extract rule.



    Questions



    • Are they concepts (design pattern like) to handle this problem ?

    • Can someone enlighten me because i'm really lost ?

    I really hope I have been clear,
    Thanks a lot for your answers :)







    share|improve this question























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Context



      I created a bundle to run asynchronous actions (using RabbitMQ) from data extracted by a service named extract_rule in the bundle context.



      There are some definitions from documentation :




      • An extract_rule refers to a symfony service that will retrieve an array of data. A task will be created for each item of this array.


      • An action is a service doing any work you want. It can be triggered by other previous actions in a definable and predictable order (composing what we call a workflow).


      • A workflow refers to the way actions are linked together. You can use conditions depending of the results of previous actions to trigger an action or another.


      • A task refers to multiple actions linked together for one extracted data. It's a mongo document, and can be used to resume actions if they failed

      • With all these, we can compose a task configuration to define how tasks are created and processed.

      Here is a simple schema that will help get a picture of how tasks are created and processed. Each arrow can represent a RabbitMq message that is sent and will be consumed.



      enter image description here



      The data extraction is the entrypoint of the execution.



      I am creating a custom extract rule and i'm facing with a memory problem...



      This is a code snippet of the handler that will call the extract rule when needed :



      <?php

      namespace IDCIBundleTaskBundleHandler;

      use SymfonyComponentEventDispatcherEventDispatcherInterface;
      use IDCIBundleTaskBundleModelAbstractTaskConfiguration;
      use IDCIBundleTaskBundleEventDataExtractedEvent;
      use IDCIBundleTaskBundleExtractRuleExtractRuleRegistry;

      class ExtractRuleHandler

      /**
      * @var ExtractRuleRegistry
      */
      protected $registry;

      /**
      * @var EventDispatcherInterface
      */
      protected $dispatcher;

      /**
      * Constructor.
      *
      * @param ExtractRuleRegistry $registry
      * @param EventDispatcherInterface $dispatcher
      */
      public function __construct(
      ExtractRuleRegistry $registry,
      EventDispatcherInterface $dispatcher
      )
      $this->registry = $registry;
      $this->dispatcher = $dispatcher;


      /**
      * Execute all extract rules and log for each
      *
      * @param AbstractTaskConfiguration $taskConfiguration
      */
      public function execute(AbstractTaskConfiguration $taskConfiguration)

      $extractRuleConfiguration = json_decode($taskConfiguration->getExtractRule(), true);

      // Extract data
      $extractedData = $this->registry
      ->getRule($extractRuleConfiguration['service'])
      ->extract($extractRuleConfiguration['parameters'])
      ;

      // Dispatch event with extractData and taskConfiguration
      $this->dispatcher->dispatch(
      DataExtractedEvent::NAME,
      new DataExtractedEvent($taskConfiguration, $extractedData)
      );




      As you can see above, the extract method called directly without optimization. If someone create an extract rule that extracts millions of data it will be very long and intensive for the memory... Data can be extracted from everything possible (API, file, etc...).



      I am sure I'm missing something in my extract rule management. I want to find a "generic" way to handle properly the memory and performance. I think this concept should be abstract for someone who want to create a custom extract rule.



      Questions



      • Are they concepts (design pattern like) to handle this problem ?

      • Can someone enlighten me because i'm really lost ?

      I really hope I have been clear,
      Thanks a lot for your answers :)







      share|improve this question













      Context



      I created a bundle to run asynchronous actions (using RabbitMQ) from data extracted by a service named extract_rule in the bundle context.



      There are some definitions from documentation :




      • An extract_rule refers to a symfony service that will retrieve an array of data. A task will be created for each item of this array.


      • An action is a service doing any work you want. It can be triggered by other previous actions in a definable and predictable order (composing what we call a workflow).


      • A workflow refers to the way actions are linked together. You can use conditions depending of the results of previous actions to trigger an action or another.


      • A task refers to multiple actions linked together for one extracted data. It's a mongo document, and can be used to resume actions if they failed

      • With all these, we can compose a task configuration to define how tasks are created and processed.

      Here is a simple schema that will help get a picture of how tasks are created and processed. Each arrow can represent a RabbitMq message that is sent and will be consumed.



      enter image description here



      The data extraction is the entrypoint of the execution.



      I am creating a custom extract rule and i'm facing with a memory problem...



      This is a code snippet of the handler that will call the extract rule when needed :



      <?php

      namespace IDCIBundleTaskBundleHandler;

      use SymfonyComponentEventDispatcherEventDispatcherInterface;
      use IDCIBundleTaskBundleModelAbstractTaskConfiguration;
      use IDCIBundleTaskBundleEventDataExtractedEvent;
      use IDCIBundleTaskBundleExtractRuleExtractRuleRegistry;

      class ExtractRuleHandler

      /**
      * @var ExtractRuleRegistry
      */
      protected $registry;

      /**
      * @var EventDispatcherInterface
      */
      protected $dispatcher;

      /**
      * Constructor.
      *
      * @param ExtractRuleRegistry $registry
      * @param EventDispatcherInterface $dispatcher
      */
      public function __construct(
      ExtractRuleRegistry $registry,
      EventDispatcherInterface $dispatcher
      )
      $this->registry = $registry;
      $this->dispatcher = $dispatcher;


      /**
      * Execute all extract rules and log for each
      *
      * @param AbstractTaskConfiguration $taskConfiguration
      */
      public function execute(AbstractTaskConfiguration $taskConfiguration)

      $extractRuleConfiguration = json_decode($taskConfiguration->getExtractRule(), true);

      // Extract data
      $extractedData = $this->registry
      ->getRule($extractRuleConfiguration['service'])
      ->extract($extractRuleConfiguration['parameters'])
      ;

      // Dispatch event with extractData and taskConfiguration
      $this->dispatcher->dispatch(
      DataExtractedEvent::NAME,
      new DataExtractedEvent($taskConfiguration, $extractedData)
      );




      As you can see above, the extract method called directly without optimization. If someone create an extract rule that extracts millions of data it will be very long and intensive for the memory... Data can be extracted from everything possible (API, file, etc...).



      I am sure I'm missing something in my extract rule management. I want to find a "generic" way to handle properly the memory and performance. I think this concept should be abstract for someone who want to create a custom extract rule.



      Questions



      • Are they concepts (design pattern like) to handle this problem ?

      • Can someone enlighten me because i'm really lost ?

      I really hope I have been clear,
      Thanks a lot for your answers :)









      share|improve this question












      share|improve this question




      share|improve this question








      edited Jun 6 at 13:56
























      asked Jun 6 at 12:32









      BwaBwa

      14




      14

























          active

          oldest

          votes











          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195953%2fmemory-management-large-amount-of-data%23new-answer', 'question_page');

          );

          Post as a guest



































          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes










           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195953%2fmemory-management-large-amount-of-data%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Greedy Best First Search implementation in Rust

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          C++11 CLH Lock Implementation