Fast implementation to output nested dict to JSON

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












Each line of the JSON has a structure looks as the following



|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals


Example of a line in JSON output (parameter num_retrieval=2)
enter image description here



Current implementation



def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):

num_sentence = len(conll_sentences)

with open(output_file, 'w') as f:
for i in range(num_sentence):
num_position = len(conll_sentences[i])
level2 = dict()
level1 = dict()
for j in range(num_position):
level0 = dict()
for mc in conll_mcs[i][j]:
mc_str = ' '.join(mc.mention_string)
level0[mc_str] = dict()
level0[mc_str]['index'] = mc.mention
level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
level1[conll_sentences[i][j]] = level0
level2[' '.join(conll_sentences[i])] = level1
json.dump(level2, f)
f.write('n')


Note that self.get_one_mc_retrievals is implemented in parallel, accessing a memory map and performs a hierarchical search. Executing the snippet of code, status showing that CPUs are under-used (like 10-20% for each core as writing is serial programming) and the whole JSON output takes 8 days (approximately 3000 sentences with num_retrievals of 100) to generate. Would it be possible to boost the speed (i.e., Cython, parallel or something else)?



** Edited **



This just comes off my head, would it be possible to utilize multiprocessing to speed up JSON writing? Right now, my code is kind of serially doing a process that retrieves a dict from reading conll_mcs[i] and writes a line to the JSON file. If I could simultaneously perform many of such a process, speed will very likely improve, however I am still trying to find how to put such an idea into practice. Any help would be greatly appreciated.







share|improve this question





















  • What is the relationship between conll_mcs and conll_sentences?
    – Austin Hastings
    Jan 9 at 17:59






  • 2




    You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
    – Austin Hastings
    Jan 9 at 18:09










  • I found this link pretty helpful stackoverflow.com/questions/13446445/… [multiprocessing safely writing to a file].
    – Logan
    Jan 10 at 16:46
















up vote
1
down vote

favorite












Each line of the JSON has a structure looks as the following



|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals


Example of a line in JSON output (parameter num_retrieval=2)
enter image description here



Current implementation



def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):

num_sentence = len(conll_sentences)

with open(output_file, 'w') as f:
for i in range(num_sentence):
num_position = len(conll_sentences[i])
level2 = dict()
level1 = dict()
for j in range(num_position):
level0 = dict()
for mc in conll_mcs[i][j]:
mc_str = ' '.join(mc.mention_string)
level0[mc_str] = dict()
level0[mc_str]['index'] = mc.mention
level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
level1[conll_sentences[i][j]] = level0
level2[' '.join(conll_sentences[i])] = level1
json.dump(level2, f)
f.write('n')


Note that self.get_one_mc_retrievals is implemented in parallel, accessing a memory map and performs a hierarchical search. Executing the snippet of code, status showing that CPUs are under-used (like 10-20% for each core as writing is serial programming) and the whole JSON output takes 8 days (approximately 3000 sentences with num_retrievals of 100) to generate. Would it be possible to boost the speed (i.e., Cython, parallel or something else)?



** Edited **



This just comes off my head, would it be possible to utilize multiprocessing to speed up JSON writing? Right now, my code is kind of serially doing a process that retrieves a dict from reading conll_mcs[i] and writes a line to the JSON file. If I could simultaneously perform many of such a process, speed will very likely improve, however I am still trying to find how to put such an idea into practice. Any help would be greatly appreciated.







share|improve this question





















  • What is the relationship between conll_mcs and conll_sentences?
    – Austin Hastings
    Jan 9 at 17:59






  • 2




    You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
    – Austin Hastings
    Jan 9 at 18:09










  • I found this link pretty helpful stackoverflow.com/questions/13446445/… [multiprocessing safely writing to a file].
    – Logan
    Jan 10 at 16:46












up vote
1
down vote

favorite









up vote
1
down vote

favorite











Each line of the JSON has a structure looks as the following



|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals


Example of a line in JSON output (parameter num_retrieval=2)
enter image description here



Current implementation



def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):

num_sentence = len(conll_sentences)

with open(output_file, 'w') as f:
for i in range(num_sentence):
num_position = len(conll_sentences[i])
level2 = dict()
level1 = dict()
for j in range(num_position):
level0 = dict()
for mc in conll_mcs[i][j]:
mc_str = ' '.join(mc.mention_string)
level0[mc_str] = dict()
level0[mc_str]['index'] = mc.mention
level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
level1[conll_sentences[i][j]] = level0
level2[' '.join(conll_sentences[i])] = level1
json.dump(level2, f)
f.write('n')


Note that self.get_one_mc_retrievals is implemented in parallel, accessing a memory map and performs a hierarchical search. Executing the snippet of code, status showing that CPUs are under-used (like 10-20% for each core as writing is serial programming) and the whole JSON output takes 8 days (approximately 3000 sentences with num_retrievals of 100) to generate. Would it be possible to boost the speed (i.e., Cython, parallel or something else)?



** Edited **



This just comes off my head, would it be possible to utilize multiprocessing to speed up JSON writing? Right now, my code is kind of serially doing a process that retrieves a dict from reading conll_mcs[i] and writes a line to the JSON file. If I could simultaneously perform many of such a process, speed will very likely improve, however I am still trying to find how to put such an idea into practice. Any help would be greatly appreciated.







share|improve this question













Each line of the JSON has a structure looks as the following



|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals


Example of a line in JSON output (parameter num_retrieval=2)
enter image description here



Current implementation



def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):

num_sentence = len(conll_sentences)

with open(output_file, 'w') as f:
for i in range(num_sentence):
num_position = len(conll_sentences[i])
level2 = dict()
level1 = dict()
for j in range(num_position):
level0 = dict()
for mc in conll_mcs[i][j]:
mc_str = ' '.join(mc.mention_string)
level0[mc_str] = dict()
level0[mc_str]['index'] = mc.mention
level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
level1[conll_sentences[i][j]] = level0
level2[' '.join(conll_sentences[i])] = level1
json.dump(level2, f)
f.write('n')


Note that self.get_one_mc_retrievals is implemented in parallel, accessing a memory map and performs a hierarchical search. Executing the snippet of code, status showing that CPUs are under-used (like 10-20% for each core as writing is serial programming) and the whole JSON output takes 8 days (approximately 3000 sentences with num_retrievals of 100) to generate. Would it be possible to boost the speed (i.e., Cython, parallel or something else)?



** Edited **



This just comes off my head, would it be possible to utilize multiprocessing to speed up JSON writing? Right now, my code is kind of serially doing a process that retrieves a dict from reading conll_mcs[i] and writes a line to the JSON file. If I could simultaneously perform many of such a process, speed will very likely improve, however I am still trying to find how to put such an idea into practice. Any help would be greatly appreciated.









share|improve this question












share|improve this question




share|improve this question








edited Jan 10 at 16:48
























asked Jan 9 at 16:55









Logan

1877




1877











  • What is the relationship between conll_mcs and conll_sentences?
    – Austin Hastings
    Jan 9 at 17:59






  • 2




    You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
    – Austin Hastings
    Jan 9 at 18:09










  • I found this link pretty helpful stackoverflow.com/questions/13446445/… [multiprocessing safely writing to a file].
    – Logan
    Jan 10 at 16:46
















  • What is the relationship between conll_mcs and conll_sentences?
    – Austin Hastings
    Jan 9 at 17:59






  • 2




    You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
    – Austin Hastings
    Jan 9 at 18:09










  • I found this link pretty helpful stackoverflow.com/questions/13446445/… [multiprocessing safely writing to a file].
    – Logan
    Jan 10 at 16:46















What is the relationship between conll_mcs and conll_sentences?
– Austin Hastings
Jan 9 at 17:59




What is the relationship between conll_mcs and conll_sentences?
– Austin Hastings
Jan 9 at 17:59




2




2




You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
– Austin Hastings
Jan 9 at 18:09




You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
– Austin Hastings
Jan 9 at 18:09












I found this link pretty helpful stackoverflow.com/questions/13446445/… [multiprocessing safely writing to a file].
– Logan
Jan 10 at 16:46




I found this link pretty helpful stackoverflow.com/questions/13446445/… [multiprocessing safely writing to a file].
– Logan
Jan 10 at 16:46















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184668%2ffast-implementation-to-output-nested-dict-to-json%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184668%2ffast-implementation-to-output-nested-dict-to-json%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods