Fast implementation to output nested dict to JSON

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
1
down vote

favorite

Each line of the JSON has a structure looks as the following

|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals

Example of a line in JSON output (parameter num_retrieval=2)
enter image description here

Current implementation

def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):

 num_sentence = len(conll_sentences)

 with open(output_file, 'w') as f:
 for i in range(num_sentence):
 num_position = len(conll_sentences[i])
 level2 = dict()
 level1 = dict()
 for j in range(num_position):
 level0 = dict()
 for mc in conll_mcs[i][j]:
 mc_str = ' '.join(mc.mention_string)
 level0[mc_str] = dict()
 level0[mc_str]['index'] = mc.mention
 level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
 level1[conll_sentences[i][j]] = level0
 level2[' '.join(conll_sentences[i])] = level1
 json.dump(level2, f)
 f.write('n')

Note that self.get_one_mc_retrievals is implemented in parallel, accessing a memory map and performs a hierarchical search. Executing the snippet of code, status showing that CPUs are under-used (like 10-20% for each core as writing is serial programming) and the whole JSON output takes 8 days (approximately 3000 sentences with num_retrievals of 100) to generate. Would it be possible to boost the speed (i.e., Cython, parallel or something else)?

** Edited **

This just comes off my head, would it be possible to utilize multiprocessing to speed up JSON writing? Right now, my code is kind of serially doing a process that retrieves a dict from reading conll_mcs[i] and writes a line to the JSON file. If I could simultaneously perform many of such a process, speed will very likely improve, however I am still trying to find how to put such an idea into practice. Any help would be greatly appreciated.

edited Jan 10 at 16:48

asked Jan 9 at 16:55

Logan

1877

What is the relationship between conll_mcs and conll_sentences?
â€“Â Austin Hastings
Jan 9 at 17:59

2

You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
â€“Â Austin Hastings
Jan 9 at 18:09

I found this link pretty helpful stackoverflow.com/questions/13446445/â€¦ [multiprocessing safely writing to a file].
â€“Â Logan
Jan 10 at 16:46

add a commentÂ |Â

up vote
1
down vote

favorite

Each line of the JSON has a structure looks as the following

|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals

Example of a line in JSON output (parameter num_retrieval=2)
enter image description here

Current implementation

def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):

 num_sentence = len(conll_sentences)

 with open(output_file, 'w') as f:
 for i in range(num_sentence):
 num_position = len(conll_sentences[i])
 level2 = dict()
 level1 = dict()
 for j in range(num_position):
 level0 = dict()
 for mc in conll_mcs[i][j]:
 mc_str = ' '.join(mc.mention_string)
 level0[mc_str] = dict()
 level0[mc_str]['index'] = mc.mention
 level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
 level1[conll_sentences[i][j]] = level0
 level2[' '.join(conll_sentences[i])] = level1
 json.dump(level2, f)
 f.write('n')

** Edited **

edited Jan 10 at 16:48

asked Jan 9 at 16:55

Logan

1877

What is the relationship between conll_mcs and conll_sentences?
â€“Â Austin Hastings
Jan 9 at 17:59

2

You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
â€“Â Austin Hastings
Jan 9 at 18:09

I found this link pretty helpful stackoverflow.com/questions/13446445/â€¦ [multiprocessing safely writing to a file].
â€“Â Logan
Jan 10 at 16:46

add a commentÂ |Â

up vote
1
down vote

favorite

Each line of the JSON has a structure looks as the following

|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals

Example of a line in JSON output (parameter num_retrieval=2)
enter image description here

Current implementation

def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):

 num_sentence = len(conll_sentences)

 with open(output_file, 'w') as f:
 for i in range(num_sentence):
 num_position = len(conll_sentences[i])
 level2 = dict()
 level1 = dict()
 for j in range(num_position):
 level0 = dict()
 for mc in conll_mcs[i][j]:
 mc_str = ' '.join(mc.mention_string)
 level0[mc_str] = dict()
 level0[mc_str]['index'] = mc.mention
 level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
 level1[conll_sentences[i][j]] = level0
 level2[' '.join(conll_sentences[i])] = level1
 json.dump(level2, f)
 f.write('n')

** Edited **

edited Jan 10 at 16:48

asked Jan 9 at 16:55

Logan

1877

Each line of the JSON has a structure looks as the following

|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals

Example of a line in JSON output (parameter num_retrieval=2)
enter image description here

Current implementation

def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):

 num_sentence = len(conll_sentences)

 with open(output_file, 'w') as f:
 for i in range(num_sentence):
 num_position = len(conll_sentences[i])
 level2 = dict()
 level1 = dict()
 for j in range(num_position):
 level0 = dict()
 for mc in conll_mcs[i][j]:
 mc_str = ' '.join(mc.mention_string)
 level0[mc_str] = dict()
 level0[mc_str]['index'] = mc.mention
 level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
 level1[conll_sentences[i][j]] = level0
 level2[' '.join(conll_sentences[i])] = level1
 json.dump(level2, f)
 f.write('n')

** Edited **

edited Jan 10 at 16:48

asked Jan 9 at 16:55

Logan

1877

edited Jan 10 at 16:48

asked Jan 9 at 16:55

Logan

1877

asked Jan 9 at 16:55

Logan

1877

asked Jan 9 at 16:55

Logan

1877

What is the relationship between conll_mcs and conll_sentences?
â€“Â Austin Hastings
Jan 9 at 17:59

2

You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
â€“Â Austin Hastings
Jan 9 at 18:09

I found this link pretty helpful stackoverflow.com/questions/13446445/â€¦ [multiprocessing safely writing to a file].
â€“Â Logan
Jan 10 at 16:46

add a commentÂ |Â

What is the relationship between conll_mcs and conll_sentences?
â€“Â Austin Hastings
Jan 9 at 17:59

2

You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
â€“Â Austin Hastings
Jan 9 at 18:09

I found this link pretty helpful stackoverflow.com/questions/13446445/â€¦ [multiprocessing safely writing to a file].
â€“Â Logan
Jan 10 at 16:46

What is the relationship between conll_mcs and conll_sentences?
â€“Â Austin Hastings
Jan 9 at 17:59

You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If get_one_mc_retrievals is slow, then show us that code.
â€“Â Austin Hastings
Jan 9 at 18:09

I found this link pretty helpful stackoverflow.com/questions/13446445/â€¦ [multiprocessing safely writing to a file].
â€“Â Logan
Jan 10 at 16:46

add a commentÂ |Â

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184668%2ffast-implementation-to-output-nested-dict-to-json%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr