Fast implementation to output nested dict to JSON

Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
Each line of the JSON has a structure looks as the following
|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals
Example of a line in JSON output (parameter num_retrieval=2)
Current implementation
def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):
num_sentence = len(conll_sentences)
with open(output_file, 'w') as f:
for i in range(num_sentence):
num_position = len(conll_sentences[i])
level2 = dict()
level1 = dict()
for j in range(num_position):
level0 = dict()
for mc in conll_mcs[i][j]:
mc_str = ' '.join(mc.mention_string)
level0[mc_str] = dict()
level0[mc_str]['index'] = mc.mention
level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
level1[conll_sentences[i][j]] = level0
level2[' '.join(conll_sentences[i])] = level1
json.dump(level2, f)
f.write('n')
Note that self.get_one_mc_retrievals is implemented in parallel, accessing a memory map and performs a hierarchical search. Executing the snippet of code, status showing that CPUs are under-used (like 10-20% for each core as writing is serial programming) and the whole JSON output takes 8 days (approximately 3000 sentences with num_retrievals of 100) to generate. Would it be possible to boost the speed (i.e., Cython, parallel or something else)?
** Edited **
This just comes off my head, would it be possible to utilize multiprocessing to speed up JSON writing? Right now, my code is kind of serially doing a process that retrieves a dict from reading conll_mcs[i] and writes a line to the JSON file. If I could simultaneously perform many of such a process, speed will very likely improve, however I am still trying to find how to put such an idea into practice. Any help would be greatly appreciated.
python python-3.x json
add a comment |Â
up vote
1
down vote
favorite
Each line of the JSON has a structure looks as the following
|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals
Example of a line in JSON output (parameter num_retrieval=2)
Current implementation
def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):
num_sentence = len(conll_sentences)
with open(output_file, 'w') as f:
for i in range(num_sentence):
num_position = len(conll_sentences[i])
level2 = dict()
level1 = dict()
for j in range(num_position):
level0 = dict()
for mc in conll_mcs[i][j]:
mc_str = ' '.join(mc.mention_string)
level0[mc_str] = dict()
level0[mc_str]['index'] = mc.mention
level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
level1[conll_sentences[i][j]] = level0
level2[' '.join(conll_sentences[i])] = level1
json.dump(level2, f)
f.write('n')
Note that self.get_one_mc_retrievals is implemented in parallel, accessing a memory map and performs a hierarchical search. Executing the snippet of code, status showing that CPUs are under-used (like 10-20% for each core as writing is serial programming) and the whole JSON output takes 8 days (approximately 3000 sentences with num_retrievals of 100) to generate. Would it be possible to boost the speed (i.e., Cython, parallel or something else)?
** Edited **
This just comes off my head, would it be possible to utilize multiprocessing to speed up JSON writing? Right now, my code is kind of serially doing a process that retrieves a dict from reading conll_mcs[i] and writes a line to the JSON file. If I could simultaneously perform many of such a process, speed will very likely improve, however I am still trying to find how to put such an idea into practice. Any help would be greatly appreciated.
python python-3.x json
What is the relationship betweenconll_mcsandconll_sentences?
â Austin Hastings
Jan 9 at 17:59
2
You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. Ifget_one_mc_retrievalsis slow, then show us that code.
â Austin Hastings
Jan 9 at 18:09
I found this link pretty helpful stackoverflow.com/questions/13446445/⦠[multiprocessing safely writing to a file].
â Logan
Jan 10 at 16:46
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Each line of the JSON has a structure looks as the following
|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals
Example of a line in JSON output (parameter num_retrieval=2)
Current implementation
def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):
num_sentence = len(conll_sentences)
with open(output_file, 'w') as f:
for i in range(num_sentence):
num_position = len(conll_sentences[i])
level2 = dict()
level1 = dict()
for j in range(num_position):
level0 = dict()
for mc in conll_mcs[i][j]:
mc_str = ' '.join(mc.mention_string)
level0[mc_str] = dict()
level0[mc_str]['index'] = mc.mention
level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
level1[conll_sentences[i][j]] = level0
level2[' '.join(conll_sentences[i])] = level1
json.dump(level2, f)
f.write('n')
Note that self.get_one_mc_retrievals is implemented in parallel, accessing a memory map and performs a hierarchical search. Executing the snippet of code, status showing that CPUs are under-used (like 10-20% for each core as writing is serial programming) and the whole JSON output takes 8 days (approximately 3000 sentences with num_retrievals of 100) to generate. Would it be possible to boost the speed (i.e., Cython, parallel or something else)?
** Edited **
This just comes off my head, would it be possible to utilize multiprocessing to speed up JSON writing? Right now, my code is kind of serially doing a process that retrieves a dict from reading conll_mcs[i] and writes a line to the JSON file. If I could simultaneously perform many of such a process, speed will very likely improve, however I am still trying to find how to put such an idea into practice. Any help would be greatly appreciated.
python python-3.x json
Each line of the JSON has a structure looks as the following
|- Sentence
| |-- WordPosition:
| |-- |-- Candidate
| |-- |-- |-- index:
| |-- |-- |-- retrievals
Example of a line in JSON output (parameter num_retrieval=2)
Current implementation
def output_conll_retrievals_to_json(output_file, conll_mcs,conll_sentences, num_retrieval):
num_sentence = len(conll_sentences)
with open(output_file, 'w') as f:
for i in range(num_sentence):
num_position = len(conll_sentences[i])
level2 = dict()
level1 = dict()
for j in range(num_position):
level0 = dict()
for mc in conll_mcs[i][j]:
mc_str = ' '.join(mc.mention_string)
level0[mc_str] = dict()
level0[mc_str]['index'] = mc.mention
level0[mc_str]['retrievals'] = self.get_one_mc_retrievals(mc, num_retrieval)
level1[conll_sentences[i][j]] = level0
level2[' '.join(conll_sentences[i])] = level1
json.dump(level2, f)
f.write('n')
Note that self.get_one_mc_retrievals is implemented in parallel, accessing a memory map and performs a hierarchical search. Executing the snippet of code, status showing that CPUs are under-used (like 10-20% for each core as writing is serial programming) and the whole JSON output takes 8 days (approximately 3000 sentences with num_retrievals of 100) to generate. Would it be possible to boost the speed (i.e., Cython, parallel or something else)?
** Edited **
This just comes off my head, would it be possible to utilize multiprocessing to speed up JSON writing? Right now, my code is kind of serially doing a process that retrieves a dict from reading conll_mcs[i] and writes a line to the JSON file. If I could simultaneously perform many of such a process, speed will very likely improve, however I am still trying to find how to put such an idea into practice. Any help would be greatly appreciated.
python python-3.x json
edited Jan 10 at 16:48
asked Jan 9 at 16:55
Logan
1877
1877
What is the relationship betweenconll_mcsandconll_sentences?
â Austin Hastings
Jan 9 at 17:59
2
You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. Ifget_one_mc_retrievalsis slow, then show us that code.
â Austin Hastings
Jan 9 at 18:09
I found this link pretty helpful stackoverflow.com/questions/13446445/⦠[multiprocessing safely writing to a file].
â Logan
Jan 10 at 16:46
add a comment |Â
What is the relationship betweenconll_mcsandconll_sentences?
â Austin Hastings
Jan 9 at 17:59
2
You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. Ifget_one_mc_retrievalsis slow, then show us that code.
â Austin Hastings
Jan 9 at 18:09
I found this link pretty helpful stackoverflow.com/questions/13446445/⦠[multiprocessing safely writing to a file].
â Logan
Jan 10 at 16:46
What is the relationship between
conll_mcs and conll_sentences?â Austin Hastings
Jan 9 at 17:59
What is the relationship between
conll_mcs and conll_sentences?â Austin Hastings
Jan 9 at 17:59
2
2
You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If
get_one_mc_retrievals is slow, then show us that code.â Austin Hastings
Jan 9 at 18:09
You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If
get_one_mc_retrievals is slow, then show us that code.â Austin Hastings
Jan 9 at 18:09
I found this link pretty helpful stackoverflow.com/questions/13446445/⦠[multiprocessing safely writing to a file].
â Logan
Jan 10 at 16:46
I found this link pretty helpful stackoverflow.com/questions/13446445/⦠[multiprocessing safely writing to a file].
â Logan
Jan 10 at 16:46
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184668%2ffast-implementation-to-output-nested-dict-to-json%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
What is the relationship between
conll_mcsandconll_sentences?â Austin Hastings
Jan 9 at 17:59
2
You are asking questions about code that you are not showing us, or referencing. There are certainly performance-enhancing options available to you. But you need to tell us (in detail) what you are doing, how you are doing it, and what constraints you are under. If
get_one_mc_retrievalsis slow, then show us that code.â Austin Hastings
Jan 9 at 18:09
I found this link pretty helpful stackoverflow.com/questions/13446445/⦠[multiprocessing safely writing to a file].
â Logan
Jan 10 at 16:46