Iterate over a list of list names as file names
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
4
down vote
favorite
Interrogating a web API using query-url's, for each query I can get either zero hits, one hit, or multiple hits. Each of those categories needs to go into a separate CSV file for manual review or processing later. (More on this project here and here).
The input data (from a 14K line csv, one line per artist) is full of holes. Only a name is always given, but may be misspelled or in a form that the API does not recognise. Birth dates, death dates may or may not be known, with a precision like for example 'not before may 1533, not after january 1534'. It may also have exact dates in ISO format.
Using those three different output csv's, the user may go back to their source, try to refine their data, and run the script again to get a better match. Exactly one hit is what we go for: a persistent identifier for this specific artist.
In the code below, df
is a Pandas dataframe that has all the information in a form that is easiest to interrogate the API with.
First, I try to get an exact match best_q
(exact match of name string + any of the available input fields elsewhere in the record), if that yields zero, I try a slightly more loose match bracket_q
(any of the words in the literal name string + any of the available input fields elsewhere in the record).
I output the dataframe as a separate csv, and each list of zero hits, single hits, or multiple hits also in a separate csv.
I'm seeking advice on two specific things.
Is there a more Pythonic way of handling the lists? Right now, I think the code is readable enough, but I have one line to generate the lists, another to put them in a list of lists, and another to put them in a list of listnames.
The second thing is the nested
if..elif
on zero hits for the first query. I know it ain't pretty, but it's still quite readable (to me), and I don't see how I could do that any other way. That is: I have to trybest q
first, and only if it yields zero, try again withbracket_q
.
I have omitted what goes before. It works, it's been reviewed, I'm happy with it.
A final note: I'm not very concerned about performance, because the API is the bottleneck. I am concerned about readability. Users may want to tweak the script, somewhere down the line.
singles, multiples, zeroes = ( for i in range(3))
for row in df.itertuples():
query = best_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
singles.append([row.priref, row.name, hits, uri])
elif hits > 1:
multiples.append([row.priref, row.name, hits])
elif hits == 0:
query = bracket_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
singles.append([row.priref, row.name, hits, uri])
elif hits > 1:
multiples.append([row.priref, row.name, hits])
elif hits == 0:
zeroes.append([row.priref, str(row.name)]) # PM: str!!
lists = singles, multiples, zeroes
listnames = ['singles','multiples','zeroes']
for s, l in zip(listnames, lists):
listfile = '_.csv'.format(input_fname, s)
writelist(list=l, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
python python-3.x csv pandas
add a comment |Â
up vote
4
down vote
favorite
Interrogating a web API using query-url's, for each query I can get either zero hits, one hit, or multiple hits. Each of those categories needs to go into a separate CSV file for manual review or processing later. (More on this project here and here).
The input data (from a 14K line csv, one line per artist) is full of holes. Only a name is always given, but may be misspelled or in a form that the API does not recognise. Birth dates, death dates may or may not be known, with a precision like for example 'not before may 1533, not after january 1534'. It may also have exact dates in ISO format.
Using those three different output csv's, the user may go back to their source, try to refine their data, and run the script again to get a better match. Exactly one hit is what we go for: a persistent identifier for this specific artist.
In the code below, df
is a Pandas dataframe that has all the information in a form that is easiest to interrogate the API with.
First, I try to get an exact match best_q
(exact match of name string + any of the available input fields elsewhere in the record), if that yields zero, I try a slightly more loose match bracket_q
(any of the words in the literal name string + any of the available input fields elsewhere in the record).
I output the dataframe as a separate csv, and each list of zero hits, single hits, or multiple hits also in a separate csv.
I'm seeking advice on two specific things.
Is there a more Pythonic way of handling the lists? Right now, I think the code is readable enough, but I have one line to generate the lists, another to put them in a list of lists, and another to put them in a list of listnames.
The second thing is the nested
if..elif
on zero hits for the first query. I know it ain't pretty, but it's still quite readable (to me), and I don't see how I could do that any other way. That is: I have to trybest q
first, and only if it yields zero, try again withbracket_q
.
I have omitted what goes before. It works, it's been reviewed, I'm happy with it.
A final note: I'm not very concerned about performance, because the API is the bottleneck. I am concerned about readability. Users may want to tweak the script, somewhere down the line.
singles, multiples, zeroes = ( for i in range(3))
for row in df.itertuples():
query = best_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
singles.append([row.priref, row.name, hits, uri])
elif hits > 1:
multiples.append([row.priref, row.name, hits])
elif hits == 0:
query = bracket_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
singles.append([row.priref, row.name, hits, uri])
elif hits > 1:
multiples.append([row.priref, row.name, hits])
elif hits == 0:
zeroes.append([row.priref, str(row.name)]) # PM: str!!
lists = singles, multiples, zeroes
listnames = ['singles','multiples','zeroes']
for s, l in zip(listnames, lists):
listfile = '_.csv'.format(input_fname, s)
writelist(list=l, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
python python-3.x csv pandas
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
Interrogating a web API using query-url's, for each query I can get either zero hits, one hit, or multiple hits. Each of those categories needs to go into a separate CSV file for manual review or processing later. (More on this project here and here).
The input data (from a 14K line csv, one line per artist) is full of holes. Only a name is always given, but may be misspelled or in a form that the API does not recognise. Birth dates, death dates may or may not be known, with a precision like for example 'not before may 1533, not after january 1534'. It may also have exact dates in ISO format.
Using those three different output csv's, the user may go back to their source, try to refine their data, and run the script again to get a better match. Exactly one hit is what we go for: a persistent identifier for this specific artist.
In the code below, df
is a Pandas dataframe that has all the information in a form that is easiest to interrogate the API with.
First, I try to get an exact match best_q
(exact match of name string + any of the available input fields elsewhere in the record), if that yields zero, I try a slightly more loose match bracket_q
(any of the words in the literal name string + any of the available input fields elsewhere in the record).
I output the dataframe as a separate csv, and each list of zero hits, single hits, or multiple hits also in a separate csv.
I'm seeking advice on two specific things.
Is there a more Pythonic way of handling the lists? Right now, I think the code is readable enough, but I have one line to generate the lists, another to put them in a list of lists, and another to put them in a list of listnames.
The second thing is the nested
if..elif
on zero hits for the first query. I know it ain't pretty, but it's still quite readable (to me), and I don't see how I could do that any other way. That is: I have to trybest q
first, and only if it yields zero, try again withbracket_q
.
I have omitted what goes before. It works, it's been reviewed, I'm happy with it.
A final note: I'm not very concerned about performance, because the API is the bottleneck. I am concerned about readability. Users may want to tweak the script, somewhere down the line.
singles, multiples, zeroes = ( for i in range(3))
for row in df.itertuples():
query = best_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
singles.append([row.priref, row.name, hits, uri])
elif hits > 1:
multiples.append([row.priref, row.name, hits])
elif hits == 0:
query = bracket_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
singles.append([row.priref, row.name, hits, uri])
elif hits > 1:
multiples.append([row.priref, row.name, hits])
elif hits == 0:
zeroes.append([row.priref, str(row.name)]) # PM: str!!
lists = singles, multiples, zeroes
listnames = ['singles','multiples','zeroes']
for s, l in zip(listnames, lists):
listfile = '_.csv'.format(input_fname, s)
writelist(list=l, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
python python-3.x csv pandas
Interrogating a web API using query-url's, for each query I can get either zero hits, one hit, or multiple hits. Each of those categories needs to go into a separate CSV file for manual review or processing later. (More on this project here and here).
The input data (from a 14K line csv, one line per artist) is full of holes. Only a name is always given, but may be misspelled or in a form that the API does not recognise. Birth dates, death dates may or may not be known, with a precision like for example 'not before may 1533, not after january 1534'. It may also have exact dates in ISO format.
Using those three different output csv's, the user may go back to their source, try to refine their data, and run the script again to get a better match. Exactly one hit is what we go for: a persistent identifier for this specific artist.
In the code below, df
is a Pandas dataframe that has all the information in a form that is easiest to interrogate the API with.
First, I try to get an exact match best_q
(exact match of name string + any of the available input fields elsewhere in the record), if that yields zero, I try a slightly more loose match bracket_q
(any of the words in the literal name string + any of the available input fields elsewhere in the record).
I output the dataframe as a separate csv, and each list of zero hits, single hits, or multiple hits also in a separate csv.
I'm seeking advice on two specific things.
Is there a more Pythonic way of handling the lists? Right now, I think the code is readable enough, but I have one line to generate the lists, another to put them in a list of lists, and another to put them in a list of listnames.
The second thing is the nested
if..elif
on zero hits for the first query. I know it ain't pretty, but it's still quite readable (to me), and I don't see how I could do that any other way. That is: I have to trybest q
first, and only if it yields zero, try again withbracket_q
.
I have omitted what goes before. It works, it's been reviewed, I'm happy with it.
A final note: I'm not very concerned about performance, because the API is the bottleneck. I am concerned about readability. Users may want to tweak the script, somewhere down the line.
singles, multiples, zeroes = ( for i in range(3))
for row in df.itertuples():
query = best_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
singles.append([row.priref, row.name, hits, uri])
elif hits > 1:
multiples.append([row.priref, row.name, hits])
elif hits == 0:
query = bracket_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
singles.append([row.priref, row.name, hits, uri])
elif hits > 1:
multiples.append([row.priref, row.name, hits])
elif hits == 0:
zeroes.append([row.priref, str(row.name)]) # PM: str!!
lists = singles, multiples, zeroes
listnames = ['singles','multiples','zeroes']
for s, l in zip(listnames, lists):
listfile = '_.csv'.format(input_fname, s)
writelist(list=l, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
python python-3.x csv pandas
edited Aug 3 at 13:42
Sam Onela
5,72961543
5,72961543
asked Aug 2 at 10:31
RolfBly
584317
584317
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
- You can simplify your
if
structure. You duplicate the code forhits == 1
andhits > 1
. To do this move theif hits == 0
code into a 'guard-statement' that updates the state to the correct one. - You should create a class to help ease your use code. A simple class with an internal list, a name, a selection and a size would allow you to Significantly reduce the amount of code you'd have to write.
- All the appends are the same, except you perform a slice to get the size that you'd like, you can do this in the list class made in 2.
- You only change what list you append to in your ifs, and so you can use a dictionary to reduce the amount of code needed for this. You'd need to have a 'default' list and to use a
dict.get
. - You won't need to use
zip
if you make the list contain the name, leaving a basicfor
.
I don't really know what the rest of your functions are, and so I'll leave it at this:
class List:
def __init__(self, name, selection, size):
self._list =
self.name = name
self.selection = selection
self.size = size
def add(self, value):
self._list.append(value[:size])
lists = [
List('zeroes', 0, 2),
List('single', 1, 4),
List('multiples', None, 3),
]
list_selections = l.selection: l for l in lists
default = list_selections.pop(None)
for row in df.itertuples():
hits, uri = ask_rkd(best_q(row))
if hits == 0:
hits, uri = ask_rkd(bracket_q(row))
list_ = list_selections.get(hits, default)
list_.add([row.priref, str(row.name), hits, uri])
for list_ in lists:
listfile = '_.csv'.format(input_fname, list_.name)
writelist(list=list_, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
add a comment |Â
up vote
4
down vote
As you already noticed you have repeated code. you have variable names for your lists and also string definitions for list names which most probably should match. while this is no big problem for just 3 lists it could get cumbersome on adding another list. A simple way to avoid such name-to-string matching edits is to hold such variables in a dict()
and have the string definition only.
The second problem is to have different iterables which must match in length and order to be zipped lateron. Avoid this by holding tuples (or other containers) in a single iterable from the beginning. key-value pairs in a dict()
also provide this binding.
I your case I'd recommend to use the strings as keys
#avoid named variables
lists = name: for name in ('singles', 'multiples' , 'zeros')
#access lists via name
lists['singles'].append(0)
#access via temporary
l = lists['singles']
l.append[0]
#iterate for saving
for s, l in lists.items():
writelist(list=l, fname=s + '.csv')
EDIT:
Above answer applies to the first version of code where all that list init was skipped. While all still valid this can now be applied to the real code. concise and following the KISS principle. Names could be improved but are left here for outlining changes only.
lists = name: for name in ('singles', 'multiples' , 'zeros')
for row in df.itertuples():
query = best_q(row)
hits, uri = ask_rkd(query)
if hits == 0:
query = bracket_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
lists['singles'].append([row.priref, row.name, hits, uri])
elif hits > 1:
lists['multiples'].append([row.priref, row.name, hits])
elif hits == 0:
lists['zeroes'].append([row.priref, str(row.name)]) # PM: str!!
for s, l in lists.items():
listfile = '_.csv'.format(input_fname, s)
writelist(list=l, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
- You can simplify your
if
structure. You duplicate the code forhits == 1
andhits > 1
. To do this move theif hits == 0
code into a 'guard-statement' that updates the state to the correct one. - You should create a class to help ease your use code. A simple class with an internal list, a name, a selection and a size would allow you to Significantly reduce the amount of code you'd have to write.
- All the appends are the same, except you perform a slice to get the size that you'd like, you can do this in the list class made in 2.
- You only change what list you append to in your ifs, and so you can use a dictionary to reduce the amount of code needed for this. You'd need to have a 'default' list and to use a
dict.get
. - You won't need to use
zip
if you make the list contain the name, leaving a basicfor
.
I don't really know what the rest of your functions are, and so I'll leave it at this:
class List:
def __init__(self, name, selection, size):
self._list =
self.name = name
self.selection = selection
self.size = size
def add(self, value):
self._list.append(value[:size])
lists = [
List('zeroes', 0, 2),
List('single', 1, 4),
List('multiples', None, 3),
]
list_selections = l.selection: l for l in lists
default = list_selections.pop(None)
for row in df.itertuples():
hits, uri = ask_rkd(best_q(row))
if hits == 0:
hits, uri = ask_rkd(bracket_q(row))
list_ = list_selections.get(hits, default)
list_.add([row.priref, str(row.name), hits, uri])
for list_ in lists:
listfile = '_.csv'.format(input_fname, list_.name)
writelist(list=list_, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
add a comment |Â
up vote
2
down vote
accepted
- You can simplify your
if
structure. You duplicate the code forhits == 1
andhits > 1
. To do this move theif hits == 0
code into a 'guard-statement' that updates the state to the correct one. - You should create a class to help ease your use code. A simple class with an internal list, a name, a selection and a size would allow you to Significantly reduce the amount of code you'd have to write.
- All the appends are the same, except you perform a slice to get the size that you'd like, you can do this in the list class made in 2.
- You only change what list you append to in your ifs, and so you can use a dictionary to reduce the amount of code needed for this. You'd need to have a 'default' list and to use a
dict.get
. - You won't need to use
zip
if you make the list contain the name, leaving a basicfor
.
I don't really know what the rest of your functions are, and so I'll leave it at this:
class List:
def __init__(self, name, selection, size):
self._list =
self.name = name
self.selection = selection
self.size = size
def add(self, value):
self._list.append(value[:size])
lists = [
List('zeroes', 0, 2),
List('single', 1, 4),
List('multiples', None, 3),
]
list_selections = l.selection: l for l in lists
default = list_selections.pop(None)
for row in df.itertuples():
hits, uri = ask_rkd(best_q(row))
if hits == 0:
hits, uri = ask_rkd(bracket_q(row))
list_ = list_selections.get(hits, default)
list_.add([row.priref, str(row.name), hits, uri])
for list_ in lists:
listfile = '_.csv'.format(input_fname, list_.name)
writelist(list=list_, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
- You can simplify your
if
structure. You duplicate the code forhits == 1
andhits > 1
. To do this move theif hits == 0
code into a 'guard-statement' that updates the state to the correct one. - You should create a class to help ease your use code. A simple class with an internal list, a name, a selection and a size would allow you to Significantly reduce the amount of code you'd have to write.
- All the appends are the same, except you perform a slice to get the size that you'd like, you can do this in the list class made in 2.
- You only change what list you append to in your ifs, and so you can use a dictionary to reduce the amount of code needed for this. You'd need to have a 'default' list and to use a
dict.get
. - You won't need to use
zip
if you make the list contain the name, leaving a basicfor
.
I don't really know what the rest of your functions are, and so I'll leave it at this:
class List:
def __init__(self, name, selection, size):
self._list =
self.name = name
self.selection = selection
self.size = size
def add(self, value):
self._list.append(value[:size])
lists = [
List('zeroes', 0, 2),
List('single', 1, 4),
List('multiples', None, 3),
]
list_selections = l.selection: l for l in lists
default = list_selections.pop(None)
for row in df.itertuples():
hits, uri = ask_rkd(best_q(row))
if hits == 0:
hits, uri = ask_rkd(bracket_q(row))
list_ = list_selections.get(hits, default)
list_.add([row.priref, str(row.name), hits, uri])
for list_ in lists:
listfile = '_.csv'.format(input_fname, list_.name)
writelist(list=list_, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
- You can simplify your
if
structure. You duplicate the code forhits == 1
andhits > 1
. To do this move theif hits == 0
code into a 'guard-statement' that updates the state to the correct one. - You should create a class to help ease your use code. A simple class with an internal list, a name, a selection and a size would allow you to Significantly reduce the amount of code you'd have to write.
- All the appends are the same, except you perform a slice to get the size that you'd like, you can do this in the list class made in 2.
- You only change what list you append to in your ifs, and so you can use a dictionary to reduce the amount of code needed for this. You'd need to have a 'default' list and to use a
dict.get
. - You won't need to use
zip
if you make the list contain the name, leaving a basicfor
.
I don't really know what the rest of your functions are, and so I'll leave it at this:
class List:
def __init__(self, name, selection, size):
self._list =
self.name = name
self.selection = selection
self.size = size
def add(self, value):
self._list.append(value[:size])
lists = [
List('zeroes', 0, 2),
List('single', 1, 4),
List('multiples', None, 3),
]
list_selections = l.selection: l for l in lists
default = list_selections.pop(None)
for row in df.itertuples():
hits, uri = ask_rkd(best_q(row))
if hits == 0:
hits, uri = ask_rkd(bracket_q(row))
list_ = list_selections.get(hits, default)
list_.add([row.priref, str(row.name), hits, uri])
for list_ in lists:
listfile = '_.csv'.format(input_fname, list_.name)
writelist(list=list_, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
edited Aug 3 at 14:32
Malachiâ¦
25.3k769173
25.3k769173
answered Aug 3 at 8:03
Peilonrayz
24.3k336101
24.3k336101
add a comment |Â
add a comment |Â
up vote
4
down vote
As you already noticed you have repeated code. you have variable names for your lists and also string definitions for list names which most probably should match. while this is no big problem for just 3 lists it could get cumbersome on adding another list. A simple way to avoid such name-to-string matching edits is to hold such variables in a dict()
and have the string definition only.
The second problem is to have different iterables which must match in length and order to be zipped lateron. Avoid this by holding tuples (or other containers) in a single iterable from the beginning. key-value pairs in a dict()
also provide this binding.
I your case I'd recommend to use the strings as keys
#avoid named variables
lists = name: for name in ('singles', 'multiples' , 'zeros')
#access lists via name
lists['singles'].append(0)
#access via temporary
l = lists['singles']
l.append[0]
#iterate for saving
for s, l in lists.items():
writelist(list=l, fname=s + '.csv')
EDIT:
Above answer applies to the first version of code where all that list init was skipped. While all still valid this can now be applied to the real code. concise and following the KISS principle. Names could be improved but are left here for outlining changes only.
lists = name: for name in ('singles', 'multiples' , 'zeros')
for row in df.itertuples():
query = best_q(row)
hits, uri = ask_rkd(query)
if hits == 0:
query = bracket_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
lists['singles'].append([row.priref, row.name, hits, uri])
elif hits > 1:
lists['multiples'].append([row.priref, row.name, hits])
elif hits == 0:
lists['zeroes'].append([row.priref, str(row.name)]) # PM: str!!
for s, l in lists.items():
listfile = '_.csv'.format(input_fname, s)
writelist(list=l, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
add a comment |Â
up vote
4
down vote
As you already noticed you have repeated code. you have variable names for your lists and also string definitions for list names which most probably should match. while this is no big problem for just 3 lists it could get cumbersome on adding another list. A simple way to avoid such name-to-string matching edits is to hold such variables in a dict()
and have the string definition only.
The second problem is to have different iterables which must match in length and order to be zipped lateron. Avoid this by holding tuples (or other containers) in a single iterable from the beginning. key-value pairs in a dict()
also provide this binding.
I your case I'd recommend to use the strings as keys
#avoid named variables
lists = name: for name in ('singles', 'multiples' , 'zeros')
#access lists via name
lists['singles'].append(0)
#access via temporary
l = lists['singles']
l.append[0]
#iterate for saving
for s, l in lists.items():
writelist(list=l, fname=s + '.csv')
EDIT:
Above answer applies to the first version of code where all that list init was skipped. While all still valid this can now be applied to the real code. concise and following the KISS principle. Names could be improved but are left here for outlining changes only.
lists = name: for name in ('singles', 'multiples' , 'zeros')
for row in df.itertuples():
query = best_q(row)
hits, uri = ask_rkd(query)
if hits == 0:
query = bracket_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
lists['singles'].append([row.priref, row.name, hits, uri])
elif hits > 1:
lists['multiples'].append([row.priref, row.name, hits])
elif hits == 0:
lists['zeroes'].append([row.priref, str(row.name)]) # PM: str!!
for s, l in lists.items():
listfile = '_.csv'.format(input_fname, s)
writelist(list=l, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
add a comment |Â
up vote
4
down vote
up vote
4
down vote
As you already noticed you have repeated code. you have variable names for your lists and also string definitions for list names which most probably should match. while this is no big problem for just 3 lists it could get cumbersome on adding another list. A simple way to avoid such name-to-string matching edits is to hold such variables in a dict()
and have the string definition only.
The second problem is to have different iterables which must match in length and order to be zipped lateron. Avoid this by holding tuples (or other containers) in a single iterable from the beginning. key-value pairs in a dict()
also provide this binding.
I your case I'd recommend to use the strings as keys
#avoid named variables
lists = name: for name in ('singles', 'multiples' , 'zeros')
#access lists via name
lists['singles'].append(0)
#access via temporary
l = lists['singles']
l.append[0]
#iterate for saving
for s, l in lists.items():
writelist(list=l, fname=s + '.csv')
EDIT:
Above answer applies to the first version of code where all that list init was skipped. While all still valid this can now be applied to the real code. concise and following the KISS principle. Names could be improved but are left here for outlining changes only.
lists = name: for name in ('singles', 'multiples' , 'zeros')
for row in df.itertuples():
query = best_q(row)
hits, uri = ask_rkd(query)
if hits == 0:
query = bracket_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
lists['singles'].append([row.priref, row.name, hits, uri])
elif hits > 1:
lists['multiples'].append([row.priref, row.name, hits])
elif hits == 0:
lists['zeroes'].append([row.priref, str(row.name)]) # PM: str!!
for s, l in lists.items():
listfile = '_.csv'.format(input_fname, s)
writelist(list=l, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
As you already noticed you have repeated code. you have variable names for your lists and also string definitions for list names which most probably should match. while this is no big problem for just 3 lists it could get cumbersome on adding another list. A simple way to avoid such name-to-string matching edits is to hold such variables in a dict()
and have the string definition only.
The second problem is to have different iterables which must match in length and order to be zipped lateron. Avoid this by holding tuples (or other containers) in a single iterable from the beginning. key-value pairs in a dict()
also provide this binding.
I your case I'd recommend to use the strings as keys
#avoid named variables
lists = name: for name in ('singles', 'multiples' , 'zeros')
#access lists via name
lists['singles'].append(0)
#access via temporary
l = lists['singles']
l.append[0]
#iterate for saving
for s, l in lists.items():
writelist(list=l, fname=s + '.csv')
EDIT:
Above answer applies to the first version of code where all that list init was skipped. While all still valid this can now be applied to the real code. concise and following the KISS principle. Names could be improved but are left here for outlining changes only.
lists = name: for name in ('singles', 'multiples' , 'zeros')
for row in df.itertuples():
query = best_q(row)
hits, uri = ask_rkd(query)
if hits == 0:
query = bracket_q(row)
hits, uri = ask_rkd(query)
if hits == 1:
lists['singles'].append([row.priref, row.name, hits, uri])
elif hits > 1:
lists['multiples'].append([row.priref, row.name, hits])
elif hits == 0:
lists['zeroes'].append([row.priref, str(row.name)]) # PM: str!!
for s, l in lists.items():
listfile = '_.csv'.format(input_fname, s)
writelist(list=l, fname=listfile)
outfile = fname + '_out' + ext
df.to_csv(outfile, sep='|', encoding='utf-8-sig')
edited Aug 3 at 13:53
answered Aug 2 at 11:02
stefan
1,151110
1,151110
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200802%2fiterate-over-a-list-of-list-names-as-file-names%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password