Append an annual total to all Excel files
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
5
down vote
favorite
I am new to all this and figured, after doing some tutorials, the best way to get some proper experience is to simply start a little project.
I have yearly folders with Excel files in it, those are roughly the same, only last year I made a change to the structure of them.
My aim was to cycle through them all and add one value of them to a list, and then compare those results.
So far, this all works out as I hoped.
I have not figured out yet however, how to define the part that reads the excel files into a callable object ((?)hope that's the correct term).
My main question is, how efficient, or inefficient is this script I wrote?
I assume, one could reduce the length of this code a lot.
import os
import openpyxl
my_list_2015 =
my_list_2016 =
my_list_2017 =
year_show = 2015
def brutto_total(netto):
brutto = netto + netto * 0.2
return '%.2f' % brutto
def show_result(year):
global year_show
print("-" * 14 + str(year_show) + "-" * 14)
print("Total Sum Netto 2015: " + str('%.2f' % year))
print("Total Sum Brutto 2015: " + str(brutto_total(year)))
year_show = year_show + 1
return year_show
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Blatt1']
c=sheet['L46'].value
my_list_2015.append(c)
my_list_total_2015 = sum(my_list_2015)
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Blatt1']
c=sheet['L46'].value
my_list_2016.append(c)
my_list_total_2016 = sum(my_list_2016)
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Zusammen']
c=sheet['C11'].value
my_list_2017.append(c)
my_list_total_2017 = sum(my_list_2017)
show_result(my_list_total_2015)
show_result(my_list_total_2016)
show_result(my_list_total_2017)
python beginner excel
add a comment |Â
up vote
5
down vote
favorite
I am new to all this and figured, after doing some tutorials, the best way to get some proper experience is to simply start a little project.
I have yearly folders with Excel files in it, those are roughly the same, only last year I made a change to the structure of them.
My aim was to cycle through them all and add one value of them to a list, and then compare those results.
So far, this all works out as I hoped.
I have not figured out yet however, how to define the part that reads the excel files into a callable object ((?)hope that's the correct term).
My main question is, how efficient, or inefficient is this script I wrote?
I assume, one could reduce the length of this code a lot.
import os
import openpyxl
my_list_2015 =
my_list_2016 =
my_list_2017 =
year_show = 2015
def brutto_total(netto):
brutto = netto + netto * 0.2
return '%.2f' % brutto
def show_result(year):
global year_show
print("-" * 14 + str(year_show) + "-" * 14)
print("Total Sum Netto 2015: " + str('%.2f' % year))
print("Total Sum Brutto 2015: " + str(brutto_total(year)))
year_show = year_show + 1
return year_show
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Blatt1']
c=sheet['L46'].value
my_list_2015.append(c)
my_list_total_2015 = sum(my_list_2015)
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Blatt1']
c=sheet['L46'].value
my_list_2016.append(c)
my_list_total_2016 = sum(my_list_2016)
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Zusammen']
c=sheet['C11'].value
my_list_2017.append(c)
my_list_total_2017 = sum(my_list_2017)
show_result(my_list_total_2015)
show_result(my_list_total_2016)
show_result(my_list_total_2017)
python beginner excel
Is'path_to_excel_files'
really the same for each year or does it actually change per year (as you have yearly folders with Excel files in it)?
â Mathias Ettinger
Feb 7 at 10:06
You are correct to assume that there are yearly folders. I already thought about implementing a automation to check what folders are available and store those in variables and work with those instead.
â jego
Feb 8 at 14:52
add a comment |Â
up vote
5
down vote
favorite
up vote
5
down vote
favorite
I am new to all this and figured, after doing some tutorials, the best way to get some proper experience is to simply start a little project.
I have yearly folders with Excel files in it, those are roughly the same, only last year I made a change to the structure of them.
My aim was to cycle through them all and add one value of them to a list, and then compare those results.
So far, this all works out as I hoped.
I have not figured out yet however, how to define the part that reads the excel files into a callable object ((?)hope that's the correct term).
My main question is, how efficient, or inefficient is this script I wrote?
I assume, one could reduce the length of this code a lot.
import os
import openpyxl
my_list_2015 =
my_list_2016 =
my_list_2017 =
year_show = 2015
def brutto_total(netto):
brutto = netto + netto * 0.2
return '%.2f' % brutto
def show_result(year):
global year_show
print("-" * 14 + str(year_show) + "-" * 14)
print("Total Sum Netto 2015: " + str('%.2f' % year))
print("Total Sum Brutto 2015: " + str(brutto_total(year)))
year_show = year_show + 1
return year_show
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Blatt1']
c=sheet['L46'].value
my_list_2015.append(c)
my_list_total_2015 = sum(my_list_2015)
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Blatt1']
c=sheet['L46'].value
my_list_2016.append(c)
my_list_total_2016 = sum(my_list_2016)
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Zusammen']
c=sheet['C11'].value
my_list_2017.append(c)
my_list_total_2017 = sum(my_list_2017)
show_result(my_list_total_2015)
show_result(my_list_total_2016)
show_result(my_list_total_2017)
python beginner excel
I am new to all this and figured, after doing some tutorials, the best way to get some proper experience is to simply start a little project.
I have yearly folders with Excel files in it, those are roughly the same, only last year I made a change to the structure of them.
My aim was to cycle through them all and add one value of them to a list, and then compare those results.
So far, this all works out as I hoped.
I have not figured out yet however, how to define the part that reads the excel files into a callable object ((?)hope that's the correct term).
My main question is, how efficient, or inefficient is this script I wrote?
I assume, one could reduce the length of this code a lot.
import os
import openpyxl
my_list_2015 =
my_list_2016 =
my_list_2017 =
year_show = 2015
def brutto_total(netto):
brutto = netto + netto * 0.2
return '%.2f' % brutto
def show_result(year):
global year_show
print("-" * 14 + str(year_show) + "-" * 14)
print("Total Sum Netto 2015: " + str('%.2f' % year))
print("Total Sum Brutto 2015: " + str(brutto_total(year)))
year_show = year_show + 1
return year_show
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Blatt1']
c=sheet['L46'].value
my_list_2015.append(c)
my_list_total_2015 = sum(my_list_2015)
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Blatt1']
c=sheet['L46'].value
my_list_2016.append(c)
my_list_total_2016 = sum(my_list_2016)
for i in os.listdir(os.chdir('path_to_excel_files')):
if i.endswith(".xlsx"):
workbook = openpyxl.load_workbook(i, data_only=True)
sheet = workbook['Zusammen']
c=sheet['C11'].value
my_list_2017.append(c)
my_list_total_2017 = sum(my_list_2017)
show_result(my_list_total_2015)
show_result(my_list_total_2016)
show_result(my_list_total_2017)
python beginner excel
edited Feb 7 at 3:51
200_success
123k14143401
123k14143401
asked Feb 6 at 20:30
jego
283
283
Is'path_to_excel_files'
really the same for each year or does it actually change per year (as you have yearly folders with Excel files in it)?
â Mathias Ettinger
Feb 7 at 10:06
You are correct to assume that there are yearly folders. I already thought about implementing a automation to check what folders are available and store those in variables and work with those instead.
â jego
Feb 8 at 14:52
add a comment |Â
Is'path_to_excel_files'
really the same for each year or does it actually change per year (as you have yearly folders with Excel files in it)?
â Mathias Ettinger
Feb 7 at 10:06
You are correct to assume that there are yearly folders. I already thought about implementing a automation to check what folders are available and store those in variables and work with those instead.
â jego
Feb 8 at 14:52
Is
'path_to_excel_files'
really the same for each year or does it actually change per year (as you have yearly folders with Excel files in it)?â Mathias Ettinger
Feb 7 at 10:06
Is
'path_to_excel_files'
really the same for each year or does it actually change per year (as you have yearly folders with Excel files in it)?â Mathias Ettinger
Feb 7 at 10:06
You are correct to assume that there are yearly folders. I already thought about implementing a automation to check what folders are available and store those in variables and work with those instead.
â jego
Feb 8 at 14:52
You are correct to assume that there are yearly folders. I already thought about implementing a automation to check what folders are available and store those in variables and work with those instead.
â jego
Feb 8 at 14:52
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
4
down vote
accepted
You are onto a good start, since you already use some functions already; yes they do look like callable objects, but functions in general are referred as such to disambiguate from that more involved construct.
As you noted, you have a number of duplicate code that can be shortened into a single function that take different arguments. Before doing that, let's take one of the for loops and find out what could be improved first.
The show_results
function should take in a parameter, but we can fix this later
Calling os.chdir
haphazardly just to list some files is a bit overkill. I am guessing you want to allow the singular i
variable be opened without having to use absolute paths, but doing so can be dangerous if this code is executed in a multi-threaded context (say you want to process multiple files at once), as a thread changing the working directory will change them for all, thus potentially breaking code running on other threads.
Fortunately, Python offers a much better way to glob these files from a path, using a module incidentally named glob
. Now the loop is simply
# insert these into the imports
from os.path import join
from glob import glob
for path in glob(join('path_to_excel_files'. '*.xlsx')):
The reason why we use os.path.join
is for cross-platform compatibility. Also try to avoid single letter names if there is a more expressive name available.
Lastly, the sum
call in each loop could be done once after all the values have been gathered rather than every loop as the result would not matter at all.
To simplify code, look for parts that are common, and from that parts that take values that define the specific usage. From the three for loops we can see that it's just simply sheet identifier within the workbook, the cell identifier from the sheet. This is what it might look like (please note that I don't have the library installed, nor your Excel files, so I can't exactly test the following)
def process_workbooks_in_dir(excel_root, sheet_id, cell_id):
values =
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
sheet = workbook[sheet_id]
value = sheet[cell_id].value
values.append(value)
return values
Note that sum
can be called later to sum up the list of values returned, this function now simply process the list of excel files within the provided base path, and the load_workbook
function will be given the absolute path of each excel files for the processing to be done. To use this, the following may be done.
my_list_2015 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2016 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2017 = process_workbooks_in_dir('path_to_excel_files', 'Zusammen', 'C11')
my_list_total_2015 = sum(my_list_2015)
my_list_total_2016 = sum(my_list_2016)
my_list_total_2017 = sum(my_list_2017)
Of course, verify that all this still working as intended. You may note that the 2015 and 2016 lists are using the same parameters - this is now much more visible with the repeated code moved out of the way to show how these parameters can become accidentally (or intentionally) repeated in the remaining code.
While this is basically done, we can still do better with more advanced techniques.
As there is significant overhead with opening and closing files (especially for a complicated file format such as excel spreadsheets), it would be great if this is done only once for all the operations required. Let's explore for way to streamline the flow better so that this can be done. Note that the actual act of opening/closing files can be decoupled from the usage of the workbook, and that the workbook can be passed as a parameter. Let's move that workbook usage logic to a separate function:
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
Now this is a much more simple function, but we still need the other bits such as opening the files, specify how to get the data through this function and to store it.
def process_workbooks_by_spec(excel_root, specification):
results =
# predefine the results with the year lists as required
for year, sheet_id, cell_id in specification:
results[year] =
# for the actual work
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
Note that Python can unpack tuples into multiple variables so we can have a list of tuples put into friendly variable names within the for loop to assign the year
, sheet_id
and cell_id
for usage. Also another thing that can simplify/eliminate the initial assignment loop is to use collections.defaultdict
, which would reduce the code to simply this:
from collections import defaultdict
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
This is now much more efficient in terms of overall execution. All the required operation needed to extract data can be passed in as a single list that might look something like this:
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
Note that the top level loop still provide the path to the files, which then open the workbook, then the inner loop unpacks the specification and pass the ids along with the workbook to the inner function, extract the required data and append that to results dictionary under the respective key (by year).
As this is relatively simple, the inner loop remain as is within that function, however for much more complicated business logic, it is wise to consider moving the inner loop outside to a separate function so that the logic is contained in a more concise way.
Now to use it:
results = process_workbooks_by_spec('path_to_excel_files', spec)
Accessing results['2015']
should now return the list of values extracted, and one can simply pass the results into a function that will process this. Consider modifying show_results
to this:
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + str(year) + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %s" % (year, brutto_total(result)))
Now rather than relying on some global value outside of the function and some hard-coded value defined at the module, this only take in the results dictionary produced by the process_workbooks_by_spec
function and iterate through every item and produce an output. Of course, the sum is done at the time of calculation and if this is undesirable, it can be moved as part of the results as another dictionary, but this is up to you.
I would also not prematurely have some math processing function to return something that is not a number, so brutto_total
should return the number as is, and format it later to keep the summary string consistent. Putting it all together
from os.path import join
from glob import glob
from collections import defaultdict
import openpyxl
def brutto_total(netto):
return netto + netto * 0.2
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + year + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %.2f" % (year, brutto_total(result)))
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
results = process_workbooks_by_spec('path_to_excel_files', spec)
show_results(results)
This is how I might have done this (also reminder: I don't have your files and library installed, but this complete version at least pass the syntax check); hopefully this shows how you might be able to make use of the rich standard library functions that are available in Python to simplify certain tasks, and to also think about how one might break a program down and restructure it into a more clear, reusable form.
This is all very great and thanks for the excellent explanation on how you would go and do this. The final result of what you would do is still too advanced for me thought. As for me, it does not compile correctly because I have the Excel files in separate folders, according to year. When looking at your code, it appears that it is assuming that the files are within the same folder. I tried to work out how I would make it access the individual folders, however I can not get my head around what to even do to get started. :) I will keep this in mind though and will come back to this.
â jego
Feb 9 at 10:31
Ah, you can pass in the specific directories by the first argument of the originalprocess_workbooks_in_dir
, so you can invoke it likeprocess_workbooks_in_dir('2016_excel_files', 'Blatt1', 'L46')
for 2016, and thenprocess_workbooks_in_dir('2017_excel_files', 'Zusammen', 'C11')
for 2017, and so on. The advanced example did assume the files are all in one directory. Assumptions can increase the difficulty in understanding for both of us, and this can be hard to avoid with limited information. You were on a good track and I hope I didn't cause more confusion for you!
â metatoaster
Feb 9 at 11:42
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
You are onto a good start, since you already use some functions already; yes they do look like callable objects, but functions in general are referred as such to disambiguate from that more involved construct.
As you noted, you have a number of duplicate code that can be shortened into a single function that take different arguments. Before doing that, let's take one of the for loops and find out what could be improved first.
The show_results
function should take in a parameter, but we can fix this later
Calling os.chdir
haphazardly just to list some files is a bit overkill. I am guessing you want to allow the singular i
variable be opened without having to use absolute paths, but doing so can be dangerous if this code is executed in a multi-threaded context (say you want to process multiple files at once), as a thread changing the working directory will change them for all, thus potentially breaking code running on other threads.
Fortunately, Python offers a much better way to glob these files from a path, using a module incidentally named glob
. Now the loop is simply
# insert these into the imports
from os.path import join
from glob import glob
for path in glob(join('path_to_excel_files'. '*.xlsx')):
The reason why we use os.path.join
is for cross-platform compatibility. Also try to avoid single letter names if there is a more expressive name available.
Lastly, the sum
call in each loop could be done once after all the values have been gathered rather than every loop as the result would not matter at all.
To simplify code, look for parts that are common, and from that parts that take values that define the specific usage. From the three for loops we can see that it's just simply sheet identifier within the workbook, the cell identifier from the sheet. This is what it might look like (please note that I don't have the library installed, nor your Excel files, so I can't exactly test the following)
def process_workbooks_in_dir(excel_root, sheet_id, cell_id):
values =
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
sheet = workbook[sheet_id]
value = sheet[cell_id].value
values.append(value)
return values
Note that sum
can be called later to sum up the list of values returned, this function now simply process the list of excel files within the provided base path, and the load_workbook
function will be given the absolute path of each excel files for the processing to be done. To use this, the following may be done.
my_list_2015 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2016 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2017 = process_workbooks_in_dir('path_to_excel_files', 'Zusammen', 'C11')
my_list_total_2015 = sum(my_list_2015)
my_list_total_2016 = sum(my_list_2016)
my_list_total_2017 = sum(my_list_2017)
Of course, verify that all this still working as intended. You may note that the 2015 and 2016 lists are using the same parameters - this is now much more visible with the repeated code moved out of the way to show how these parameters can become accidentally (or intentionally) repeated in the remaining code.
While this is basically done, we can still do better with more advanced techniques.
As there is significant overhead with opening and closing files (especially for a complicated file format such as excel spreadsheets), it would be great if this is done only once for all the operations required. Let's explore for way to streamline the flow better so that this can be done. Note that the actual act of opening/closing files can be decoupled from the usage of the workbook, and that the workbook can be passed as a parameter. Let's move that workbook usage logic to a separate function:
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
Now this is a much more simple function, but we still need the other bits such as opening the files, specify how to get the data through this function and to store it.
def process_workbooks_by_spec(excel_root, specification):
results =
# predefine the results with the year lists as required
for year, sheet_id, cell_id in specification:
results[year] =
# for the actual work
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
Note that Python can unpack tuples into multiple variables so we can have a list of tuples put into friendly variable names within the for loop to assign the year
, sheet_id
and cell_id
for usage. Also another thing that can simplify/eliminate the initial assignment loop is to use collections.defaultdict
, which would reduce the code to simply this:
from collections import defaultdict
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
This is now much more efficient in terms of overall execution. All the required operation needed to extract data can be passed in as a single list that might look something like this:
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
Note that the top level loop still provide the path to the files, which then open the workbook, then the inner loop unpacks the specification and pass the ids along with the workbook to the inner function, extract the required data and append that to results dictionary under the respective key (by year).
As this is relatively simple, the inner loop remain as is within that function, however for much more complicated business logic, it is wise to consider moving the inner loop outside to a separate function so that the logic is contained in a more concise way.
Now to use it:
results = process_workbooks_by_spec('path_to_excel_files', spec)
Accessing results['2015']
should now return the list of values extracted, and one can simply pass the results into a function that will process this. Consider modifying show_results
to this:
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + str(year) + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %s" % (year, brutto_total(result)))
Now rather than relying on some global value outside of the function and some hard-coded value defined at the module, this only take in the results dictionary produced by the process_workbooks_by_spec
function and iterate through every item and produce an output. Of course, the sum is done at the time of calculation and if this is undesirable, it can be moved as part of the results as another dictionary, but this is up to you.
I would also not prematurely have some math processing function to return something that is not a number, so brutto_total
should return the number as is, and format it later to keep the summary string consistent. Putting it all together
from os.path import join
from glob import glob
from collections import defaultdict
import openpyxl
def brutto_total(netto):
return netto + netto * 0.2
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + year + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %.2f" % (year, brutto_total(result)))
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
results = process_workbooks_by_spec('path_to_excel_files', spec)
show_results(results)
This is how I might have done this (also reminder: I don't have your files and library installed, but this complete version at least pass the syntax check); hopefully this shows how you might be able to make use of the rich standard library functions that are available in Python to simplify certain tasks, and to also think about how one might break a program down and restructure it into a more clear, reusable form.
This is all very great and thanks for the excellent explanation on how you would go and do this. The final result of what you would do is still too advanced for me thought. As for me, it does not compile correctly because I have the Excel files in separate folders, according to year. When looking at your code, it appears that it is assuming that the files are within the same folder. I tried to work out how I would make it access the individual folders, however I can not get my head around what to even do to get started. :) I will keep this in mind though and will come back to this.
â jego
Feb 9 at 10:31
Ah, you can pass in the specific directories by the first argument of the originalprocess_workbooks_in_dir
, so you can invoke it likeprocess_workbooks_in_dir('2016_excel_files', 'Blatt1', 'L46')
for 2016, and thenprocess_workbooks_in_dir('2017_excel_files', 'Zusammen', 'C11')
for 2017, and so on. The advanced example did assume the files are all in one directory. Assumptions can increase the difficulty in understanding for both of us, and this can be hard to avoid with limited information. You were on a good track and I hope I didn't cause more confusion for you!
â metatoaster
Feb 9 at 11:42
add a comment |Â
up vote
4
down vote
accepted
You are onto a good start, since you already use some functions already; yes they do look like callable objects, but functions in general are referred as such to disambiguate from that more involved construct.
As you noted, you have a number of duplicate code that can be shortened into a single function that take different arguments. Before doing that, let's take one of the for loops and find out what could be improved first.
The show_results
function should take in a parameter, but we can fix this later
Calling os.chdir
haphazardly just to list some files is a bit overkill. I am guessing you want to allow the singular i
variable be opened without having to use absolute paths, but doing so can be dangerous if this code is executed in a multi-threaded context (say you want to process multiple files at once), as a thread changing the working directory will change them for all, thus potentially breaking code running on other threads.
Fortunately, Python offers a much better way to glob these files from a path, using a module incidentally named glob
. Now the loop is simply
# insert these into the imports
from os.path import join
from glob import glob
for path in glob(join('path_to_excel_files'. '*.xlsx')):
The reason why we use os.path.join
is for cross-platform compatibility. Also try to avoid single letter names if there is a more expressive name available.
Lastly, the sum
call in each loop could be done once after all the values have been gathered rather than every loop as the result would not matter at all.
To simplify code, look for parts that are common, and from that parts that take values that define the specific usage. From the three for loops we can see that it's just simply sheet identifier within the workbook, the cell identifier from the sheet. This is what it might look like (please note that I don't have the library installed, nor your Excel files, so I can't exactly test the following)
def process_workbooks_in_dir(excel_root, sheet_id, cell_id):
values =
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
sheet = workbook[sheet_id]
value = sheet[cell_id].value
values.append(value)
return values
Note that sum
can be called later to sum up the list of values returned, this function now simply process the list of excel files within the provided base path, and the load_workbook
function will be given the absolute path of each excel files for the processing to be done. To use this, the following may be done.
my_list_2015 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2016 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2017 = process_workbooks_in_dir('path_to_excel_files', 'Zusammen', 'C11')
my_list_total_2015 = sum(my_list_2015)
my_list_total_2016 = sum(my_list_2016)
my_list_total_2017 = sum(my_list_2017)
Of course, verify that all this still working as intended. You may note that the 2015 and 2016 lists are using the same parameters - this is now much more visible with the repeated code moved out of the way to show how these parameters can become accidentally (or intentionally) repeated in the remaining code.
While this is basically done, we can still do better with more advanced techniques.
As there is significant overhead with opening and closing files (especially for a complicated file format such as excel spreadsheets), it would be great if this is done only once for all the operations required. Let's explore for way to streamline the flow better so that this can be done. Note that the actual act of opening/closing files can be decoupled from the usage of the workbook, and that the workbook can be passed as a parameter. Let's move that workbook usage logic to a separate function:
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
Now this is a much more simple function, but we still need the other bits such as opening the files, specify how to get the data through this function and to store it.
def process_workbooks_by_spec(excel_root, specification):
results =
# predefine the results with the year lists as required
for year, sheet_id, cell_id in specification:
results[year] =
# for the actual work
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
Note that Python can unpack tuples into multiple variables so we can have a list of tuples put into friendly variable names within the for loop to assign the year
, sheet_id
and cell_id
for usage. Also another thing that can simplify/eliminate the initial assignment loop is to use collections.defaultdict
, which would reduce the code to simply this:
from collections import defaultdict
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
This is now much more efficient in terms of overall execution. All the required operation needed to extract data can be passed in as a single list that might look something like this:
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
Note that the top level loop still provide the path to the files, which then open the workbook, then the inner loop unpacks the specification and pass the ids along with the workbook to the inner function, extract the required data and append that to results dictionary under the respective key (by year).
As this is relatively simple, the inner loop remain as is within that function, however for much more complicated business logic, it is wise to consider moving the inner loop outside to a separate function so that the logic is contained in a more concise way.
Now to use it:
results = process_workbooks_by_spec('path_to_excel_files', spec)
Accessing results['2015']
should now return the list of values extracted, and one can simply pass the results into a function that will process this. Consider modifying show_results
to this:
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + str(year) + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %s" % (year, brutto_total(result)))
Now rather than relying on some global value outside of the function and some hard-coded value defined at the module, this only take in the results dictionary produced by the process_workbooks_by_spec
function and iterate through every item and produce an output. Of course, the sum is done at the time of calculation and if this is undesirable, it can be moved as part of the results as another dictionary, but this is up to you.
I would also not prematurely have some math processing function to return something that is not a number, so brutto_total
should return the number as is, and format it later to keep the summary string consistent. Putting it all together
from os.path import join
from glob import glob
from collections import defaultdict
import openpyxl
def brutto_total(netto):
return netto + netto * 0.2
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + year + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %.2f" % (year, brutto_total(result)))
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
results = process_workbooks_by_spec('path_to_excel_files', spec)
show_results(results)
This is how I might have done this (also reminder: I don't have your files and library installed, but this complete version at least pass the syntax check); hopefully this shows how you might be able to make use of the rich standard library functions that are available in Python to simplify certain tasks, and to also think about how one might break a program down and restructure it into a more clear, reusable form.
This is all very great and thanks for the excellent explanation on how you would go and do this. The final result of what you would do is still too advanced for me thought. As for me, it does not compile correctly because I have the Excel files in separate folders, according to year. When looking at your code, it appears that it is assuming that the files are within the same folder. I tried to work out how I would make it access the individual folders, however I can not get my head around what to even do to get started. :) I will keep this in mind though and will come back to this.
â jego
Feb 9 at 10:31
Ah, you can pass in the specific directories by the first argument of the originalprocess_workbooks_in_dir
, so you can invoke it likeprocess_workbooks_in_dir('2016_excel_files', 'Blatt1', 'L46')
for 2016, and thenprocess_workbooks_in_dir('2017_excel_files', 'Zusammen', 'C11')
for 2017, and so on. The advanced example did assume the files are all in one directory. Assumptions can increase the difficulty in understanding for both of us, and this can be hard to avoid with limited information. You were on a good track and I hope I didn't cause more confusion for you!
â metatoaster
Feb 9 at 11:42
add a comment |Â
up vote
4
down vote
accepted
up vote
4
down vote
accepted
You are onto a good start, since you already use some functions already; yes they do look like callable objects, but functions in general are referred as such to disambiguate from that more involved construct.
As you noted, you have a number of duplicate code that can be shortened into a single function that take different arguments. Before doing that, let's take one of the for loops and find out what could be improved first.
The show_results
function should take in a parameter, but we can fix this later
Calling os.chdir
haphazardly just to list some files is a bit overkill. I am guessing you want to allow the singular i
variable be opened without having to use absolute paths, but doing so can be dangerous if this code is executed in a multi-threaded context (say you want to process multiple files at once), as a thread changing the working directory will change them for all, thus potentially breaking code running on other threads.
Fortunately, Python offers a much better way to glob these files from a path, using a module incidentally named glob
. Now the loop is simply
# insert these into the imports
from os.path import join
from glob import glob
for path in glob(join('path_to_excel_files'. '*.xlsx')):
The reason why we use os.path.join
is for cross-platform compatibility. Also try to avoid single letter names if there is a more expressive name available.
Lastly, the sum
call in each loop could be done once after all the values have been gathered rather than every loop as the result would not matter at all.
To simplify code, look for parts that are common, and from that parts that take values that define the specific usage. From the three for loops we can see that it's just simply sheet identifier within the workbook, the cell identifier from the sheet. This is what it might look like (please note that I don't have the library installed, nor your Excel files, so I can't exactly test the following)
def process_workbooks_in_dir(excel_root, sheet_id, cell_id):
values =
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
sheet = workbook[sheet_id]
value = sheet[cell_id].value
values.append(value)
return values
Note that sum
can be called later to sum up the list of values returned, this function now simply process the list of excel files within the provided base path, and the load_workbook
function will be given the absolute path of each excel files for the processing to be done. To use this, the following may be done.
my_list_2015 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2016 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2017 = process_workbooks_in_dir('path_to_excel_files', 'Zusammen', 'C11')
my_list_total_2015 = sum(my_list_2015)
my_list_total_2016 = sum(my_list_2016)
my_list_total_2017 = sum(my_list_2017)
Of course, verify that all this still working as intended. You may note that the 2015 and 2016 lists are using the same parameters - this is now much more visible with the repeated code moved out of the way to show how these parameters can become accidentally (or intentionally) repeated in the remaining code.
While this is basically done, we can still do better with more advanced techniques.
As there is significant overhead with opening and closing files (especially for a complicated file format such as excel spreadsheets), it would be great if this is done only once for all the operations required. Let's explore for way to streamline the flow better so that this can be done. Note that the actual act of opening/closing files can be decoupled from the usage of the workbook, and that the workbook can be passed as a parameter. Let's move that workbook usage logic to a separate function:
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
Now this is a much more simple function, but we still need the other bits such as opening the files, specify how to get the data through this function and to store it.
def process_workbooks_by_spec(excel_root, specification):
results =
# predefine the results with the year lists as required
for year, sheet_id, cell_id in specification:
results[year] =
# for the actual work
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
Note that Python can unpack tuples into multiple variables so we can have a list of tuples put into friendly variable names within the for loop to assign the year
, sheet_id
and cell_id
for usage. Also another thing that can simplify/eliminate the initial assignment loop is to use collections.defaultdict
, which would reduce the code to simply this:
from collections import defaultdict
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
This is now much more efficient in terms of overall execution. All the required operation needed to extract data can be passed in as a single list that might look something like this:
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
Note that the top level loop still provide the path to the files, which then open the workbook, then the inner loop unpacks the specification and pass the ids along with the workbook to the inner function, extract the required data and append that to results dictionary under the respective key (by year).
As this is relatively simple, the inner loop remain as is within that function, however for much more complicated business logic, it is wise to consider moving the inner loop outside to a separate function so that the logic is contained in a more concise way.
Now to use it:
results = process_workbooks_by_spec('path_to_excel_files', spec)
Accessing results['2015']
should now return the list of values extracted, and one can simply pass the results into a function that will process this. Consider modifying show_results
to this:
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + str(year) + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %s" % (year, brutto_total(result)))
Now rather than relying on some global value outside of the function and some hard-coded value defined at the module, this only take in the results dictionary produced by the process_workbooks_by_spec
function and iterate through every item and produce an output. Of course, the sum is done at the time of calculation and if this is undesirable, it can be moved as part of the results as another dictionary, but this is up to you.
I would also not prematurely have some math processing function to return something that is not a number, so brutto_total
should return the number as is, and format it later to keep the summary string consistent. Putting it all together
from os.path import join
from glob import glob
from collections import defaultdict
import openpyxl
def brutto_total(netto):
return netto + netto * 0.2
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + year + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %.2f" % (year, brutto_total(result)))
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
results = process_workbooks_by_spec('path_to_excel_files', spec)
show_results(results)
This is how I might have done this (also reminder: I don't have your files and library installed, but this complete version at least pass the syntax check); hopefully this shows how you might be able to make use of the rich standard library functions that are available in Python to simplify certain tasks, and to also think about how one might break a program down and restructure it into a more clear, reusable form.
You are onto a good start, since you already use some functions already; yes they do look like callable objects, but functions in general are referred as such to disambiguate from that more involved construct.
As you noted, you have a number of duplicate code that can be shortened into a single function that take different arguments. Before doing that, let's take one of the for loops and find out what could be improved first.
The show_results
function should take in a parameter, but we can fix this later
Calling os.chdir
haphazardly just to list some files is a bit overkill. I am guessing you want to allow the singular i
variable be opened without having to use absolute paths, but doing so can be dangerous if this code is executed in a multi-threaded context (say you want to process multiple files at once), as a thread changing the working directory will change them for all, thus potentially breaking code running on other threads.
Fortunately, Python offers a much better way to glob these files from a path, using a module incidentally named glob
. Now the loop is simply
# insert these into the imports
from os.path import join
from glob import glob
for path in glob(join('path_to_excel_files'. '*.xlsx')):
The reason why we use os.path.join
is for cross-platform compatibility. Also try to avoid single letter names if there is a more expressive name available.
Lastly, the sum
call in each loop could be done once after all the values have been gathered rather than every loop as the result would not matter at all.
To simplify code, look for parts that are common, and from that parts that take values that define the specific usage. From the three for loops we can see that it's just simply sheet identifier within the workbook, the cell identifier from the sheet. This is what it might look like (please note that I don't have the library installed, nor your Excel files, so I can't exactly test the following)
def process_workbooks_in_dir(excel_root, sheet_id, cell_id):
values =
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
sheet = workbook[sheet_id]
value = sheet[cell_id].value
values.append(value)
return values
Note that sum
can be called later to sum up the list of values returned, this function now simply process the list of excel files within the provided base path, and the load_workbook
function will be given the absolute path of each excel files for the processing to be done. To use this, the following may be done.
my_list_2015 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2016 = process_workbooks_in_dir('path_to_excel_files', 'Blatt1', 'L46')
my_list_2017 = process_workbooks_in_dir('path_to_excel_files', 'Zusammen', 'C11')
my_list_total_2015 = sum(my_list_2015)
my_list_total_2016 = sum(my_list_2016)
my_list_total_2017 = sum(my_list_2017)
Of course, verify that all this still working as intended. You may note that the 2015 and 2016 lists are using the same parameters - this is now much more visible with the repeated code moved out of the way to show how these parameters can become accidentally (or intentionally) repeated in the remaining code.
While this is basically done, we can still do better with more advanced techniques.
As there is significant overhead with opening and closing files (especially for a complicated file format such as excel spreadsheets), it would be great if this is done only once for all the operations required. Let's explore for way to streamline the flow better so that this can be done. Note that the actual act of opening/closing files can be decoupled from the usage of the workbook, and that the workbook can be passed as a parameter. Let's move that workbook usage logic to a separate function:
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
Now this is a much more simple function, but we still need the other bits such as opening the files, specify how to get the data through this function and to store it.
def process_workbooks_by_spec(excel_root, specification):
results =
# predefine the results with the year lists as required
for year, sheet_id, cell_id in specification:
results[year] =
# for the actual work
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
Note that Python can unpack tuples into multiple variables so we can have a list of tuples put into friendly variable names within the for loop to assign the year
, sheet_id
and cell_id
for usage. Also another thing that can simplify/eliminate the initial assignment loop is to use collections.defaultdict
, which would reduce the code to simply this:
from collections import defaultdict
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
This is now much more efficient in terms of overall execution. All the required operation needed to extract data can be passed in as a single list that might look something like this:
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
Note that the top level loop still provide the path to the files, which then open the workbook, then the inner loop unpacks the specification and pass the ids along with the workbook to the inner function, extract the required data and append that to results dictionary under the respective key (by year).
As this is relatively simple, the inner loop remain as is within that function, however for much more complicated business logic, it is wise to consider moving the inner loop outside to a separate function so that the logic is contained in a more concise way.
Now to use it:
results = process_workbooks_by_spec('path_to_excel_files', spec)
Accessing results['2015']
should now return the list of values extracted, and one can simply pass the results into a function that will process this. Consider modifying show_results
to this:
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + str(year) + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %s" % (year, brutto_total(result)))
Now rather than relying on some global value outside of the function and some hard-coded value defined at the module, this only take in the results dictionary produced by the process_workbooks_by_spec
function and iterate through every item and produce an output. Of course, the sum is done at the time of calculation and if this is undesirable, it can be moved as part of the results as another dictionary, but this is up to you.
I would also not prematurely have some math processing function to return something that is not a number, so brutto_total
should return the number as is, and format it later to keep the summary string consistent. Putting it all together
from os.path import join
from glob import glob
from collections import defaultdict
import openpyxl
def brutto_total(netto):
return netto + netto * 0.2
def read_value_from_workbook(workbook, sheet_id, cell_id):
sheet = workbook[sheet_id]
return sheet[cell_id].value
def process_workbooks_by_spec(excel_root, specification):
results = defaultdict(list)
for path in glob(join(excel_root, '*.xlsx')):
workbook = openpyxl.load_workbook(path, data_only=True)
for year, sheet_id, cell_id in specification:
value = read_value_from_workbook(workbook, sheet_id, cell_id)
results[year].append(value)
return results
def show_results(results):
for year, values in sorted(results.items()):
result = sum(values)
print("-" * 14 + year + "-" * 14)
print("Total Sum Netto %s: %.2f" % (year, result))
print("Total Sum Brutto %s: %.2f" % (year, brutto_total(result)))
spec = [
('2015', 'Blatt1', 'L46'),
('2016', 'Blatt1', 'L46'),
('2017', 'Zusammen', 'C11'),
]
results = process_workbooks_by_spec('path_to_excel_files', spec)
show_results(results)
This is how I might have done this (also reminder: I don't have your files and library installed, but this complete version at least pass the syntax check); hopefully this shows how you might be able to make use of the rich standard library functions that are available in Python to simplify certain tasks, and to also think about how one might break a program down and restructure it into a more clear, reusable form.
edited Feb 7 at 9:11
answered Feb 7 at 3:41
metatoaster
23614
23614
This is all very great and thanks for the excellent explanation on how you would go and do this. The final result of what you would do is still too advanced for me thought. As for me, it does not compile correctly because I have the Excel files in separate folders, according to year. When looking at your code, it appears that it is assuming that the files are within the same folder. I tried to work out how I would make it access the individual folders, however I can not get my head around what to even do to get started. :) I will keep this in mind though and will come back to this.
â jego
Feb 9 at 10:31
Ah, you can pass in the specific directories by the first argument of the originalprocess_workbooks_in_dir
, so you can invoke it likeprocess_workbooks_in_dir('2016_excel_files', 'Blatt1', 'L46')
for 2016, and thenprocess_workbooks_in_dir('2017_excel_files', 'Zusammen', 'C11')
for 2017, and so on. The advanced example did assume the files are all in one directory. Assumptions can increase the difficulty in understanding for both of us, and this can be hard to avoid with limited information. You were on a good track and I hope I didn't cause more confusion for you!
â metatoaster
Feb 9 at 11:42
add a comment |Â
This is all very great and thanks for the excellent explanation on how you would go and do this. The final result of what you would do is still too advanced for me thought. As for me, it does not compile correctly because I have the Excel files in separate folders, according to year. When looking at your code, it appears that it is assuming that the files are within the same folder. I tried to work out how I would make it access the individual folders, however I can not get my head around what to even do to get started. :) I will keep this in mind though and will come back to this.
â jego
Feb 9 at 10:31
Ah, you can pass in the specific directories by the first argument of the originalprocess_workbooks_in_dir
, so you can invoke it likeprocess_workbooks_in_dir('2016_excel_files', 'Blatt1', 'L46')
for 2016, and thenprocess_workbooks_in_dir('2017_excel_files', 'Zusammen', 'C11')
for 2017, and so on. The advanced example did assume the files are all in one directory. Assumptions can increase the difficulty in understanding for both of us, and this can be hard to avoid with limited information. You were on a good track and I hope I didn't cause more confusion for you!
â metatoaster
Feb 9 at 11:42
This is all very great and thanks for the excellent explanation on how you would go and do this. The final result of what you would do is still too advanced for me thought. As for me, it does not compile correctly because I have the Excel files in separate folders, according to year. When looking at your code, it appears that it is assuming that the files are within the same folder. I tried to work out how I would make it access the individual folders, however I can not get my head around what to even do to get started. :) I will keep this in mind though and will come back to this.
â jego
Feb 9 at 10:31
This is all very great and thanks for the excellent explanation on how you would go and do this. The final result of what you would do is still too advanced for me thought. As for me, it does not compile correctly because I have the Excel files in separate folders, according to year. When looking at your code, it appears that it is assuming that the files are within the same folder. I tried to work out how I would make it access the individual folders, however I can not get my head around what to even do to get started. :) I will keep this in mind though and will come back to this.
â jego
Feb 9 at 10:31
Ah, you can pass in the specific directories by the first argument of the original
process_workbooks_in_dir
, so you can invoke it like process_workbooks_in_dir('2016_excel_files', 'Blatt1', 'L46')
for 2016, and then process_workbooks_in_dir('2017_excel_files', 'Zusammen', 'C11')
for 2017, and so on. The advanced example did assume the files are all in one directory. Assumptions can increase the difficulty in understanding for both of us, and this can be hard to avoid with limited information. You were on a good track and I hope I didn't cause more confusion for you!â metatoaster
Feb 9 at 11:42
Ah, you can pass in the specific directories by the first argument of the original
process_workbooks_in_dir
, so you can invoke it like process_workbooks_in_dir('2016_excel_files', 'Blatt1', 'L46')
for 2016, and then process_workbooks_in_dir('2017_excel_files', 'Zusammen', 'C11')
for 2017, and so on. The advanced example did assume the files are all in one directory. Assumptions can increase the difficulty in understanding for both of us, and this can be hard to avoid with limited information. You were on a good track and I hope I didn't cause more confusion for you!â metatoaster
Feb 9 at 11:42
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186945%2fappend-an-annual-total-to-all-excel-files%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Is
'path_to_excel_files'
really the same for each year or does it actually change per year (as you have yearly folders with Excel files in it)?â Mathias Ettinger
Feb 7 at 10:06
You are correct to assume that there are yearly folders. I already thought about implementing a automation to check what folders are available and store those in variables and work with those instead.
â jego
Feb 8 at 14:52