Python program to transform an Excel document, with some parsing

 Clash Royale CLAN TAG#URR8PPP
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
5
down vote
favorite
I wrote this to parse through an Excel doc, extract desired columns, parse certain cells and output another formatted Excel doc.
This is the original Excel doc I have condensed most of the columns for space

Here are the the results of the program that I am mostly happy with, it parses the name column and writes a unit identifier(i.e 83) along with a tower identifier(i.e 5) on a new line along with a completed date if the completed column is TRUE. I think the only thing I would change would be getting rid of the blank line under the row that has H 7

And here is the code
import xlrd
import xlwt
import re
# open and create workbook
workbookread = xlrd.open_workbook('seedtestexcelbytask.xls')
workbookwrite = xlwt.Workbook('output.xlsx')
# open and create sheets
worksheetread = workbookread.sheet_by_index(0)
worksheetwrite = workbookwrite.add_sheet('Sheet1')
timelist = 
towerlist = 
i = 0
l = 0
t = 0
# iterate through rows in worksheet, get columns completed, completed_at and name.
# add to timelist and towerlist if completed == true
for rows in range(worksheetread.nrows):
 cell0 = worksheetread.cell(rows, 2).value
 cell1 = worksheetread.cell(rows, 3).value
 cell2 = worksheetread.cell(rows, 15).value
 if cell0 == 1:
 parsedvalue = re.split(', |/', cell2)
 timelist.append(cell1)
 towerlist.append(parsedvalue)
b = 2
g = 1
for rows in range(len(timelist)):
 row = worksheetwrite.row(i)
# get current index from timelist and towerlist and assign to currenttime and currentvalue
 currenttime = timelist[t]
 currentname = towerlist[t]
# get the length of the current index this will help us decide how to parse
 length = len(towerlist[t])
# if length <= 2 the index is not formatted properly so we only need index 1 which is the cafe and towers together
 if length <= 2:
 row.write(0, currenttime)
 row.write(1, currentname[1])
 i += 1
 l += 1
# if length = 4 index is formatted properly and we just grab the columns we need
 elif length == 4:
 a = 2
 for values in range(0, 2):
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentname[1])
 row.write(2, currentname[a])
 a += 1
 i += 1
 l += 1
# if length > 4 we need to iterate through each index and write them
 else:
 # define the length of the current cell of the excel sheet we have
 lengthd = (len(currentname)+1)/2
 length = int(lengthd)
 for values in range(length):
 # set variables that we will need to hold on to
 currentcafe = currentname[g]
 currenttower = currentname[b]
 # if the value of the current tower is 2,4,5,6,7 we need to write the current cafe and current tower
 # then move on to the next value and set that as the current cafe, we also need to jump to the
 # value after the current cafe and set that as the current tower
 if currenttower == "2" or currenttower == "4" or currenttower == "5" or currenttower == "6" or currenttower == "7":
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentcafe)
 row.write(2, currenttower)
 b += 1
 i += 1
 g = b
 b = g + 1
 # otherwise we can just write the value fo the current cafe and current tower, advance to the next value
 # which is another tower and set current tower to that
 else:
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentcafe)
 row.write(2, currenttower)
 b += 1
 i += 1
 l += 1
 i += 1
 t += 1
workbookwrite.save('output.xls')
This is the first time I have written anything with Python and I am fairly new to programming, any comments would be greatly appreciated.
python beginner parsing excel
add a comment |Â
up vote
5
down vote
favorite
I wrote this to parse through an Excel doc, extract desired columns, parse certain cells and output another formatted Excel doc.
This is the original Excel doc I have condensed most of the columns for space

Here are the the results of the program that I am mostly happy with, it parses the name column and writes a unit identifier(i.e 83) along with a tower identifier(i.e 5) on a new line along with a completed date if the completed column is TRUE. I think the only thing I would change would be getting rid of the blank line under the row that has H 7

And here is the code
import xlrd
import xlwt
import re
# open and create workbook
workbookread = xlrd.open_workbook('seedtestexcelbytask.xls')
workbookwrite = xlwt.Workbook('output.xlsx')
# open and create sheets
worksheetread = workbookread.sheet_by_index(0)
worksheetwrite = workbookwrite.add_sheet('Sheet1')
timelist = 
towerlist = 
i = 0
l = 0
t = 0
# iterate through rows in worksheet, get columns completed, completed_at and name.
# add to timelist and towerlist if completed == true
for rows in range(worksheetread.nrows):
 cell0 = worksheetread.cell(rows, 2).value
 cell1 = worksheetread.cell(rows, 3).value
 cell2 = worksheetread.cell(rows, 15).value
 if cell0 == 1:
 parsedvalue = re.split(', |/', cell2)
 timelist.append(cell1)
 towerlist.append(parsedvalue)
b = 2
g = 1
for rows in range(len(timelist)):
 row = worksheetwrite.row(i)
# get current index from timelist and towerlist and assign to currenttime and currentvalue
 currenttime = timelist[t]
 currentname = towerlist[t]
# get the length of the current index this will help us decide how to parse
 length = len(towerlist[t])
# if length <= 2 the index is not formatted properly so we only need index 1 which is the cafe and towers together
 if length <= 2:
 row.write(0, currenttime)
 row.write(1, currentname[1])
 i += 1
 l += 1
# if length = 4 index is formatted properly and we just grab the columns we need
 elif length == 4:
 a = 2
 for values in range(0, 2):
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentname[1])
 row.write(2, currentname[a])
 a += 1
 i += 1
 l += 1
# if length > 4 we need to iterate through each index and write them
 else:
 # define the length of the current cell of the excel sheet we have
 lengthd = (len(currentname)+1)/2
 length = int(lengthd)
 for values in range(length):
 # set variables that we will need to hold on to
 currentcafe = currentname[g]
 currenttower = currentname[b]
 # if the value of the current tower is 2,4,5,6,7 we need to write the current cafe and current tower
 # then move on to the next value and set that as the current cafe, we also need to jump to the
 # value after the current cafe and set that as the current tower
 if currenttower == "2" or currenttower == "4" or currenttower == "5" or currenttower == "6" or currenttower == "7":
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentcafe)
 row.write(2, currenttower)
 b += 1
 i += 1
 g = b
 b = g + 1
 # otherwise we can just write the value fo the current cafe and current tower, advance to the next value
 # which is another tower and set current tower to that
 else:
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentcafe)
 row.write(2, currenttower)
 b += 1
 i += 1
 l += 1
 i += 1
 t += 1
workbookwrite.save('output.xls')
This is the first time I have written anything with Python and I am fairly new to programming, any comments would be greatly appreciated.
python beginner parsing excel
 
 
 2
 
 
 
 
 Could you expand the column width to fully show the "notes" column? Also, could you summarize what this transformation is for?
 â 200_success
 Jun 7 at 20:55
 
 
 
 
 
 
 
 
 
 I updated the the pic to better show relevant columns
 â azryel6500
 Jun 8 at 14:27
 
 
 
 
 
 
 
 
 
 
 check out- pandas
 â Maarten Fabré
 Jun 8 at 14:30
 
 
 
add a comment |Â
up vote
5
down vote
favorite
up vote
5
down vote
favorite
I wrote this to parse through an Excel doc, extract desired columns, parse certain cells and output another formatted Excel doc.
This is the original Excel doc I have condensed most of the columns for space

Here are the the results of the program that I am mostly happy with, it parses the name column and writes a unit identifier(i.e 83) along with a tower identifier(i.e 5) on a new line along with a completed date if the completed column is TRUE. I think the only thing I would change would be getting rid of the blank line under the row that has H 7

And here is the code
import xlrd
import xlwt
import re
# open and create workbook
workbookread = xlrd.open_workbook('seedtestexcelbytask.xls')
workbookwrite = xlwt.Workbook('output.xlsx')
# open and create sheets
worksheetread = workbookread.sheet_by_index(0)
worksheetwrite = workbookwrite.add_sheet('Sheet1')
timelist = 
towerlist = 
i = 0
l = 0
t = 0
# iterate through rows in worksheet, get columns completed, completed_at and name.
# add to timelist and towerlist if completed == true
for rows in range(worksheetread.nrows):
 cell0 = worksheetread.cell(rows, 2).value
 cell1 = worksheetread.cell(rows, 3).value
 cell2 = worksheetread.cell(rows, 15).value
 if cell0 == 1:
 parsedvalue = re.split(', |/', cell2)
 timelist.append(cell1)
 towerlist.append(parsedvalue)
b = 2
g = 1
for rows in range(len(timelist)):
 row = worksheetwrite.row(i)
# get current index from timelist and towerlist and assign to currenttime and currentvalue
 currenttime = timelist[t]
 currentname = towerlist[t]
# get the length of the current index this will help us decide how to parse
 length = len(towerlist[t])
# if length <= 2 the index is not formatted properly so we only need index 1 which is the cafe and towers together
 if length <= 2:
 row.write(0, currenttime)
 row.write(1, currentname[1])
 i += 1
 l += 1
# if length = 4 index is formatted properly and we just grab the columns we need
 elif length == 4:
 a = 2
 for values in range(0, 2):
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentname[1])
 row.write(2, currentname[a])
 a += 1
 i += 1
 l += 1
# if length > 4 we need to iterate through each index and write them
 else:
 # define the length of the current cell of the excel sheet we have
 lengthd = (len(currentname)+1)/2
 length = int(lengthd)
 for values in range(length):
 # set variables that we will need to hold on to
 currentcafe = currentname[g]
 currenttower = currentname[b]
 # if the value of the current tower is 2,4,5,6,7 we need to write the current cafe and current tower
 # then move on to the next value and set that as the current cafe, we also need to jump to the
 # value after the current cafe and set that as the current tower
 if currenttower == "2" or currenttower == "4" or currenttower == "5" or currenttower == "6" or currenttower == "7":
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentcafe)
 row.write(2, currenttower)
 b += 1
 i += 1
 g = b
 b = g + 1
 # otherwise we can just write the value fo the current cafe and current tower, advance to the next value
 # which is another tower and set current tower to that
 else:
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentcafe)
 row.write(2, currenttower)
 b += 1
 i += 1
 l += 1
 i += 1
 t += 1
workbookwrite.save('output.xls')
This is the first time I have written anything with Python and I am fairly new to programming, any comments would be greatly appreciated.
python beginner parsing excel
I wrote this to parse through an Excel doc, extract desired columns, parse certain cells and output another formatted Excel doc.
This is the original Excel doc I have condensed most of the columns for space

Here are the the results of the program that I am mostly happy with, it parses the name column and writes a unit identifier(i.e 83) along with a tower identifier(i.e 5) on a new line along with a completed date if the completed column is TRUE. I think the only thing I would change would be getting rid of the blank line under the row that has H 7

And here is the code
import xlrd
import xlwt
import re
# open and create workbook
workbookread = xlrd.open_workbook('seedtestexcelbytask.xls')
workbookwrite = xlwt.Workbook('output.xlsx')
# open and create sheets
worksheetread = workbookread.sheet_by_index(0)
worksheetwrite = workbookwrite.add_sheet('Sheet1')
timelist = 
towerlist = 
i = 0
l = 0
t = 0
# iterate through rows in worksheet, get columns completed, completed_at and name.
# add to timelist and towerlist if completed == true
for rows in range(worksheetread.nrows):
 cell0 = worksheetread.cell(rows, 2).value
 cell1 = worksheetread.cell(rows, 3).value
 cell2 = worksheetread.cell(rows, 15).value
 if cell0 == 1:
 parsedvalue = re.split(', |/', cell2)
 timelist.append(cell1)
 towerlist.append(parsedvalue)
b = 2
g = 1
for rows in range(len(timelist)):
 row = worksheetwrite.row(i)
# get current index from timelist and towerlist and assign to currenttime and currentvalue
 currenttime = timelist[t]
 currentname = towerlist[t]
# get the length of the current index this will help us decide how to parse
 length = len(towerlist[t])
# if length <= 2 the index is not formatted properly so we only need index 1 which is the cafe and towers together
 if length <= 2:
 row.write(0, currenttime)
 row.write(1, currentname[1])
 i += 1
 l += 1
# if length = 4 index is formatted properly and we just grab the columns we need
 elif length == 4:
 a = 2
 for values in range(0, 2):
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentname[1])
 row.write(2, currentname[a])
 a += 1
 i += 1
 l += 1
# if length > 4 we need to iterate through each index and write them
 else:
 # define the length of the current cell of the excel sheet we have
 lengthd = (len(currentname)+1)/2
 length = int(lengthd)
 for values in range(length):
 # set variables that we will need to hold on to
 currentcafe = currentname[g]
 currenttower = currentname[b]
 # if the value of the current tower is 2,4,5,6,7 we need to write the current cafe and current tower
 # then move on to the next value and set that as the current cafe, we also need to jump to the
 # value after the current cafe and set that as the current tower
 if currenttower == "2" or currenttower == "4" or currenttower == "5" or currenttower == "6" or currenttower == "7":
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentcafe)
 row.write(2, currenttower)
 b += 1
 i += 1
 g = b
 b = g + 1
 # otherwise we can just write the value fo the current cafe and current tower, advance to the next value
 # which is another tower and set current tower to that
 else:
 row = worksheetwrite.row(l)
 row.write(0, currenttime)
 row.write(1, currentcafe)
 row.write(2, currenttower)
 b += 1
 i += 1
 l += 1
 i += 1
 t += 1
workbookwrite.save('output.xls')
This is the first time I have written anything with Python and I am fairly new to programming, any comments would be greatly appreciated.
python beginner parsing excel
edited Jun 8 at 14:26
asked Jun 7 at 20:13
azryel6500
263
263
 
 
 2
 
 
 
 
 Could you expand the column width to fully show the "notes" column? Also, could you summarize what this transformation is for?
 â 200_success
 Jun 7 at 20:55
 
 
 
 
 
 
 
 
 
 I updated the the pic to better show relevant columns
 â azryel6500
 Jun 8 at 14:27
 
 
 
 
 
 
 
 
 
 
 check out- pandas
 â Maarten Fabré
 Jun 8 at 14:30
 
 
 
add a comment |Â
 
 
 2
 
 
 
 
 Could you expand the column width to fully show the "notes" column? Also, could you summarize what this transformation is for?
 â 200_success
 Jun 7 at 20:55
 
 
 
 
 
 
 
 
 
 I updated the the pic to better show relevant columns
 â azryel6500
 Jun 8 at 14:27
 
 
 
 
 
 
 
 
 
 
 check out- pandas
 â Maarten Fabré
 Jun 8 at 14:30
 
 
 
2
2
Could you expand the column width to fully show the "notes" column? Also, could you summarize what this transformation is for?
â 200_success
Jun 7 at 20:55
Could you expand the column width to fully show the "notes" column? Also, could you summarize what this transformation is for?
â 200_success
Jun 7 at 20:55
I updated the the pic to better show relevant columns
â azryel6500
Jun 8 at 14:27
I updated the the pic to better show relevant columns
â azryel6500
Jun 8 at 14:27
check out
pandasâ Maarten Fabré
Jun 8 at 14:30
check out
pandasâ Maarten Fabré
Jun 8 at 14:30
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f196066%2fpython-program-to-transform-an-excel-document-with-some-parsing%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
Could you expand the column width to fully show the "notes" column? Also, could you summarize what this transformation is for?
â 200_success
Jun 7 at 20:55
I updated the the pic to better show relevant columns
â azryel6500
Jun 8 at 14:27
check out
pandasâ Maarten Fabré
Jun 8 at 14:30