Updating data file records

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
4
down vote

favorite












I learnt to code a long, long time ago.



Concerns



Existing approach is messy and will be hard to maintain. System errors are not trapped and the error messages could be better implemented. Final hashcode calculation occurs after the data file is written.



I have been through a few python tutorials some time ago but it is hard without a real-world problem. I'd like to use this as an OOP learning opportunity. This code actually does the job and is well on the way to finalising a file validation feature yet to be implemented.



Everything is up for review here.



Situation



Data files are created by a road model and this program rapidly creates scenarios for option testing. It involves overwriting two 'barrier' fields and and one of two overtaking lane fields in the road (ROD) file. I don't understand or need to care about the many other fields but they must not be changed as some involve complex calculations. A hashcode is being introduced to prevent inadvertent ROD file changes but they must remain human readable. Two other files form part of the data set but the changes are trivial - renaming both files and replacing the first line of one with the filename.



So, a database file properties are a filename, with the filename and description on the first two lines then eight other lines before the 109 character records start. The first six characters in each record are the distance (starting at some arbitrary value near 0.0) incremented by 0.1 in each record. Whole numbers have no decimal or trailing zero. There are two other files in the dataset with different extensions.



The program provides a service of creating a new option from an existing road file set. (New option could be updating existing dataset.) The file and command line are checked and the road file data structure analysed. The hashcode register provides a validation and updating service.



#!/usr/bin/python
# With existing road file create new overtaking lane option file(s).
#
# Copy file, insert overtaking lane and barrier line codes for specified
# chainage, replace first line with new filename, second line with description
# and update MD5 HashCode (future).
#
# eg python OptionEditor.py Demo.ROD 1.5 2.2 P Option.ROD Destination_Description
#
# By Neale Irons version 25/02/2018 (CC BY-SA 4.0)


import sys
import os
import shutil
import tempfile
import hashlib

def main():

def copyFile(src, dest):
# From https://www.pythoncentral.io/how-to-copy-a-file-in-python-with-shutil/
try:
shutil.copy(src, dest)
# eg. src and dest are the same file
except shutil.Error as e:
print('Error: %s' % e)
# eg. source or destination doesn't exist
except IOError as e:
print('Error: %s' % e.strerror)
return

def HashCode_register(HASHFILE, src, action):
# with open (hashfile) as hf:
# hash, fn = (open(HASHFILE).read().split('n').split(' *')
hashcode = hashlib.md5(open(src, 'rb').read()).hexdigest()
print hashcode+ " *" + src # Debug ****
lines = open(HASHFILE).read().split('n')
for item in lines:
if action == 'validate':
if hashcode in item: # Maybe uppercase compare
return True
elif action == "update":
if src in item: # Maybe uppercase compare
print lines # Debug - no-op
# if item_number_hashcode != hashcode: # FIX THIS ***
# item_hashcode(update_number) = hashcode # need write access here FIX THIS ***
# else:
# write_append_HASFILE(hashcode + " *" + src) # need write access here FIX THIS ***
# return True
return False


def replace_textfile_line(text_filename, new_text, description):
# Replace first line with new_text.
#
# Adapted from ReplFHdr.py
# By Neale Irons version 14/02/2018 (CC BY-SA 4.0)
# import sys, os, tempfile
with open(text_filename) as src:
with tempfile.NamedTemporaryFile('w',
dir=os.path.dirname(text_filename), delete=False) as dst:
line = src.readline() # Read first line
dst.write(new_text + 'n') # Replace first line
if description: # If description not empty
line = src.readline() # Read second line
dst.write(description + 'n') # Replace second line
shutil.copyfileobj(src, dst) # Copy rest of file
os.unlink(text_filename) # remove old version
os.rename(dst.name, text_filename) # rename new version
return()


#Constants
ODO_STEP_KM = 0.1
RECORD_SIZE = 109
BARRIER_COLUMN1 = 11
BARRIER_COLUMN2 = 16
BARRIER_VALUE = '+' # Debug - use '+' instead of '-'
OTLANE_COLUMN1 = 22
OTLANE_COLUMN2 = 27
OTLANE_VALUE = 'Y' # Debug - use 'Y' instead of 'T'
OTLANE_DIRECTION_CODES = "PC"
HASHFILE = "Hashcode.MD5"
True = not False

# Validate command line
if len(sys.argv) < 6 or len(sys.argv) > 7:
format_string = (' Usage: python Source.ROD Start_km End_km P|C Destination.ROD ["Destination_Description"]')
print(format_string.format(sys.argv[0]))
print(' Use existing road files to create new overtaking lane option and optionally insert second line description.)')
sys.exit()

src = sys.argv[1]
otl_start_km = float(sys.argv[2])
otl_end_km = float(sys.argv[3])
otl_direction = sys.argv[4]
dest = sys.argv[5]
if len(sys.argv) < 6:
description = sys.argv[6]
else:
description = ''

if otl_start_km > otl_end_km:
print("Invalid start/end overtaking lane chainage - start must be less than end. Exiting...")
sys.exit()

if otl_direction not in OTLANE_DIRECTION_CODES:
print("Direction is invalid, must be P or C. Exiting..."
.format(sys.argv[4]))
sys.exit()


# Validate file
if not os.path.isfile(src):
print("File path does not exist. Exiting..."
.format(src))
sys.exit()

if src <> dest:
# Validate support files
src_name, src_ext = os.path.splitext(src)
if not os.path.isfile(src_name+'.MLT'):
print("Requires support file . Exiting..."
.format(src_name+'.MLT'))
sys.exit()
if not os.path.isfile(src_name+'.OBS'):
print("Requires support file . Exiting..."
.format(src_name+'.OBS'))
sys.exit()

# Validate HashCode
if not HashCode_register(HASHFILE, src, "validate"):
print("Invalid file - failed HashCode. Exiting...")

# Read header
with open(src) as fp:
# *** if fails raise "Invalid file format - first 10 lines must be file header
for x in range(1, 10):
line = fp.readline()
header_length=fp.tell() # Calculate header length
fp.seek(0,2)
file_length = fp.tell()
fp.close()

if (file_length - header_length) % RECORD_SIZE != 0:
print("Invalid file format - must be 10 lines and character records. Exiting..."
.format((file_length - header_length) % RECORD_SIZE))
sys.exit()

nrecs = (file_length - header_length) // RECORD_SIZE
if nrecs < 1:
print("Invalid file format - file must contain at least one record. Exiting...")
sys.exit()

# dest_tmp = NamedTemporaryFile(delete=False)
dest_tmp = src + ".tmp"
# print("Temp file: ".format(dest_tmp)) # Debug
copyFile(src, dest_tmp)


# Open file for random read/write
with open(dest_tmp, 'r+b') as fp:
fp.seek(header_length,0)
odo_start_km = float((fp.read(6)))
fp.seek(RECORD_SIZE * -1, 2)
odo_end_km = float((fp.read(6)))

if otl_start_km <= odo_start_km:
print("Invalid start overtaking lane chainage - Overtaking lane can't start before first road chainage. Exiting...")
sys.exit()

if otl_end_km >= odo_end_km:
print("Invalid end overtaking lane chainage - Overtaking lane can't ends after last road chainage. Exiting...")
sys.exit()

if int(round((odo_end_km - odo_start_km)/ODO_STEP_KM) + 1) != nrecs:
print("Invalid file format - incorrect number of lines between start and end distances. Exiting...")
sys.exit()

while otl_start_km < otl_end_km:
fpos=int(round(header_length + ((otl_start_km - odo_start_km) / ODO_STEP_KM * RECORD_SIZE) - 1))

fp.seek(BARRIER_COLUMN1 + fpos)
fp.write(BARRIER_VALUE)
fp.seek(BARRIER_COLUMN2 + fpos)
fp.write(BARRIER_VALUE)

if otl_direction == "P":
fp.seek(OTLANE_COLUMN1 + fpos)
elif otl_direction == "C":
fp.seek(OTLANE_COLUMN2 + fpos)
fp.write(OTLANE_VALUE)
otl_start_km += ODO_STEP_KM

fp.close()

if dest <> src :
# Create new supporting files
dest_name, dest_ext = os.path.splitext(dest)
copyFile(src_name+'.MLT', dest_name+'.MLT')
replace_textfile_line(src_name+'.MLT', src_name+'.MLT', '')
copyFile(src_name+'.OBS', dest_name+'.OBS')

replace_textfile_line(dest_tmp, dest, description)
HashCode_register(HASHFILE, src, "update") # Not working yet ****

from shutil import move
move(dest_tmp, dest)

if __name__ == '__main__':
main()


Data record snippet (not much happening in this sample):



 0.8 -1 -1 F F 9 9
0.9 -1 -1 F F 10 201
1 -1 -1 F F 11 202
1.1 -1 -1 F F 12 12
1.2 1 -1 F F 13 13
1.3 1 -1 F F 14 14
1.4 1 -1 F F 15 15
1.5 -1 1 F F 16 16
1.6 -1 1 F F 17 17
1.7 -1 1 F F 18 18
1.8 -1 -1 F F 19 19
1.9 -1 -1 F F 20 20
2 -1 -1 F F 21 21
2.1 -1 -1 F F 22 22
2.2 -1 -1 F F 23 23
2.3 -1 -1 F F 24 24


Hashcode.MD5:



8d443f2e93a3f0b67f442e4f1d5a4d6d *md5.exe
eb574b236133e60c989c6f472f07827b *md5sum.exe
c1f62b08d050f2a30790c5b2f3d29f5c *Demo.ROD
9f036e40755daaa5a5fc141d3cbfbb99 *nGetPid.exe






share|improve this question





















  • Is this code working? FIX THIS and Not working yet make it seem that this code isn't ready for review yet.
    – Gerrit0
    Feb 25 at 22:17










  • Yes - I am using this code and guarantee it works. New functionality to extend Hashcode_Register for update action is under development and not implemented yet. Those are the comments you are referring to. Making these changes prompted me to seek a code review.
    – nealei
    Feb 26 at 1:37

















up vote
4
down vote

favorite












I learnt to code a long, long time ago.



Concerns



Existing approach is messy and will be hard to maintain. System errors are not trapped and the error messages could be better implemented. Final hashcode calculation occurs after the data file is written.



I have been through a few python tutorials some time ago but it is hard without a real-world problem. I'd like to use this as an OOP learning opportunity. This code actually does the job and is well on the way to finalising a file validation feature yet to be implemented.



Everything is up for review here.



Situation



Data files are created by a road model and this program rapidly creates scenarios for option testing. It involves overwriting two 'barrier' fields and and one of two overtaking lane fields in the road (ROD) file. I don't understand or need to care about the many other fields but they must not be changed as some involve complex calculations. A hashcode is being introduced to prevent inadvertent ROD file changes but they must remain human readable. Two other files form part of the data set but the changes are trivial - renaming both files and replacing the first line of one with the filename.



So, a database file properties are a filename, with the filename and description on the first two lines then eight other lines before the 109 character records start. The first six characters in each record are the distance (starting at some arbitrary value near 0.0) incremented by 0.1 in each record. Whole numbers have no decimal or trailing zero. There are two other files in the dataset with different extensions.



The program provides a service of creating a new option from an existing road file set. (New option could be updating existing dataset.) The file and command line are checked and the road file data structure analysed. The hashcode register provides a validation and updating service.



#!/usr/bin/python
# With existing road file create new overtaking lane option file(s).
#
# Copy file, insert overtaking lane and barrier line codes for specified
# chainage, replace first line with new filename, second line with description
# and update MD5 HashCode (future).
#
# eg python OptionEditor.py Demo.ROD 1.5 2.2 P Option.ROD Destination_Description
#
# By Neale Irons version 25/02/2018 (CC BY-SA 4.0)


import sys
import os
import shutil
import tempfile
import hashlib

def main():

def copyFile(src, dest):
# From https://www.pythoncentral.io/how-to-copy-a-file-in-python-with-shutil/
try:
shutil.copy(src, dest)
# eg. src and dest are the same file
except shutil.Error as e:
print('Error: %s' % e)
# eg. source or destination doesn't exist
except IOError as e:
print('Error: %s' % e.strerror)
return

def HashCode_register(HASHFILE, src, action):
# with open (hashfile) as hf:
# hash, fn = (open(HASHFILE).read().split('n').split(' *')
hashcode = hashlib.md5(open(src, 'rb').read()).hexdigest()
print hashcode+ " *" + src # Debug ****
lines = open(HASHFILE).read().split('n')
for item in lines:
if action == 'validate':
if hashcode in item: # Maybe uppercase compare
return True
elif action == "update":
if src in item: # Maybe uppercase compare
print lines # Debug - no-op
# if item_number_hashcode != hashcode: # FIX THIS ***
# item_hashcode(update_number) = hashcode # need write access here FIX THIS ***
# else:
# write_append_HASFILE(hashcode + " *" + src) # need write access here FIX THIS ***
# return True
return False


def replace_textfile_line(text_filename, new_text, description):
# Replace first line with new_text.
#
# Adapted from ReplFHdr.py
# By Neale Irons version 14/02/2018 (CC BY-SA 4.0)
# import sys, os, tempfile
with open(text_filename) as src:
with tempfile.NamedTemporaryFile('w',
dir=os.path.dirname(text_filename), delete=False) as dst:
line = src.readline() # Read first line
dst.write(new_text + 'n') # Replace first line
if description: # If description not empty
line = src.readline() # Read second line
dst.write(description + 'n') # Replace second line
shutil.copyfileobj(src, dst) # Copy rest of file
os.unlink(text_filename) # remove old version
os.rename(dst.name, text_filename) # rename new version
return()


#Constants
ODO_STEP_KM = 0.1
RECORD_SIZE = 109
BARRIER_COLUMN1 = 11
BARRIER_COLUMN2 = 16
BARRIER_VALUE = '+' # Debug - use '+' instead of '-'
OTLANE_COLUMN1 = 22
OTLANE_COLUMN2 = 27
OTLANE_VALUE = 'Y' # Debug - use 'Y' instead of 'T'
OTLANE_DIRECTION_CODES = "PC"
HASHFILE = "Hashcode.MD5"
True = not False

# Validate command line
if len(sys.argv) < 6 or len(sys.argv) > 7:
format_string = (' Usage: python Source.ROD Start_km End_km P|C Destination.ROD ["Destination_Description"]')
print(format_string.format(sys.argv[0]))
print(' Use existing road files to create new overtaking lane option and optionally insert second line description.)')
sys.exit()

src = sys.argv[1]
otl_start_km = float(sys.argv[2])
otl_end_km = float(sys.argv[3])
otl_direction = sys.argv[4]
dest = sys.argv[5]
if len(sys.argv) < 6:
description = sys.argv[6]
else:
description = ''

if otl_start_km > otl_end_km:
print("Invalid start/end overtaking lane chainage - start must be less than end. Exiting...")
sys.exit()

if otl_direction not in OTLANE_DIRECTION_CODES:
print("Direction is invalid, must be P or C. Exiting..."
.format(sys.argv[4]))
sys.exit()


# Validate file
if not os.path.isfile(src):
print("File path does not exist. Exiting..."
.format(src))
sys.exit()

if src <> dest:
# Validate support files
src_name, src_ext = os.path.splitext(src)
if not os.path.isfile(src_name+'.MLT'):
print("Requires support file . Exiting..."
.format(src_name+'.MLT'))
sys.exit()
if not os.path.isfile(src_name+'.OBS'):
print("Requires support file . Exiting..."
.format(src_name+'.OBS'))
sys.exit()

# Validate HashCode
if not HashCode_register(HASHFILE, src, "validate"):
print("Invalid file - failed HashCode. Exiting...")

# Read header
with open(src) as fp:
# *** if fails raise "Invalid file format - first 10 lines must be file header
for x in range(1, 10):
line = fp.readline()
header_length=fp.tell() # Calculate header length
fp.seek(0,2)
file_length = fp.tell()
fp.close()

if (file_length - header_length) % RECORD_SIZE != 0:
print("Invalid file format - must be 10 lines and character records. Exiting..."
.format((file_length - header_length) % RECORD_SIZE))
sys.exit()

nrecs = (file_length - header_length) // RECORD_SIZE
if nrecs < 1:
print("Invalid file format - file must contain at least one record. Exiting...")
sys.exit()

# dest_tmp = NamedTemporaryFile(delete=False)
dest_tmp = src + ".tmp"
# print("Temp file: ".format(dest_tmp)) # Debug
copyFile(src, dest_tmp)


# Open file for random read/write
with open(dest_tmp, 'r+b') as fp:
fp.seek(header_length,0)
odo_start_km = float((fp.read(6)))
fp.seek(RECORD_SIZE * -1, 2)
odo_end_km = float((fp.read(6)))

if otl_start_km <= odo_start_km:
print("Invalid start overtaking lane chainage - Overtaking lane can't start before first road chainage. Exiting...")
sys.exit()

if otl_end_km >= odo_end_km:
print("Invalid end overtaking lane chainage - Overtaking lane can't ends after last road chainage. Exiting...")
sys.exit()

if int(round((odo_end_km - odo_start_km)/ODO_STEP_KM) + 1) != nrecs:
print("Invalid file format - incorrect number of lines between start and end distances. Exiting...")
sys.exit()

while otl_start_km < otl_end_km:
fpos=int(round(header_length + ((otl_start_km - odo_start_km) / ODO_STEP_KM * RECORD_SIZE) - 1))

fp.seek(BARRIER_COLUMN1 + fpos)
fp.write(BARRIER_VALUE)
fp.seek(BARRIER_COLUMN2 + fpos)
fp.write(BARRIER_VALUE)

if otl_direction == "P":
fp.seek(OTLANE_COLUMN1 + fpos)
elif otl_direction == "C":
fp.seek(OTLANE_COLUMN2 + fpos)
fp.write(OTLANE_VALUE)
otl_start_km += ODO_STEP_KM

fp.close()

if dest <> src :
# Create new supporting files
dest_name, dest_ext = os.path.splitext(dest)
copyFile(src_name+'.MLT', dest_name+'.MLT')
replace_textfile_line(src_name+'.MLT', src_name+'.MLT', '')
copyFile(src_name+'.OBS', dest_name+'.OBS')

replace_textfile_line(dest_tmp, dest, description)
HashCode_register(HASHFILE, src, "update") # Not working yet ****

from shutil import move
move(dest_tmp, dest)

if __name__ == '__main__':
main()


Data record snippet (not much happening in this sample):



 0.8 -1 -1 F F 9 9
0.9 -1 -1 F F 10 201
1 -1 -1 F F 11 202
1.1 -1 -1 F F 12 12
1.2 1 -1 F F 13 13
1.3 1 -1 F F 14 14
1.4 1 -1 F F 15 15
1.5 -1 1 F F 16 16
1.6 -1 1 F F 17 17
1.7 -1 1 F F 18 18
1.8 -1 -1 F F 19 19
1.9 -1 -1 F F 20 20
2 -1 -1 F F 21 21
2.1 -1 -1 F F 22 22
2.2 -1 -1 F F 23 23
2.3 -1 -1 F F 24 24


Hashcode.MD5:



8d443f2e93a3f0b67f442e4f1d5a4d6d *md5.exe
eb574b236133e60c989c6f472f07827b *md5sum.exe
c1f62b08d050f2a30790c5b2f3d29f5c *Demo.ROD
9f036e40755daaa5a5fc141d3cbfbb99 *nGetPid.exe






share|improve this question





















  • Is this code working? FIX THIS and Not working yet make it seem that this code isn't ready for review yet.
    – Gerrit0
    Feb 25 at 22:17










  • Yes - I am using this code and guarantee it works. New functionality to extend Hashcode_Register for update action is under development and not implemented yet. Those are the comments you are referring to. Making these changes prompted me to seek a code review.
    – nealei
    Feb 26 at 1:37













up vote
4
down vote

favorite









up vote
4
down vote

favorite











I learnt to code a long, long time ago.



Concerns



Existing approach is messy and will be hard to maintain. System errors are not trapped and the error messages could be better implemented. Final hashcode calculation occurs after the data file is written.



I have been through a few python tutorials some time ago but it is hard without a real-world problem. I'd like to use this as an OOP learning opportunity. This code actually does the job and is well on the way to finalising a file validation feature yet to be implemented.



Everything is up for review here.



Situation



Data files are created by a road model and this program rapidly creates scenarios for option testing. It involves overwriting two 'barrier' fields and and one of two overtaking lane fields in the road (ROD) file. I don't understand or need to care about the many other fields but they must not be changed as some involve complex calculations. A hashcode is being introduced to prevent inadvertent ROD file changes but they must remain human readable. Two other files form part of the data set but the changes are trivial - renaming both files and replacing the first line of one with the filename.



So, a database file properties are a filename, with the filename and description on the first two lines then eight other lines before the 109 character records start. The first six characters in each record are the distance (starting at some arbitrary value near 0.0) incremented by 0.1 in each record. Whole numbers have no decimal or trailing zero. There are two other files in the dataset with different extensions.



The program provides a service of creating a new option from an existing road file set. (New option could be updating existing dataset.) The file and command line are checked and the road file data structure analysed. The hashcode register provides a validation and updating service.



#!/usr/bin/python
# With existing road file create new overtaking lane option file(s).
#
# Copy file, insert overtaking lane and barrier line codes for specified
# chainage, replace first line with new filename, second line with description
# and update MD5 HashCode (future).
#
# eg python OptionEditor.py Demo.ROD 1.5 2.2 P Option.ROD Destination_Description
#
# By Neale Irons version 25/02/2018 (CC BY-SA 4.0)


import sys
import os
import shutil
import tempfile
import hashlib

def main():

def copyFile(src, dest):
# From https://www.pythoncentral.io/how-to-copy-a-file-in-python-with-shutil/
try:
shutil.copy(src, dest)
# eg. src and dest are the same file
except shutil.Error as e:
print('Error: %s' % e)
# eg. source or destination doesn't exist
except IOError as e:
print('Error: %s' % e.strerror)
return

def HashCode_register(HASHFILE, src, action):
# with open (hashfile) as hf:
# hash, fn = (open(HASHFILE).read().split('n').split(' *')
hashcode = hashlib.md5(open(src, 'rb').read()).hexdigest()
print hashcode+ " *" + src # Debug ****
lines = open(HASHFILE).read().split('n')
for item in lines:
if action == 'validate':
if hashcode in item: # Maybe uppercase compare
return True
elif action == "update":
if src in item: # Maybe uppercase compare
print lines # Debug - no-op
# if item_number_hashcode != hashcode: # FIX THIS ***
# item_hashcode(update_number) = hashcode # need write access here FIX THIS ***
# else:
# write_append_HASFILE(hashcode + " *" + src) # need write access here FIX THIS ***
# return True
return False


def replace_textfile_line(text_filename, new_text, description):
# Replace first line with new_text.
#
# Adapted from ReplFHdr.py
# By Neale Irons version 14/02/2018 (CC BY-SA 4.0)
# import sys, os, tempfile
with open(text_filename) as src:
with tempfile.NamedTemporaryFile('w',
dir=os.path.dirname(text_filename), delete=False) as dst:
line = src.readline() # Read first line
dst.write(new_text + 'n') # Replace first line
if description: # If description not empty
line = src.readline() # Read second line
dst.write(description + 'n') # Replace second line
shutil.copyfileobj(src, dst) # Copy rest of file
os.unlink(text_filename) # remove old version
os.rename(dst.name, text_filename) # rename new version
return()


#Constants
ODO_STEP_KM = 0.1
RECORD_SIZE = 109
BARRIER_COLUMN1 = 11
BARRIER_COLUMN2 = 16
BARRIER_VALUE = '+' # Debug - use '+' instead of '-'
OTLANE_COLUMN1 = 22
OTLANE_COLUMN2 = 27
OTLANE_VALUE = 'Y' # Debug - use 'Y' instead of 'T'
OTLANE_DIRECTION_CODES = "PC"
HASHFILE = "Hashcode.MD5"
True = not False

# Validate command line
if len(sys.argv) < 6 or len(sys.argv) > 7:
format_string = (' Usage: python Source.ROD Start_km End_km P|C Destination.ROD ["Destination_Description"]')
print(format_string.format(sys.argv[0]))
print(' Use existing road files to create new overtaking lane option and optionally insert second line description.)')
sys.exit()

src = sys.argv[1]
otl_start_km = float(sys.argv[2])
otl_end_km = float(sys.argv[3])
otl_direction = sys.argv[4]
dest = sys.argv[5]
if len(sys.argv) < 6:
description = sys.argv[6]
else:
description = ''

if otl_start_km > otl_end_km:
print("Invalid start/end overtaking lane chainage - start must be less than end. Exiting...")
sys.exit()

if otl_direction not in OTLANE_DIRECTION_CODES:
print("Direction is invalid, must be P or C. Exiting..."
.format(sys.argv[4]))
sys.exit()


# Validate file
if not os.path.isfile(src):
print("File path does not exist. Exiting..."
.format(src))
sys.exit()

if src <> dest:
# Validate support files
src_name, src_ext = os.path.splitext(src)
if not os.path.isfile(src_name+'.MLT'):
print("Requires support file . Exiting..."
.format(src_name+'.MLT'))
sys.exit()
if not os.path.isfile(src_name+'.OBS'):
print("Requires support file . Exiting..."
.format(src_name+'.OBS'))
sys.exit()

# Validate HashCode
if not HashCode_register(HASHFILE, src, "validate"):
print("Invalid file - failed HashCode. Exiting...")

# Read header
with open(src) as fp:
# *** if fails raise "Invalid file format - first 10 lines must be file header
for x in range(1, 10):
line = fp.readline()
header_length=fp.tell() # Calculate header length
fp.seek(0,2)
file_length = fp.tell()
fp.close()

if (file_length - header_length) % RECORD_SIZE != 0:
print("Invalid file format - must be 10 lines and character records. Exiting..."
.format((file_length - header_length) % RECORD_SIZE))
sys.exit()

nrecs = (file_length - header_length) // RECORD_SIZE
if nrecs < 1:
print("Invalid file format - file must contain at least one record. Exiting...")
sys.exit()

# dest_tmp = NamedTemporaryFile(delete=False)
dest_tmp = src + ".tmp"
# print("Temp file: ".format(dest_tmp)) # Debug
copyFile(src, dest_tmp)


# Open file for random read/write
with open(dest_tmp, 'r+b') as fp:
fp.seek(header_length,0)
odo_start_km = float((fp.read(6)))
fp.seek(RECORD_SIZE * -1, 2)
odo_end_km = float((fp.read(6)))

if otl_start_km <= odo_start_km:
print("Invalid start overtaking lane chainage - Overtaking lane can't start before first road chainage. Exiting...")
sys.exit()

if otl_end_km >= odo_end_km:
print("Invalid end overtaking lane chainage - Overtaking lane can't ends after last road chainage. Exiting...")
sys.exit()

if int(round((odo_end_km - odo_start_km)/ODO_STEP_KM) + 1) != nrecs:
print("Invalid file format - incorrect number of lines between start and end distances. Exiting...")
sys.exit()

while otl_start_km < otl_end_km:
fpos=int(round(header_length + ((otl_start_km - odo_start_km) / ODO_STEP_KM * RECORD_SIZE) - 1))

fp.seek(BARRIER_COLUMN1 + fpos)
fp.write(BARRIER_VALUE)
fp.seek(BARRIER_COLUMN2 + fpos)
fp.write(BARRIER_VALUE)

if otl_direction == "P":
fp.seek(OTLANE_COLUMN1 + fpos)
elif otl_direction == "C":
fp.seek(OTLANE_COLUMN2 + fpos)
fp.write(OTLANE_VALUE)
otl_start_km += ODO_STEP_KM

fp.close()

if dest <> src :
# Create new supporting files
dest_name, dest_ext = os.path.splitext(dest)
copyFile(src_name+'.MLT', dest_name+'.MLT')
replace_textfile_line(src_name+'.MLT', src_name+'.MLT', '')
copyFile(src_name+'.OBS', dest_name+'.OBS')

replace_textfile_line(dest_tmp, dest, description)
HashCode_register(HASHFILE, src, "update") # Not working yet ****

from shutil import move
move(dest_tmp, dest)

if __name__ == '__main__':
main()


Data record snippet (not much happening in this sample):



 0.8 -1 -1 F F 9 9
0.9 -1 -1 F F 10 201
1 -1 -1 F F 11 202
1.1 -1 -1 F F 12 12
1.2 1 -1 F F 13 13
1.3 1 -1 F F 14 14
1.4 1 -1 F F 15 15
1.5 -1 1 F F 16 16
1.6 -1 1 F F 17 17
1.7 -1 1 F F 18 18
1.8 -1 -1 F F 19 19
1.9 -1 -1 F F 20 20
2 -1 -1 F F 21 21
2.1 -1 -1 F F 22 22
2.2 -1 -1 F F 23 23
2.3 -1 -1 F F 24 24


Hashcode.MD5:



8d443f2e93a3f0b67f442e4f1d5a4d6d *md5.exe
eb574b236133e60c989c6f472f07827b *md5sum.exe
c1f62b08d050f2a30790c5b2f3d29f5c *Demo.ROD
9f036e40755daaa5a5fc141d3cbfbb99 *nGetPid.exe






share|improve this question













I learnt to code a long, long time ago.



Concerns



Existing approach is messy and will be hard to maintain. System errors are not trapped and the error messages could be better implemented. Final hashcode calculation occurs after the data file is written.



I have been through a few python tutorials some time ago but it is hard without a real-world problem. I'd like to use this as an OOP learning opportunity. This code actually does the job and is well on the way to finalising a file validation feature yet to be implemented.



Everything is up for review here.



Situation



Data files are created by a road model and this program rapidly creates scenarios for option testing. It involves overwriting two 'barrier' fields and and one of two overtaking lane fields in the road (ROD) file. I don't understand or need to care about the many other fields but they must not be changed as some involve complex calculations. A hashcode is being introduced to prevent inadvertent ROD file changes but they must remain human readable. Two other files form part of the data set but the changes are trivial - renaming both files and replacing the first line of one with the filename.



So, a database file properties are a filename, with the filename and description on the first two lines then eight other lines before the 109 character records start. The first six characters in each record are the distance (starting at some arbitrary value near 0.0) incremented by 0.1 in each record. Whole numbers have no decimal or trailing zero. There are two other files in the dataset with different extensions.



The program provides a service of creating a new option from an existing road file set. (New option could be updating existing dataset.) The file and command line are checked and the road file data structure analysed. The hashcode register provides a validation and updating service.



#!/usr/bin/python
# With existing road file create new overtaking lane option file(s).
#
# Copy file, insert overtaking lane and barrier line codes for specified
# chainage, replace first line with new filename, second line with description
# and update MD5 HashCode (future).
#
# eg python OptionEditor.py Demo.ROD 1.5 2.2 P Option.ROD Destination_Description
#
# By Neale Irons version 25/02/2018 (CC BY-SA 4.0)


import sys
import os
import shutil
import tempfile
import hashlib

def main():

def copyFile(src, dest):
# From https://www.pythoncentral.io/how-to-copy-a-file-in-python-with-shutil/
try:
shutil.copy(src, dest)
# eg. src and dest are the same file
except shutil.Error as e:
print('Error: %s' % e)
# eg. source or destination doesn't exist
except IOError as e:
print('Error: %s' % e.strerror)
return

def HashCode_register(HASHFILE, src, action):
# with open (hashfile) as hf:
# hash, fn = (open(HASHFILE).read().split('n').split(' *')
hashcode = hashlib.md5(open(src, 'rb').read()).hexdigest()
print hashcode+ " *" + src # Debug ****
lines = open(HASHFILE).read().split('n')
for item in lines:
if action == 'validate':
if hashcode in item: # Maybe uppercase compare
return True
elif action == "update":
if src in item: # Maybe uppercase compare
print lines # Debug - no-op
# if item_number_hashcode != hashcode: # FIX THIS ***
# item_hashcode(update_number) = hashcode # need write access here FIX THIS ***
# else:
# write_append_HASFILE(hashcode + " *" + src) # need write access here FIX THIS ***
# return True
return False


def replace_textfile_line(text_filename, new_text, description):
# Replace first line with new_text.
#
# Adapted from ReplFHdr.py
# By Neale Irons version 14/02/2018 (CC BY-SA 4.0)
# import sys, os, tempfile
with open(text_filename) as src:
with tempfile.NamedTemporaryFile('w',
dir=os.path.dirname(text_filename), delete=False) as dst:
line = src.readline() # Read first line
dst.write(new_text + 'n') # Replace first line
if description: # If description not empty
line = src.readline() # Read second line
dst.write(description + 'n') # Replace second line
shutil.copyfileobj(src, dst) # Copy rest of file
os.unlink(text_filename) # remove old version
os.rename(dst.name, text_filename) # rename new version
return()


#Constants
ODO_STEP_KM = 0.1
RECORD_SIZE = 109
BARRIER_COLUMN1 = 11
BARRIER_COLUMN2 = 16
BARRIER_VALUE = '+' # Debug - use '+' instead of '-'
OTLANE_COLUMN1 = 22
OTLANE_COLUMN2 = 27
OTLANE_VALUE = 'Y' # Debug - use 'Y' instead of 'T'
OTLANE_DIRECTION_CODES = "PC"
HASHFILE = "Hashcode.MD5"
True = not False

# Validate command line
if len(sys.argv) < 6 or len(sys.argv) > 7:
format_string = (' Usage: python Source.ROD Start_km End_km P|C Destination.ROD ["Destination_Description"]')
print(format_string.format(sys.argv[0]))
print(' Use existing road files to create new overtaking lane option and optionally insert second line description.)')
sys.exit()

src = sys.argv[1]
otl_start_km = float(sys.argv[2])
otl_end_km = float(sys.argv[3])
otl_direction = sys.argv[4]
dest = sys.argv[5]
if len(sys.argv) < 6:
description = sys.argv[6]
else:
description = ''

if otl_start_km > otl_end_km:
print("Invalid start/end overtaking lane chainage - start must be less than end. Exiting...")
sys.exit()

if otl_direction not in OTLANE_DIRECTION_CODES:
print("Direction is invalid, must be P or C. Exiting..."
.format(sys.argv[4]))
sys.exit()


# Validate file
if not os.path.isfile(src):
print("File path does not exist. Exiting..."
.format(src))
sys.exit()

if src <> dest:
# Validate support files
src_name, src_ext = os.path.splitext(src)
if not os.path.isfile(src_name+'.MLT'):
print("Requires support file . Exiting..."
.format(src_name+'.MLT'))
sys.exit()
if not os.path.isfile(src_name+'.OBS'):
print("Requires support file . Exiting..."
.format(src_name+'.OBS'))
sys.exit()

# Validate HashCode
if not HashCode_register(HASHFILE, src, "validate"):
print("Invalid file - failed HashCode. Exiting...")

# Read header
with open(src) as fp:
# *** if fails raise "Invalid file format - first 10 lines must be file header
for x in range(1, 10):
line = fp.readline()
header_length=fp.tell() # Calculate header length
fp.seek(0,2)
file_length = fp.tell()
fp.close()

if (file_length - header_length) % RECORD_SIZE != 0:
print("Invalid file format - must be 10 lines and character records. Exiting..."
.format((file_length - header_length) % RECORD_SIZE))
sys.exit()

nrecs = (file_length - header_length) // RECORD_SIZE
if nrecs < 1:
print("Invalid file format - file must contain at least one record. Exiting...")
sys.exit()

# dest_tmp = NamedTemporaryFile(delete=False)
dest_tmp = src + ".tmp"
# print("Temp file: ".format(dest_tmp)) # Debug
copyFile(src, dest_tmp)


# Open file for random read/write
with open(dest_tmp, 'r+b') as fp:
fp.seek(header_length,0)
odo_start_km = float((fp.read(6)))
fp.seek(RECORD_SIZE * -1, 2)
odo_end_km = float((fp.read(6)))

if otl_start_km <= odo_start_km:
print("Invalid start overtaking lane chainage - Overtaking lane can't start before first road chainage. Exiting...")
sys.exit()

if otl_end_km >= odo_end_km:
print("Invalid end overtaking lane chainage - Overtaking lane can't ends after last road chainage. Exiting...")
sys.exit()

if int(round((odo_end_km - odo_start_km)/ODO_STEP_KM) + 1) != nrecs:
print("Invalid file format - incorrect number of lines between start and end distances. Exiting...")
sys.exit()

while otl_start_km < otl_end_km:
fpos=int(round(header_length + ((otl_start_km - odo_start_km) / ODO_STEP_KM * RECORD_SIZE) - 1))

fp.seek(BARRIER_COLUMN1 + fpos)
fp.write(BARRIER_VALUE)
fp.seek(BARRIER_COLUMN2 + fpos)
fp.write(BARRIER_VALUE)

if otl_direction == "P":
fp.seek(OTLANE_COLUMN1 + fpos)
elif otl_direction == "C":
fp.seek(OTLANE_COLUMN2 + fpos)
fp.write(OTLANE_VALUE)
otl_start_km += ODO_STEP_KM

fp.close()

if dest <> src :
# Create new supporting files
dest_name, dest_ext = os.path.splitext(dest)
copyFile(src_name+'.MLT', dest_name+'.MLT')
replace_textfile_line(src_name+'.MLT', src_name+'.MLT', '')
copyFile(src_name+'.OBS', dest_name+'.OBS')

replace_textfile_line(dest_tmp, dest, description)
HashCode_register(HASHFILE, src, "update") # Not working yet ****

from shutil import move
move(dest_tmp, dest)

if __name__ == '__main__':
main()


Data record snippet (not much happening in this sample):



 0.8 -1 -1 F F 9 9
0.9 -1 -1 F F 10 201
1 -1 -1 F F 11 202
1.1 -1 -1 F F 12 12
1.2 1 -1 F F 13 13
1.3 1 -1 F F 14 14
1.4 1 -1 F F 15 15
1.5 -1 1 F F 16 16
1.6 -1 1 F F 17 17
1.7 -1 1 F F 18 18
1.8 -1 -1 F F 19 19
1.9 -1 -1 F F 20 20
2 -1 -1 F F 21 21
2.1 -1 -1 F F 22 22
2.2 -1 -1 F F 23 23
2.3 -1 -1 F F 24 24


Hashcode.MD5:



8d443f2e93a3f0b67f442e4f1d5a4d6d *md5.exe
eb574b236133e60c989c6f472f07827b *md5sum.exe
c1f62b08d050f2a30790c5b2f3d29f5c *Demo.ROD
9f036e40755daaa5a5fc141d3cbfbb99 *nGetPid.exe








share|improve this question












share|improve this question




share|improve this question








edited Mar 1 at 21:31
























asked Feb 25 at 20:40









nealei

213




213











  • Is this code working? FIX THIS and Not working yet make it seem that this code isn't ready for review yet.
    – Gerrit0
    Feb 25 at 22:17










  • Yes - I am using this code and guarantee it works. New functionality to extend Hashcode_Register for update action is under development and not implemented yet. Those are the comments you are referring to. Making these changes prompted me to seek a code review.
    – nealei
    Feb 26 at 1:37

















  • Is this code working? FIX THIS and Not working yet make it seem that this code isn't ready for review yet.
    – Gerrit0
    Feb 25 at 22:17










  • Yes - I am using this code and guarantee it works. New functionality to extend Hashcode_Register for update action is under development and not implemented yet. Those are the comments you are referring to. Making these changes prompted me to seek a code review.
    – nealei
    Feb 26 at 1:37
















Is this code working? FIX THIS and Not working yet make it seem that this code isn't ready for review yet.
– Gerrit0
Feb 25 at 22:17




Is this code working? FIX THIS and Not working yet make it seem that this code isn't ready for review yet.
– Gerrit0
Feb 25 at 22:17












Yes - I am using this code and guarantee it works. New functionality to extend Hashcode_Register for update action is under development and not implemented yet. Those are the comments you are referring to. Making these changes prompted me to seek a code review.
– nealei
Feb 26 at 1:37





Yes - I am using this code and guarantee it works. New functionality to extend Hashcode_Register for update action is under development and not implemented yet. Those are the comments you are referring to. Making these changes prompted me to seek a code review.
– nealei
Feb 26 at 1:37











1 Answer
1






active

oldest

votes

















up vote
5
down vote













#!/usr/bin/python


Consider using #! /usr/bin/env python instead (or python3). Then you can manage import dependencies with venv or, better, conda env update. Conda will adjust PATH to point at the appropriate python interpreter which has the necessary pypi libraries that inevitably get added to a project.



# By Neale Irons version 25/02/2018 (CC BY-SA 4.0)


You probably wanted to use the word 'Copyright', if you want to assert IP rights and modify them using Creative Commons.



def main():


The main function is entirely too long. Limit yourself to one screenful per function. At a minimum, promote any helper functions that might possibly be useful for other functions to call or re-use.



def copyFile(src, dest):


Please follow PEP-8 naming conventions, and call it copy_file.



 try:
shutil.copy(src, dest)


Joey Payne was just illustrating exceptions that one might want to catch. Here, it's not clear that you really want to catch them, given that you don't actually "handle" the error, you merely swallow it after emitting some noise on stdout. If copy_file was merely an alias for copy, that would be fine. But here, it appears to me that copy_file is actually worse, it manages to do a disservice for the caller, by swallowing exceptions. The usual post-condition would be "this method guarantees dest file exists upon return", but as written the post-condition is empty set, the method makes no correctness guarantees at all. That makes things harder for maintenance programmers reading any calling code a few months down the road.



def HashCode_register(HASHFILE, src, action):


Ok, now you're just going crazy with case. Use lowercase with underscores, please. Yes, I know you're passing in a manifest constant in the caller code. But the function definition should use lower case for the formal parameter.



# with open (hashfile) as hf:


I like with! Why did you choose to take on the responsibility of issuing a close(), and then failed to close hf?



 print hashcode+ " *" + src


You appear to be using python2. I urge you to use python3 instead. It makes some things simpler, for example getting I/O encode / decode right. Run flake8 *.py on your sources, and follow its advice regarding whitespace around the + operator.



 lines = open(HASHFILE).read().split('n')


This is equivalent to the .readlines() function. But it would be more natural (and less memory intensive) to phrase it like this:



 with open (hashfile) as hf:
for line in hf:
line = line.rstrip()
...


To loop you write:



 for item in lines:


This is a little odd. The usual idiom would be:



 for line in lines:


In general, if you have some plural iterable xs, iterate with for x in xs.



 if action == 'validate': ...
elif action == "update":


There are only two values, which suggests you might choose to pass in a boolean. If three values are needed, consider using enum: https://docs.python.org/3/library/enum.html



Ok, I read the function, and I still don't know what it does. Please write a docstring explaining what promise it makes about its boolean return value. The 'update' case, given the current non-comment code, appears to be a constant False function.



In replace_textfile_line, delete the return() line.



True = not False


That's not funny. Do not redefine builtin values with your own computed values. And don't even think about assigning 3.2 to math.pi as the Indiana legislature once did.



if len(sys.argv) < 6 or len(sys.argv) > 7:


Your command line appears to be sufficiently complex to warrant the use of a library: https://docs.python.org/3/library/argparse.html Plus, then you get --help for free!



 format_string = (' Usage: python Source.ROD Start_km End_km P|C Destination.ROD ["Destination_Description"]')


The identifier you were looking for was: usage = ...



if not os.path.isfile(src):


This is correct. But, consider letting open do the validation for you. If the file can't be read, we'll find out soon enough. Same remarks for your subsequent file validations.



if src <> dest:


This is correct but odd. Please use the modern != operator.



if not HashCode_register(HASHFILE, src, "validate"):
print("Invalid file - failed HashCode. Exiting...")


I don't believe you. I don't see an immediate exit. Consider turning that function into one that raises a fatal exception upon finding a file that doesn't validate.



 ...
header_length=fp.tell()


This logic is convoluted. Please write a helper function for it. If reading the entire file is acceptable, then this might suffice:



 header_length = len(''.join(fp.readlines()[:10]))
file_length = os.path.getsize(src)


The with handler is a good style - keep using it.



fp.close()


fp is already closed at this point, thanks to the with.



if (file_length - header_length) % RECORD_SIZE != 0:


This is OK as far as it goes, but I would prefer to see a more helpful diagnostic, e.g.



lines = fp.readlines()
for line in lines[:10]:
if len(line) != RECORD_SIZE: # then show some helpful diagnostic


Oh, wait, I see the whole thing must conform. So use for line in lines:. Also, nrec might come from len(lines).



 fp.seek(RECORD_SIZE * -1, 2)


You're making me nervous here. Using lines[i] and lines[j] would be more robust, in case short or long lines crept into the input file. You seem to believe that every 109th character will be a newline, but you don't validate that.



 fp.seek(BARRIER_COLUMN1 + fpos)
fp.write(BARRIER_VALUE)


Maybe you have a giant file and it's import to patch it up in-place. But your code would be more testable / readable / maintainable if it read lines from a source file and wrote modified lines to a dest file, perhaps with a file rename upon success. You used that pattern very nicely in a previous code fragment. I'm concerned about the possibility of silently corrupting a file you haven't validated. The current code might work great, but after months go by, along with personnel, someone will be faced with a maintenance task that is harder than it needs to be. After all, sometimes file formats change, and fixed record size of 109 might not last until the end of time.



if dest <> src :


Please use !=, and pay attention to flake8 when it tells you to delete the blank before the colon.



from shutil import move


Please keep your imports at top of file. The identifier is generic enough that it might be clearer to phrase it as import shutil and then shutil.move(...)



Keep at it! Getting people to read your code will have a good effect on the clarity of code you write in future.






share|improve this answer























  • Python 3 is not an option and I don't know usual python so your comments, especially flake8 advice, are excellent guidance. Exactly which copy do you recommend? Why wouldn't you read hashcode file into memory to update (replace) hashcode? Is there a nicer way to handle those verification errors? (Say msg strings and all exit in same function.) I suppose system errors should generate 'sys.exit(1)'. The datafile and hashcode files are objects - main question: Is there a better OOP way?
    – nealei
    Mar 1 at 21:29











  • @nealei Python 3 should be an option. Especially for new code.
    – Mathias Ettinger
    May 7 at 8:29










  • No - restricted to 2.7 on corporate system.
    – nealei
    May 7 at 8:56










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f188335%2fupdating-data-file-records%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote













#!/usr/bin/python


Consider using #! /usr/bin/env python instead (or python3). Then you can manage import dependencies with venv or, better, conda env update. Conda will adjust PATH to point at the appropriate python interpreter which has the necessary pypi libraries that inevitably get added to a project.



# By Neale Irons version 25/02/2018 (CC BY-SA 4.0)


You probably wanted to use the word 'Copyright', if you want to assert IP rights and modify them using Creative Commons.



def main():


The main function is entirely too long. Limit yourself to one screenful per function. At a minimum, promote any helper functions that might possibly be useful for other functions to call or re-use.



def copyFile(src, dest):


Please follow PEP-8 naming conventions, and call it copy_file.



 try:
shutil.copy(src, dest)


Joey Payne was just illustrating exceptions that one might want to catch. Here, it's not clear that you really want to catch them, given that you don't actually "handle" the error, you merely swallow it after emitting some noise on stdout. If copy_file was merely an alias for copy, that would be fine. But here, it appears to me that copy_file is actually worse, it manages to do a disservice for the caller, by swallowing exceptions. The usual post-condition would be "this method guarantees dest file exists upon return", but as written the post-condition is empty set, the method makes no correctness guarantees at all. That makes things harder for maintenance programmers reading any calling code a few months down the road.



def HashCode_register(HASHFILE, src, action):


Ok, now you're just going crazy with case. Use lowercase with underscores, please. Yes, I know you're passing in a manifest constant in the caller code. But the function definition should use lower case for the formal parameter.



# with open (hashfile) as hf:


I like with! Why did you choose to take on the responsibility of issuing a close(), and then failed to close hf?



 print hashcode+ " *" + src


You appear to be using python2. I urge you to use python3 instead. It makes some things simpler, for example getting I/O encode / decode right. Run flake8 *.py on your sources, and follow its advice regarding whitespace around the + operator.



 lines = open(HASHFILE).read().split('n')


This is equivalent to the .readlines() function. But it would be more natural (and less memory intensive) to phrase it like this:



 with open (hashfile) as hf:
for line in hf:
line = line.rstrip()
...


To loop you write:



 for item in lines:


This is a little odd. The usual idiom would be:



 for line in lines:


In general, if you have some plural iterable xs, iterate with for x in xs.



 if action == 'validate': ...
elif action == "update":


There are only two values, which suggests you might choose to pass in a boolean. If three values are needed, consider using enum: https://docs.python.org/3/library/enum.html



Ok, I read the function, and I still don't know what it does. Please write a docstring explaining what promise it makes about its boolean return value. The 'update' case, given the current non-comment code, appears to be a constant False function.



In replace_textfile_line, delete the return() line.



True = not False


That's not funny. Do not redefine builtin values with your own computed values. And don't even think about assigning 3.2 to math.pi as the Indiana legislature once did.



if len(sys.argv) < 6 or len(sys.argv) > 7:


Your command line appears to be sufficiently complex to warrant the use of a library: https://docs.python.org/3/library/argparse.html Plus, then you get --help for free!



 format_string = (' Usage: python Source.ROD Start_km End_km P|C Destination.ROD ["Destination_Description"]')


The identifier you were looking for was: usage = ...



if not os.path.isfile(src):


This is correct. But, consider letting open do the validation for you. If the file can't be read, we'll find out soon enough. Same remarks for your subsequent file validations.



if src <> dest:


This is correct but odd. Please use the modern != operator.



if not HashCode_register(HASHFILE, src, "validate"):
print("Invalid file - failed HashCode. Exiting...")


I don't believe you. I don't see an immediate exit. Consider turning that function into one that raises a fatal exception upon finding a file that doesn't validate.



 ...
header_length=fp.tell()


This logic is convoluted. Please write a helper function for it. If reading the entire file is acceptable, then this might suffice:



 header_length = len(''.join(fp.readlines()[:10]))
file_length = os.path.getsize(src)


The with handler is a good style - keep using it.



fp.close()


fp is already closed at this point, thanks to the with.



if (file_length - header_length) % RECORD_SIZE != 0:


This is OK as far as it goes, but I would prefer to see a more helpful diagnostic, e.g.



lines = fp.readlines()
for line in lines[:10]:
if len(line) != RECORD_SIZE: # then show some helpful diagnostic


Oh, wait, I see the whole thing must conform. So use for line in lines:. Also, nrec might come from len(lines).



 fp.seek(RECORD_SIZE * -1, 2)


You're making me nervous here. Using lines[i] and lines[j] would be more robust, in case short or long lines crept into the input file. You seem to believe that every 109th character will be a newline, but you don't validate that.



 fp.seek(BARRIER_COLUMN1 + fpos)
fp.write(BARRIER_VALUE)


Maybe you have a giant file and it's import to patch it up in-place. But your code would be more testable / readable / maintainable if it read lines from a source file and wrote modified lines to a dest file, perhaps with a file rename upon success. You used that pattern very nicely in a previous code fragment. I'm concerned about the possibility of silently corrupting a file you haven't validated. The current code might work great, but after months go by, along with personnel, someone will be faced with a maintenance task that is harder than it needs to be. After all, sometimes file formats change, and fixed record size of 109 might not last until the end of time.



if dest <> src :


Please use !=, and pay attention to flake8 when it tells you to delete the blank before the colon.



from shutil import move


Please keep your imports at top of file. The identifier is generic enough that it might be clearer to phrase it as import shutil and then shutil.move(...)



Keep at it! Getting people to read your code will have a good effect on the clarity of code you write in future.






share|improve this answer























  • Python 3 is not an option and I don't know usual python so your comments, especially flake8 advice, are excellent guidance. Exactly which copy do you recommend? Why wouldn't you read hashcode file into memory to update (replace) hashcode? Is there a nicer way to handle those verification errors? (Say msg strings and all exit in same function.) I suppose system errors should generate 'sys.exit(1)'. The datafile and hashcode files are objects - main question: Is there a better OOP way?
    – nealei
    Mar 1 at 21:29











  • @nealei Python 3 should be an option. Especially for new code.
    – Mathias Ettinger
    May 7 at 8:29










  • No - restricted to 2.7 on corporate system.
    – nealei
    May 7 at 8:56














up vote
5
down vote













#!/usr/bin/python


Consider using #! /usr/bin/env python instead (or python3). Then you can manage import dependencies with venv or, better, conda env update. Conda will adjust PATH to point at the appropriate python interpreter which has the necessary pypi libraries that inevitably get added to a project.



# By Neale Irons version 25/02/2018 (CC BY-SA 4.0)


You probably wanted to use the word 'Copyright', if you want to assert IP rights and modify them using Creative Commons.



def main():


The main function is entirely too long. Limit yourself to one screenful per function. At a minimum, promote any helper functions that might possibly be useful for other functions to call or re-use.



def copyFile(src, dest):


Please follow PEP-8 naming conventions, and call it copy_file.



 try:
shutil.copy(src, dest)


Joey Payne was just illustrating exceptions that one might want to catch. Here, it's not clear that you really want to catch them, given that you don't actually "handle" the error, you merely swallow it after emitting some noise on stdout. If copy_file was merely an alias for copy, that would be fine. But here, it appears to me that copy_file is actually worse, it manages to do a disservice for the caller, by swallowing exceptions. The usual post-condition would be "this method guarantees dest file exists upon return", but as written the post-condition is empty set, the method makes no correctness guarantees at all. That makes things harder for maintenance programmers reading any calling code a few months down the road.



def HashCode_register(HASHFILE, src, action):


Ok, now you're just going crazy with case. Use lowercase with underscores, please. Yes, I know you're passing in a manifest constant in the caller code. But the function definition should use lower case for the formal parameter.



# with open (hashfile) as hf:


I like with! Why did you choose to take on the responsibility of issuing a close(), and then failed to close hf?



 print hashcode+ " *" + src


You appear to be using python2. I urge you to use python3 instead. It makes some things simpler, for example getting I/O encode / decode right. Run flake8 *.py on your sources, and follow its advice regarding whitespace around the + operator.



 lines = open(HASHFILE).read().split('n')


This is equivalent to the .readlines() function. But it would be more natural (and less memory intensive) to phrase it like this:



 with open (hashfile) as hf:
for line in hf:
line = line.rstrip()
...


To loop you write:



 for item in lines:


This is a little odd. The usual idiom would be:



 for line in lines:


In general, if you have some plural iterable xs, iterate with for x in xs.



 if action == 'validate': ...
elif action == "update":


There are only two values, which suggests you might choose to pass in a boolean. If three values are needed, consider using enum: https://docs.python.org/3/library/enum.html



Ok, I read the function, and I still don't know what it does. Please write a docstring explaining what promise it makes about its boolean return value. The 'update' case, given the current non-comment code, appears to be a constant False function.



In replace_textfile_line, delete the return() line.



True = not False


That's not funny. Do not redefine builtin values with your own computed values. And don't even think about assigning 3.2 to math.pi as the Indiana legislature once did.



if len(sys.argv) < 6 or len(sys.argv) > 7:


Your command line appears to be sufficiently complex to warrant the use of a library: https://docs.python.org/3/library/argparse.html Plus, then you get --help for free!



 format_string = (' Usage: python Source.ROD Start_km End_km P|C Destination.ROD ["Destination_Description"]')


The identifier you were looking for was: usage = ...



if not os.path.isfile(src):


This is correct. But, consider letting open do the validation for you. If the file can't be read, we'll find out soon enough. Same remarks for your subsequent file validations.



if src <> dest:


This is correct but odd. Please use the modern != operator.



if not HashCode_register(HASHFILE, src, "validate"):
print("Invalid file - failed HashCode. Exiting...")


I don't believe you. I don't see an immediate exit. Consider turning that function into one that raises a fatal exception upon finding a file that doesn't validate.



 ...
header_length=fp.tell()


This logic is convoluted. Please write a helper function for it. If reading the entire file is acceptable, then this might suffice:



 header_length = len(''.join(fp.readlines()[:10]))
file_length = os.path.getsize(src)


The with handler is a good style - keep using it.



fp.close()


fp is already closed at this point, thanks to the with.



if (file_length - header_length) % RECORD_SIZE != 0:


This is OK as far as it goes, but I would prefer to see a more helpful diagnostic, e.g.



lines = fp.readlines()
for line in lines[:10]:
if len(line) != RECORD_SIZE: # then show some helpful diagnostic


Oh, wait, I see the whole thing must conform. So use for line in lines:. Also, nrec might come from len(lines).



 fp.seek(RECORD_SIZE * -1, 2)


You're making me nervous here. Using lines[i] and lines[j] would be more robust, in case short or long lines crept into the input file. You seem to believe that every 109th character will be a newline, but you don't validate that.



 fp.seek(BARRIER_COLUMN1 + fpos)
fp.write(BARRIER_VALUE)


Maybe you have a giant file and it's import to patch it up in-place. But your code would be more testable / readable / maintainable if it read lines from a source file and wrote modified lines to a dest file, perhaps with a file rename upon success. You used that pattern very nicely in a previous code fragment. I'm concerned about the possibility of silently corrupting a file you haven't validated. The current code might work great, but after months go by, along with personnel, someone will be faced with a maintenance task that is harder than it needs to be. After all, sometimes file formats change, and fixed record size of 109 might not last until the end of time.



if dest <> src :


Please use !=, and pay attention to flake8 when it tells you to delete the blank before the colon.



from shutil import move


Please keep your imports at top of file. The identifier is generic enough that it might be clearer to phrase it as import shutil and then shutil.move(...)



Keep at it! Getting people to read your code will have a good effect on the clarity of code you write in future.






share|improve this answer























  • Python 3 is not an option and I don't know usual python so your comments, especially flake8 advice, are excellent guidance. Exactly which copy do you recommend? Why wouldn't you read hashcode file into memory to update (replace) hashcode? Is there a nicer way to handle those verification errors? (Say msg strings and all exit in same function.) I suppose system errors should generate 'sys.exit(1)'. The datafile and hashcode files are objects - main question: Is there a better OOP way?
    – nealei
    Mar 1 at 21:29











  • @nealei Python 3 should be an option. Especially for new code.
    – Mathias Ettinger
    May 7 at 8:29










  • No - restricted to 2.7 on corporate system.
    – nealei
    May 7 at 8:56












up vote
5
down vote










up vote
5
down vote









#!/usr/bin/python


Consider using #! /usr/bin/env python instead (or python3). Then you can manage import dependencies with venv or, better, conda env update. Conda will adjust PATH to point at the appropriate python interpreter which has the necessary pypi libraries that inevitably get added to a project.



# By Neale Irons version 25/02/2018 (CC BY-SA 4.0)


You probably wanted to use the word 'Copyright', if you want to assert IP rights and modify them using Creative Commons.



def main():


The main function is entirely too long. Limit yourself to one screenful per function. At a minimum, promote any helper functions that might possibly be useful for other functions to call or re-use.



def copyFile(src, dest):


Please follow PEP-8 naming conventions, and call it copy_file.



 try:
shutil.copy(src, dest)


Joey Payne was just illustrating exceptions that one might want to catch. Here, it's not clear that you really want to catch them, given that you don't actually "handle" the error, you merely swallow it after emitting some noise on stdout. If copy_file was merely an alias for copy, that would be fine. But here, it appears to me that copy_file is actually worse, it manages to do a disservice for the caller, by swallowing exceptions. The usual post-condition would be "this method guarantees dest file exists upon return", but as written the post-condition is empty set, the method makes no correctness guarantees at all. That makes things harder for maintenance programmers reading any calling code a few months down the road.



def HashCode_register(HASHFILE, src, action):


Ok, now you're just going crazy with case. Use lowercase with underscores, please. Yes, I know you're passing in a manifest constant in the caller code. But the function definition should use lower case for the formal parameter.



# with open (hashfile) as hf:


I like with! Why did you choose to take on the responsibility of issuing a close(), and then failed to close hf?



 print hashcode+ " *" + src


You appear to be using python2. I urge you to use python3 instead. It makes some things simpler, for example getting I/O encode / decode right. Run flake8 *.py on your sources, and follow its advice regarding whitespace around the + operator.



 lines = open(HASHFILE).read().split('n')


This is equivalent to the .readlines() function. But it would be more natural (and less memory intensive) to phrase it like this:



 with open (hashfile) as hf:
for line in hf:
line = line.rstrip()
...


To loop you write:



 for item in lines:


This is a little odd. The usual idiom would be:



 for line in lines:


In general, if you have some plural iterable xs, iterate with for x in xs.



 if action == 'validate': ...
elif action == "update":


There are only two values, which suggests you might choose to pass in a boolean. If three values are needed, consider using enum: https://docs.python.org/3/library/enum.html



Ok, I read the function, and I still don't know what it does. Please write a docstring explaining what promise it makes about its boolean return value. The 'update' case, given the current non-comment code, appears to be a constant False function.



In replace_textfile_line, delete the return() line.



True = not False


That's not funny. Do not redefine builtin values with your own computed values. And don't even think about assigning 3.2 to math.pi as the Indiana legislature once did.



if len(sys.argv) < 6 or len(sys.argv) > 7:


Your command line appears to be sufficiently complex to warrant the use of a library: https://docs.python.org/3/library/argparse.html Plus, then you get --help for free!



 format_string = (' Usage: python Source.ROD Start_km End_km P|C Destination.ROD ["Destination_Description"]')


The identifier you were looking for was: usage = ...



if not os.path.isfile(src):


This is correct. But, consider letting open do the validation for you. If the file can't be read, we'll find out soon enough. Same remarks for your subsequent file validations.



if src <> dest:


This is correct but odd. Please use the modern != operator.



if not HashCode_register(HASHFILE, src, "validate"):
print("Invalid file - failed HashCode. Exiting...")


I don't believe you. I don't see an immediate exit. Consider turning that function into one that raises a fatal exception upon finding a file that doesn't validate.



 ...
header_length=fp.tell()


This logic is convoluted. Please write a helper function for it. If reading the entire file is acceptable, then this might suffice:



 header_length = len(''.join(fp.readlines()[:10]))
file_length = os.path.getsize(src)


The with handler is a good style - keep using it.



fp.close()


fp is already closed at this point, thanks to the with.



if (file_length - header_length) % RECORD_SIZE != 0:


This is OK as far as it goes, but I would prefer to see a more helpful diagnostic, e.g.



lines = fp.readlines()
for line in lines[:10]:
if len(line) != RECORD_SIZE: # then show some helpful diagnostic


Oh, wait, I see the whole thing must conform. So use for line in lines:. Also, nrec might come from len(lines).



 fp.seek(RECORD_SIZE * -1, 2)


You're making me nervous here. Using lines[i] and lines[j] would be more robust, in case short or long lines crept into the input file. You seem to believe that every 109th character will be a newline, but you don't validate that.



 fp.seek(BARRIER_COLUMN1 + fpos)
fp.write(BARRIER_VALUE)


Maybe you have a giant file and it's import to patch it up in-place. But your code would be more testable / readable / maintainable if it read lines from a source file and wrote modified lines to a dest file, perhaps with a file rename upon success. You used that pattern very nicely in a previous code fragment. I'm concerned about the possibility of silently corrupting a file you haven't validated. The current code might work great, but after months go by, along with personnel, someone will be faced with a maintenance task that is harder than it needs to be. After all, sometimes file formats change, and fixed record size of 109 might not last until the end of time.



if dest <> src :


Please use !=, and pay attention to flake8 when it tells you to delete the blank before the colon.



from shutil import move


Please keep your imports at top of file. The identifier is generic enough that it might be clearer to phrase it as import shutil and then shutil.move(...)



Keep at it! Getting people to read your code will have a good effect on the clarity of code you write in future.






share|improve this answer















#!/usr/bin/python


Consider using #! /usr/bin/env python instead (or python3). Then you can manage import dependencies with venv or, better, conda env update. Conda will adjust PATH to point at the appropriate python interpreter which has the necessary pypi libraries that inevitably get added to a project.



# By Neale Irons version 25/02/2018 (CC BY-SA 4.0)


You probably wanted to use the word 'Copyright', if you want to assert IP rights and modify them using Creative Commons.



def main():


The main function is entirely too long. Limit yourself to one screenful per function. At a minimum, promote any helper functions that might possibly be useful for other functions to call or re-use.



def copyFile(src, dest):


Please follow PEP-8 naming conventions, and call it copy_file.



 try:
shutil.copy(src, dest)


Joey Payne was just illustrating exceptions that one might want to catch. Here, it's not clear that you really want to catch them, given that you don't actually "handle" the error, you merely swallow it after emitting some noise on stdout. If copy_file was merely an alias for copy, that would be fine. But here, it appears to me that copy_file is actually worse, it manages to do a disservice for the caller, by swallowing exceptions. The usual post-condition would be "this method guarantees dest file exists upon return", but as written the post-condition is empty set, the method makes no correctness guarantees at all. That makes things harder for maintenance programmers reading any calling code a few months down the road.



def HashCode_register(HASHFILE, src, action):


Ok, now you're just going crazy with case. Use lowercase with underscores, please. Yes, I know you're passing in a manifest constant in the caller code. But the function definition should use lower case for the formal parameter.



# with open (hashfile) as hf:


I like with! Why did you choose to take on the responsibility of issuing a close(), and then failed to close hf?



 print hashcode+ " *" + src


You appear to be using python2. I urge you to use python3 instead. It makes some things simpler, for example getting I/O encode / decode right. Run flake8 *.py on your sources, and follow its advice regarding whitespace around the + operator.



 lines = open(HASHFILE).read().split('n')


This is equivalent to the .readlines() function. But it would be more natural (and less memory intensive) to phrase it like this:



 with open (hashfile) as hf:
for line in hf:
line = line.rstrip()
...


To loop you write:



 for item in lines:


This is a little odd. The usual idiom would be:



 for line in lines:


In general, if you have some plural iterable xs, iterate with for x in xs.



 if action == 'validate': ...
elif action == "update":


There are only two values, which suggests you might choose to pass in a boolean. If three values are needed, consider using enum: https://docs.python.org/3/library/enum.html



Ok, I read the function, and I still don't know what it does. Please write a docstring explaining what promise it makes about its boolean return value. The 'update' case, given the current non-comment code, appears to be a constant False function.



In replace_textfile_line, delete the return() line.



True = not False


That's not funny. Do not redefine builtin values with your own computed values. And don't even think about assigning 3.2 to math.pi as the Indiana legislature once did.



if len(sys.argv) < 6 or len(sys.argv) > 7:


Your command line appears to be sufficiently complex to warrant the use of a library: https://docs.python.org/3/library/argparse.html Plus, then you get --help for free!



 format_string = (' Usage: python Source.ROD Start_km End_km P|C Destination.ROD ["Destination_Description"]')


The identifier you were looking for was: usage = ...



if not os.path.isfile(src):


This is correct. But, consider letting open do the validation for you. If the file can't be read, we'll find out soon enough. Same remarks for your subsequent file validations.



if src <> dest:


This is correct but odd. Please use the modern != operator.



if not HashCode_register(HASHFILE, src, "validate"):
print("Invalid file - failed HashCode. Exiting...")


I don't believe you. I don't see an immediate exit. Consider turning that function into one that raises a fatal exception upon finding a file that doesn't validate.



 ...
header_length=fp.tell()


This logic is convoluted. Please write a helper function for it. If reading the entire file is acceptable, then this might suffice:



 header_length = len(''.join(fp.readlines()[:10]))
file_length = os.path.getsize(src)


The with handler is a good style - keep using it.



fp.close()


fp is already closed at this point, thanks to the with.



if (file_length - header_length) % RECORD_SIZE != 0:


This is OK as far as it goes, but I would prefer to see a more helpful diagnostic, e.g.



lines = fp.readlines()
for line in lines[:10]:
if len(line) != RECORD_SIZE: # then show some helpful diagnostic


Oh, wait, I see the whole thing must conform. So use for line in lines:. Also, nrec might come from len(lines).



 fp.seek(RECORD_SIZE * -1, 2)


You're making me nervous here. Using lines[i] and lines[j] would be more robust, in case short or long lines crept into the input file. You seem to believe that every 109th character will be a newline, but you don't validate that.



 fp.seek(BARRIER_COLUMN1 + fpos)
fp.write(BARRIER_VALUE)


Maybe you have a giant file and it's import to patch it up in-place. But your code would be more testable / readable / maintainable if it read lines from a source file and wrote modified lines to a dest file, perhaps with a file rename upon success. You used that pattern very nicely in a previous code fragment. I'm concerned about the possibility of silently corrupting a file you haven't validated. The current code might work great, but after months go by, along with personnel, someone will be faced with a maintenance task that is harder than it needs to be. After all, sometimes file formats change, and fixed record size of 109 might not last until the end of time.



if dest <> src :


Please use !=, and pay attention to flake8 when it tells you to delete the blank before the colon.



from shutil import move


Please keep your imports at top of file. The identifier is generic enough that it might be clearer to phrase it as import shutil and then shutil.move(...)



Keep at it! Getting people to read your code will have a good effect on the clarity of code you write in future.







share|improve this answer















share|improve this answer



share|improve this answer








edited Feb 26 at 3:51


























answered Feb 26 at 3:46









J_H

4,317129




4,317129











  • Python 3 is not an option and I don't know usual python so your comments, especially flake8 advice, are excellent guidance. Exactly which copy do you recommend? Why wouldn't you read hashcode file into memory to update (replace) hashcode? Is there a nicer way to handle those verification errors? (Say msg strings and all exit in same function.) I suppose system errors should generate 'sys.exit(1)'. The datafile and hashcode files are objects - main question: Is there a better OOP way?
    – nealei
    Mar 1 at 21:29











  • @nealei Python 3 should be an option. Especially for new code.
    – Mathias Ettinger
    May 7 at 8:29










  • No - restricted to 2.7 on corporate system.
    – nealei
    May 7 at 8:56
















  • Python 3 is not an option and I don't know usual python so your comments, especially flake8 advice, are excellent guidance. Exactly which copy do you recommend? Why wouldn't you read hashcode file into memory to update (replace) hashcode? Is there a nicer way to handle those verification errors? (Say msg strings and all exit in same function.) I suppose system errors should generate 'sys.exit(1)'. The datafile and hashcode files are objects - main question: Is there a better OOP way?
    – nealei
    Mar 1 at 21:29











  • @nealei Python 3 should be an option. Especially for new code.
    – Mathias Ettinger
    May 7 at 8:29










  • No - restricted to 2.7 on corporate system.
    – nealei
    May 7 at 8:56















Python 3 is not an option and I don't know usual python so your comments, especially flake8 advice, are excellent guidance. Exactly which copy do you recommend? Why wouldn't you read hashcode file into memory to update (replace) hashcode? Is there a nicer way to handle those verification errors? (Say msg strings and all exit in same function.) I suppose system errors should generate 'sys.exit(1)'. The datafile and hashcode files are objects - main question: Is there a better OOP way?
– nealei
Mar 1 at 21:29





Python 3 is not an option and I don't know usual python so your comments, especially flake8 advice, are excellent guidance. Exactly which copy do you recommend? Why wouldn't you read hashcode file into memory to update (replace) hashcode? Is there a nicer way to handle those verification errors? (Say msg strings and all exit in same function.) I suppose system errors should generate 'sys.exit(1)'. The datafile and hashcode files are objects - main question: Is there a better OOP way?
– nealei
Mar 1 at 21:29













@nealei Python 3 should be an option. Especially for new code.
– Mathias Ettinger
May 7 at 8:29




@nealei Python 3 should be an option. Especially for new code.
– Mathias Ettinger
May 7 at 8:29












No - restricted to 2.7 on corporate system.
– nealei
May 7 at 8:56




No - restricted to 2.7 on corporate system.
– nealei
May 7 at 8:56












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f188335%2fupdating-data-file-records%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation