Merging two files' lines of text for output

Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
5
down vote
favorite
My program is supposed to merge two text files' lines together. For instance, if I have two files, one.txt:
a b c d e f g h i j k l m
n o p q r s t u v w x y z
and two.txt:
0 1 2 3 4
5 6 7 8 9
The output is:
a b c d e f g h i j k l m0 1 2 3 4
n o p q r s t u v w x y z5 6 7 8 9
However, I feel as though some of the conditions are redundant. However, I'm not sure how I can better design this.
// Program to merge lines from two files and output results
#include <stdio.h>
int main(int argc, char *argv)
char *inName1 = argv[1], *inName2 = argv[2];
FILE *in1, *in2;
int c, d;
// Ensure correct usage
if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;
// Open input files and return if unable to open
if ((in1 = fopen(inName1, "r")) == NULL)
fprintf(stderr, "Can't open %s.n", inName1);
return 2;
if ((in2 = fopen(inName2, "r")) == NULL)
fprintf(stderr, "Can't open %s.n", inName2);
return 3;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
else
while ((d = getc(in2)) != EOF)
if (d != 'n')
putc(d, stdout);
else
putc('n', stdout);
break;
if (d == EOF)
while (c != EOF)
putc(c, stdout);
c = getc(in1);
if (c == EOF)
while ((d = getc(in2)) != EOF)
putc(d, stdout);
fclose(in1);
fclose(in2);
printf("nProcess ended successfully.n");
return 0;
c file file-system
add a comment |Â
up vote
5
down vote
favorite
My program is supposed to merge two text files' lines together. For instance, if I have two files, one.txt:
a b c d e f g h i j k l m
n o p q r s t u v w x y z
and two.txt:
0 1 2 3 4
5 6 7 8 9
The output is:
a b c d e f g h i j k l m0 1 2 3 4
n o p q r s t u v w x y z5 6 7 8 9
However, I feel as though some of the conditions are redundant. However, I'm not sure how I can better design this.
// Program to merge lines from two files and output results
#include <stdio.h>
int main(int argc, char *argv)
char *inName1 = argv[1], *inName2 = argv[2];
FILE *in1, *in2;
int c, d;
// Ensure correct usage
if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;
// Open input files and return if unable to open
if ((in1 = fopen(inName1, "r")) == NULL)
fprintf(stderr, "Can't open %s.n", inName1);
return 2;
if ((in2 = fopen(inName2, "r")) == NULL)
fprintf(stderr, "Can't open %s.n", inName2);
return 3;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
else
while ((d = getc(in2)) != EOF)
if (d != 'n')
putc(d, stdout);
else
putc('n', stdout);
break;
if (d == EOF)
while (c != EOF)
putc(c, stdout);
c = getc(in1);
if (c == EOF)
while ((d = getc(in2)) != EOF)
putc(d, stdout);
fclose(in1);
fclose(in2);
printf("nProcess ended successfully.n");
return 0;
c file file-system
So it's supposed to be exactly likepaste -d ''?
â Toby Speight
Mar 26 at 10:38
Corner cases not specified: Considerone.txtandtwo.extexist like above, but 1 of them lacks a final'n'. Should output contain a final'n'? (IMO, it should)
â chux
Mar 27 at 12:30
add a comment |Â
up vote
5
down vote
favorite
up vote
5
down vote
favorite
My program is supposed to merge two text files' lines together. For instance, if I have two files, one.txt:
a b c d e f g h i j k l m
n o p q r s t u v w x y z
and two.txt:
0 1 2 3 4
5 6 7 8 9
The output is:
a b c d e f g h i j k l m0 1 2 3 4
n o p q r s t u v w x y z5 6 7 8 9
However, I feel as though some of the conditions are redundant. However, I'm not sure how I can better design this.
// Program to merge lines from two files and output results
#include <stdio.h>
int main(int argc, char *argv)
char *inName1 = argv[1], *inName2 = argv[2];
FILE *in1, *in2;
int c, d;
// Ensure correct usage
if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;
// Open input files and return if unable to open
if ((in1 = fopen(inName1, "r")) == NULL)
fprintf(stderr, "Can't open %s.n", inName1);
return 2;
if ((in2 = fopen(inName2, "r")) == NULL)
fprintf(stderr, "Can't open %s.n", inName2);
return 3;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
else
while ((d = getc(in2)) != EOF)
if (d != 'n')
putc(d, stdout);
else
putc('n', stdout);
break;
if (d == EOF)
while (c != EOF)
putc(c, stdout);
c = getc(in1);
if (c == EOF)
while ((d = getc(in2)) != EOF)
putc(d, stdout);
fclose(in1);
fclose(in2);
printf("nProcess ended successfully.n");
return 0;
c file file-system
My program is supposed to merge two text files' lines together. For instance, if I have two files, one.txt:
a b c d e f g h i j k l m
n o p q r s t u v w x y z
and two.txt:
0 1 2 3 4
5 6 7 8 9
The output is:
a b c d e f g h i j k l m0 1 2 3 4
n o p q r s t u v w x y z5 6 7 8 9
However, I feel as though some of the conditions are redundant. However, I'm not sure how I can better design this.
// Program to merge lines from two files and output results
#include <stdio.h>
int main(int argc, char *argv)
char *inName1 = argv[1], *inName2 = argv[2];
FILE *in1, *in2;
int c, d;
// Ensure correct usage
if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;
// Open input files and return if unable to open
if ((in1 = fopen(inName1, "r")) == NULL)
fprintf(stderr, "Can't open %s.n", inName1);
return 2;
if ((in2 = fopen(inName2, "r")) == NULL)
fprintf(stderr, "Can't open %s.n", inName2);
return 3;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
else
while ((d = getc(in2)) != EOF)
if (d != 'n')
putc(d, stdout);
else
putc('n', stdout);
break;
if (d == EOF)
while (c != EOF)
putc(c, stdout);
c = getc(in1);
if (c == EOF)
while ((d = getc(in2)) != EOF)
putc(d, stdout);
fclose(in1);
fclose(in2);
printf("nProcess ended successfully.n");
return 0;
c file file-system
asked Mar 26 at 4:01
Areg Sarvazyan
284
284
So it's supposed to be exactly likepaste -d ''?
â Toby Speight
Mar 26 at 10:38
Corner cases not specified: Considerone.txtandtwo.extexist like above, but 1 of them lacks a final'n'. Should output contain a final'n'? (IMO, it should)
â chux
Mar 27 at 12:30
add a comment |Â
So it's supposed to be exactly likepaste -d ''?
â Toby Speight
Mar 26 at 10:38
Corner cases not specified: Considerone.txtandtwo.extexist like above, but 1 of them lacks a final'n'. Should output contain a final'n'? (IMO, it should)
â chux
Mar 27 at 12:30
So it's supposed to be exactly like
paste -d ''?â Toby Speight
Mar 26 at 10:38
So it's supposed to be exactly like
paste -d ''?â Toby Speight
Mar 26 at 10:38
Corner cases not specified: Consider
one.txt and two.ext exist like above, but 1 of them lacks a final 'n'. Should output contain a final 'n'? (IMO, it should)â chux
Mar 27 at 12:30
Corner cases not specified: Consider
one.txt and two.ext exist like above, but 1 of them lacks a final 'n'. Should output contain a final 'n'? (IMO, it should)â chux
Mar 27 at 12:30
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
3
down vote
accepted
Some good things
This compiles cleanly with all my usual compiler warnings enabled - well done!
You've used stderr appropriately for the output (with the exception of the final message); that's good practice.
Don't dereference argv before you've checked its bounds
This is wrong:
char *inName1 = argv[1], *inName2 = argv[2];
if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;
We can't read from argv[1] or argv[2] until after we know that argc is big enough:
if (argc != 3)
fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;
char *inName1 = argv[1], *inName2 = argv[2];
I've enhanced your error message to use the actual name of the program, as passed in argv[0].
Report actual error type on failure
"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:
FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;
I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.
A slight tweak to the logic
The loops look mostly correct; there's a small duplication we can avoid here:
while ((d = getc(in2)) != EOF)
if (d != 'n')
putc(d, stdout);
else
putc('n', stdout);
break;
We can bring the putc() outside the if-else like this:
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
A redundant test
while ((c = getc(in1)) != EOF)
/* ... */
/* c can only be EOF here */
if (c == EOF)
/* ... */
Consider some edge cases
You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.
Alternative logic
I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:
int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
if (d == EOF)
break;
while ((c = getc(in1)) != EOF)
putc(c, stdout);
while ((d = getc(in2)) != EOF)
putc(d, stdout);
I've kept both c and d variables for this, although we only really need one for both files.
Error checking of output
None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.
Error checking of input
We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().
Final message
printf("nProcess ended successfully.n");
This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.
Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value ofputc()andgetc()? And why would usingperror()make the program more specific? Does the fact that theFILE *points toNULLmean that the permission to access the file was denied?
â Areg Sarvazyan
Mar 26 at 16:08
1
Error checking -if (putc(c, stdout) == EOF)then something went wrong, andcwas not written.perror()uses the contents oferrno(which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
â Toby Speight
Mar 26 at 16:19
But wouldn't(putc(c, stdout) == EOF)be redundant when the character I am scanning withgetc()is always being evaluated againsEOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
â Areg Sarvazyan
Mar 27 at 0:24
Read the man page forputc()again - it will returncwhen it succeeds, butEOFwhen it fails (and the reason will be noted inerrno).
â Toby Speight
Mar 27 at 7:05
1
"we have an error of reading, and we should check that using errno." -->errnois not specified to be set. Usefeof()/ferror()to distinguish end-of-file from error.
â chux
Mar 27 at 12:46
add a comment |Â
up vote
2
down vote
Use functions to avoid repeating code
You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:
int copyline(FILE *in, FILE *out)
int c;
// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);
// This will return either 'n' or EOF
return c;
Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:
// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);
// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);
Try to read whole lines instead of characters at a time
Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.
If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.
Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.
When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.
Close stdout and check for errors
You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.
If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:
mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt
By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.
Try to make your program as generic as possible
Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.
You might also want to allow specifying only one filename on the command line, or even none at all!
2
Er, your firstcopyline()copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
â Toby Speight
Mar 27 at 7:09
1
Bug:copyline(FILE *in, FILE *out)only returnsEOF. Change towhile ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
â chux
Mar 27 at 12:33
1
copyline()(viafgets()) would make more sense to loop if thelinewas full and no'n'found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
â chux
Mar 27 at 12:40
1
@G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, thenlinewill get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall thatfgets()stops reading when it encounters a'n'. If it reads'', it saves that null character and continues like any other non-'n'. Sincestrlen(line)then returns 0,line[len - 1]is UB.
â chux
Mar 27 at 19:35
1
@G.Sliepenline[strcspn(line, "n")] = '';is a one-line alternative to lop off the potential'n'without UB.fgets()and null character input is problematic though. My sample alternative
â chux
Mar 28 at 1:22
 |Â
show 9 more comments
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Some good things
This compiles cleanly with all my usual compiler warnings enabled - well done!
You've used stderr appropriately for the output (with the exception of the final message); that's good practice.
Don't dereference argv before you've checked its bounds
This is wrong:
char *inName1 = argv[1], *inName2 = argv[2];
if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;
We can't read from argv[1] or argv[2] until after we know that argc is big enough:
if (argc != 3)
fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;
char *inName1 = argv[1], *inName2 = argv[2];
I've enhanced your error message to use the actual name of the program, as passed in argv[0].
Report actual error type on failure
"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:
FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;
I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.
A slight tweak to the logic
The loops look mostly correct; there's a small duplication we can avoid here:
while ((d = getc(in2)) != EOF)
if (d != 'n')
putc(d, stdout);
else
putc('n', stdout);
break;
We can bring the putc() outside the if-else like this:
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
A redundant test
while ((c = getc(in1)) != EOF)
/* ... */
/* c can only be EOF here */
if (c == EOF)
/* ... */
Consider some edge cases
You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.
Alternative logic
I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:
int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
if (d == EOF)
break;
while ((c = getc(in1)) != EOF)
putc(c, stdout);
while ((d = getc(in2)) != EOF)
putc(d, stdout);
I've kept both c and d variables for this, although we only really need one for both files.
Error checking of output
None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.
Error checking of input
We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().
Final message
printf("nProcess ended successfully.n");
This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.
Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value ofputc()andgetc()? And why would usingperror()make the program more specific? Does the fact that theFILE *points toNULLmean that the permission to access the file was denied?
â Areg Sarvazyan
Mar 26 at 16:08
1
Error checking -if (putc(c, stdout) == EOF)then something went wrong, andcwas not written.perror()uses the contents oferrno(which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
â Toby Speight
Mar 26 at 16:19
But wouldn't(putc(c, stdout) == EOF)be redundant when the character I am scanning withgetc()is always being evaluated againsEOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
â Areg Sarvazyan
Mar 27 at 0:24
Read the man page forputc()again - it will returncwhen it succeeds, butEOFwhen it fails (and the reason will be noted inerrno).
â Toby Speight
Mar 27 at 7:05
1
"we have an error of reading, and we should check that using errno." -->errnois not specified to be set. Usefeof()/ferror()to distinguish end-of-file from error.
â chux
Mar 27 at 12:46
add a comment |Â
up vote
3
down vote
accepted
Some good things
This compiles cleanly with all my usual compiler warnings enabled - well done!
You've used stderr appropriately for the output (with the exception of the final message); that's good practice.
Don't dereference argv before you've checked its bounds
This is wrong:
char *inName1 = argv[1], *inName2 = argv[2];
if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;
We can't read from argv[1] or argv[2] until after we know that argc is big enough:
if (argc != 3)
fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;
char *inName1 = argv[1], *inName2 = argv[2];
I've enhanced your error message to use the actual name of the program, as passed in argv[0].
Report actual error type on failure
"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:
FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;
I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.
A slight tweak to the logic
The loops look mostly correct; there's a small duplication we can avoid here:
while ((d = getc(in2)) != EOF)
if (d != 'n')
putc(d, stdout);
else
putc('n', stdout);
break;
We can bring the putc() outside the if-else like this:
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
A redundant test
while ((c = getc(in1)) != EOF)
/* ... */
/* c can only be EOF here */
if (c == EOF)
/* ... */
Consider some edge cases
You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.
Alternative logic
I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:
int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
if (d == EOF)
break;
while ((c = getc(in1)) != EOF)
putc(c, stdout);
while ((d = getc(in2)) != EOF)
putc(d, stdout);
I've kept both c and d variables for this, although we only really need one for both files.
Error checking of output
None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.
Error checking of input
We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().
Final message
printf("nProcess ended successfully.n");
This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.
Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value ofputc()andgetc()? And why would usingperror()make the program more specific? Does the fact that theFILE *points toNULLmean that the permission to access the file was denied?
â Areg Sarvazyan
Mar 26 at 16:08
1
Error checking -if (putc(c, stdout) == EOF)then something went wrong, andcwas not written.perror()uses the contents oferrno(which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
â Toby Speight
Mar 26 at 16:19
But wouldn't(putc(c, stdout) == EOF)be redundant when the character I am scanning withgetc()is always being evaluated againsEOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
â Areg Sarvazyan
Mar 27 at 0:24
Read the man page forputc()again - it will returncwhen it succeeds, butEOFwhen it fails (and the reason will be noted inerrno).
â Toby Speight
Mar 27 at 7:05
1
"we have an error of reading, and we should check that using errno." -->errnois not specified to be set. Usefeof()/ferror()to distinguish end-of-file from error.
â chux
Mar 27 at 12:46
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Some good things
This compiles cleanly with all my usual compiler warnings enabled - well done!
You've used stderr appropriately for the output (with the exception of the final message); that's good practice.
Don't dereference argv before you've checked its bounds
This is wrong:
char *inName1 = argv[1], *inName2 = argv[2];
if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;
We can't read from argv[1] or argv[2] until after we know that argc is big enough:
if (argc != 3)
fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;
char *inName1 = argv[1], *inName2 = argv[2];
I've enhanced your error message to use the actual name of the program, as passed in argv[0].
Report actual error type on failure
"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:
FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;
I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.
A slight tweak to the logic
The loops look mostly correct; there's a small duplication we can avoid here:
while ((d = getc(in2)) != EOF)
if (d != 'n')
putc(d, stdout);
else
putc('n', stdout);
break;
We can bring the putc() outside the if-else like this:
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
A redundant test
while ((c = getc(in1)) != EOF)
/* ... */
/* c can only be EOF here */
if (c == EOF)
/* ... */
Consider some edge cases
You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.
Alternative logic
I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:
int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
if (d == EOF)
break;
while ((c = getc(in1)) != EOF)
putc(c, stdout);
while ((d = getc(in2)) != EOF)
putc(d, stdout);
I've kept both c and d variables for this, although we only really need one for both files.
Error checking of output
None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.
Error checking of input
We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().
Final message
printf("nProcess ended successfully.n");
This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.
Some good things
This compiles cleanly with all my usual compiler warnings enabled - well done!
You've used stderr appropriately for the output (with the exception of the final message); that's good practice.
Don't dereference argv before you've checked its bounds
This is wrong:
char *inName1 = argv[1], *inName2 = argv[2];
if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;
We can't read from argv[1] or argv[2] until after we know that argc is big enough:
if (argc != 3)
fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;
char *inName1 = argv[1], *inName2 = argv[2];
I've enhanced your error message to use the actual name of the program, as passed in argv[0].
Report actual error type on failure
"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:
FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;
I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.
A slight tweak to the logic
The loops look mostly correct; there's a small duplication we can avoid here:
while ((d = getc(in2)) != EOF)
if (d != 'n')
putc(d, stdout);
else
putc('n', stdout);
break;
We can bring the putc() outside the if-else like this:
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
A redundant test
while ((c = getc(in1)) != EOF)
/* ... */
/* c can only be EOF here */
if (c == EOF)
/* ... */
Consider some edge cases
You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.
Alternative logic
I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:
int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;
while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;
if (d == EOF)
break;
while ((c = getc(in1)) != EOF)
putc(c, stdout);
while ((d = getc(in2)) != EOF)
putc(d, stdout);
I've kept both c and d variables for this, although we only really need one for both files.
Error checking of output
None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.
Error checking of input
We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().
Final message
printf("nProcess ended successfully.n");
This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.
edited Apr 2 at 10:21
answered Mar 26 at 11:18
Toby Speight
17.6k13490
17.6k13490
Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value ofputc()andgetc()? And why would usingperror()make the program more specific? Does the fact that theFILE *points toNULLmean that the permission to access the file was denied?
â Areg Sarvazyan
Mar 26 at 16:08
1
Error checking -if (putc(c, stdout) == EOF)then something went wrong, andcwas not written.perror()uses the contents oferrno(which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
â Toby Speight
Mar 26 at 16:19
But wouldn't(putc(c, stdout) == EOF)be redundant when the character I am scanning withgetc()is always being evaluated againsEOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
â Areg Sarvazyan
Mar 27 at 0:24
Read the man page forputc()again - it will returncwhen it succeeds, butEOFwhen it fails (and the reason will be noted inerrno).
â Toby Speight
Mar 27 at 7:05
1
"we have an error of reading, and we should check that using errno." -->errnois not specified to be set. Usefeof()/ferror()to distinguish end-of-file from error.
â chux
Mar 27 at 12:46
add a comment |Â
Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value ofputc()andgetc()? And why would usingperror()make the program more specific? Does the fact that theFILE *points toNULLmean that the permission to access the file was denied?
â Areg Sarvazyan
Mar 26 at 16:08
1
Error checking -if (putc(c, stdout) == EOF)then something went wrong, andcwas not written.perror()uses the contents oferrno(which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
â Toby Speight
Mar 26 at 16:19
But wouldn't(putc(c, stdout) == EOF)be redundant when the character I am scanning withgetc()is always being evaluated againsEOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
â Areg Sarvazyan
Mar 27 at 0:24
Read the man page forputc()again - it will returncwhen it succeeds, butEOFwhen it fails (and the reason will be noted inerrno).
â Toby Speight
Mar 27 at 7:05
1
"we have an error of reading, and we should check that using errno." -->errnois not specified to be set. Usefeof()/ferror()to distinguish end-of-file from error.
â chux
Mar 27 at 12:46
Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value of
putc() and getc()? And why would using perror() make the program more specific? Does the fact that the FILE * points to NULL mean that the permission to access the file was denied?â Areg Sarvazyan
Mar 26 at 16:08
Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value of
putc() and getc()? And why would using perror() make the program more specific? Does the fact that the FILE * points to NULL mean that the permission to access the file was denied?â Areg Sarvazyan
Mar 26 at 16:08
1
1
Error checking -
if (putc(c, stdout) == EOF) then something went wrong, and c was not written. perror() uses the contents of errno (which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!â Toby Speight
Mar 26 at 16:19
Error checking -
if (putc(c, stdout) == EOF) then something went wrong, and c was not written. perror() uses the contents of errno (which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!â Toby Speight
Mar 26 at 16:19
But wouldn't
(putc(c, stdout) == EOF) be redundant when the character I am scanning with getc() is always being evaluated agains EOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.â Areg Sarvazyan
Mar 27 at 0:24
But wouldn't
(putc(c, stdout) == EOF) be redundant when the character I am scanning with getc() is always being evaluated agains EOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.â Areg Sarvazyan
Mar 27 at 0:24
Read the man page for
putc() again - it will return c when it succeeds, but EOF when it fails (and the reason will be noted in errno).â Toby Speight
Mar 27 at 7:05
Read the man page for
putc() again - it will return c when it succeeds, but EOF when it fails (and the reason will be noted in errno).â Toby Speight
Mar 27 at 7:05
1
1
"we have an error of reading, and we should check that using errno." -->
errno is not specified to be set. Use feof()/ferror() to distinguish end-of-file from error.â chux
Mar 27 at 12:46
"we have an error of reading, and we should check that using errno." -->
errno is not specified to be set. Use feof()/ferror() to distinguish end-of-file from error.â chux
Mar 27 at 12:46
add a comment |Â
up vote
2
down vote
Use functions to avoid repeating code
You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:
int copyline(FILE *in, FILE *out)
int c;
// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);
// This will return either 'n' or EOF
return c;
Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:
// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);
// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);
Try to read whole lines instead of characters at a time
Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.
If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.
Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.
When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.
Close stdout and check for errors
You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.
If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:
mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt
By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.
Try to make your program as generic as possible
Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.
You might also want to allow specifying only one filename on the command line, or even none at all!
2
Er, your firstcopyline()copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
â Toby Speight
Mar 27 at 7:09
1
Bug:copyline(FILE *in, FILE *out)only returnsEOF. Change towhile ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
â chux
Mar 27 at 12:33
1
copyline()(viafgets()) would make more sense to loop if thelinewas full and no'n'found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
â chux
Mar 27 at 12:40
1
@G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, thenlinewill get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall thatfgets()stops reading when it encounters a'n'. If it reads'', it saves that null character and continues like any other non-'n'. Sincestrlen(line)then returns 0,line[len - 1]is UB.
â chux
Mar 27 at 19:35
1
@G.Sliepenline[strcspn(line, "n")] = '';is a one-line alternative to lop off the potential'n'without UB.fgets()and null character input is problematic though. My sample alternative
â chux
Mar 28 at 1:22
 |Â
show 9 more comments
up vote
2
down vote
Use functions to avoid repeating code
You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:
int copyline(FILE *in, FILE *out)
int c;
// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);
// This will return either 'n' or EOF
return c;
Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:
// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);
// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);
Try to read whole lines instead of characters at a time
Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.
If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.
Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.
When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.
Close stdout and check for errors
You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.
If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:
mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt
By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.
Try to make your program as generic as possible
Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.
You might also want to allow specifying only one filename on the command line, or even none at all!
2
Er, your firstcopyline()copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
â Toby Speight
Mar 27 at 7:09
1
Bug:copyline(FILE *in, FILE *out)only returnsEOF. Change towhile ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
â chux
Mar 27 at 12:33
1
copyline()(viafgets()) would make more sense to loop if thelinewas full and no'n'found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
â chux
Mar 27 at 12:40
1
@G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, thenlinewill get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall thatfgets()stops reading when it encounters a'n'. If it reads'', it saves that null character and continues like any other non-'n'. Sincestrlen(line)then returns 0,line[len - 1]is UB.
â chux
Mar 27 at 19:35
1
@G.Sliepenline[strcspn(line, "n")] = '';is a one-line alternative to lop off the potential'n'without UB.fgets()and null character input is problematic though. My sample alternative
â chux
Mar 28 at 1:22
 |Â
show 9 more comments
up vote
2
down vote
up vote
2
down vote
Use functions to avoid repeating code
You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:
int copyline(FILE *in, FILE *out)
int c;
// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);
// This will return either 'n' or EOF
return c;
Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:
// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);
// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);
Try to read whole lines instead of characters at a time
Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.
If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.
Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.
When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.
Close stdout and check for errors
You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.
If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:
mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt
By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.
Try to make your program as generic as possible
Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.
You might also want to allow specifying only one filename on the command line, or even none at all!
Use functions to avoid repeating code
You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:
int copyline(FILE *in, FILE *out)
int c;
// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);
// This will return either 'n' or EOF
return c;
Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:
// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);
// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);
Try to read whole lines instead of characters at a time
Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.
If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.
Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.
When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.
Close stdout and check for errors
You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.
If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:
mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt
By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.
Try to make your program as generic as possible
Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.
You might also want to allow specifying only one filename on the command line, or even none at all!
edited Mar 27 at 17:49
answered Mar 26 at 20:26
G. Sliepen
62638
62638
2
Er, your firstcopyline()copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
â Toby Speight
Mar 27 at 7:09
1
Bug:copyline(FILE *in, FILE *out)only returnsEOF. Change towhile ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
â chux
Mar 27 at 12:33
1
copyline()(viafgets()) would make more sense to loop if thelinewas full and no'n'found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
â chux
Mar 27 at 12:40
1
@G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, thenlinewill get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall thatfgets()stops reading when it encounters a'n'. If it reads'', it saves that null character and continues like any other non-'n'. Sincestrlen(line)then returns 0,line[len - 1]is UB.
â chux
Mar 27 at 19:35
1
@G.Sliepenline[strcspn(line, "n")] = '';is a one-line alternative to lop off the potential'n'without UB.fgets()and null character input is problematic though. My sample alternative
â chux
Mar 28 at 1:22
 |Â
show 9 more comments
2
Er, your firstcopyline()copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
â Toby Speight
Mar 27 at 7:09
1
Bug:copyline(FILE *in, FILE *out)only returnsEOF. Change towhile ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
â chux
Mar 27 at 12:33
1
copyline()(viafgets()) would make more sense to loop if thelinewas full and no'n'found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
â chux
Mar 27 at 12:40
1
@G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, thenlinewill get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall thatfgets()stops reading when it encounters a'n'. If it reads'', it saves that null character and continues like any other non-'n'. Sincestrlen(line)then returns 0,line[len - 1]is UB.
â chux
Mar 27 at 19:35
1
@G.Sliepenline[strcspn(line, "n")] = '';is a one-line alternative to lop off the potential'n'without UB.fgets()and null character input is problematic though. My sample alternative
â chux
Mar 28 at 1:22
2
2
Er, your first
copyline() copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).â Toby Speight
Mar 27 at 7:09
Er, your first
copyline() copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).â Toby Speight
Mar 27 at 7:09
1
1
Bug:
copyline(FILE *in, FILE *out) only returns EOF. Change to while ((c = getc(in)) != 'n' && c != EOF) putc(c, out); â chux
Mar 27 at 12:33
Bug:
copyline(FILE *in, FILE *out) only returns EOF. Change to while ((c = getc(in)) != 'n' && c != EOF) putc(c, out); â chux
Mar 27 at 12:33
1
1
copyline() (via fgets()) would make more sense to loop if the line was full and no 'n' found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.â chux
Mar 27 at 12:40
copyline() (via fgets()) would make more sense to loop if the line was full and no 'n' found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.â chux
Mar 27 at 12:40
1
1
@G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, then
line will get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall that fgets() stops reading when it encounters a 'n'. If it reads '', it saves that null character and continues like any other non-'n'. Since strlen(line) then returns 0, line[len - 1] is UB.â chux
Mar 27 at 19:35
@G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, then
line will get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall that fgets() stops reading when it encounters a 'n'. If it reads '', it saves that null character and continues like any other non-'n'. Since strlen(line) then returns 0, line[len - 1] is UB.â chux
Mar 27 at 19:35
1
1
@G.Sliepen
line[strcspn(line, "n")] = ''; is a one-line alternative to lop off the potential 'n' without UB. fgets() and null character input is problematic though. My sample alternativeâ chux
Mar 28 at 1:22
@G.Sliepen
line[strcspn(line, "n")] = ''; is a one-line alternative to lop off the potential 'n' without UB. fgets() and null character input is problematic though. My sample alternativeâ chux
Mar 28 at 1:22
 |Â
show 9 more comments
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f190468%2fmerging-two-files-lines-of-text-for-output%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
So it's supposed to be exactly like
paste -d ''?â Toby Speight
Mar 26 at 10:38
Corner cases not specified: Consider
one.txtandtwo.extexist like above, but 1 of them lacks a final'n'. Should output contain a final'n'? (IMO, it should)â chux
Mar 27 at 12:30