Merging two files' lines of text for output

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
5
down vote

favorite












My program is supposed to merge two text files' lines together. For instance, if I have two files, one.txt:



a b c d e f g h i j k l m
n o p q r s t u v w x y z


and two.txt:



0 1 2 3 4
5 6 7 8 9


The output is:



a b c d e f g h i j k l m0 1 2 3 4
n o p q r s t u v w x y z5 6 7 8 9


However, I feel as though some of the conditions are redundant. However, I'm not sure how I can better design this.



// Program to merge lines from two files and output results
#include <stdio.h>

int main(int argc, char *argv)

char *inName1 = argv[1], *inName2 = argv[2];
FILE *in1, *in2;
int c, d;

// Ensure correct usage
if (argc != 3)

fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;


// Open input files and return if unable to open
if ((in1 = fopen(inName1, "r")) == NULL)

fprintf(stderr, "Can't open %s.n", inName1);
return 2;


if ((in2 = fopen(inName2, "r")) == NULL)

fprintf(stderr, "Can't open %s.n", inName2);
return 3;


// Take care of output
while ((c = getc(in1)) != EOF)

if (c != 'n')
putc(c, stdout);
else

while ((d = getc(in2)) != EOF)

if (d != 'n')
putc(d, stdout);
else

putc('n', stdout);
break;



if (d == EOF)

while (c != EOF)

putc(c, stdout);
c = getc(in1);





if (c == EOF)

while ((d = getc(in2)) != EOF)

putc(d, stdout);



fclose(in1);
fclose(in2);

printf("nProcess ended successfully.n");

return 0;







share|improve this question



















  • So it's supposed to be exactly like paste -d ''?
    – Toby Speight
    Mar 26 at 10:38










  • Corner cases not specified: Consider one.txt and two.ext exist like above, but 1 of them lacks a final 'n'. Should output contain a final 'n'? (IMO, it should)
    – chux
    Mar 27 at 12:30

















up vote
5
down vote

favorite












My program is supposed to merge two text files' lines together. For instance, if I have two files, one.txt:



a b c d e f g h i j k l m
n o p q r s t u v w x y z


and two.txt:



0 1 2 3 4
5 6 7 8 9


The output is:



a b c d e f g h i j k l m0 1 2 3 4
n o p q r s t u v w x y z5 6 7 8 9


However, I feel as though some of the conditions are redundant. However, I'm not sure how I can better design this.



// Program to merge lines from two files and output results
#include <stdio.h>

int main(int argc, char *argv)

char *inName1 = argv[1], *inName2 = argv[2];
FILE *in1, *in2;
int c, d;

// Ensure correct usage
if (argc != 3)

fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;


// Open input files and return if unable to open
if ((in1 = fopen(inName1, "r")) == NULL)

fprintf(stderr, "Can't open %s.n", inName1);
return 2;


if ((in2 = fopen(inName2, "r")) == NULL)

fprintf(stderr, "Can't open %s.n", inName2);
return 3;


// Take care of output
while ((c = getc(in1)) != EOF)

if (c != 'n')
putc(c, stdout);
else

while ((d = getc(in2)) != EOF)

if (d != 'n')
putc(d, stdout);
else

putc('n', stdout);
break;



if (d == EOF)

while (c != EOF)

putc(c, stdout);
c = getc(in1);





if (c == EOF)

while ((d = getc(in2)) != EOF)

putc(d, stdout);



fclose(in1);
fclose(in2);

printf("nProcess ended successfully.n");

return 0;







share|improve this question



















  • So it's supposed to be exactly like paste -d ''?
    – Toby Speight
    Mar 26 at 10:38










  • Corner cases not specified: Consider one.txt and two.ext exist like above, but 1 of them lacks a final 'n'. Should output contain a final 'n'? (IMO, it should)
    – chux
    Mar 27 at 12:30













up vote
5
down vote

favorite









up vote
5
down vote

favorite











My program is supposed to merge two text files' lines together. For instance, if I have two files, one.txt:



a b c d e f g h i j k l m
n o p q r s t u v w x y z


and two.txt:



0 1 2 3 4
5 6 7 8 9


The output is:



a b c d e f g h i j k l m0 1 2 3 4
n o p q r s t u v w x y z5 6 7 8 9


However, I feel as though some of the conditions are redundant. However, I'm not sure how I can better design this.



// Program to merge lines from two files and output results
#include <stdio.h>

int main(int argc, char *argv)

char *inName1 = argv[1], *inName2 = argv[2];
FILE *in1, *in2;
int c, d;

// Ensure correct usage
if (argc != 3)

fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;


// Open input files and return if unable to open
if ((in1 = fopen(inName1, "r")) == NULL)

fprintf(stderr, "Can't open %s.n", inName1);
return 2;


if ((in2 = fopen(inName2, "r")) == NULL)

fprintf(stderr, "Can't open %s.n", inName2);
return 3;


// Take care of output
while ((c = getc(in1)) != EOF)

if (c != 'n')
putc(c, stdout);
else

while ((d = getc(in2)) != EOF)

if (d != 'n')
putc(d, stdout);
else

putc('n', stdout);
break;



if (d == EOF)

while (c != EOF)

putc(c, stdout);
c = getc(in1);





if (c == EOF)

while ((d = getc(in2)) != EOF)

putc(d, stdout);



fclose(in1);
fclose(in2);

printf("nProcess ended successfully.n");

return 0;







share|improve this question











My program is supposed to merge two text files' lines together. For instance, if I have two files, one.txt:



a b c d e f g h i j k l m
n o p q r s t u v w x y z


and two.txt:



0 1 2 3 4
5 6 7 8 9


The output is:



a b c d e f g h i j k l m0 1 2 3 4
n o p q r s t u v w x y z5 6 7 8 9


However, I feel as though some of the conditions are redundant. However, I'm not sure how I can better design this.



// Program to merge lines from two files and output results
#include <stdio.h>

int main(int argc, char *argv)

char *inName1 = argv[1], *inName2 = argv[2];
FILE *in1, *in2;
int c, d;

// Ensure correct usage
if (argc != 3)

fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;


// Open input files and return if unable to open
if ((in1 = fopen(inName1, "r")) == NULL)

fprintf(stderr, "Can't open %s.n", inName1);
return 2;


if ((in2 = fopen(inName2, "r")) == NULL)

fprintf(stderr, "Can't open %s.n", inName2);
return 3;


// Take care of output
while ((c = getc(in1)) != EOF)

if (c != 'n')
putc(c, stdout);
else

while ((d = getc(in2)) != EOF)

if (d != 'n')
putc(d, stdout);
else

putc('n', stdout);
break;



if (d == EOF)

while (c != EOF)

putc(c, stdout);
c = getc(in1);





if (c == EOF)

while ((d = getc(in2)) != EOF)

putc(d, stdout);



fclose(in1);
fclose(in2);

printf("nProcess ended successfully.n");

return 0;









share|improve this question










share|improve this question




share|improve this question









asked Mar 26 at 4:01









Areg Sarvazyan

284




284











  • So it's supposed to be exactly like paste -d ''?
    – Toby Speight
    Mar 26 at 10:38










  • Corner cases not specified: Consider one.txt and two.ext exist like above, but 1 of them lacks a final 'n'. Should output contain a final 'n'? (IMO, it should)
    – chux
    Mar 27 at 12:30

















  • So it's supposed to be exactly like paste -d ''?
    – Toby Speight
    Mar 26 at 10:38










  • Corner cases not specified: Consider one.txt and two.ext exist like above, but 1 of them lacks a final 'n'. Should output contain a final 'n'? (IMO, it should)
    – chux
    Mar 27 at 12:30
















So it's supposed to be exactly like paste -d ''?
– Toby Speight
Mar 26 at 10:38




So it's supposed to be exactly like paste -d ''?
– Toby Speight
Mar 26 at 10:38












Corner cases not specified: Consider one.txt and two.ext exist like above, but 1 of them lacks a final 'n'. Should output contain a final 'n'? (IMO, it should)
– chux
Mar 27 at 12:30





Corner cases not specified: Consider one.txt and two.ext exist like above, but 1 of them lacks a final 'n'. Should output contain a final 'n'? (IMO, it should)
– chux
Mar 27 at 12:30











2 Answers
2






active

oldest

votes

















up vote
3
down vote



accepted










Some good things



This compiles cleanly with all my usual compiler warnings enabled - well done!



You've used stderr appropriately for the output (with the exception of the final message); that's good practice.



Don't dereference argv before you've checked its bounds



This is wrong:



 char *inName1 = argv[1], *inName2 = argv[2];

if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;



We can't read from argv[1] or argv[2] until after we know that argc is big enough:



 if (argc != 3)

fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;

char *inName1 = argv[1], *inName2 = argv[2];


I've enhanced your error message to use the actual name of the program, as passed in argv[0].



Report actual error type on failure



"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:



 FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;



I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.



A slight tweak to the logic



The loops look mostly correct; there's a small duplication we can avoid here:



 while ((d = getc(in2)) != EOF)

if (d != 'n')
putc(d, stdout);
else

putc('n', stdout);
break;




We can bring the putc() outside the if-else like this:



 while ((d = getc(in2)) != EOF) 
putc(d, stdout);
if (d == 'n')
break;




A redundant test



while ((c = getc(in1)) != EOF)

/* ... */


/* c can only be EOF here */
if (c == EOF)

/* ... */



Consider some edge cases



You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.



Alternative logic



I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:



int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;


while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;



if (d == EOF)
break;



while ((c = getc(in1)) != EOF)
putc(c, stdout);

while ((d = getc(in2)) != EOF)
putc(d, stdout);



I've kept both c and d variables for this, although we only really need one for both files.



Error checking of output



None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.



Error checking of input



We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().



Final message



printf("nProcess ended successfully.n");


This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.






share|improve this answer























  • Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value of putc() and getc()? And why would using perror() make the program more specific? Does the fact that the FILE * points to NULL mean that the permission to access the file was denied?
    – Areg Sarvazyan
    Mar 26 at 16:08







  • 1




    Error checking - if (putc(c, stdout) == EOF) then something went wrong, and c was not written. perror() uses the contents of errno (which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
    – Toby Speight
    Mar 26 at 16:19










  • But wouldn't (putc(c, stdout) == EOF) be redundant when the character I am scanning with getc() is always being evaluated agains EOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
    – Areg Sarvazyan
    Mar 27 at 0:24










  • Read the man page for putc() again - it will return c when it succeeds, but EOF when it fails (and the reason will be noted in errno).
    – Toby Speight
    Mar 27 at 7:05






  • 1




    "we have an error of reading, and we should check that using errno." --> errno is not specified to be set. Use feof()/ferror() to distinguish end-of-file from error.
    – chux
    Mar 27 at 12:46

















up vote
2
down vote













Use functions to avoid repeating code



You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:



int copyline(FILE *in, FILE *out) 
int c;

// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);


// This will return either 'n' or EOF
return c;



Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:



// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);


// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);



Try to read whole lines instead of characters at a time



Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.



If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.



Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.



When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.



Close stdout and check for errors



You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.



If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:



mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt


By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.



Try to make your program as generic as possible



Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.



You might also want to allow specifying only one filename on the command line, or even none at all!






share|improve this answer



















  • 2




    Er, your first copyline() copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
    – Toby Speight
    Mar 27 at 7:09






  • 1




    Bug: copyline(FILE *in, FILE *out) only returns EOF. Change to while ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
    – chux
    Mar 27 at 12:33







  • 1




    copyline() (via fgets()) would make more sense to loop if the line was full and no 'n' found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
    – chux
    Mar 27 at 12:40







  • 1




    @G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, then line will get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall that fgets() stops reading when it encounters a 'n'. If it reads '', it saves that null character and continues like any other non-'n'. Since strlen(line) then returns 0, line[len - 1] is UB.
    – chux
    Mar 27 at 19:35







  • 1




    @G.Sliepen line[strcspn(line, "n")] = ''; is a one-line alternative to lop off the potential 'n' without UB. fgets() and null character input is problematic though. My sample alternative
    – chux
    Mar 28 at 1:22










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f190468%2fmerging-two-files-lines-of-text-for-output%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










Some good things



This compiles cleanly with all my usual compiler warnings enabled - well done!



You've used stderr appropriately for the output (with the exception of the final message); that's good practice.



Don't dereference argv before you've checked its bounds



This is wrong:



 char *inName1 = argv[1], *inName2 = argv[2];

if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;



We can't read from argv[1] or argv[2] until after we know that argc is big enough:



 if (argc != 3)

fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;

char *inName1 = argv[1], *inName2 = argv[2];


I've enhanced your error message to use the actual name of the program, as passed in argv[0].



Report actual error type on failure



"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:



 FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;



I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.



A slight tweak to the logic



The loops look mostly correct; there's a small duplication we can avoid here:



 while ((d = getc(in2)) != EOF)

if (d != 'n')
putc(d, stdout);
else

putc('n', stdout);
break;




We can bring the putc() outside the if-else like this:



 while ((d = getc(in2)) != EOF) 
putc(d, stdout);
if (d == 'n')
break;




A redundant test



while ((c = getc(in1)) != EOF)

/* ... */


/* c can only be EOF here */
if (c == EOF)

/* ... */



Consider some edge cases



You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.



Alternative logic



I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:



int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;


while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;



if (d == EOF)
break;



while ((c = getc(in1)) != EOF)
putc(c, stdout);

while ((d = getc(in2)) != EOF)
putc(d, stdout);



I've kept both c and d variables for this, although we only really need one for both files.



Error checking of output



None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.



Error checking of input



We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().



Final message



printf("nProcess ended successfully.n");


This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.






share|improve this answer























  • Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value of putc() and getc()? And why would using perror() make the program more specific? Does the fact that the FILE * points to NULL mean that the permission to access the file was denied?
    – Areg Sarvazyan
    Mar 26 at 16:08







  • 1




    Error checking - if (putc(c, stdout) == EOF) then something went wrong, and c was not written. perror() uses the contents of errno (which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
    – Toby Speight
    Mar 26 at 16:19










  • But wouldn't (putc(c, stdout) == EOF) be redundant when the character I am scanning with getc() is always being evaluated agains EOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
    – Areg Sarvazyan
    Mar 27 at 0:24










  • Read the man page for putc() again - it will return c when it succeeds, but EOF when it fails (and the reason will be noted in errno).
    – Toby Speight
    Mar 27 at 7:05






  • 1




    "we have an error of reading, and we should check that using errno." --> errno is not specified to be set. Use feof()/ferror() to distinguish end-of-file from error.
    – chux
    Mar 27 at 12:46














up vote
3
down vote



accepted










Some good things



This compiles cleanly with all my usual compiler warnings enabled - well done!



You've used stderr appropriately for the output (with the exception of the final message); that's good practice.



Don't dereference argv before you've checked its bounds



This is wrong:



 char *inName1 = argv[1], *inName2 = argv[2];

if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;



We can't read from argv[1] or argv[2] until after we know that argc is big enough:



 if (argc != 3)

fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;

char *inName1 = argv[1], *inName2 = argv[2];


I've enhanced your error message to use the actual name of the program, as passed in argv[0].



Report actual error type on failure



"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:



 FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;



I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.



A slight tweak to the logic



The loops look mostly correct; there's a small duplication we can avoid here:



 while ((d = getc(in2)) != EOF)

if (d != 'n')
putc(d, stdout);
else

putc('n', stdout);
break;




We can bring the putc() outside the if-else like this:



 while ((d = getc(in2)) != EOF) 
putc(d, stdout);
if (d == 'n')
break;




A redundant test



while ((c = getc(in1)) != EOF)

/* ... */


/* c can only be EOF here */
if (c == EOF)

/* ... */



Consider some edge cases



You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.



Alternative logic



I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:



int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;


while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;



if (d == EOF)
break;



while ((c = getc(in1)) != EOF)
putc(c, stdout);

while ((d = getc(in2)) != EOF)
putc(d, stdout);



I've kept both c and d variables for this, although we only really need one for both files.



Error checking of output



None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.



Error checking of input



We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().



Final message



printf("nProcess ended successfully.n");


This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.






share|improve this answer























  • Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value of putc() and getc()? And why would using perror() make the program more specific? Does the fact that the FILE * points to NULL mean that the permission to access the file was denied?
    – Areg Sarvazyan
    Mar 26 at 16:08







  • 1




    Error checking - if (putc(c, stdout) == EOF) then something went wrong, and c was not written. perror() uses the contents of errno (which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
    – Toby Speight
    Mar 26 at 16:19










  • But wouldn't (putc(c, stdout) == EOF) be redundant when the character I am scanning with getc() is always being evaluated agains EOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
    – Areg Sarvazyan
    Mar 27 at 0:24










  • Read the man page for putc() again - it will return c when it succeeds, but EOF when it fails (and the reason will be noted in errno).
    – Toby Speight
    Mar 27 at 7:05






  • 1




    "we have an error of reading, and we should check that using errno." --> errno is not specified to be set. Use feof()/ferror() to distinguish end-of-file from error.
    – chux
    Mar 27 at 12:46












up vote
3
down vote



accepted







up vote
3
down vote



accepted






Some good things



This compiles cleanly with all my usual compiler warnings enabled - well done!



You've used stderr appropriately for the output (with the exception of the final message); that's good practice.



Don't dereference argv before you've checked its bounds



This is wrong:



 char *inName1 = argv[1], *inName2 = argv[2];

if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;



We can't read from argv[1] or argv[2] until after we know that argc is big enough:



 if (argc != 3)

fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;

char *inName1 = argv[1], *inName2 = argv[2];


I've enhanced your error message to use the actual name of the program, as passed in argv[0].



Report actual error type on failure



"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:



 FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;



I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.



A slight tweak to the logic



The loops look mostly correct; there's a small duplication we can avoid here:



 while ((d = getc(in2)) != EOF)

if (d != 'n')
putc(d, stdout);
else

putc('n', stdout);
break;




We can bring the putc() outside the if-else like this:



 while ((d = getc(in2)) != EOF) 
putc(d, stdout);
if (d == 'n')
break;




A redundant test



while ((c = getc(in1)) != EOF)

/* ... */


/* c can only be EOF here */
if (c == EOF)

/* ... */



Consider some edge cases



You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.



Alternative logic



I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:



int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;


while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;



if (d == EOF)
break;



while ((c = getc(in1)) != EOF)
putc(c, stdout);

while ((d = getc(in2)) != EOF)
putc(d, stdout);



I've kept both c and d variables for this, although we only really need one for both files.



Error checking of output



None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.



Error checking of input



We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().



Final message



printf("nProcess ended successfully.n");


This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.






share|improve this answer















Some good things



This compiles cleanly with all my usual compiler warnings enabled - well done!



You've used stderr appropriately for the output (with the exception of the final message); that's good practice.



Don't dereference argv before you've checked its bounds



This is wrong:



 char *inName1 = argv[1], *inName2 = argv[2];

if (argc != 3)
fprintf(stderr, "Usage: ./merge <file1> <file2>n");
return 1;



We can't read from argv[1] or argv[2] until after we know that argc is big enough:



 if (argc != 3)

fprintf(stderr, "Usage: %s <file1> <file2>n", argv[0]);
return 1;

char *inName1 = argv[1], *inName2 = argv[2];


I've enhanced your error message to use the actual name of the program, as passed in argv[0].



Report actual error type on failure



"Can't open " at least specifies which file couldn't be opened, but we can be more specific (file not found, permission denied, ...) using the perror() function:



 FILE *const in1 = fopen(inName1, "r");
if (!in1)
perror(inName1);
return 1;



I also changed this to return 1 from any error - there isn't any great benefit in distinguishing the different causes in the error code.



A slight tweak to the logic



The loops look mostly correct; there's a small duplication we can avoid here:



 while ((d = getc(in2)) != EOF)

if (d != 'n')
putc(d, stdout);
else

putc('n', stdout);
break;




We can bring the putc() outside the if-else like this:



 while ((d = getc(in2)) != EOF) 
putc(d, stdout);
if (d == 'n')
break;




A redundant test



while ((c = getc(in1)) != EOF)

/* ... */


/* c can only be EOF here */
if (c == EOF)

/* ... */



Consider some edge cases



You've taken good care to ensure that when one file ends, the rest of the other is output. Have you thought about when to output a newline, if the ending file doesn't itself end with one. It's not clear what you intend to happen in that case - comments should explain the desired output.



Alternative logic



I prefer a less deeply nested set of loops, where the first loop is the "normal" case where we're reading from both files, and we leave that loop when either file is exhausted, and finish copying from the other:



int c, d;
// Take care of output
while ((c = getc(in1)) != EOF)
if (c != 'n')
putc(c, stdout);
continue;


while ((d = getc(in2)) != EOF)
putc(d, stdout);
if (d == 'n')
break;



if (d == EOF)
break;



while ((c = getc(in1)) != EOF)
putc(c, stdout);

while ((d = getc(in2)) != EOF)
putc(d, stdout);



I've kept both c and d variables for this, although we only really need one for both files.



Error checking of output



None of the above code ever checks the return value of putc(). Filesystems do fill up, and users do disconnect streams and worse, so we really ought to ensure that the writes succeed.



Error checking of input



We currently assume that EOF returned from getc() means that we've reached the end of the file. However, it could mean that we have an error of reading, and we should distinguish those cases using feof() or ferror().



Final message



printf("nProcess ended successfully.n");


This isn't part of the output (and you don't want it to be passed along in a pipeline). Send status messages to stderr - or better still, be silent whilst successful, like the standard utilities.







share|improve this answer















share|improve this answer



share|improve this answer








edited Apr 2 at 10:21


























answered Mar 26 at 11:18









Toby Speight

17.6k13490




17.6k13490











  • Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value of putc() and getc()? And why would using perror() make the program more specific? Does the fact that the FILE * points to NULL mean that the permission to access the file was denied?
    – Areg Sarvazyan
    Mar 26 at 16:08







  • 1




    Error checking - if (putc(c, stdout) == EOF) then something went wrong, and c was not written. perror() uses the contents of errno (which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
    – Toby Speight
    Mar 26 at 16:19










  • But wouldn't (putc(c, stdout) == EOF) be redundant when the character I am scanning with getc() is always being evaluated agains EOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
    – Areg Sarvazyan
    Mar 27 at 0:24










  • Read the man page for putc() again - it will return c when it succeeds, but EOF when it fails (and the reason will be noted in errno).
    – Toby Speight
    Mar 27 at 7:05






  • 1




    "we have an error of reading, and we should check that using errno." --> errno is not specified to be set. Use feof()/ferror() to distinguish end-of-file from error.
    – chux
    Mar 27 at 12:46
















  • Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value of putc() and getc()? And why would using perror() make the program more specific? Does the fact that the FILE * points to NULL mean that the permission to access the file was denied?
    – Areg Sarvazyan
    Mar 26 at 16:08







  • 1




    Error checking - if (putc(c, stdout) == EOF) then something went wrong, and c was not written. perror() uses the contents of errno (which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
    – Toby Speight
    Mar 26 at 16:19










  • But wouldn't (putc(c, stdout) == EOF) be redundant when the character I am scanning with getc() is always being evaluated agains EOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
    – Areg Sarvazyan
    Mar 27 at 0:24










  • Read the man page for putc() again - it will return c when it succeeds, but EOF when it fails (and the reason will be noted in errno).
    – Toby Speight
    Mar 27 at 7:05






  • 1




    "we have an error of reading, and we should check that using errno." --> errno is not specified to be set. Use feof()/ferror() to distinguish end-of-file from error.
    – chux
    Mar 27 at 12:46















Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value of putc() and getc()? And why would using perror() make the program more specific? Does the fact that the FILE * points to NULL mean that the permission to access the file was denied?
– Areg Sarvazyan
Mar 26 at 16:08





Thanks a lot for this! I, however, have some doubts. I haven't learned much about error checking in I/O, so how would I go about checking the return value of putc() and getc()? And why would using perror() make the program more specific? Does the fact that the FILE * points to NULL mean that the permission to access the file was denied?
– Areg Sarvazyan
Mar 26 at 16:08





1




1




Error checking - if (putc(c, stdout) == EOF) then something went wrong, and c was not written. perror() uses the contents of errno (which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
– Toby Speight
Mar 26 at 16:19




Error checking - if (putc(c, stdout) == EOF) then something went wrong, and c was not written. perror() uses the contents of errno (which is written by the failing function) to produce an error message (in the user's preferred language), such as "permission denied", or "file not found". Read the man pages for all three of those functions, and you'll learn how to spot these details!
– Toby Speight
Mar 26 at 16:19












But wouldn't (putc(c, stdout) == EOF) be redundant when the character I am scanning with getc() is always being evaluated agains EOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
– Areg Sarvazyan
Mar 27 at 0:24




But wouldn't (putc(c, stdout) == EOF) be redundant when the character I am scanning with getc() is always being evaluated agains EOF? I always end up outputting the character I read, don't I? And yes, I'll be reading the man pages for those! Thanks.
– Areg Sarvazyan
Mar 27 at 0:24












Read the man page for putc() again - it will return c when it succeeds, but EOF when it fails (and the reason will be noted in errno).
– Toby Speight
Mar 27 at 7:05




Read the man page for putc() again - it will return c when it succeeds, but EOF when it fails (and the reason will be noted in errno).
– Toby Speight
Mar 27 at 7:05




1




1




"we have an error of reading, and we should check that using errno." --> errno is not specified to be set. Use feof()/ferror() to distinguish end-of-file from error.
– chux
Mar 27 at 12:46




"we have an error of reading, and we should check that using errno." --> errno is not specified to be set. Use feof()/ferror() to distinguish end-of-file from error.
– chux
Mar 27 at 12:46












up vote
2
down vote













Use functions to avoid repeating code



You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:



int copyline(FILE *in, FILE *out) 
int c;

// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);


// This will return either 'n' or EOF
return c;



Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:



// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);


// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);



Try to read whole lines instead of characters at a time



Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.



If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.



Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.



When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.



Close stdout and check for errors



You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.



If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:



mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt


By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.



Try to make your program as generic as possible



Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.



You might also want to allow specifying only one filename on the command line, or even none at all!






share|improve this answer



















  • 2




    Er, your first copyline() copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
    – Toby Speight
    Mar 27 at 7:09






  • 1




    Bug: copyline(FILE *in, FILE *out) only returns EOF. Change to while ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
    – chux
    Mar 27 at 12:33







  • 1




    copyline() (via fgets()) would make more sense to loop if the line was full and no 'n' found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
    – chux
    Mar 27 at 12:40







  • 1




    @G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, then line will get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall that fgets() stops reading when it encounters a 'n'. If it reads '', it saves that null character and continues like any other non-'n'. Since strlen(line) then returns 0, line[len - 1] is UB.
    – chux
    Mar 27 at 19:35







  • 1




    @G.Sliepen line[strcspn(line, "n")] = ''; is a one-line alternative to lop off the potential 'n' without UB. fgets() and null character input is problematic though. My sample alternative
    – chux
    Mar 28 at 1:22














up vote
2
down vote













Use functions to avoid repeating code



You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:



int copyline(FILE *in, FILE *out) 
int c;

// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);


// This will return either 'n' or EOF
return c;



Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:



// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);


// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);



Try to read whole lines instead of characters at a time



Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.



If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.



Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.



When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.



Close stdout and check for errors



You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.



If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:



mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt


By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.



Try to make your program as generic as possible



Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.



You might also want to allow specifying only one filename on the command line, or even none at all!






share|improve this answer



















  • 2




    Er, your first copyline() copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
    – Toby Speight
    Mar 27 at 7:09






  • 1




    Bug: copyline(FILE *in, FILE *out) only returns EOF. Change to while ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
    – chux
    Mar 27 at 12:33







  • 1




    copyline() (via fgets()) would make more sense to loop if the line was full and no 'n' found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
    – chux
    Mar 27 at 12:40







  • 1




    @G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, then line will get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall that fgets() stops reading when it encounters a 'n'. If it reads '', it saves that null character and continues like any other non-'n'. Since strlen(line) then returns 0, line[len - 1] is UB.
    – chux
    Mar 27 at 19:35







  • 1




    @G.Sliepen line[strcspn(line, "n")] = ''; is a one-line alternative to lop off the potential 'n' without UB. fgets() and null character input is problematic though. My sample alternative
    – chux
    Mar 28 at 1:22












up vote
2
down vote










up vote
2
down vote









Use functions to avoid repeating code



You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:



int copyline(FILE *in, FILE *out) 
int c;

// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);


// This will return either 'n' or EOF
return c;



Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:



// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);


// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);



Try to read whole lines instead of characters at a time



Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.



If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.



Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.



When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.



Close stdout and check for errors



You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.



If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:



mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt


By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.



Try to make your program as generic as possible



Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.



You might also want to allow specifying only one filename on the command line, or even none at all!






share|improve this answer















Use functions to avoid repeating code



You have one big main() function that does everything. While this is a small program, it still is better to split off some logical pieces of code into their own functions. In this case, an operation that you repeat several times is copying one line of input from one input file to stdout. So write a function like this:



int copyline(FILE *in, FILE *out) 
int c;

// Copy all characters up to a 'n' or EOF
while ((c = getc(in)) != 'n' && c != EOF)
putc(c, out);


// This will return either 'n' or EOF
return c;



Note how this function returns the last result from getc().
In main, you can use it to simplify your code like this:



// Copy all lines from in1
while (copyline(in1, stdout) != EOF)
copyline(in2, stdout);
putc('n', stdout);


// Copy any remaining lines from in2
while (copyline(in2, stdout) != EOF)
putc('n', stdout);



Try to read whole lines instead of characters at a time



Now that you already have working code, you can think of optimizing it.
Your code reads and writes one character at a time. This might not be ideal.
Function calls like getc() and putc() are not free, and since they cannot be inlined, the compiler will have a hard time optimizing your loops.



If you can read and write whole lines at once, that would be better.
If you split of your line copying code into a function like copyline() above, then you only need to upgrade that function. One possibility is to use
the fgets() and fputs() functions that read and write whole lines at once.
Be aware though that fgets() can only read as many characters as the size of the buffer you supply (minus one), so you need to detect and handle longer lines somehow. Also, these functions assume you are working with C strings,
which use NUL bytes as terminators, and as such they will give you problems if
you want your program to be able to handle lines that contain NUL bytes.



Another option is to fread() the whole file into memory, or to use mmap() to map it into memory, and implement your own function to scan the memory for newline characters.



When optimizing code, make sure that you run benchmarks on the old and new implementations, never assume that a particular implementation is faster or slower than another one without having measured it.



Close stdout and check for errors



You should call fclose(stdout) as well at the end, and check for errors. File output is buffered, and the fclose() call will ensure any buffers are flushed. Errors can definitely happen; for example, the user might have redirected stdout to a file, and the filesystem did not have enough space to hold the output.



If you silently ignore these errors, data corruption might happen more easily. For example, if you want to automate a process where two intermediate output files are generated, but you are only interested in the merged output, and will delete the temporary files afterwards, you might have a shell script that contains this line:



mergelines temp1.txt temp2.txt >result.txt && rm temp*.txt


By returning an error, you will have prevented the loss of the original files.
Note, GNU Coreutils, the package that contains all the basic UNIX text processing utilities like cat, does exactly that.



Try to make your program as generic as possible



Your program can merge the lines from two files, but what if someone wants to merge the lines from three files? This sounds like a very logical extension, and if you made your program a little bit more generic, you can easily handle an arbitrary number of input files. Just make an array of FILE * pointers, as big as argc - 1, and for each filename on the command line, open that file. Then copy lines from all open files, until all of them return EOF.



You might also want to allow specifying only one filename on the command line, or even none at all!







share|improve this answer















share|improve this answer



share|improve this answer








edited Mar 27 at 17:49


























answered Mar 26 at 20:26









G. Sliepen

62638




62638







  • 2




    Er, your first copyline() copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
    – Toby Speight
    Mar 27 at 7:09






  • 1




    Bug: copyline(FILE *in, FILE *out) only returns EOF. Change to while ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
    – chux
    Mar 27 at 12:33







  • 1




    copyline() (via fgets()) would make more sense to loop if the line was full and no 'n' found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
    – chux
    Mar 27 at 12:40







  • 1




    @G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, then line will get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall that fgets() stops reading when it encounters a 'n'. If it reads '', it saves that null character and continues like any other non-'n'. Since strlen(line) then returns 0, line[len - 1] is UB.
    – chux
    Mar 27 at 19:35







  • 1




    @G.Sliepen line[strcspn(line, "n")] = ''; is a one-line alternative to lop off the potential 'n' without UB. fgets() and null character input is problematic though. My sample alternative
    – chux
    Mar 28 at 1:22












  • 2




    Er, your first copyline() copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
    – Toby Speight
    Mar 27 at 7:09






  • 1




    Bug: copyline(FILE *in, FILE *out) only returns EOF. Change to while ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
    – chux
    Mar 27 at 12:33







  • 1




    copyline() (via fgets()) would make more sense to loop if the line was full and no 'n' found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
    – chux
    Mar 27 at 12:40







  • 1




    @G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, then line will get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall that fgets() stops reading when it encounters a 'n'. If it reads '', it saves that null character and continues like any other non-'n'. Since strlen(line) then returns 0, line[len - 1] is UB.
    – chux
    Mar 27 at 19:35







  • 1




    @G.Sliepen line[strcspn(line, "n")] = ''; is a one-line alternative to lop off the potential 'n' without UB. fgets() and null character input is problematic though. My sample alternative
    – chux
    Mar 28 at 1:22







2




2




Er, your first copyline() copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
– Toby Speight
Mar 27 at 7:09




Er, your first copyline() copies more than just a line... And the second may not be any faster than the first (that's the whole point of stdio - buffering to make char-by-char processing as fast as reading chunks at a time). The other issue with that version is that it will treat any NUL character as line terminator (and drop the rest of the line).
– Toby Speight
Mar 27 at 7:09




1




1




Bug: copyline(FILE *in, FILE *out) only returns EOF. Change to while ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
– chux
Mar 27 at 12:33





Bug: copyline(FILE *in, FILE *out) only returns EOF. Change to while ((c = getc(in)) != 'n' && c != EOF) putc(c, out);
– chux
Mar 27 at 12:33





1




1




copyline() (via fgets()) would make more sense to loop if the line was full and no 'n' found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
– chux
Mar 27 at 12:40





copyline() (via fgets()) would make more sense to loop if the line was full and no 'n' found - yet I now see the "You will need to detect and handle the case of longer lines". Better to comment such code limitations in the code.
– chux
Mar 27 at 12:40





1




1




@G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, then line will get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall that fgets() stops reading when it encounters a 'n'. If it reads '', it saves that null character and continues like any other non-'n'. Since strlen(line) then returns 0, line[len - 1] is UB.
– chux
Mar 27 at 19:35





@G.Sliepen "because fgets() never returns an empty line." is not so empty concerning null character input. If user input is null character, line feed, then line will get written with the 2 characters: null character, line feed, and an appended null character. A lead null character is not as uncommon as one may think. Recall that fgets() stops reading when it encounters a 'n'. If it reads '', it saves that null character and continues like any other non-'n'. Since strlen(line) then returns 0, line[len - 1] is UB.
– chux
Mar 27 at 19:35





1




1




@G.Sliepen line[strcspn(line, "n")] = ''; is a one-line alternative to lop off the potential 'n' without UB. fgets() and null character input is problematic though. My sample alternative
– chux
Mar 28 at 1:22




@G.Sliepen line[strcspn(line, "n")] = ''; is a one-line alternative to lop off the potential 'n' without UB. fgets() and null character input is problematic though. My sample alternative
– chux
Mar 28 at 1:22












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f190468%2fmerging-two-files-lines-of-text-for-output%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods