Determining the similarity between two documents

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
0
down vote

favorite












I've made some code that reads in text files (which hold quite large vectors of word frequencies), which in turn stores each index of a vector within an ArrayList for their specified team (i.e. Arsenal vector to the Arsenal ArrayList, Chelsea vector the Chelsea ArrayList, etc.). It then uses a cosine similarity function to determine similarity between the two documents and writes it to a file.



What I would like is to make the code that reads in the text files (and storing them in their corresponding ArrayList more efficient), rather than me change the parameters of the while loop each time i need to use it.



public static Double cosineSimilarity(ArrayList<Integer> vectorOne, ArrayList<Integer> vectorTwo) 

Double dotProduct = 0.0;
Double normVecA = 0.0;
Double normVecB = 0.0;
for(int i = 0; i < vectorOne.size(); i++)
dotProduct += vectorOne.get(i) * vectorTwo.get(i);
normVecA += Math.pow(vectorOne.get(i), 2);
normVecB += Math.pow(vectorTwo.get(i), 2);


return dotProduct / (Math.sqrt(normVecA) * Math.sqrt(normVecB));



public static void main(String args) throws IOException

ArrayList<Integer> arsenal = new ArrayList<Integer>();
ArrayList<Integer> chelsea = new ArrayList<Integer>();
ArrayList<Integer> liverpool = new ArrayList<Integer>();
ArrayList<Integer> manchesterCity = new ArrayList<Integer>();
ArrayList<Integer> manchesterUnited = new ArrayList<Integer>();
ArrayList<Integer> tottenham = new ArrayList<Integer>();

Scanner textFile = new Scanner(new File("Enter textfile here"));

while (textFile.hasNext())
arsenal.add(textFile.nextInt());


Double output = cosineSimilarity(arsenal, chelsea);

File fileName;
FileWriter fw;

// Create a new textfile for listOfWords
fileName = new File("arsenalCosineSimilarities.txt");
fw = new FileWriter(fileName, true);

fw.write(String.valueOf("Chelsea: " + output + "n"));

fw.close();







share|improve this question

















  • 1




    In the current example code, chelsea never gets any values. Do you have an example text file and can made this code runnable and testable so that it's possible to try it out by ourselves?
    – Simon Forsberg♦
    Jun 24 at 15:56
















up vote
0
down vote

favorite












I've made some code that reads in text files (which hold quite large vectors of word frequencies), which in turn stores each index of a vector within an ArrayList for their specified team (i.e. Arsenal vector to the Arsenal ArrayList, Chelsea vector the Chelsea ArrayList, etc.). It then uses a cosine similarity function to determine similarity between the two documents and writes it to a file.



What I would like is to make the code that reads in the text files (and storing them in their corresponding ArrayList more efficient), rather than me change the parameters of the while loop each time i need to use it.



public static Double cosineSimilarity(ArrayList<Integer> vectorOne, ArrayList<Integer> vectorTwo) 

Double dotProduct = 0.0;
Double normVecA = 0.0;
Double normVecB = 0.0;
for(int i = 0; i < vectorOne.size(); i++)
dotProduct += vectorOne.get(i) * vectorTwo.get(i);
normVecA += Math.pow(vectorOne.get(i), 2);
normVecB += Math.pow(vectorTwo.get(i), 2);


return dotProduct / (Math.sqrt(normVecA) * Math.sqrt(normVecB));



public static void main(String args) throws IOException

ArrayList<Integer> arsenal = new ArrayList<Integer>();
ArrayList<Integer> chelsea = new ArrayList<Integer>();
ArrayList<Integer> liverpool = new ArrayList<Integer>();
ArrayList<Integer> manchesterCity = new ArrayList<Integer>();
ArrayList<Integer> manchesterUnited = new ArrayList<Integer>();
ArrayList<Integer> tottenham = new ArrayList<Integer>();

Scanner textFile = new Scanner(new File("Enter textfile here"));

while (textFile.hasNext())
arsenal.add(textFile.nextInt());


Double output = cosineSimilarity(arsenal, chelsea);

File fileName;
FileWriter fw;

// Create a new textfile for listOfWords
fileName = new File("arsenalCosineSimilarities.txt");
fw = new FileWriter(fileName, true);

fw.write(String.valueOf("Chelsea: " + output + "n"));

fw.close();







share|improve this question

















  • 1




    In the current example code, chelsea never gets any values. Do you have an example text file and can made this code runnable and testable so that it's possible to try it out by ourselves?
    – Simon Forsberg♦
    Jun 24 at 15:56












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I've made some code that reads in text files (which hold quite large vectors of word frequencies), which in turn stores each index of a vector within an ArrayList for their specified team (i.e. Arsenal vector to the Arsenal ArrayList, Chelsea vector the Chelsea ArrayList, etc.). It then uses a cosine similarity function to determine similarity between the two documents and writes it to a file.



What I would like is to make the code that reads in the text files (and storing them in their corresponding ArrayList more efficient), rather than me change the parameters of the while loop each time i need to use it.



public static Double cosineSimilarity(ArrayList<Integer> vectorOne, ArrayList<Integer> vectorTwo) 

Double dotProduct = 0.0;
Double normVecA = 0.0;
Double normVecB = 0.0;
for(int i = 0; i < vectorOne.size(); i++)
dotProduct += vectorOne.get(i) * vectorTwo.get(i);
normVecA += Math.pow(vectorOne.get(i), 2);
normVecB += Math.pow(vectorTwo.get(i), 2);


return dotProduct / (Math.sqrt(normVecA) * Math.sqrt(normVecB));



public static void main(String args) throws IOException

ArrayList<Integer> arsenal = new ArrayList<Integer>();
ArrayList<Integer> chelsea = new ArrayList<Integer>();
ArrayList<Integer> liverpool = new ArrayList<Integer>();
ArrayList<Integer> manchesterCity = new ArrayList<Integer>();
ArrayList<Integer> manchesterUnited = new ArrayList<Integer>();
ArrayList<Integer> tottenham = new ArrayList<Integer>();

Scanner textFile = new Scanner(new File("Enter textfile here"));

while (textFile.hasNext())
arsenal.add(textFile.nextInt());


Double output = cosineSimilarity(arsenal, chelsea);

File fileName;
FileWriter fw;

// Create a new textfile for listOfWords
fileName = new File("arsenalCosineSimilarities.txt");
fw = new FileWriter(fileName, true);

fw.write(String.valueOf("Chelsea: " + output + "n"));

fw.close();







share|improve this question













I've made some code that reads in text files (which hold quite large vectors of word frequencies), which in turn stores each index of a vector within an ArrayList for their specified team (i.e. Arsenal vector to the Arsenal ArrayList, Chelsea vector the Chelsea ArrayList, etc.). It then uses a cosine similarity function to determine similarity between the two documents and writes it to a file.



What I would like is to make the code that reads in the text files (and storing them in their corresponding ArrayList more efficient), rather than me change the parameters of the while loop each time i need to use it.



public static Double cosineSimilarity(ArrayList<Integer> vectorOne, ArrayList<Integer> vectorTwo) 

Double dotProduct = 0.0;
Double normVecA = 0.0;
Double normVecB = 0.0;
for(int i = 0; i < vectorOne.size(); i++)
dotProduct += vectorOne.get(i) * vectorTwo.get(i);
normVecA += Math.pow(vectorOne.get(i), 2);
normVecB += Math.pow(vectorTwo.get(i), 2);


return dotProduct / (Math.sqrt(normVecA) * Math.sqrt(normVecB));



public static void main(String args) throws IOException

ArrayList<Integer> arsenal = new ArrayList<Integer>();
ArrayList<Integer> chelsea = new ArrayList<Integer>();
ArrayList<Integer> liverpool = new ArrayList<Integer>();
ArrayList<Integer> manchesterCity = new ArrayList<Integer>();
ArrayList<Integer> manchesterUnited = new ArrayList<Integer>();
ArrayList<Integer> tottenham = new ArrayList<Integer>();

Scanner textFile = new Scanner(new File("Enter textfile here"));

while (textFile.hasNext())
arsenal.add(textFile.nextInt());


Double output = cosineSimilarity(arsenal, chelsea);

File fileName;
FileWriter fw;

// Create a new textfile for listOfWords
fileName = new File("arsenalCosineSimilarities.txt");
fw = new FileWriter(fileName, true);

fw.write(String.valueOf("Chelsea: " + output + "n"));

fw.close();









share|improve this question












share|improve this question




share|improve this question








edited Jun 24 at 16:38









200_success

123k14143399




123k14143399









asked Jun 24 at 15:39









FeelingLikeAJabroni

474




474







  • 1




    In the current example code, chelsea never gets any values. Do you have an example text file and can made this code runnable and testable so that it's possible to try it out by ourselves?
    – Simon Forsberg♦
    Jun 24 at 15:56












  • 1




    In the current example code, chelsea never gets any values. Do you have an example text file and can made this code runnable and testable so that it's possible to try it out by ourselves?
    – Simon Forsberg♦
    Jun 24 at 15:56







1




1




In the current example code, chelsea never gets any values. Do you have an example text file and can made this code runnable and testable so that it's possible to try it out by ourselves?
– Simon Forsberg♦
Jun 24 at 15:56




In the current example code, chelsea never gets any values. Do you have an example text file and can made this code runnable and testable so that it's possible to try it out by ourselves?
– Simon Forsberg♦
Jun 24 at 15:56










1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










For flexibility, your cosineSimilarity() method should taken in List<Integer> arguments, instead of ArrayList<Integer> arguments. This method doesn't care how the list is stored, only that it is a list which implements .size() and .get(i) methods.



For efficiency you should used double variables, not Double objects in the method:



public static double cosineSimilarity(List<Integer> vectorOne,
List<Integer> vectorTwo)
double dotProduct = 0.0;
double normVecA = 0.0;
double normVecB = 0.0;



When you open a Scanner, you should .close() it, to prevent resource leaks. The "try-with-resources" construct will automatically close resources that it opens. So instead of:



Scanner textFile = new Scanner(new File("Enter textfile here"));


use



try(Scanner textFile = new Scanner(new File("Enter textfile here")) 
// use textFile inside this block.

// textFile is automatically closed when the block is exited.


Ditto with FileWriter. Use "try-with-resources" to automatically close the writer.



try(FileWriter fw = new FileWriter(fileName, true)) 
fw.write( ... );

// fw has been automatically closed at this point.



If you are using .nextInt(), you should loop on .hasNextInt(), not .hasNext().



while(textFile.hasNextInt()) 
arsenal.add(textFile.nextInt());




It sounds like you want a Map<String,List<Integer>> to store an ArrayList for each team.



List<String> team_names = List.of("Arsenal", "Chelsea", "Liverpool",
"ManchesterCity", "ManchesterUnited", "Tottenham");

Map<String,List<Integer>> stats = new HashMap<>();

// Read in all team stats from (for example) "<TeamName>_stats.txt" files

for(String team_name : team_names)
List<Integer> team_stats = new ArrayList<>();
try (Scanner textFile = new Scanner(new File(team+"_stats.txt")))
while(textFile.hasNextInt())
team_stats.add(textFile.nextInt());


stats.put(team_name, team_stats);



Then you can use stats.get(team_name) to get each team's stats for comparison / analysis



for(String team1_name : team_names) 
List<Integer> team1_stats = stats.get(team1_name);

for(String team2_name: team_names)
// Skip comparing a team against itself
if (team1_name.equals(team2_name))
continue;

List<Integer> team2_stats = stats.get(team2_name);

double output = cosineSimilarity(team1_stats, team2_stats);

// ... display "output", or write to file, or ...




Depending on how you want your output written (one file for all comparisons, or one file per team for a comparison with all other teams, you'll want to open the FileWriter before the outer loop, or before the inner loop.






share|improve this answer

















  • 1




    True that it's better to use List instead of ArrayList. Important note though is that a LinkedList would be much slower here, as the .get(index) method is O(n) instead of O(1) as in ArrayList.
    – Simon Forsberg♦
    Jun 24 at 19:45






  • 1




    @SimonForsberg Yes, but at least it would work, instead of being a compiler error. Of course, the efficiency issue can be fixed by using an iterator, instead of .get(i).
    – AJNeufeld
    Jun 24 at 20:40






  • 2




    @AJNeufeld Yup, totally agreed. In fact, I would definitely recommend using an Iterator here.
    – Simon Forsberg♦
    Jun 24 at 20:55











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197164%2fdetermining-the-similarity-between-two-documents%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










For flexibility, your cosineSimilarity() method should taken in List<Integer> arguments, instead of ArrayList<Integer> arguments. This method doesn't care how the list is stored, only that it is a list which implements .size() and .get(i) methods.



For efficiency you should used double variables, not Double objects in the method:



public static double cosineSimilarity(List<Integer> vectorOne,
List<Integer> vectorTwo)
double dotProduct = 0.0;
double normVecA = 0.0;
double normVecB = 0.0;



When you open a Scanner, you should .close() it, to prevent resource leaks. The "try-with-resources" construct will automatically close resources that it opens. So instead of:



Scanner textFile = new Scanner(new File("Enter textfile here"));


use



try(Scanner textFile = new Scanner(new File("Enter textfile here")) 
// use textFile inside this block.

// textFile is automatically closed when the block is exited.


Ditto with FileWriter. Use "try-with-resources" to automatically close the writer.



try(FileWriter fw = new FileWriter(fileName, true)) 
fw.write( ... );

// fw has been automatically closed at this point.



If you are using .nextInt(), you should loop on .hasNextInt(), not .hasNext().



while(textFile.hasNextInt()) 
arsenal.add(textFile.nextInt());




It sounds like you want a Map<String,List<Integer>> to store an ArrayList for each team.



List<String> team_names = List.of("Arsenal", "Chelsea", "Liverpool",
"ManchesterCity", "ManchesterUnited", "Tottenham");

Map<String,List<Integer>> stats = new HashMap<>();

// Read in all team stats from (for example) "<TeamName>_stats.txt" files

for(String team_name : team_names)
List<Integer> team_stats = new ArrayList<>();
try (Scanner textFile = new Scanner(new File(team+"_stats.txt")))
while(textFile.hasNextInt())
team_stats.add(textFile.nextInt());


stats.put(team_name, team_stats);



Then you can use stats.get(team_name) to get each team's stats for comparison / analysis



for(String team1_name : team_names) 
List<Integer> team1_stats = stats.get(team1_name);

for(String team2_name: team_names)
// Skip comparing a team against itself
if (team1_name.equals(team2_name))
continue;

List<Integer> team2_stats = stats.get(team2_name);

double output = cosineSimilarity(team1_stats, team2_stats);

// ... display "output", or write to file, or ...




Depending on how you want your output written (one file for all comparisons, or one file per team for a comparison with all other teams, you'll want to open the FileWriter before the outer loop, or before the inner loop.






share|improve this answer

















  • 1




    True that it's better to use List instead of ArrayList. Important note though is that a LinkedList would be much slower here, as the .get(index) method is O(n) instead of O(1) as in ArrayList.
    – Simon Forsberg♦
    Jun 24 at 19:45






  • 1




    @SimonForsberg Yes, but at least it would work, instead of being a compiler error. Of course, the efficiency issue can be fixed by using an iterator, instead of .get(i).
    – AJNeufeld
    Jun 24 at 20:40






  • 2




    @AJNeufeld Yup, totally agreed. In fact, I would definitely recommend using an Iterator here.
    – Simon Forsberg♦
    Jun 24 at 20:55















up vote
2
down vote



accepted










For flexibility, your cosineSimilarity() method should taken in List<Integer> arguments, instead of ArrayList<Integer> arguments. This method doesn't care how the list is stored, only that it is a list which implements .size() and .get(i) methods.



For efficiency you should used double variables, not Double objects in the method:



public static double cosineSimilarity(List<Integer> vectorOne,
List<Integer> vectorTwo)
double dotProduct = 0.0;
double normVecA = 0.0;
double normVecB = 0.0;



When you open a Scanner, you should .close() it, to prevent resource leaks. The "try-with-resources" construct will automatically close resources that it opens. So instead of:



Scanner textFile = new Scanner(new File("Enter textfile here"));


use



try(Scanner textFile = new Scanner(new File("Enter textfile here")) 
// use textFile inside this block.

// textFile is automatically closed when the block is exited.


Ditto with FileWriter. Use "try-with-resources" to automatically close the writer.



try(FileWriter fw = new FileWriter(fileName, true)) 
fw.write( ... );

// fw has been automatically closed at this point.



If you are using .nextInt(), you should loop on .hasNextInt(), not .hasNext().



while(textFile.hasNextInt()) 
arsenal.add(textFile.nextInt());




It sounds like you want a Map<String,List<Integer>> to store an ArrayList for each team.



List<String> team_names = List.of("Arsenal", "Chelsea", "Liverpool",
"ManchesterCity", "ManchesterUnited", "Tottenham");

Map<String,List<Integer>> stats = new HashMap<>();

// Read in all team stats from (for example) "<TeamName>_stats.txt" files

for(String team_name : team_names)
List<Integer> team_stats = new ArrayList<>();
try (Scanner textFile = new Scanner(new File(team+"_stats.txt")))
while(textFile.hasNextInt())
team_stats.add(textFile.nextInt());


stats.put(team_name, team_stats);



Then you can use stats.get(team_name) to get each team's stats for comparison / analysis



for(String team1_name : team_names) 
List<Integer> team1_stats = stats.get(team1_name);

for(String team2_name: team_names)
// Skip comparing a team against itself
if (team1_name.equals(team2_name))
continue;

List<Integer> team2_stats = stats.get(team2_name);

double output = cosineSimilarity(team1_stats, team2_stats);

// ... display "output", or write to file, or ...




Depending on how you want your output written (one file for all comparisons, or one file per team for a comparison with all other teams, you'll want to open the FileWriter before the outer loop, or before the inner loop.






share|improve this answer

















  • 1




    True that it's better to use List instead of ArrayList. Important note though is that a LinkedList would be much slower here, as the .get(index) method is O(n) instead of O(1) as in ArrayList.
    – Simon Forsberg♦
    Jun 24 at 19:45






  • 1




    @SimonForsberg Yes, but at least it would work, instead of being a compiler error. Of course, the efficiency issue can be fixed by using an iterator, instead of .get(i).
    – AJNeufeld
    Jun 24 at 20:40






  • 2




    @AJNeufeld Yup, totally agreed. In fact, I would definitely recommend using an Iterator here.
    – Simon Forsberg♦
    Jun 24 at 20:55













up vote
2
down vote



accepted







up vote
2
down vote



accepted






For flexibility, your cosineSimilarity() method should taken in List<Integer> arguments, instead of ArrayList<Integer> arguments. This method doesn't care how the list is stored, only that it is a list which implements .size() and .get(i) methods.



For efficiency you should used double variables, not Double objects in the method:



public static double cosineSimilarity(List<Integer> vectorOne,
List<Integer> vectorTwo)
double dotProduct = 0.0;
double normVecA = 0.0;
double normVecB = 0.0;



When you open a Scanner, you should .close() it, to prevent resource leaks. The "try-with-resources" construct will automatically close resources that it opens. So instead of:



Scanner textFile = new Scanner(new File("Enter textfile here"));


use



try(Scanner textFile = new Scanner(new File("Enter textfile here")) 
// use textFile inside this block.

// textFile is automatically closed when the block is exited.


Ditto with FileWriter. Use "try-with-resources" to automatically close the writer.



try(FileWriter fw = new FileWriter(fileName, true)) 
fw.write( ... );

// fw has been automatically closed at this point.



If you are using .nextInt(), you should loop on .hasNextInt(), not .hasNext().



while(textFile.hasNextInt()) 
arsenal.add(textFile.nextInt());




It sounds like you want a Map<String,List<Integer>> to store an ArrayList for each team.



List<String> team_names = List.of("Arsenal", "Chelsea", "Liverpool",
"ManchesterCity", "ManchesterUnited", "Tottenham");

Map<String,List<Integer>> stats = new HashMap<>();

// Read in all team stats from (for example) "<TeamName>_stats.txt" files

for(String team_name : team_names)
List<Integer> team_stats = new ArrayList<>();
try (Scanner textFile = new Scanner(new File(team+"_stats.txt")))
while(textFile.hasNextInt())
team_stats.add(textFile.nextInt());


stats.put(team_name, team_stats);



Then you can use stats.get(team_name) to get each team's stats for comparison / analysis



for(String team1_name : team_names) 
List<Integer> team1_stats = stats.get(team1_name);

for(String team2_name: team_names)
// Skip comparing a team against itself
if (team1_name.equals(team2_name))
continue;

List<Integer> team2_stats = stats.get(team2_name);

double output = cosineSimilarity(team1_stats, team2_stats);

// ... display "output", or write to file, or ...




Depending on how you want your output written (one file for all comparisons, or one file per team for a comparison with all other teams, you'll want to open the FileWriter before the outer loop, or before the inner loop.






share|improve this answer













For flexibility, your cosineSimilarity() method should taken in List<Integer> arguments, instead of ArrayList<Integer> arguments. This method doesn't care how the list is stored, only that it is a list which implements .size() and .get(i) methods.



For efficiency you should used double variables, not Double objects in the method:



public static double cosineSimilarity(List<Integer> vectorOne,
List<Integer> vectorTwo)
double dotProduct = 0.0;
double normVecA = 0.0;
double normVecB = 0.0;



When you open a Scanner, you should .close() it, to prevent resource leaks. The "try-with-resources" construct will automatically close resources that it opens. So instead of:



Scanner textFile = new Scanner(new File("Enter textfile here"));


use



try(Scanner textFile = new Scanner(new File("Enter textfile here")) 
// use textFile inside this block.

// textFile is automatically closed when the block is exited.


Ditto with FileWriter. Use "try-with-resources" to automatically close the writer.



try(FileWriter fw = new FileWriter(fileName, true)) 
fw.write( ... );

// fw has been automatically closed at this point.



If you are using .nextInt(), you should loop on .hasNextInt(), not .hasNext().



while(textFile.hasNextInt()) 
arsenal.add(textFile.nextInt());




It sounds like you want a Map<String,List<Integer>> to store an ArrayList for each team.



List<String> team_names = List.of("Arsenal", "Chelsea", "Liverpool",
"ManchesterCity", "ManchesterUnited", "Tottenham");

Map<String,List<Integer>> stats = new HashMap<>();

// Read in all team stats from (for example) "<TeamName>_stats.txt" files

for(String team_name : team_names)
List<Integer> team_stats = new ArrayList<>();
try (Scanner textFile = new Scanner(new File(team+"_stats.txt")))
while(textFile.hasNextInt())
team_stats.add(textFile.nextInt());


stats.put(team_name, team_stats);



Then you can use stats.get(team_name) to get each team's stats for comparison / analysis



for(String team1_name : team_names) 
List<Integer> team1_stats = stats.get(team1_name);

for(String team2_name: team_names)
// Skip comparing a team against itself
if (team1_name.equals(team2_name))
continue;

List<Integer> team2_stats = stats.get(team2_name);

double output = cosineSimilarity(team1_stats, team2_stats);

// ... display "output", or write to file, or ...




Depending on how you want your output written (one file for all comparisons, or one file per team for a comparison with all other teams, you'll want to open the FileWriter before the outer loop, or before the inner loop.







share|improve this answer













share|improve this answer



share|improve this answer











answered Jun 24 at 17:55









AJNeufeld

1,378312




1,378312







  • 1




    True that it's better to use List instead of ArrayList. Important note though is that a LinkedList would be much slower here, as the .get(index) method is O(n) instead of O(1) as in ArrayList.
    – Simon Forsberg♦
    Jun 24 at 19:45






  • 1




    @SimonForsberg Yes, but at least it would work, instead of being a compiler error. Of course, the efficiency issue can be fixed by using an iterator, instead of .get(i).
    – AJNeufeld
    Jun 24 at 20:40






  • 2




    @AJNeufeld Yup, totally agreed. In fact, I would definitely recommend using an Iterator here.
    – Simon Forsberg♦
    Jun 24 at 20:55













  • 1




    True that it's better to use List instead of ArrayList. Important note though is that a LinkedList would be much slower here, as the .get(index) method is O(n) instead of O(1) as in ArrayList.
    – Simon Forsberg♦
    Jun 24 at 19:45






  • 1




    @SimonForsberg Yes, but at least it would work, instead of being a compiler error. Of course, the efficiency issue can be fixed by using an iterator, instead of .get(i).
    – AJNeufeld
    Jun 24 at 20:40






  • 2




    @AJNeufeld Yup, totally agreed. In fact, I would definitely recommend using an Iterator here.
    – Simon Forsberg♦
    Jun 24 at 20:55








1




1




True that it's better to use List instead of ArrayList. Important note though is that a LinkedList would be much slower here, as the .get(index) method is O(n) instead of O(1) as in ArrayList.
– Simon Forsberg♦
Jun 24 at 19:45




True that it's better to use List instead of ArrayList. Important note though is that a LinkedList would be much slower here, as the .get(index) method is O(n) instead of O(1) as in ArrayList.
– Simon Forsberg♦
Jun 24 at 19:45




1




1




@SimonForsberg Yes, but at least it would work, instead of being a compiler error. Of course, the efficiency issue can be fixed by using an iterator, instead of .get(i).
– AJNeufeld
Jun 24 at 20:40




@SimonForsberg Yes, but at least it would work, instead of being a compiler error. Of course, the efficiency issue can be fixed by using an iterator, instead of .get(i).
– AJNeufeld
Jun 24 at 20:40




2




2




@AJNeufeld Yup, totally agreed. In fact, I would definitely recommend using an Iterator here.
– Simon Forsberg♦
Jun 24 at 20:55





@AJNeufeld Yup, totally agreed. In fact, I would definitely recommend using an Iterator here.
– Simon Forsberg♦
Jun 24 at 20:55













 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197164%2fdetermining-the-similarity-between-two-documents%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Chat program with C++ and SFML

Function to Return a JSON Like Objects Using VBA Collections and Arrays

Will my employers contract hold up in court?