Title capitalization of strings in Java
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
4
down vote
favorite
I needed to port some Coldfusion code to Java that does Title Capitalization on Strings and came up with this solution. I would love it if anyone could give me some pointers as to how to make this more efficient.
import org.apache.commons.text.WordUtils;
import java.util.HashSet;
import java.util.Set;
public class CapFirstTitle
public static String CapFirstTitle()
String inputText = " from a String to test Title Capitalization CASE hyphenated-word word from IBM od id XX if ";
Set<String> whiteList = new HashSet<String>();
whiteList.add("a");
whiteList.add("above");
whiteList.add("after");
whiteList.add("ain't");
whiteList.add("among");
whiteList.add("an");
whiteList.add("and");
whiteList.add("as");
whiteList.add("at");
whiteList.add("below");
whiteList.add("but");
whiteList.add("by");
whiteList.add("can't");
whiteList.add("don't");
whiteList.add("for");
whiteList.add("from");
whiteList.add("if");
whiteList.add("in");
whiteList.add("into");
whiteList.add("it's");
whiteList.add("nor");
whiteList.add("of");
whiteList.add("off");
whiteList.add("on");
whiteList.add("onto");
whiteList.add("or");
whiteList.add("over");
whiteList.add("since");
whiteList.add("the");
whiteList.add("to");
whiteList.add("under");
whiteList.add("until");
whiteList.add("up");
whiteList.add("with");
whiteList.add("won't");
Set<String> alwaysCapitalize = new HashSet<String>();
alwaysCapitalize.add("II");
alwaysCapitalize.add("III");
alwaysCapitalize.add("IV");
alwaysCapitalize.add("V");
alwaysCapitalize.add("VI");
alwaysCapitalize.add("VII");
alwaysCapitalize.add("VIII");
alwaysCapitalize.add("IX");
alwaysCapitalize.add("X");
alwaysCapitalize.add("XI");
alwaysCapitalize.add("XII");
alwaysCapitalize.add("XIII");
alwaysCapitalize.add("XIV");
alwaysCapitalize.add("XV");
alwaysCapitalize.add("XVI");
alwaysCapitalize.add("XVII");
alwaysCapitalize.add("XVIII");
alwaysCapitalize.add("XIX");
alwaysCapitalize.add("XX");
alwaysCapitalize.add("XXI");
alwaysCapitalize.add("OD");
alwaysCapitalize.add("ID");
alwaysCapitalize.add("PH");
alwaysCapitalize.add("XH");
alwaysCapitalize.add("UV");
alwaysCapitalize.add("DOM");
alwaysCapitalize.add("GS");
alwaysCapitalize.add("ii");
alwaysCapitalize.add("iii");
alwaysCapitalize.add("iv");
alwaysCapitalize.add("v");
alwaysCapitalize.add("vi");
alwaysCapitalize.add("vii");
alwaysCapitalize.add("viii");
alwaysCapitalize.add("ix");
alwaysCapitalize.add("x");
alwaysCapitalize.add("xi");
alwaysCapitalize.add("xii");
alwaysCapitalize.add("xiii");
alwaysCapitalize.add("xiv");
alwaysCapitalize.add("xv");
alwaysCapitalize.add("xvi");
alwaysCapitalize.add("xvii");
alwaysCapitalize.add("xviii");
alwaysCapitalize.add("xix");
alwaysCapitalize.add("xx");
alwaysCapitalize.add("xxi");
alwaysCapitalize.add("od");
alwaysCapitalize.add("id");
alwaysCapitalize.add("ph");
alwaysCapitalize.add("xh");
alwaysCapitalize.add("uv");
alwaysCapitalize.add("dom");
alwaysCapitalize.add("gs");
StringBuilder capitalizedString = new StringBuilder();
if (inputText.contains(" "))
inputText = inputText.toLowerCase().trim();
String parts = inputText.split(" ");
for (int i = 0; i < parts.length; i++) i == parts.length - 1)
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
else
if (!whiteList.contains(parts[i]))
if (!alwaysCapitalize.contains(parts[i]))
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
else
capitalizedString.append(parts[i].toUpperCase()).append(" ");
else
capitalizedString.append(parts[i]).append(" ");
return capitalizedString.toString();
java performance strings
add a comment |Â
up vote
4
down vote
favorite
I needed to port some Coldfusion code to Java that does Title Capitalization on Strings and came up with this solution. I would love it if anyone could give me some pointers as to how to make this more efficient.
import org.apache.commons.text.WordUtils;
import java.util.HashSet;
import java.util.Set;
public class CapFirstTitle
public static String CapFirstTitle()
String inputText = " from a String to test Title Capitalization CASE hyphenated-word word from IBM od id XX if ";
Set<String> whiteList = new HashSet<String>();
whiteList.add("a");
whiteList.add("above");
whiteList.add("after");
whiteList.add("ain't");
whiteList.add("among");
whiteList.add("an");
whiteList.add("and");
whiteList.add("as");
whiteList.add("at");
whiteList.add("below");
whiteList.add("but");
whiteList.add("by");
whiteList.add("can't");
whiteList.add("don't");
whiteList.add("for");
whiteList.add("from");
whiteList.add("if");
whiteList.add("in");
whiteList.add("into");
whiteList.add("it's");
whiteList.add("nor");
whiteList.add("of");
whiteList.add("off");
whiteList.add("on");
whiteList.add("onto");
whiteList.add("or");
whiteList.add("over");
whiteList.add("since");
whiteList.add("the");
whiteList.add("to");
whiteList.add("under");
whiteList.add("until");
whiteList.add("up");
whiteList.add("with");
whiteList.add("won't");
Set<String> alwaysCapitalize = new HashSet<String>();
alwaysCapitalize.add("II");
alwaysCapitalize.add("III");
alwaysCapitalize.add("IV");
alwaysCapitalize.add("V");
alwaysCapitalize.add("VI");
alwaysCapitalize.add("VII");
alwaysCapitalize.add("VIII");
alwaysCapitalize.add("IX");
alwaysCapitalize.add("X");
alwaysCapitalize.add("XI");
alwaysCapitalize.add("XII");
alwaysCapitalize.add("XIII");
alwaysCapitalize.add("XIV");
alwaysCapitalize.add("XV");
alwaysCapitalize.add("XVI");
alwaysCapitalize.add("XVII");
alwaysCapitalize.add("XVIII");
alwaysCapitalize.add("XIX");
alwaysCapitalize.add("XX");
alwaysCapitalize.add("XXI");
alwaysCapitalize.add("OD");
alwaysCapitalize.add("ID");
alwaysCapitalize.add("PH");
alwaysCapitalize.add("XH");
alwaysCapitalize.add("UV");
alwaysCapitalize.add("DOM");
alwaysCapitalize.add("GS");
alwaysCapitalize.add("ii");
alwaysCapitalize.add("iii");
alwaysCapitalize.add("iv");
alwaysCapitalize.add("v");
alwaysCapitalize.add("vi");
alwaysCapitalize.add("vii");
alwaysCapitalize.add("viii");
alwaysCapitalize.add("ix");
alwaysCapitalize.add("x");
alwaysCapitalize.add("xi");
alwaysCapitalize.add("xii");
alwaysCapitalize.add("xiii");
alwaysCapitalize.add("xiv");
alwaysCapitalize.add("xv");
alwaysCapitalize.add("xvi");
alwaysCapitalize.add("xvii");
alwaysCapitalize.add("xviii");
alwaysCapitalize.add("xix");
alwaysCapitalize.add("xx");
alwaysCapitalize.add("xxi");
alwaysCapitalize.add("od");
alwaysCapitalize.add("id");
alwaysCapitalize.add("ph");
alwaysCapitalize.add("xh");
alwaysCapitalize.add("uv");
alwaysCapitalize.add("dom");
alwaysCapitalize.add("gs");
StringBuilder capitalizedString = new StringBuilder();
if (inputText.contains(" "))
inputText = inputText.toLowerCase().trim();
String parts = inputText.split(" ");
for (int i = 0; i < parts.length; i++) i == parts.length - 1)
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
else
if (!whiteList.contains(parts[i]))
if (!alwaysCapitalize.contains(parts[i]))
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
else
capitalizedString.append(parts[i].toUpperCase()).append(" ");
else
capitalizedString.append(parts[i]).append(" ");
return capitalizedString.toString();
java performance strings
1
What precisely should the code do? Because I wonder for example why you convert the input text completely to lower case as a first step.
â Koekje
May 1 at 22:52
It is supposed to clean up strings that may be in the form of "This IS a SAmPle STRING" first. I didn't check if Apache Commons WordUtils already handles that particular case. Then it checks if we are on the first word or the last word of the string and if so don't check against the whitelist of words that should stay lowercase. Otherwise check against the whitelist and capitalize all words not on the whitelist, then go on to check against words that need to stay capitalized fully.
â ozfive
May 1 at 23:04
Looks like it doesn't
â ozfive
May 1 at 23:09
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I needed to port some Coldfusion code to Java that does Title Capitalization on Strings and came up with this solution. I would love it if anyone could give me some pointers as to how to make this more efficient.
import org.apache.commons.text.WordUtils;
import java.util.HashSet;
import java.util.Set;
public class CapFirstTitle
public static String CapFirstTitle()
String inputText = " from a String to test Title Capitalization CASE hyphenated-word word from IBM od id XX if ";
Set<String> whiteList = new HashSet<String>();
whiteList.add("a");
whiteList.add("above");
whiteList.add("after");
whiteList.add("ain't");
whiteList.add("among");
whiteList.add("an");
whiteList.add("and");
whiteList.add("as");
whiteList.add("at");
whiteList.add("below");
whiteList.add("but");
whiteList.add("by");
whiteList.add("can't");
whiteList.add("don't");
whiteList.add("for");
whiteList.add("from");
whiteList.add("if");
whiteList.add("in");
whiteList.add("into");
whiteList.add("it's");
whiteList.add("nor");
whiteList.add("of");
whiteList.add("off");
whiteList.add("on");
whiteList.add("onto");
whiteList.add("or");
whiteList.add("over");
whiteList.add("since");
whiteList.add("the");
whiteList.add("to");
whiteList.add("under");
whiteList.add("until");
whiteList.add("up");
whiteList.add("with");
whiteList.add("won't");
Set<String> alwaysCapitalize = new HashSet<String>();
alwaysCapitalize.add("II");
alwaysCapitalize.add("III");
alwaysCapitalize.add("IV");
alwaysCapitalize.add("V");
alwaysCapitalize.add("VI");
alwaysCapitalize.add("VII");
alwaysCapitalize.add("VIII");
alwaysCapitalize.add("IX");
alwaysCapitalize.add("X");
alwaysCapitalize.add("XI");
alwaysCapitalize.add("XII");
alwaysCapitalize.add("XIII");
alwaysCapitalize.add("XIV");
alwaysCapitalize.add("XV");
alwaysCapitalize.add("XVI");
alwaysCapitalize.add("XVII");
alwaysCapitalize.add("XVIII");
alwaysCapitalize.add("XIX");
alwaysCapitalize.add("XX");
alwaysCapitalize.add("XXI");
alwaysCapitalize.add("OD");
alwaysCapitalize.add("ID");
alwaysCapitalize.add("PH");
alwaysCapitalize.add("XH");
alwaysCapitalize.add("UV");
alwaysCapitalize.add("DOM");
alwaysCapitalize.add("GS");
alwaysCapitalize.add("ii");
alwaysCapitalize.add("iii");
alwaysCapitalize.add("iv");
alwaysCapitalize.add("v");
alwaysCapitalize.add("vi");
alwaysCapitalize.add("vii");
alwaysCapitalize.add("viii");
alwaysCapitalize.add("ix");
alwaysCapitalize.add("x");
alwaysCapitalize.add("xi");
alwaysCapitalize.add("xii");
alwaysCapitalize.add("xiii");
alwaysCapitalize.add("xiv");
alwaysCapitalize.add("xv");
alwaysCapitalize.add("xvi");
alwaysCapitalize.add("xvii");
alwaysCapitalize.add("xviii");
alwaysCapitalize.add("xix");
alwaysCapitalize.add("xx");
alwaysCapitalize.add("xxi");
alwaysCapitalize.add("od");
alwaysCapitalize.add("id");
alwaysCapitalize.add("ph");
alwaysCapitalize.add("xh");
alwaysCapitalize.add("uv");
alwaysCapitalize.add("dom");
alwaysCapitalize.add("gs");
StringBuilder capitalizedString = new StringBuilder();
if (inputText.contains(" "))
inputText = inputText.toLowerCase().trim();
String parts = inputText.split(" ");
for (int i = 0; i < parts.length; i++) i == parts.length - 1)
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
else
if (!whiteList.contains(parts[i]))
if (!alwaysCapitalize.contains(parts[i]))
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
else
capitalizedString.append(parts[i].toUpperCase()).append(" ");
else
capitalizedString.append(parts[i]).append(" ");
return capitalizedString.toString();
java performance strings
I needed to port some Coldfusion code to Java that does Title Capitalization on Strings and came up with this solution. I would love it if anyone could give me some pointers as to how to make this more efficient.
import org.apache.commons.text.WordUtils;
import java.util.HashSet;
import java.util.Set;
public class CapFirstTitle
public static String CapFirstTitle()
String inputText = " from a String to test Title Capitalization CASE hyphenated-word word from IBM od id XX if ";
Set<String> whiteList = new HashSet<String>();
whiteList.add("a");
whiteList.add("above");
whiteList.add("after");
whiteList.add("ain't");
whiteList.add("among");
whiteList.add("an");
whiteList.add("and");
whiteList.add("as");
whiteList.add("at");
whiteList.add("below");
whiteList.add("but");
whiteList.add("by");
whiteList.add("can't");
whiteList.add("don't");
whiteList.add("for");
whiteList.add("from");
whiteList.add("if");
whiteList.add("in");
whiteList.add("into");
whiteList.add("it's");
whiteList.add("nor");
whiteList.add("of");
whiteList.add("off");
whiteList.add("on");
whiteList.add("onto");
whiteList.add("or");
whiteList.add("over");
whiteList.add("since");
whiteList.add("the");
whiteList.add("to");
whiteList.add("under");
whiteList.add("until");
whiteList.add("up");
whiteList.add("with");
whiteList.add("won't");
Set<String> alwaysCapitalize = new HashSet<String>();
alwaysCapitalize.add("II");
alwaysCapitalize.add("III");
alwaysCapitalize.add("IV");
alwaysCapitalize.add("V");
alwaysCapitalize.add("VI");
alwaysCapitalize.add("VII");
alwaysCapitalize.add("VIII");
alwaysCapitalize.add("IX");
alwaysCapitalize.add("X");
alwaysCapitalize.add("XI");
alwaysCapitalize.add("XII");
alwaysCapitalize.add("XIII");
alwaysCapitalize.add("XIV");
alwaysCapitalize.add("XV");
alwaysCapitalize.add("XVI");
alwaysCapitalize.add("XVII");
alwaysCapitalize.add("XVIII");
alwaysCapitalize.add("XIX");
alwaysCapitalize.add("XX");
alwaysCapitalize.add("XXI");
alwaysCapitalize.add("OD");
alwaysCapitalize.add("ID");
alwaysCapitalize.add("PH");
alwaysCapitalize.add("XH");
alwaysCapitalize.add("UV");
alwaysCapitalize.add("DOM");
alwaysCapitalize.add("GS");
alwaysCapitalize.add("ii");
alwaysCapitalize.add("iii");
alwaysCapitalize.add("iv");
alwaysCapitalize.add("v");
alwaysCapitalize.add("vi");
alwaysCapitalize.add("vii");
alwaysCapitalize.add("viii");
alwaysCapitalize.add("ix");
alwaysCapitalize.add("x");
alwaysCapitalize.add("xi");
alwaysCapitalize.add("xii");
alwaysCapitalize.add("xiii");
alwaysCapitalize.add("xiv");
alwaysCapitalize.add("xv");
alwaysCapitalize.add("xvi");
alwaysCapitalize.add("xvii");
alwaysCapitalize.add("xviii");
alwaysCapitalize.add("xix");
alwaysCapitalize.add("xx");
alwaysCapitalize.add("xxi");
alwaysCapitalize.add("od");
alwaysCapitalize.add("id");
alwaysCapitalize.add("ph");
alwaysCapitalize.add("xh");
alwaysCapitalize.add("uv");
alwaysCapitalize.add("dom");
alwaysCapitalize.add("gs");
StringBuilder capitalizedString = new StringBuilder();
if (inputText.contains(" "))
inputText = inputText.toLowerCase().trim();
String parts = inputText.split(" ");
for (int i = 0; i < parts.length; i++) i == parts.length - 1)
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
else
if (!whiteList.contains(parts[i]))
if (!alwaysCapitalize.contains(parts[i]))
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
else
capitalizedString.append(parts[i].toUpperCase()).append(" ");
else
capitalizedString.append(parts[i]).append(" ");
return capitalizedString.toString();
java performance strings
edited May 2 at 0:31
Jamalâ¦
30.1k11114225
30.1k11114225
asked May 1 at 22:42
ozfive
233
233
1
What precisely should the code do? Because I wonder for example why you convert the input text completely to lower case as a first step.
â Koekje
May 1 at 22:52
It is supposed to clean up strings that may be in the form of "This IS a SAmPle STRING" first. I didn't check if Apache Commons WordUtils already handles that particular case. Then it checks if we are on the first word or the last word of the string and if so don't check against the whitelist of words that should stay lowercase. Otherwise check against the whitelist and capitalize all words not on the whitelist, then go on to check against words that need to stay capitalized fully.
â ozfive
May 1 at 23:04
Looks like it doesn't
â ozfive
May 1 at 23:09
add a comment |Â
1
What precisely should the code do? Because I wonder for example why you convert the input text completely to lower case as a first step.
â Koekje
May 1 at 22:52
It is supposed to clean up strings that may be in the form of "This IS a SAmPle STRING" first. I didn't check if Apache Commons WordUtils already handles that particular case. Then it checks if we are on the first word or the last word of the string and if so don't check against the whitelist of words that should stay lowercase. Otherwise check against the whitelist and capitalize all words not on the whitelist, then go on to check against words that need to stay capitalized fully.
â ozfive
May 1 at 23:04
Looks like it doesn't
â ozfive
May 1 at 23:09
1
1
What precisely should the code do? Because I wonder for example why you convert the input text completely to lower case as a first step.
â Koekje
May 1 at 22:52
What precisely should the code do? Because I wonder for example why you convert the input text completely to lower case as a first step.
â Koekje
May 1 at 22:52
It is supposed to clean up strings that may be in the form of "This IS a SAmPle STRING" first. I didn't check if Apache Commons WordUtils already handles that particular case. Then it checks if we are on the first word or the last word of the string and if so don't check against the whitelist of words that should stay lowercase. Otherwise check against the whitelist and capitalize all words not on the whitelist, then go on to check against words that need to stay capitalized fully.
â ozfive
May 1 at 23:04
It is supposed to clean up strings that may be in the form of "This IS a SAmPle STRING" first. I didn't check if Apache Commons WordUtils already handles that particular case. Then it checks if we are on the first word or the last word of the string and if so don't check against the whitelist of words that should stay lowercase. Otherwise check against the whitelist and capitalize all words not on the whitelist, then go on to check against words that need to stay capitalized fully.
â ozfive
May 1 at 23:04
Looks like it doesn't
â ozfive
May 1 at 23:09
Looks like it doesn't
â ozfive
May 1 at 23:09
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
4
down vote
accepted
CapFirstTitle()
is a method name that disregards java coding conventions for method names. Use standardlowerCamelCase
.whitelist
andalwaysCapitalize
should beprivate static final
members. You can initialize these in a static initializer block like this:private static final Set<String> whitelist = new HashSet<>();
static
whitelist.addAll(Arrays.asList("a", "above", "after", ...);The binary or is a little uncommon in conditions. You won't really get much benefit out of it, especially since the boolean or is short-circuiting. Use
||
over|
here.There is a lot of vertical whitespace here. I personally find that hard to follow and I'd delete most of that space if I wrote this code myself.
YMMV. Since you're extremely consistent about it, I don't see any need to change it.You're calling
toLowerCase
without specifying the Locale. While this may work in most cases, you should be aware of edge-cases involving languages like Turkish. If you have any way to be locale-aware here, you should strive to be. That can save you a lot of headaches down the line :)This code always appends a
" "
at the end of the String. You could avoid this by first overwriting theparts
array with the capitalized parts and subsequently usingArrays.stream(parts).collect(Collectors.joining(" "));
Since Java 8, String exposes the method
join
(thanks to Koekje for pointing that out in the coments), you can use asString.join(" ", parts);
The last point also contains an option for a deeper rewrite. Consider the following:
String capitalizedParts = Arrays.stream(input.toLowerCase().trim().split(" "))
.map(CapFirstTitle::capSingleWord)
.toArray(String::new);
capitalizedParts[0] = WordUtils.capitalize(capitalizedParts[0], ' ', '-');
int lastIdx = capitalizedParts.length - 1;
capitalizedParts[lastIdx] = WordUtils.capitalize(capitalizedParts[lastIdx], ' ', '-');
return String.join(" ", capitalizedParts);
This is a solution that hides the details of whitelist
and alwaysCapitalize
inside a method and clears the enforced capitalization of the first and last word.
It avoids the (correct, but mental overhead inducing) use of StringBuilder and separates the capitalization logic from the reassembly of the String. You can even go so far and reduce multiple spaces in the same process by filtering out empty array elements in the first Stream.
This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you!
â ozfive
May 2 at 0:19
The binary or | was a typo
â ozfive
May 2 at 0:28
Very good answer! You can also use thejoin
method ofString
, see docs.oracle.com/javase/8/docs/api/java/lang/â¦
â Koekje
May 2 at 11:40
@Koekje I forgot about that method, it seems ... Thanks for reminding me :)
â Vogel612â¦
May 2 at 11:45
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
CapFirstTitle()
is a method name that disregards java coding conventions for method names. Use standardlowerCamelCase
.whitelist
andalwaysCapitalize
should beprivate static final
members. You can initialize these in a static initializer block like this:private static final Set<String> whitelist = new HashSet<>();
static
whitelist.addAll(Arrays.asList("a", "above", "after", ...);The binary or is a little uncommon in conditions. You won't really get much benefit out of it, especially since the boolean or is short-circuiting. Use
||
over|
here.There is a lot of vertical whitespace here. I personally find that hard to follow and I'd delete most of that space if I wrote this code myself.
YMMV. Since you're extremely consistent about it, I don't see any need to change it.You're calling
toLowerCase
without specifying the Locale. While this may work in most cases, you should be aware of edge-cases involving languages like Turkish. If you have any way to be locale-aware here, you should strive to be. That can save you a lot of headaches down the line :)This code always appends a
" "
at the end of the String. You could avoid this by first overwriting theparts
array with the capitalized parts and subsequently usingArrays.stream(parts).collect(Collectors.joining(" "));
Since Java 8, String exposes the method
join
(thanks to Koekje for pointing that out in the coments), you can use asString.join(" ", parts);
The last point also contains an option for a deeper rewrite. Consider the following:
String capitalizedParts = Arrays.stream(input.toLowerCase().trim().split(" "))
.map(CapFirstTitle::capSingleWord)
.toArray(String::new);
capitalizedParts[0] = WordUtils.capitalize(capitalizedParts[0], ' ', '-');
int lastIdx = capitalizedParts.length - 1;
capitalizedParts[lastIdx] = WordUtils.capitalize(capitalizedParts[lastIdx], ' ', '-');
return String.join(" ", capitalizedParts);
This is a solution that hides the details of whitelist
and alwaysCapitalize
inside a method and clears the enforced capitalization of the first and last word.
It avoids the (correct, but mental overhead inducing) use of StringBuilder and separates the capitalization logic from the reassembly of the String. You can even go so far and reduce multiple spaces in the same process by filtering out empty array elements in the first Stream.
This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you!
â ozfive
May 2 at 0:19
The binary or | was a typo
â ozfive
May 2 at 0:28
Very good answer! You can also use thejoin
method ofString
, see docs.oracle.com/javase/8/docs/api/java/lang/â¦
â Koekje
May 2 at 11:40
@Koekje I forgot about that method, it seems ... Thanks for reminding me :)
â Vogel612â¦
May 2 at 11:45
add a comment |Â
up vote
4
down vote
accepted
CapFirstTitle()
is a method name that disregards java coding conventions for method names. Use standardlowerCamelCase
.whitelist
andalwaysCapitalize
should beprivate static final
members. You can initialize these in a static initializer block like this:private static final Set<String> whitelist = new HashSet<>();
static
whitelist.addAll(Arrays.asList("a", "above", "after", ...);The binary or is a little uncommon in conditions. You won't really get much benefit out of it, especially since the boolean or is short-circuiting. Use
||
over|
here.There is a lot of vertical whitespace here. I personally find that hard to follow and I'd delete most of that space if I wrote this code myself.
YMMV. Since you're extremely consistent about it, I don't see any need to change it.You're calling
toLowerCase
without specifying the Locale. While this may work in most cases, you should be aware of edge-cases involving languages like Turkish. If you have any way to be locale-aware here, you should strive to be. That can save you a lot of headaches down the line :)This code always appends a
" "
at the end of the String. You could avoid this by first overwriting theparts
array with the capitalized parts and subsequently usingArrays.stream(parts).collect(Collectors.joining(" "));
Since Java 8, String exposes the method
join
(thanks to Koekje for pointing that out in the coments), you can use asString.join(" ", parts);
The last point also contains an option for a deeper rewrite. Consider the following:
String capitalizedParts = Arrays.stream(input.toLowerCase().trim().split(" "))
.map(CapFirstTitle::capSingleWord)
.toArray(String::new);
capitalizedParts[0] = WordUtils.capitalize(capitalizedParts[0], ' ', '-');
int lastIdx = capitalizedParts.length - 1;
capitalizedParts[lastIdx] = WordUtils.capitalize(capitalizedParts[lastIdx], ' ', '-');
return String.join(" ", capitalizedParts);
This is a solution that hides the details of whitelist
and alwaysCapitalize
inside a method and clears the enforced capitalization of the first and last word.
It avoids the (correct, but mental overhead inducing) use of StringBuilder and separates the capitalization logic from the reassembly of the String. You can even go so far and reduce multiple spaces in the same process by filtering out empty array elements in the first Stream.
This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you!
â ozfive
May 2 at 0:19
The binary or | was a typo
â ozfive
May 2 at 0:28
Very good answer! You can also use thejoin
method ofString
, see docs.oracle.com/javase/8/docs/api/java/lang/â¦
â Koekje
May 2 at 11:40
@Koekje I forgot about that method, it seems ... Thanks for reminding me :)
â Vogel612â¦
May 2 at 11:45
add a comment |Â
up vote
4
down vote
accepted
up vote
4
down vote
accepted
CapFirstTitle()
is a method name that disregards java coding conventions for method names. Use standardlowerCamelCase
.whitelist
andalwaysCapitalize
should beprivate static final
members. You can initialize these in a static initializer block like this:private static final Set<String> whitelist = new HashSet<>();
static
whitelist.addAll(Arrays.asList("a", "above", "after", ...);The binary or is a little uncommon in conditions. You won't really get much benefit out of it, especially since the boolean or is short-circuiting. Use
||
over|
here.There is a lot of vertical whitespace here. I personally find that hard to follow and I'd delete most of that space if I wrote this code myself.
YMMV. Since you're extremely consistent about it, I don't see any need to change it.You're calling
toLowerCase
without specifying the Locale. While this may work in most cases, you should be aware of edge-cases involving languages like Turkish. If you have any way to be locale-aware here, you should strive to be. That can save you a lot of headaches down the line :)This code always appends a
" "
at the end of the String. You could avoid this by first overwriting theparts
array with the capitalized parts and subsequently usingArrays.stream(parts).collect(Collectors.joining(" "));
Since Java 8, String exposes the method
join
(thanks to Koekje for pointing that out in the coments), you can use asString.join(" ", parts);
The last point also contains an option for a deeper rewrite. Consider the following:
String capitalizedParts = Arrays.stream(input.toLowerCase().trim().split(" "))
.map(CapFirstTitle::capSingleWord)
.toArray(String::new);
capitalizedParts[0] = WordUtils.capitalize(capitalizedParts[0], ' ', '-');
int lastIdx = capitalizedParts.length - 1;
capitalizedParts[lastIdx] = WordUtils.capitalize(capitalizedParts[lastIdx], ' ', '-');
return String.join(" ", capitalizedParts);
This is a solution that hides the details of whitelist
and alwaysCapitalize
inside a method and clears the enforced capitalization of the first and last word.
It avoids the (correct, but mental overhead inducing) use of StringBuilder and separates the capitalization logic from the reassembly of the String. You can even go so far and reduce multiple spaces in the same process by filtering out empty array elements in the first Stream.
CapFirstTitle()
is a method name that disregards java coding conventions for method names. Use standardlowerCamelCase
.whitelist
andalwaysCapitalize
should beprivate static final
members. You can initialize these in a static initializer block like this:private static final Set<String> whitelist = new HashSet<>();
static
whitelist.addAll(Arrays.asList("a", "above", "after", ...);The binary or is a little uncommon in conditions. You won't really get much benefit out of it, especially since the boolean or is short-circuiting. Use
||
over|
here.There is a lot of vertical whitespace here. I personally find that hard to follow and I'd delete most of that space if I wrote this code myself.
YMMV. Since you're extremely consistent about it, I don't see any need to change it.You're calling
toLowerCase
without specifying the Locale. While this may work in most cases, you should be aware of edge-cases involving languages like Turkish. If you have any way to be locale-aware here, you should strive to be. That can save you a lot of headaches down the line :)This code always appends a
" "
at the end of the String. You could avoid this by first overwriting theparts
array with the capitalized parts and subsequently usingArrays.stream(parts).collect(Collectors.joining(" "));
Since Java 8, String exposes the method
join
(thanks to Koekje for pointing that out in the coments), you can use asString.join(" ", parts);
The last point also contains an option for a deeper rewrite. Consider the following:
String capitalizedParts = Arrays.stream(input.toLowerCase().trim().split(" "))
.map(CapFirstTitle::capSingleWord)
.toArray(String::new);
capitalizedParts[0] = WordUtils.capitalize(capitalizedParts[0], ' ', '-');
int lastIdx = capitalizedParts.length - 1;
capitalizedParts[lastIdx] = WordUtils.capitalize(capitalizedParts[lastIdx], ' ', '-');
return String.join(" ", capitalizedParts);
This is a solution that hides the details of whitelist
and alwaysCapitalize
inside a method and clears the enforced capitalization of the first and last word.
It avoids the (correct, but mental overhead inducing) use of StringBuilder and separates the capitalization logic from the reassembly of the String. You can even go so far and reduce multiple spaces in the same process by filtering out empty array elements in the first Stream.
edited May 2 at 11:45
answered May 1 at 23:28
Vogel612â¦
20.9k345124
20.9k345124
This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you!
â ozfive
May 2 at 0:19
The binary or | was a typo
â ozfive
May 2 at 0:28
Very good answer! You can also use thejoin
method ofString
, see docs.oracle.com/javase/8/docs/api/java/lang/â¦
â Koekje
May 2 at 11:40
@Koekje I forgot about that method, it seems ... Thanks for reminding me :)
â Vogel612â¦
May 2 at 11:45
add a comment |Â
This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you!
â ozfive
May 2 at 0:19
The binary or | was a typo
â ozfive
May 2 at 0:28
Very good answer! You can also use thejoin
method ofString
, see docs.oracle.com/javase/8/docs/api/java/lang/â¦
â Koekje
May 2 at 11:40
@Koekje I forgot about that method, it seems ... Thanks for reminding me :)
â Vogel612â¦
May 2 at 11:45
This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you!
â ozfive
May 2 at 0:19
This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you!
â ozfive
May 2 at 0:19
The binary or | was a typo
â ozfive
May 2 at 0:28
The binary or | was a typo
â ozfive
May 2 at 0:28
Very good answer! You can also use the
join
method of String
, see docs.oracle.com/javase/8/docs/api/java/lang/â¦â Koekje
May 2 at 11:40
Very good answer! You can also use the
join
method of String
, see docs.oracle.com/javase/8/docs/api/java/lang/â¦â Koekje
May 2 at 11:40
@Koekje I forgot about that method, it seems ... Thanks for reminding me :)
â Vogel612â¦
May 2 at 11:45
@Koekje I forgot about that method, it seems ... Thanks for reminding me :)
â Vogel612â¦
May 2 at 11:45
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193405%2ftitle-capitalization-of-strings-in-java%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
What precisely should the code do? Because I wonder for example why you convert the input text completely to lower case as a first step.
â Koekje
May 1 at 22:52
It is supposed to clean up strings that may be in the form of "This IS a SAmPle STRING" first. I didn't check if Apache Commons WordUtils already handles that particular case. Then it checks if we are on the first word or the last word of the string and if so don't check against the whitelist of words that should stay lowercase. Otherwise check against the whitelist and capitalize all words not on the whitelist, then go on to check against words that need to stay capitalized fully.
â ozfive
May 1 at 23:04
Looks like it doesn't
â ozfive
May 1 at 23:09