Class Strings

java.lang.Object
com.pnfsoftware.jeb.util.format.Strings

public class Strings extends Object
Utility methods for Strings and CharSequences.
  • Field Details

    • LINESEP

      public static final String LINESEP
      Line-separator for *this* platform.
  • Constructor Details

    • Strings

      public Strings()
  • Method Details

    • hasLength

      public static boolean hasLength(CharSequence s)
      Determine if a string is non-null and non-empty.
      Parameters:
      s - character sequence to test
      Returns:
      true iff the sequence contains at least one character
    • replaceNewLines

      public static String replaceNewLines(String s, String repl)
      Replace newline sequences. This method accepts null strings as input.
      Parameters:
      s - a string or null; in the latter case, null will be returned
      repl - the non-null substitution string, which must not contain new-line characters
      Returns:
      the string with all newline sequences replaced, or null if s is null
    • normalizeNewLines

      public static String normalizeNewLines(String s)
      Replace all newline sequences by the standard \n LF character.
      Parameters:
      s - a string
      Returns:
      the normalized string
    • safe

      public static String safe(Object s)
      Get the string representation of the parameter object, or the empty string if the object is null.
      Parameters:
      s - an object, possibly null
      Returns:
      the object toString(java.lang.Object) representation, or the empty string
    • safe

      public static String safe(Object s, String def)
      Get the string representation of the parameter object, or the provided string if the object is null.
      Parameters:
      s - an object, possibly null
      def - a non-null string
      Returns:
      a non-null string, possibly empty
    • safe2

      public static String safe2(Object s, String def)
      Get the string representation of the parameter object, or the provided non-empty string if the object is null or its string representation is the empty string.
      Parameters:
      s - an object, possibly null
      def - a non-null, non-empty string
      Returns:
      a string guaranteed to be non-empty
    • joinList

      public static String joinList(Iterable<?> objects)
      Join the elements of a list using "," as a separator and surround the resulting string with square brackets. Careful, this method does not abide to the common semantic of join.
      Parameters:
      objects - a list of objects
      Returns:
      the resulting string
    • joinv

      public static String joinv(String separator, Object... objects)
      Join the string representations of a sequence of objects using the provided separator. Null objects will be formatted as "null".
      Parameters:
      separator - a non-null separator
      objects - an array of objects
      Returns:
      the resulting string
    • joinv

      public static String joinv(String separator, String defaultValue, Object... objects)
      Join the string representations of a sequence of objects using the provided separator.
      Parameters:
      separator - a non-null separator
      defaultValue - String representation for null Objects
      objects - an array of objects
      Returns:
      the resulting string
    • join

      public static String join(String separator, Iterable<?> iterator)
      Join the string representations of a sequence of objects using the provided separator. Null objects will be formatted as "null".
      Parameters:
      separator - a non-null separator
      iterator - an iterator
      Returns:
      the resulting string
    • join

      public static <T> String join(String separator, Iterable<T> iterator, Function<T,CharSequence> f)
      Join a series of items. Format items using the function. For example, to display a list of long as hexadecimal separated by comma: Strings.join(", ", Arrays.asList(0x10L, 0x20L), l -> Long.toHexString(l))
      Type Parameters:
      T - Any Object
      Parameters:
      separator - a non-null separator
      iterator - items to join
      f - toString() equivalent method to be applied to objects from list.
      Returns:
      the resulting string
    • join

      public static String join(String separator, String[] elts, int begin, int end)
      Join a series of non-null strings.
      Parameters:
      separator - a non-null separator
      elts - string array to join
      begin - inclusive start index
      end - exclusive end index
      Returns:
      the joined string
    • splitLines

      public static String[] splitLines(String s, boolean doNotReturnFinalEmptyLine)
      Split a text into an array of Lines. Empty lines are returned. The final new-line character(s) are trimmed off. Works for all new lines characters (\r, \n) or sequences of characters (\r\n)
      Parameters:
      s - mandatory input string
      doNotReturnFinalEmptyLine - true to discard a final empty line
      Returns:
      the lines
    • splitLines

      public static String[] splitLines(String s)
      Split a text into an array of Lines. Empty lines are returned. The final new-line character(s) are trimmed off. Works for all new lines characters (\r, \n) or sequences of characters (\r\n)
      Parameters:
      s - mandatory input string
      Returns:
      the lines
    • splitall

      public static String[] splitall(String s, String delim)
      Split a string and preserve trailing empty tokens.
      Parameters:
      s - input string
      delim - regular-expression delimiter
      Returns:
      the split tokens
    • firstLine

      public static String firstLine(String s)
      Retrieve the first line of a string.
      Parameters:
      s - input string
      Returns:
      the portion of the string before the first CR or LF character
    • splitCsv

      public static String[] splitCsv(String buf)
      Split a text into an array of lines using comma as a separator. Empty lines are returned.
      Parameters:
      buf - input string
      Returns:
      an array of tokens; the array is empty if the input string is null or empty
    • search

      public static int search(CharSequence data, int index, String pattern, boolean regex, boolean caseSensitive, boolean reverseSearch)
      Search for a sub-string.

      Note: on JDK 11+, this implementation for regex=false, caseSensitive=false, reverseSearch=false may be slower than doing data.toLowerCase().indexOf(pattern.toLowerCase()).

      Parameters:
      data - buffer to be searched (aka, the haystack)
      index - in the case of a regular (forward) search, the search takes is [index,EOS); in the case of a reverse (backward) search, the search range is [0,index)
      pattern - text that is being searched (aka, the needle)
      regex - if true, the pattern will be treated as a regular expression; if the regex is invalid, it will be treated as a regular string and no error will be reported
      caseSensitive - search is case-sensitive
      reverseSearch - search is done in reverse
      Returns:
      index where the substring was found, or -1 if nothing was found
    • isContainedIn

      public static boolean isContainedIn(String s, String... elts)
      Determine if a string is contained in an var-arg list of provided strings.
      Parameters:
      s - string to be searched
      elts - the list of elements
      Returns:
      true iff the input string was not null and found in the list of elements
    • contains

      public static boolean contains(String s, String... elts)
      A many-element variant of String.contains.
      Parameters:
      s - the string
      elts - a list of string elements
      Returns:
      true if the string contains at least one of the provided elements
    • startsWith

      public static boolean startsWith(String s, String... elts)
      A many-element variant of String.startsWith.
      Parameters:
      s - the string
      elts - a list of string elements
      Returns:
      true if the string starts with one of the provided elements
    • containsAt

      public static boolean containsAt(String s, int index, String elt)
      Indicates if a String s contains a particular substring at a specified index. Semantically equivalent to s.substring(i).startsWith(elt) without intermediate substring creation.
      Parameters:
      s - the string
      index - String index to look at
      elt - element to identify
      Returns:
      true if s.substring(i).startsWith(elt) returns true
    • endsWith

      public static boolean endsWith(String s, String... elts)
      A many-element variant of String.endsWith.
      Parameters:
      s - the string
      elts - a list of string elements
      Returns:
      true if the string ends with one of the provided elements
    • equals

      public static boolean equals(String a, String b)
      A safer version of String.equals(Object).
      Parameters:
      a - first string, may be null
      b - second string, may be null
      Returns:
      true iff both strings are non-null and equals
    • equalsIgnoreCase

      public static boolean equalsIgnoreCase(String a, String b)
      Parameters:
      a - first string, may be null
      b - second string, may be null
      Returns:
      true iff both strings are non-null and iequals
    • toString

      public static String toString(Object o)
      A safe version of String.toString.
      Parameters:
      o - an object, could be null
      Returns:
      the String representation of the provided object, or "null"
    • toString

      public static String toString(Object o, String defaultValue)
      A safe version of String.toString.
      Parameters:
      o - an object, could be null
      defaultValue - default String representation if o is null
      Returns:
      the String representation of the provided object, or the default value
    • generate

      public static String generate(char c, int count)
      Generate a repeated-character String. For CharSequence generation, use pad(char, int).
      Parameters:
      c - character to repeat
      count - repeat count (i.e. string length)
      Returns:
      the string
    • generate

      public static String generate(CharSequence s, int count)
      Generate a repeated string.
      Parameters:
      s - string to repeat
      count - repeat count
      Returns:
      the resulting result
    • spaces

      public static String spaces(int count)
      Generate a repeated string of spaces.
      Parameters:
      count - number of spaces to generate
      Returns:
      a string containing count spaces, or the empty string if count is not positive
    • isBlank

      public static boolean isBlank(CharSequence s)
      Determine if a character sequence is null, empty, or contains WSP chars exclusively.
      Parameters:
      s - the character sequence
      Returns:
      true if the sequence is null or blank
    • countNonBlankCharacters

      public static int countNonBlankCharacters(CharSequence s)
      Count the number of non blank characters in the provided string.
      Parameters:
      s - character sequence to scan
      Returns:
      the number of non-whitespace characters
    • indexOf

      public static int indexOf(CharSequence text, char c)
      Implementation of indexOf for CharSequence. Same behavior as String.indexOf(int).
      Parameters:
      text - string
      c - char
      Returns:
      the index position, or -1 if not found
    • indexOf2

      public static int indexOf2(CharSequence text, char c0, char c1)
      Find the first one of two characters and return its position.

      This is a 2-element implementation of String.indexOf(int).

      Parameters:
      text - string
      c0 - first char
      c1 - second char
      Returns:
      the position of the first occurrence of c0 or c1 (whichever came first), -1 if not found
    • indexOf2

      public static int indexOf2(CharSequence text, int from, char c0, char c1)
      Find the first one of two characters and return its position.
      Parameters:
      text - string
      from - start index
      c0 - first char
      c1 - second char
      Returns:
      the position of the first occurrence of c0 or c1 (whichever came first), -1 if not found
    • indexOfAny

      public static int indexOfAny(CharSequence text, Set<Character> cset)
      Find the first one of any of the provided characters and return its position.

      This is a N-element implementation of String.indexOf(int).

      Parameters:
      text - string
      cset - a set of characters
      Returns:
      the position of the first occurrence of any character in the set, or -1 if not found
    • indexOfNotInGroup

      public static int indexOfNotInGroup(CharSequence text, char c, int fromIndex, char[]... ingoreInGroups)
      Find the index a of character, ignoring some groups. For example:
      • ignore some text in parenthesis: indexOfNotInGroup("it is (almost) done", 'o', 0, ['(', ')']) will return 16
      • ignore generics: indexOfNotInGroup("std::myclass<a,b>::mymethod(type a, type b)", ',', 0, ['<', '>']) will return 34
      Parameters:
      text - string
      c - character to find
      fromIndex - start index, use 0 by default
      ingoreInGroups - list of character groups to be ignored, for example {'(', ')'} or {'<', '>'}. Each character group must contain at least 2 elements (one for the open element, one for the close element)
      Returns:
      positive index if found, -1 when not found, -2 in case of malformed
    • lastIndexOf2

      public static int lastIndexOf2(CharSequence text, char c0, char c1)
      Find the last one of two characters and return its position.

      This is a 2-element implementation of String.lastIndexOf(int).

      Parameters:
      text - string
      c0 - first char
      c1 - second char
      Returns:
      the position of the last occurrence of c0 or c1 (whichever came first), -1 if not found
    • lastIndexOf2

      public static int lastIndexOf2(CharSequence text, int from, char c0, char c1)
      Find the last one of two characters and return its position.
      Parameters:
      text - string
      from - start index
      c0 - first char
      c1 - second char
      Returns:
      the position of the last occurrence of c0 or c1 (whichever came first), -1 if not found
    • lastIndexOfAny

      public static int lastIndexOfAny(CharSequence text, Set<Character> cset)
      Find the last one of any of the provided characters and return its position.

      This is a N-element implementation of String.lastIndexOf(int).

      Parameters:
      text - string
      cset - a set of characters
      Returns:
      the position of the last occurrence of any character in the set, or -1 if not found
    • hasBlank

      public static boolean hasBlank(CharSequence s)
      Determine if a string contains one or more WSP characters.
      Parameters:
      s - character sequence to test
      Returns:
      true if the sequence contains at least one whitespace character
    • isWhitespace

      public static boolean isWhitespace(char c)
      Determine if a character is a white-space, per the Unicode standard. This method differs from Character.isWhitespace(char) (Java language definition of a WSP).
      Parameters:
      c - character to test
      Returns:
      true if the character is one of the recognized Unicode whitespace characters
    • isAsciiWhitespace

      public static boolean isAsciiWhitespace(int b, char... extraWhitespaceCharacters)
      Determine if a character is a white-space, per the Ascii standard. It only processes regular space, tab, CR and LF characters.
      Parameters:
      b - the int to test
      extraWhitespaceCharacters - additional ascii characters considered as whitespace
      Returns:
      true if the value is an ASCII whitespace character
    • replaceWhitespaces

      public static String replaceWhitespaces(String str, char repl)
      Efficiently replace all Unicode white-spaces by the provided char.
      Parameters:
      str - input string
      repl - replacement character
      Returns:
      the string with all recognized whitespace characters replaced
    • trimWhitespaces

      public static String trimWhitespaces(String s)
      Trim (left and right) all characters considered to be white-space by the Unicode standard.
      Parameters:
      s - the input string
      Returns:
      the trimmed string
    • trim

      public static String trim(String s)
      Trim (left and right) all chars less than or equal to ' '. Note that this method differs from String.trim() which, for instance, does not consider CR or LF to be WSP.
      Parameters:
      s - a string
      Returns:
      the trimmed string
    • trim

      public static String trim(String s, char c)
      Trim (left and right) all chars to provided character.
      Parameters:
      s - a string
      c - the character to be removed
      Returns:
      the trimmed string
    • ltrim

      public static String ltrim(String s)
      Left trim all chars less than or equal to ' '. Note that this method differs from String.trim() which, for instance, does not consider CR or LF to be WSP.
      Parameters:
      s - a string
      Returns:
      the left-trimmed string
    • rtrim

      public static String rtrim(String s)
      Right trim all chars less than or equal to ' '. Note that this method differs from String.trim() which, for instance, does not consider CR or LF to be WSP.
      Parameters:
      s - a string
      Returns:
      the right-trimmed string
    • ltrim

      public static String ltrim(String s, char c)
      Left trim on a given character.
      Parameters:
      s - a string
      c - character to remove
      Returns:
      the left-trimmed string
    • rtrim

      public static String rtrim(String s, char c)
      Right trim on a given character.
      Parameters:
      s - a string
      c - character to remove
      Returns:
      the right-trimmed string
    • getAsciiLength

      public static int getAsciiLength(byte[] data, int maxlen)
      Retrieve the length of a potentially ASCII-encoded string. The String characters allowed are contained CR, LF, TAB, and any character in the [0x20, 0x7E] range.
      Parameters:
      data - a byte array
      maxlen - maximum length
      Returns:
      the length of the string
    • getAsciiLength

      public static int getAsciiLength(byte[] data)
      Parameters:
      data - a byte array
      Returns:
      the length of the string
    • determinePotentialEncoding

      public static Charset determinePotentialEncoding(byte[] data, int offset, int size)
      Heuristically determine the encoding of a string.
      Parameters:
      data - byte buffer to analyze
      offset - start offset in data
      size - number of bytes to analyze
      Returns:
      null if unknown, else one of ASCII, UTF-8, UTF-16, UTF-16LE, UTF-16BE, UTF-32LE or UTF-32BE
    • isNumber

      public static boolean isNumber(String text)
      Check that every character of the text parameter is a digit.
      Parameters:
      text - string to test
      Returns:
      true if text is a valid decimal number
    • isConsistentHexNumberString

      public static boolean isConsistentHexNumberString(String text)
      Check that every character of a non-empty input string is an hexadecimal digit. Allow upper case as well as lower case characters (only lower or only upper).
      Parameters:
      text - the input string
      Returns:
      true if the string is a valid hexadecimal number
    • f

      public static String f(String format, Object... args)
      Format using the US locale.
      Parameters:
      format - format string
      args - format arguments
      Returns:
      the formatted string
    • getFastFormatInvocationCount

      public static int getFastFormatInvocationCount()
    • getFastFormatFailureCount

      public static int getFastFormatFailureCount()
    • resetFastFormatCounts

      public static void resetFastFormatCounts()
    • ff

      public static Appendable ff(Locale l, Appendable sink, String format, Object... args)
      A faster version of String.format(String, Object...).

      Implementation note: currently limited to formatters %b %c %d %0{2,4,8}{x,X} %[N]s %[-N]s %% %n. If any other formatter is used, the implementation reverts to String.format.

      Parameters:
      l - locale to be used
      sink - optional recipient (if null, a new builder will be created; the formatted string is appended to the sink)
      format - format string
      args - format arguments
      Returns:
      the sink, never null
    • ff

      public static Appendable ff(Appendable sink, String format, Object... args)
      A faster version of String.format(String, Object...).

      Implementation note: currently limited to formatters %b %c %d %0{2,4,8}{x,X} %[N]s %[-N]s %% %n. If any other formatter is used, the implementation reverts to String.format.

      Parameters:
      sink - optional recipient (if null, a new builder will be created; the formatted string is appended to the sink)
      format - format string
      args - format arguments
      Returns:
      the sink, never null
    • ff

      public static String ff(Locale l, String format, Object... args)
      A faster version of String.format(String, Object...).

      Implementation note: currently limited to formatters %b %c %d %0{2,4,8}{x,X} %[N]s %[-N]s %% %n. If any other formatter is used, the implementation reverts to String.format.

      Parameters:
      l - locale to be used
      format - format string
      args - format arguments
      Returns:
      the formatted string
    • ff

      public static String ff(String format, Object... args)
      A faster version of String.format(String, Object...).

      Implementation note: currently limited to formatters %b %c %d %0{2,4,8}{x,X} %[N]s %[-N]s %% %n. If any other formatter is used, the implementation reverts to String.format.

      Parameters:
      format - format string
      args - format arguments
      Returns:
      the formatted string
    • replaceLast

      public static String replaceLast(String str, String target, String replacement)
      Replace the last occurrence of target in str by the replacement
      Parameters:
      str - the string to search in
      target - the string to search for
      replacement - the replacement part
      Returns:
      the new string with replacement instead of last target occurrence or original string if target was not found
    • substring

      public static String substring(String s, int begin, int end)
      Flexible version of String.substring(int, int). Allow Python-like negative indexes for convenience.
      Parameters:
      s - a string
      begin - index in the [-s_length, +s_length] range
      end - index in the [-s_length, +s_length] range
      Returns:
      the substring
    • truncate

      public static String truncate(String s, int maxLength)
      Truncate a string.
      Parameters:
      s - a string
      maxLength - positive length
      Returns:
      the truncated string, which will contain at most `maxLength` characters
    • truncateWithSuffix

      public static String truncateWithSuffix(String s, int maxLength, String suffix)
      Truncate a string and append an optional suffix to it if it was actually truncated.
      Parameters:
      s - a string
      maxLength - positive length, which must be greater than or equal to the suffix, if one was provided
      suffix - optional suffix appended to a string that is actually truncated
      Returns:
      the original string, or a truncated string ending with the suffix
    • indentBlock

      public static String indentBlock(String blk, String indent)
      Indent a buffer.
      Parameters:
      blk - input block
      indent - indentation string prepended to every line
      Returns:
      the indented block
    • indentBlock

      public static String indentBlock(String blk)
      Indent a buffer using a 4-space indentation.
      Parameters:
      blk - input block
      Returns:
      the indented block
    • urlencodeUTF8

      public static String urlencodeUTF8(String s)
      Urlencode a string. The resulting string will have the following characteristics:
      • a-z, A-Z, 0-9 remain the same
      • ., -, *, _ remain the same
      • space is converted to +
      • all other characters are UTF8 encoded using the "%xx" scheme
      Parameters:
      s - the string to be encoded
      Returns:
      the encoded string
    • urldecodeUTF8

      public static String urldecodeUTF8(String s)
      Decode a URL-encoded string.
      Parameters:
      s - the encoded string
      Returns:
      the decoded string
    • parseUrlParameters

      public static String[] parseUrlParameters(String s, String... entries)
      Extract the parameters of a URL-like encoded string. No decoding is taking place. Example:
       - s: "type=home&subtype=house&[another_key]=[another_value]"
       - entries: "type", "subtype"
       - returns: ["home", "house"]
       
      Parameters:
      s - the string to be parsed
      entries - the entries, whose count must match the number of key-value pairs
      Returns:
      the array of parameter values, as they were (i.e. without any decoding applied); an element is null if it could not be parsed
    • parseUrlParameter

      public static String parseUrlParameter(String s, String entry)
      Parameters:
      s - the URL-like string to be parsed, containing a single key-value pair, eg hometype=house
      entry - parameter name to retrieve
      Returns:
      the parameter (without decoding applied), null on error
    • encodeArray

      public static String encodeArray(Object... array)
      Encode an array of objects.
      Parameters:
      array - the array of objects
      Returns:
      the encoded array as a string
    • decodeArray

      public static String[] decodeArray(String s)
      Decode an encoded array of objects.
      Parameters:
      s - the encoded array
      Returns:
      the array of decoded strings
    • encodeList

      public static String encodeList(List<?> list)
      Encode a list of objects.
      Parameters:
      list - the list of objects
      Returns:
      the encoded list as a string
    • decodeList

      public static List<String> decodeList(String s)
      Decode an encoded list of objects.
      Parameters:
      s - optional encoded list
      Returns:
      the list of decoded strings
    • encodeMap

      public static String encodeMap(Map<?,?> map)
      Encode a dictionary. The encoding scheme will produce strings like: encodedKey1=encodedValue1&encodedKey2=encodedValue2&...
      Parameters:
      map - the map of key/values
      Returns:
      the encoded map as a string
    • decodeMap

      public static Map<String,String> decodeMap(String s)
      Decode an encoded map.
      Parameters:
      s - optional encoded map
      Returns:
      the decoded map
    • encodeUTF8

      public static byte[] encodeUTF8(String s)
      Encode a string using a UTF-8 encoder. If the encoder is not available, the string is encoded using the system's default encoder. This should never happen.
      Parameters:
      s - mandatory string
      Returns:
      the encoded byte buffer
    • decodeUTF8

      public static String decodeUTF8(byte[] bytes, int offset, int length)
      Decode a byte buffer using a UTF-8 decoder. If the decoder is not available, the byte buffer is decoded using the system's default decoder.
      Parameters:
      bytes - byte buffer
      offset - start offset
      length - count of bytes to be decoded
      Returns:
      the decoded string
    • decodeUTF8

      public static String decodeUTF8(byte[] bytes)
      Decode a byte buffer using a UTF-8 decoder. If the decoder is not available, the byte buffer is decoded using the system's default decoder.
      Parameters:
      bytes - mandatory byte buffer
      Returns:
      the decoded string
    • encodeASCII

      public static byte[] encodeASCII(String s)
      Encode a string using an ASCII encoder. If the encoder is not available, the string is encoded using the system's default encoder. This should never happen.
      Parameters:
      s - mandatory string
      Returns:
      the encoded byte buffer
    • decodeASCII

      public static String decodeASCII(byte[] bytes, int offset, int length)
      Decode a byte buffer using an ASCII decoder. If the decoder is not available, the byte buffer is decoded using the system's default decoder.
      Parameters:
      bytes - byte buffer
      offset - start offset
      length - count of bytes to be decoded
      Returns:
      the decoded string
    • decodeASCII

      public static String decodeASCII(byte[] bytes)
      Decode a byte buffer using an ASCII decoder. If the decoder is not available, the byte buffer is decoded using the system's default decoder.
      Parameters:
      bytes - mandatory byte buffer
      Returns:
      the decoded string
    • encodeLocal

      public static byte[] encodeLocal(String s)
      Encode a string using the local platform's default charset. This method is potentially dangerous.
      Parameters:
      s - mandatory string
      Returns:
      the encoded byte buffer
    • decodeLocal

      public static String decodeLocal(byte[] bytes, int offset, int length)
      Decode a byte buffer using the local platform's default charset. This method is potentially dangerous.
      Parameters:
      bytes - byte buffer
      offset - start offset
      length - count of bytes to be decoded
      Returns:
      the decoded string
    • decodeLocal

      public static String decodeLocal(byte[] bytes)
      Decode a byte buffer using the local platform's default charset. This method is potentially dangerous.
      Parameters:
      bytes - mandatory byte buffer
      Returns:
      the decoded string
    • encodeBinary

      public static byte[] encodeBinary(String s)
      Generate a byte array consisting of the low-bytes of the input string characters.
      Parameters:
      s - string to encode
      Returns:
      a byte array containing the low byte of each input character
    • getComparator

      public static Comparator<String> getComparator()
      Get a case-sensitive string comparator that treats hexadecimal sequences as numbers, and orders them accordingly, instead as simple strings.

      Refer to NumberComparator and AlphanumCharComparator for details.

      Returns:
      the comparator
    • getComparator

      public static Comparator<String> getComparator(boolean caseSensitive, boolean scanHexadecimal)
      Get a string comparator that can treat hexadecimal sequences as numbers (and order them accordingly) instead as simple strings.

      Refer to NumberComparator and AlphanumCharComparator for details.

      Parameters:
      caseSensitive - true to sort upper-case and lower-case letters separately
      scanHexadecimal - true to recognize hexadecimal number chunks
      Returns:
      the comparator
    • makeNewLine

      public static void makeNewLine(StringBuilder sb)
      Append a new-line character to the provided buffer unless the buffer is empty or the last character in the buffer is a new-line.
      Parameters:
      sb - a string builder
    • randomUniqueId

      public static String randomUniqueId()
      Generate a 32-character long random unique identifier. The UID returned consists of the digits 0 to 9 and letters a to f (lower-case).
      Returns:
      a random lowercase hexadecimal identifier
    • pad

      public static CharSequence pad(char c, int count)
      Repeat character c, iter times and build a CharSequence from it. For example pad('0', 4) will return "0000".
      Parameters:
      c - inner character
      count - times to repeat character.
      Returns:
      CharSequence
    • capitalizeFirst

      public static String capitalizeFirst(String s)
      Capitalize the first character of a string.
      Parameters:
      s - input string
      Returns:
      the input string with its first character converted to upper-case, when applicable
    • camelCaseToString

      public static String camelCaseToString(String s, boolean breakOnDigits, boolean keepUppercaseAcronyms) throws ParseException
      Convert a camel-case string to a sentence. Example:
       ThisIsACamelCaseString    -> This is a camel case string
       ThisIsACamel44CaseString  -> This is a camel44 case string
       CountryUSA                -> Country u s a
      
       with breakOnDigits=true:
       ThisIsACamel44CaseString  -> This is a camel 44 case string
      
       with keepUppercaseAcronyms=true:
       CountryUSA                -> Country USA
       
      A legal camel-case string always starts with an upper-case letter, and does not contain whitespace characters.
      Parameters:
      s - the input camel-case string
      breakOnDigits - if true, base-10 numbers will also be used as breaks
      keepUppercaseAcronyms - keep 2+ upper-case letter acronyms intact, eg: CountryUSA would be converted to Country USA instead of Country u s a
      Returns:
      the result sentence
      Throws:
      ParseException - if the input string was not camel-case formatted
    • camelCaseToString

      public static String camelCaseToString(String s) throws ParseException
      Convert a camel-case string to a sentence. Example:
       ThisIsACamelCaseString -> This is a camel case string
       
      A legal camel-case string always starts with an upper-case letter, and does not contain whitespace characters.
      Parameters:
      s - the input camel-case string
      Returns:
      the result sentence
      Throws:
      ParseException - if the input string was not camel-case formatted
    • hasRtl

      public static boolean hasRtl(CharSequence s)
      Determine if a string contains right-to-left (RTL) characters, eg Arabic or Hebrew characters.
      Parameters:
      s - character sequence to scan
      Returns:
      true if the sequence contains an RTL character
    • parseCommandline

      public static String[] parseCommandline(String s)
      Parse a string as a command line. Source: ant.jar.
      Parameters:
      s - the command line to process.
      Returns:
      the command line broken into strings
    • isWellFormedUTF8

      public static boolean isWellFormedUTF8(byte[] bytes)
      Determine if a byte array is a well-formed UTF-8 byte sequence.
      Parameters:
      bytes - bytes to validate
      Returns:
      true if the bytes round-trip through UTF-8 decoding and encoding
    • isWellFormedUTF8

      public static boolean isWellFormedUTF8(byte[] bytes, int off, int len)
      Determine if a byte range is a well-formed UTF-8 byte sequence.
      Parameters:
      bytes - bytes to validate
      off - start offset
      len - number of bytes to validate
      Returns:
      true if the bytes round-trip through UTF-8 decoding and encoding
    • isPrintableUTF8Header

      public static boolean isPrintableUTF8Header(byte[] headerBytes)
      Validate if some starting bytes may be considered as an UTF-8 printable character header.
      Parameters:
      headerBytes - starting bytes. May be cropped without incidence (will be more accurate with more bytes, though).
      Returns:
      true if bytes appears to represent UTF-8.
    • isPrintableCharsetHeader

      public static boolean isPrintableCharsetHeader(byte[] headerBytes, Charset charset)
      Validate if some starting bytes may be encoded with a particular charset.
      Parameters:
      headerBytes - starting bytes. May be cropped without incidence (will be more accurate with more bytes, though).
      charset - charset to detect. Use isPrintableUTF8Header(byte[]) for UTF-8.
      Returns:
      true if bytes appears to represent the provided charset.
    • decodeUTF8Ex

      public static String decodeUTF8Ex(byte[] bytes, boolean useStandardDecoderFirst)
      Decode UTF-8 bytes back to a string. Sub-optimal encodings of characters are processed as normal.
      Parameters:
      bytes - bytes to decode
      useStandardDecoderFirst - true to attempt standard UTF-8 decoding before the permissive decoder
      Returns:
      the decoded string
    • decodeUTF8Ex

      public static String decodeUTF8Ex(byte[] bytes, int off, int len, boolean useStandardDecoderFirst)
      Decode UTF-8 bytes back to a string. Sub-optimal encodings of characters are processed as normal.
      Parameters:
      bytes - bytes to decode
      off - start offset
      len - number of bytes to decode
      useStandardDecoderFirst - true to attempt standard UTF-8 decoding before the permissive decoder
      Returns:
      the decoded string
    • getBOMSize

      public static int getBOMSize(byte[] input)
      Retrieve the size taken by the BOM or equivalent encoding mark. Detect UTF-8, UTF-16 and UTF-32.
      Parameters:
      input - byte array. Be sure to have at least 4 bytes to analyze all.
      Returns:
      the size taken by BOM or 0 if no BOM was detected
    • readBOM

      public static String readBOM(byte[] input)
      Retrieve the charset from start bytes. Detect UTF-8, UTF-16LE/BE and UTF-32LE-BE.
      Parameters:
      input - first bytes of a string
      Returns:
      the detected charset or null if no BOM was detected.
    • getInitialBlankSize

      public static int getInitialBlankSize(InputStream in, boolean includeBOM, char... extraWSPChars) throws IOException
      Retrieve the initial blank bytes at the beginning of a stream. This method is a destructive probe and the caller must not continue reading from the same stream after it returns.
      Parameters:
      in - input stream (when this method returns, some bytes have been read from the stream; callers should no longer use the stream)
      includeBOM - true will consider BOM at start of the stream as an initial blank bytes
      extraWSPChars - optional set of characters to be treated as whitespaces, aside from Space, Tab, CR, LF
      Returns:
      the number of bytes considered blank and that should be skipped
      Throws:
      IOException - if the stream cannot be read
    • count

      public static int count(String str, char ch)
      Count the number of occurrences of a character within a string.
      Parameters:
      str - haystack
      ch - needle
      Returns:
      the number of occurrences
    • count

      public static int count(String str, String sub, boolean countOverlaps)
      Count the number of occurrences of a sub-string within a string.

      Note: a search for 'aaa' inside 'aaaaaa' would return 4, not 2!

      Parameters:
      str - haystack
      sub - needle (if empty, the method returns 0)
      countOverlaps - if true, a search for 'aaa' inside 'aaaaaa' will return 4 instead of 2
      Returns:
      the number of occurrences
    • like

      public static boolean like(String str, String pat)
      Check whether an input string matches a provided regex pattern. This method is case-sensitive.
      Parameters:
      str - a string
      pat - a regular expression
      Returns:
      true if the pattern matches the entire string
    • likei

      public static boolean likei(String str, String pat)
      Check whether an input string matches a provided regex pattern. This method is case-insensitive.
      Parameters:
      str - a string
      pat - a regular expression
      Returns:
      true if the pattern matches the entire string, ignoring case
    • starMatches

      public static boolean starMatches(String str, String pat)
      Check whether an input string matches a provided pattern using a StarMatcher.
      Parameters:
      str - a string
      pat - a wildcard pattern
      Returns:
      true if the wildcard pattern matches the string
    • findWordBoundaries

      public static int[] findWordBoundaries(String str, int offset)
      Find a word in the string
      Parameters:
      str - a string
      offset - offset in the string, for which the underlying word should be found
      Returns:
      a tuple (start, end) in the string, specifying the word boundaries; if nothing is found, the tuple returned will be (provided_offset, provided_offset)
    • findWordBoundaries

      public static int[] findWordBoundaries(String str, int offset, Predicate<Character> boundaryTester)
      Find a word in the string
      Parameters:
      str - a string
      offset - offset in the string, for which the underlying word should be found
      boundaryTester - optional custom boundary tester; leave null to use the default boundary tester (in that case, characters considered as boundaries are: white-space characters, punctuation characters except dash and underscore)
      Returns:
      a tuple (start, end) in the string, specifying the word boundaries; if nothing is found, the tuple returned will be (provided_offset, provided_offset)