Class WildcardStringParser
- java.lang.Object
-
- com.twelvemonkeys.util.regex.WildcardStringParser
-
@Deprecated public class WildcardStringParser extends java.lang.Object
Deprecated.Will probably be removed in the near futureThis class parses arbitrary strings against a wildcard string mask provided. The wildcard characters are '*' and '?'.The string masks provided are treated as case sensitive.
Null-valued string masks as well as null valued strings to be parsed, will lead to rejection.This class is custom designed for wildcard string parsing and is several times faster than the implementation based on the Jakarta Regexp package.
This task is performed based on regular expression techniques. The possibilities of string generation with the well-known wildcard characters stated above, represent a subset of the possibilities of string generation with regular expressions.
The '*' corresponds to ([Union of all characters in the alphabet])*
The '?' corresponds to ([Union of all characters in the alphabet])
These expressions are not suited for textual representation at all, I must say. Is there any math tags included in HTML?The complete meta-language for regular expressions are much larger. This fact makes it fairly straightforward to build data structures for parsing because the amount of rules of building these structures are quite limited, as stated below.
To bring this over to mathematical terms: The parser ia a nondeterministic finite automaton (latin) representing the grammar which is stated by the string mask. The language accepted by this automaton is the set of all strings accepted by this automaton.
The formal automaton quintuple consists of:- A finite set of states, depending on the wildcard string mask. For each character in the mask a state representing that character is created. The number of states therefore coincides with the length of the mask.
- An alphabet consisting of all legal filename characters - included the two wildcard characters '*' and '?'. This alphabet is hard-coded in this class. It contains {a .. �}, {A .. �}, {0 .. 9}, {.}, {_}, {-}, {*} and {?}.
- A finite set of initial states, here only consisting of the state corresponding to the first character in the mask.
- A finite set of final states, here only consisting of the state corresponding to the last character in the mask.
- A transition relation that is a finite set of transitions satisfying some formal rules.
This implementation on the other hand, only uses ad-hoc rules which start with an initial setup of the states as a sequence according to the string mask.
Additionally, the following rules completes the building of the automaton:- If the next state represents the same character as the next character in the string to test - go to this next state.
- If the next state represents '*' - go to this next state.
- If the next state represents '?' - go to this next state.
- If a '*' is followed by one or more '?', the last of these '?' state counts as a '*' state. Some extra checks regarding the number of characters read must be imposed if this is the case...
- If the next character in the string to test does not coincide with the next state - go to the last state representing '*'. If there are none - rejection.
- If there are no subsequent state (final state) and the state represents '*' - acceptance.
- If there are no subsequent state (final state) and the end of the string to test is reached - acceptance.
Disclaimer: This class does not build a finite automaton according to formal mathematical rules. The proper way of implementation should be finding the complete set of transition relations, decomposing these into rules accepted by a deterministic finite automaton and finally build this automaton to be used for string parsing. Instead, this class is ad-hoc implemented based on the informal transition rules stated above. Therefore the correctness cannot be guaranteed before extensive testing has been imposed on this class... anyway, I think I have succeeded. Parsing faults must be reported to the author.
Examples of usage:
This example will return "Accepted!".WildcardStringParser parser = new WildcardStringParser("*_28????.jp*"); if (parser.parseString("gupu_280915.jpg")) { System.out.println("Accepted!"); } else { System.out.println("Not accepted!"); }
Theories and concepts are based on the book Elements of the Theory of Computation, by Harry l. Lewis and Christos H. Papadimitriou, (c) 1981 by Prentice Hall.
- Author:
- Eirik Torske
-
-
Field Summary
Fields Modifier and Type Field Description static char[]
ALPHABET
Deprecated.Field ALPHABETstatic char
FREE_PASS_CHARACTER
Deprecated.Field FREE_PASS_CHARACTERstatic char
FREE_RANGE_CHARACTER
Deprecated.Field FREE_RANGE_CHARACTER
-
Constructor Summary
Constructors Constructor Description WildcardStringParser(java.lang.String pStringMask)
Deprecated.Creates a wildcard string parser.WildcardStringParser(java.lang.String pStringMask, boolean pDebugging)
Deprecated.Creates a wildcard string parser.WildcardStringParser(java.lang.String pStringMask, boolean pDebugging, java.io.PrintStream pDebuggingPrintStream)
Deprecated.Creates a wildcard string parser.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected java.lang.Object
clone()
Deprecated.boolean
equals(java.lang.Object pObject)
Deprecated.Method equalsprotected void
finalize()
Deprecated.java.lang.String
getStringMask()
Deprecated.Gets the string mask that was used when building the parser atomaton.int
hashCode()
Deprecated.Method hashCodestatic boolean
isFreePassCharacter(char pCharToCheck)
Deprecated.Tests if a certain character is the designated "free-pass" character ('?').static boolean
isFreeRangeCharacter(char pCharToCheck)
Deprecated.Tests if a certain character is the designated "free-range" character ('*').static boolean
isInAlphabet(char pCharToCheck)
Deprecated.Tests if a certain character is a valid character in the alphabet that is applying for this automaton.static boolean
isWildcardCharacter(char pCharToCheck)
Deprecated.Tests if a certain character is a wildcard character ('*' or '?').boolean
parseString(java.lang.String pStringToParse)
Deprecated.Parses a string according to the rules stated above.java.lang.String
toString()
Deprecated.Method toString
-
-
-
Field Detail
-
ALPHABET
public static final char[] ALPHABET
Deprecated.Field ALPHABET
-
FREE_RANGE_CHARACTER
public static final char FREE_RANGE_CHARACTER
Deprecated.Field FREE_RANGE_CHARACTER- See Also:
- Constant Field Values
-
FREE_PASS_CHARACTER
public static final char FREE_PASS_CHARACTER
Deprecated.Field FREE_PASS_CHARACTER- See Also:
- Constant Field Values
-
-
Constructor Detail
-
WildcardStringParser
public WildcardStringParser(java.lang.String pStringMask)
Deprecated.Creates a wildcard string parser.- Parameters:
pStringMask
- the wildcard string mask.
-
WildcardStringParser
public WildcardStringParser(java.lang.String pStringMask, boolean pDebugging)
Deprecated.Creates a wildcard string parser.- Parameters:
pStringMask
- the wildcard string mask.pDebugging
-true
will cause debug messages to be emitted toSystem.out
.
-
WildcardStringParser
public WildcardStringParser(java.lang.String pStringMask, boolean pDebugging, java.io.PrintStream pDebuggingPrintStream)
Deprecated.Creates a wildcard string parser.- Parameters:
pStringMask
- the wildcard string mask.pDebugging
-true
will cause debug messages to be emitted.pDebuggingPrintStream
- thejava.io.PrintStream
to which the debug messages will be emitted.
-
-
Method Detail
-
isInAlphabet
public static boolean isInAlphabet(char pCharToCheck)
Deprecated.Tests if a certain character is a valid character in the alphabet that is applying for this automaton.
-
isFreeRangeCharacter
public static boolean isFreeRangeCharacter(char pCharToCheck)
Deprecated.Tests if a certain character is the designated "free-range" character ('*').
-
isFreePassCharacter
public static boolean isFreePassCharacter(char pCharToCheck)
Deprecated.Tests if a certain character is the designated "free-pass" character ('?').
-
isWildcardCharacter
public static boolean isWildcardCharacter(char pCharToCheck)
Deprecated.Tests if a certain character is a wildcard character ('*' or '?').
-
getStringMask
public java.lang.String getStringMask()
Deprecated.Gets the string mask that was used when building the parser atomaton.- Returns:
- the string mask used for building the parser automaton.
-
parseString
public boolean parseString(java.lang.String pStringToParse)
Deprecated.Parses a string according to the rules stated above.- Parameters:
pStringToParse
- the string to parse.- Returns:
true
if and only if the string are accepted by the automaton.
-
toString
public java.lang.String toString()
Deprecated.Method toString- Overrides:
toString
in classjava.lang.Object
- Returns:
-
equals
public boolean equals(java.lang.Object pObject)
Deprecated.Method equals- Overrides:
equals
in classjava.lang.Object
- Parameters:
pObject
-- Returns:
-
hashCode
public int hashCode()
Deprecated.Method hashCode- Overrides:
hashCode
in classjava.lang.Object
- Returns:
-
clone
protected java.lang.Object clone() throws java.lang.CloneNotSupportedException
Deprecated.- Overrides:
clone
in classjava.lang.Object
- Throws:
java.lang.CloneNotSupportedException
-
finalize
protected void finalize() throws java.lang.Throwable
Deprecated.- Overrides:
finalize
in classjava.lang.Object
- Throws:
java.lang.Throwable
-
-