|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object FreeCBR.CBR
public class CBR
This is a CBR (Case Base Reasoning) API implementation. It finds the
closest match among cases in a case set. Each case consists of
a predefined set of features. The features are defined by a
name and a datatype where the datatype may be any of String
,
MultiString
, Float
, Int
and Bool
.
The closest match is calculated using weighted euclid distance (??) - like Pythagoras theorem in n dimensions.
The returned "hit percentage" is calculated as
100 * (1 - sqrt(case distance/sum(weights)))
and receives a value
between 0 and 100.
The distance between the search and a case is a floating point number between 0
and 1 and is calculated as:
case distance = weight1 * dist12 +
weight2 * dist22 + .. +
weightn * distn2
where
disti is the distance between the searched feature and the actual case feature. This value is a float between 0 and 1 where 0 means exact hit and 1 means maximum distance.
weighti is the weight for feature number "i". It is an integer >= 0, default = 5.
This means that the total case distance is >= 0 (0 means exact match) and <= sqrt(sumi=1 to n(weight[i])) wheren
is the
number of features searched for.
The distance between the searched feature and the actual case feature
is calculated as:
If case value or searched value is "?" the case feature is disqualified
and not included in the result.
If the search is for only "?":s there are no hits
The "normal" algorithm (let's call it the NormalDistance algorithm) is
distance = min(1, diff(searchedvalue, casevalue)/((maxvalue - minvalue) * infinity_constant))
or in other words
if diff(searchedvalue, casevalue) > (maxvalue - minvalue) * infinity_constant then distance = 1, else
distance = diff(searchedvalue, casevalue)/((maxvalue - minvalue) * infinity_constant)
where "infinity_constant" is a constant that defines what distance is regarded as infinity
The "logarithmic" algorithm (let's call it the LogarithmicDistance algorithm) is
ln(NormalDistance * (e-1) + 1)
If the search is done for:
"=" and fuzzy linear: if exact match then distance = 0 else use the NormalDistance algorithm
"=" and fuzzy logarithmic: if exact match then distance = 0 use the LogarithmicDistance algorithm
"=" and strict: if exact match then distance = 0 otherwise the entire case is disqualified and not included in the result
"=" and flat: if exact match then distance = 0 otherwise distance = 1
"!=" and fuzzy linear: if exact match then distance = 1 else use the NormalDistance algorithm inverted
"!=" and fuzzy logarithmic: if exact match then distance = 1 use the LogarithmicDistance algorithm inverted
"!=" and strict: if exact match then the entire case is disqualified and not included in the result, otherwise distance = 0
"!=" and flat: if exact match then distance = 1 otherwise distance = 0
">=" and fuzzy linear: if searched value >= case value then 0 otherwise use the NormalDistance algorithm
">=" and fuzzy logarithmic: if searched value >= case value then 0 otherwise use the LogarithmicDistance algorithm
">=" and strict: if searched value >= case value then 0 otherwise the entire case is disqualified and not included in the result
">=" and flat: if searched value >= case value then 0 otherwise 1
">" and fuzzy linear: if searched value > case value then 0 otherwise use the NormalDistance algorithm
">" and fuzzy logarithmic: if searched value > case value then 0 otherwise use the LogarithmicDistance algorithm
">" and strict: if searched value > case value then 0 otherwise the entire case is disqualified and not included in the result
">" and flat: if searched value > case value then 0 otherwise 1
"<=" and fuzzy linear: if searched value <= case value then 0 otherwise use the NormalDistance algorithm
"<=" and fuzzy logarithmic: if searched value <= case value then 0 otherwise use the LogarithmicDistance algorithm
"<=" and strict: if searched value <= case value then 0 otherwise the entire case is disqualified and not included in the result
"<=" and flat: if searched value <= case value then 0 otherwise 1
"<" and fuzzy linear: if searched value < case value then 0 otherwise use the NormalDistance algorithm
"<" and fuzzy logarithmic: if searched value < case value then 0 otherwise use the LogarithmicDistance algorithm
"<" and strict: if searched value < case value then 0 otherwise the entire case is disqualified and not included in the result
"<" and flat: if searched value < case value then 0 otherwise 1
"max" and fuzzy linear: the NormalDistance algorithm between current case value and the max case vale
"max" and fuzzy logarithmic: the LogarithmicDistance algorithm between current case value and the max case vale
"max" and strict: if searched value is the max case value then 0, otherwise the entire case is disqualified and not included in the result
"max" and flat: if searched value is the max case value then 0, otherwise 1
"min" and fuzzy linear: the NormalDistance algorithm between current case value and the min case vale
"min" and fuzzy logarithmic: the LogarithmicDistance algorithm between current case value and the min case vale
"min" and strict: if searched value is the min case value then 0, otherwise the entire case is disqualified and not included in the result
"min" and flat: if searched value is the min case value then 0, otherwise 1
Field Summary | |
---|---|
static int |
DEFAULT_WEIGHT
Default weight |
protected int |
INFINITY_CONSTANT
Values further away than this are considered infinity |
static int |
SEARCH_OPTION_INVERTED
Should the search result be inverted? |
static short |
SEARCH_SCALE_FLAT
Search with a "flat" scale - if the hit is not exact it is treated as maximum distance |
static short |
SEARCH_SCALE_FUZZY_LINEAR
Search with a linear scale. |
static short |
SEARCH_SCALE_FUZZY_LOGARITHMIC
Search with a logarithmic scale |
static short |
SEARCH_SCALE_STRICT
Search "strict" - if the hit is not exact the case is not included in the result at all |
static short |
SEARCH_TERM_EQUAL
Search for closest value. |
static short |
SEARCH_TERM_GREATER
Search for greater values. |
static short |
SEARCH_TERM_GREATER_OR_EQUAL
Search for greater or equal values. |
static short |
SEARCH_TERM_LESS
Search for smaller values. |
static short |
SEARCH_TERM_LESS_OR_EQUAL
Search for smaller or equal values. |
static short |
SEARCH_TERM_MAX
Search for maximum values, the higher the better. |
static short |
SEARCH_TERM_MIN
Search for minimum, the lower the better. |
static short |
SEARCH_TERM_NOT_EQUAL
Search for non-equal values. |
Constructor Summary | |
---|---|
CBR()
Constructor that initiates the CBR with no data. |
|
CBR(java.lang.String logfile,
boolean verbose,
boolean silent)
Constructor that initiates the CBR with no data |
|
CBR(java.lang.String datafile,
java.lang.String logfile,
boolean verbose,
boolean silent)
Constructor that initiates the CBR with data |
Method Summary | |
---|---|
void |
addCase(Feature[] features)
Adds a case to the set |
void |
addCase(java.lang.String caseString)
Adds a case to the set |
void |
addFeature(java.lang.String name,
short type)
Adds a feature (column) to the set. |
Feature[] |
editCase(int caseNum,
Feature[] features)
Replaces specified case with another |
Feature[] |
getCase(int caseNum)
Returns the case at the specified position |
java.lang.String |
getDatafile()
Returns the name of the data file currently in use |
java.lang.String |
getFeatureName(int featureNum)
Returns the name of the specified feature |
int |
getFeatureNum(java.lang.String featureName)
Returns the number of the feature that carries the specified name |
short |
getFeatureType(int featureNum)
Returns the datatype of the specified feature |
Feature |
getFeatureValue(int caseNum,
int featureNum)
Returns the specified feature of the specified case |
java.lang.String |
getFeatureValueAX(int caseNum,
int featureNum)
Returns the specified feature of the specified case. |
int |
getINFINITY_CONSTANT()
Returns the current infinity constant |
java.lang.String |
getLogfile()
Returns the name of the current log file |
double |
getMaxFloatValue(int featureNum)
Returns the maximum floating point value of all cases for the specified feature |
long |
getMaxIntValue(int featureNum)
Returns the maximum integer value of all cases for the specified feature |
double |
getMinFloatValue(int featureNum)
Returns the minimum floating point value of all cases for the specified feature |
long |
getMinIntValue(int featureNum)
Returns the minimum integer value of all cases for the specified feature |
int |
getNumCases()
Returns the number of cases in current set |
int |
getNumFeatures()
Returns the number of features that each case has. |
boolean |
getSilent()
Returns the silence state |
java.lang.String[] |
getUsedStringValues(int featureNum)
Returns all of the string values used at specified feature, works for String and MultiString features |
java.lang.String |
getUsedStringValuesAX(int featureNum,
java.lang.String separator)
Returns all of the string values used at specified feature, works for String and MultiString features. |
boolean |
getVerbose()
Returns the verbose state |
void |
initialize(java.lang.String datafile,
java.lang.String logfile)
Initializes the CBR if not already done. |
void |
loadSet(java.lang.String filename)
Loads a case set to memory |
void |
newSet(java.lang.String[] featureNames,
java.lang.String[] featureTypeNames)
Empties the memory - deletes the current set from memory and creates a new empty set with the specified feature names and feature data types |
void |
readData()
Reads the data from the datafile. |
Feature[] |
removeCase(int caseNum)
Removes the specified case from the set |
void |
removeFeature(int featureNumber)
Deletes a feature (column) from the set. |
void |
saveSet(java.lang.String filename,
boolean setDefault)
Saves the entire case set |
CBRResult[] |
search(int[] searchFeatureNumbers,
Feature[] searchValues,
int[] searchWeights,
int[] searchTerms,
int[] searchScales,
int[] searchOptions)
Performs a search for the best match. |
WebResult |
search(java.lang.Object req)
Performs a search for the best match in a "web" way. |
CBRResult[] |
search(java.lang.String[] searchFeatureNames,
java.lang.String[] searchValues,
int[] searchWeights,
int[] searchTerms,
int[] searchScales,
int[] searchOptions)
Performs a search for the best match. |
java.lang.String |
searchAX(java.lang.Object[] searchFeatureNames,
java.lang.Object[] searchValues,
java.lang.Object[] searchWeights,
java.lang.Object[] searchTerms,
java.lang.Object[] searchScales,
java.lang.Object[] searchOptions,
java.lang.String resultSeparator,
java.lang.String caseSeparator)
Performs a search for the best match, used primarily by ActiveX components. |
void |
setDatafile(java.lang.String datafile)
Sets the data file. |
void |
setFeatureName(int featureNum,
java.lang.String newName)
Sets the name of the specified feature to the specified value |
void |
setFeatureType(int featureNum,
short newType)
Sets the datatype of the specified feature |
void |
setFeatureValue(int caseNum,
int featureNum,
java.lang.String value)
Sets the specified feature of the specified case to the specified value |
void |
setINFINITY_CONSTANT(int infinity)
Sets the infinity constant |
void |
setLogfile(java.lang.String logfile)
Sets the log file to the specified path |
void |
setSilent(boolean silent)
Sets the silence state |
void |
setVerbose(boolean verbose)
Sets the verbose state |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected int INFINITY_CONSTANT
public static final int DEFAULT_WEIGHT
public static final short SEARCH_TERM_EQUAL
public static final short SEARCH_TERM_NOT_EQUAL
public static final short SEARCH_TERM_GREATER_OR_EQUAL
public static final short SEARCH_TERM_GREATER
public static final short SEARCH_TERM_LESS_OR_EQUAL
public static final short SEARCH_TERM_LESS
public static final short SEARCH_TERM_MAX
public static final short SEARCH_TERM_MIN
public static final short SEARCH_SCALE_FUZZY_LINEAR
public static final short SEARCH_SCALE_FUZZY_LOGARITHMIC
public static final short SEARCH_SCALE_FLAT
public static final short SEARCH_SCALE_STRICT
public static final int SEARCH_OPTION_INVERTED
Constructor Detail |
---|
public CBR()
public CBR(java.lang.String logfile, boolean verbose, boolean silent)
logfile
- path to the file to write log information to. May be set to "null"
which means no logging.verbose
- if true
then extra verbose information is added to the logfilesilent
- if true
then no information is output to standard errorpublic CBR(java.lang.String datafile, java.lang.String logfile, boolean verbose, boolean silent) throws java.io.IOException
datafile
- path to the datafilelogfile
- path to the file to write log information to. May be set to "null"
which means no logging.verbose
- if true
then extra verbose information is added to the logfilesilent
- if true
then no information is output to standard error
java.io.IOException
- if unable to read fileMethod Detail |
---|
public boolean getVerbose()
public void setVerbose(boolean verbose)
verbose
- the verbose state to assumepublic boolean getSilent()
public void setSilent(boolean silent)
silent
- the silence state to assumepublic java.lang.String getLogfile()
public void setLogfile(java.lang.String logfile)
logfile
- file to use as log filepublic java.lang.String getDatafile()
public void setDatafile(java.lang.String datafile)
initialize()
datafile
- file to use as input data filereadData()
,
initialize(String, String)
public void readData() throws java.io.IOException, FreeCBR.NoDataException
initialize()
instead.
java.io.IOException
- if unable to read from the current data file
NoDataException
- if no fileHandler previously specifiedsetDatafile(String)
,
initialize(String, String)
public void initialize(java.lang.String datafile, java.lang.String logfile) throws java.io.IOException
datafile
- file to use as input data file. If null
then
a new empty case set is created. If the datafile already was
specified with the same value nothing happenslogfile
- file to use for logging. If null
or the log file
already was specified with the same value nothing happens.
java.io.IOException
- if an error occurs when reading the data filesetLogfile(String)
,
setDatafile(String)
,
readData()
public int getINFINITY_CONSTANT()
INFINITY_CONSTANT
public void setINFINITY_CONSTANT(int infinity)
infinity
- integer to use as infinityINFINITY_CONSTANT
public int getNumCases()
public int getNumFeatures()
public void addCase(java.lang.String caseString)
caseString
- a string describing the case to add. Tab separated
string with the feature values in correct order. MultiString
values are separated by semicolons. An example might be
"HP[tab]1000.5[tab]CD-RW;DVD;Scanner"
public void addCase(Feature[] features)
features
- an array of features for the case to addFeature
public Feature[] getCase(int caseNum) throws FreeCBR.NoDataException
caseNum
- the number of the case to retrieve (0-based)
NoDataException
- when no data is in case basepublic Feature[] editCase(int caseNum, Feature[] features)
caseNum
- the number of the case to replacefeatures
- the features of the new case
public Feature[] removeCase(int caseNum)
caseNum
- the number of the case to delete
public void addFeature(java.lang.String name, short type)
name
- the name of the new featuretype
- the type of the new featurepublic Feature getFeatureValue(int caseNum, int featureNum) throws FreeCBR.NoDataException
caseNum
- the number of the case to retrievefeatureNum
- the number of the feature to retrieve
NoDataException
- when no data is readpublic java.lang.String getFeatureValueAX(int caseNum, int featureNum) throws FreeCBR.NoDataException
caseNum
- the number of the case to retrievefeatureNum
- the number of the feature to retrieve
NoDataException
- when no data is readpublic void setFeatureValue(int caseNum, int featureNum, java.lang.String value) throws FreeCBR.NoDataException
caseNum
- the number of the case to changefeatureNum
- the number of the feature to changevalue
- new value to use
NoDataException
- when no data is readpublic java.lang.String getFeatureName(int featureNum) throws FreeCBR.NoDataException
featureNum
- the number of the feature which name to retrieve
NoDataException
- when no data is readpublic void setFeatureName(int featureNum, java.lang.String newName) throws FreeCBR.NoDataException
featureNum
- the number of the feature which name to changenewName
- the new feature name to use
NoDataException
- when no data is readpublic int getFeatureNum(java.lang.String featureName) throws FreeCBR.NoDataException
featureName
- the name of the feature
NoDataException
- when no data is readpublic short getFeatureType(int featureNum) throws FreeCBR.NoDataException
featureNum
- the number of the feature which type to retrieve
NoDataException
- when no data is readFeature
public void setFeatureType(int featureNum, short newType) throws FreeCBR.NoDataException
featureNum
- the number of the feature which type to changenewType
- the feature type
NoDataException
- when no data is readFeature
public void removeFeature(int featureNumber)
featureNumber
- the number of the feature to deletepublic java.lang.String[] getUsedStringValues(int featureNum) throws FreeCBR.IllegalTypeException, FreeCBR.NoDataException
featureNum
- the number of the feature which string values to retrieve
NoDataException
- when no data is read
IllegalTypeException
- if the feature type is not String
or MultiString
public java.lang.String getUsedStringValuesAX(int featureNum, java.lang.String separator) throws FreeCBR.IllegalTypeException, FreeCBR.NoDataException
featureNum
- the number of the feature which string values to retrieveseparator
- the separator to use
NoDataException
- when no data is read
IllegalTypeException
- if the feature type is not String
or MultiString
public long getMinIntValue(int featureNum) throws FreeCBR.IllegalTypeException, FreeCBR.NoDataException
featureNum
- the number of the feature which minimum value is to retrieve
IllegalTypeException
- if feature not of type Int
NoDataException
- when no data is readpublic long getMaxIntValue(int featureNum) throws FreeCBR.IllegalTypeException, FreeCBR.NoDataException
featureNum
- the number of the feature which maximum value is to retrieve
IllegalTypeException
- if feature not of type Int
NoDataException
- when no data is readpublic double getMinFloatValue(int featureNum) throws FreeCBR.IllegalTypeException, FreeCBR.NoDataException
featureNum
- the number of the feature which minimum value is to retrieve
IllegalTypeException
- if feature not of type Float
NoDataException
- when no data is readpublic double getMaxFloatValue(int featureNum) throws FreeCBR.IllegalTypeException, FreeCBR.NoDataException
featureNum
- the number of the feature which maximum value is to retrieve
IllegalTypeException
- if feature not of type Float
NoDataException
- when no data is readpublic void saveSet(java.lang.String filename, boolean setDefault) throws java.io.IOException
filename
- name of the file to save as. If null
then save to current file.setDefault
- sets the specified filename to default if true. Otherwise saves as the
specified file name this time only.
java.io.IOException
- if an error occurs when saving the setpublic void loadSet(java.lang.String filename) throws java.lang.Exception
filename
- name of the file to use.
java.lang.Exception
- if an error occurs when loading the setpublic void newSet(java.lang.String[] featureNames, java.lang.String[] featureTypeNames)
featureNames
- an array of feature names to usefeatureTypeNames
- an array of data type names to use (such as "String", "Float" and so on)Feature
public WebResult search(java.lang.Object req) throws FreeCBR.NoDataException, java.lang.Exception
featX
, weightX
, termX
,
scaleX
and optionX
where X
is
the number of the feature corresponding to this value
An example could beCBRBean.search(req)
where req.getQueryString()
might look like
feat0=Compaq&scale0=0&feat3=1000&weight3=10
req
- the servlet (or jsp) request. Must be javax.servlet.http.HttpServletRequest
NoDataException
- when not enough data is present
java.lang.Exception
public java.lang.String searchAX(java.lang.Object[] searchFeatureNames, java.lang.Object[] searchValues, java.lang.Object[] searchWeights, java.lang.Object[] searchTerms, java.lang.Object[] searchScales, java.lang.Object[] searchOptions, java.lang.String resultSeparator, java.lang.String caseSeparator) throws FreeCBR.NoDataException
searchFeatureNames
- array of names of the features. Must be an array of Strings
.searchValues
- array of strings describing the features to search for. Must be an array of Strings
.searchWeights
- array of weights for the search, valid values are >0
where 0 means don't care. May be set to
null which means alla features are equally important. Must be Null
or an array of Integers
.searchTerms
- array of terms of the search. May be any ofNull
or an array of Integers
.searchScales
- array of the scale to use. May be any of
CBR.SEARCH_SCALE_FUZZY_LINEAR, CBR.SEARCH_SCALE_FUZZY_LOGARITHMIC,
CBR.SEARCH_SCALE_FLAT and CBR.SEARCH_SCALE_STRICT. Default (when set to 0 or null) is
CBR.SEARCH_SCALE_FUZZY_LINEAR. Must be Null
or an array of Integers
.searchOptions
- array of options on how to perform the search. Default is no options. Must be Null
or an array of Integers
.resultSeparator
- string to use to separate the case number and the match percentage in the resultcaseSeparator
- string to use to separate the cases in the result
resultSeparator
is ":" and the caseSeparator
is ";" the result might look like:
"3:33.3;0:25;2:12.5;1:12.5;4:0"
which would mean that the best
match is case number 3 with a search hit of 33.3%, case number 0 has a hit
rate of 25% and case number 1 and 2 have a hit rate of 12.5% each. Case
number 4 has the lowest hit rate, 0%. The cases are always returned in
decreasing hit order.
NoDataException
- when not enough data is presentpublic CBRResult[] search(java.lang.String[] searchFeatureNames, java.lang.String[] searchValues, int[] searchWeights, int[] searchTerms, int[] searchScales, int[] searchOptions) throws FreeCBR.NoDataException
searchFeatureNames
- array of names of the featuressearchValues
- array of strings describing the features to search forsearchWeights
- array of weights for the search, valid values are >0
where 0 means don't care. May be set to
null which means alla features are equally important.searchTerms
- array of terms of the search. May be any ofsearchScales
- array of the scale to use. May be any of
CBR.SEARCH_SCALE_FUZZY_LINEAR, CBR.SEARCH_SCALE_FUZZY_LOGARITHMIC,
CBR.SEARCH_SCALE_FLAT and CBR.SEARCH_SCALE_STRICT. Default (when set to 0 or null) is
CBR.SEARCH_SCALE_FUZZY_LINEAR.searchOptions
- array of options on how to perform the search. Default is no options.
NoDataException
- when not enough data is presentpublic CBRResult[] search(int[] searchFeatureNumbers, Feature[] searchValues, int[] searchWeights, int[] searchTerms, int[] searchScales, int[] searchOptions)
searchFeatureNumbers
- array of types of the featuressearchValues
- array of features to search forsearchWeights
- array of weights for the search, valid values are 0 to 10
where 0 means don't care and 10 means "must match". May be set to
null which means alla features are equally important.searchTerms
- array of terms of the search. May be any ofsearchScales
- array of the scale to use. May be any of
CBR.SEARCH_SCALE_FUZZY_LINEAR, CBR.SEARCH_SCALE_FUZZY_LOGARITHMIC,
CBR.SEARCH_SCALE_FLAT and CBR.SEARCH_SCALE_STRICT. Default (when set to 0 or null) is
CBR.SEARCH_SCALE_FUZZY_LINEAR.searchOptions
- array of options on how to perform the search. Default is no options.
search(String[], String[], int[], int[], int[], int[])
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |