Complexity modelling for case knowledge maintenance in case-based reasoning.
MetadataShow full item record
Case-based reasoning solves new problems by re-using the solutions of previously solved similar problems and is popular because many of the knowledge engineering demands of conventional knowledge-based systems are removed. The content of the case knowledge container is critical to the performance of case-based classification systems. However, the knowledge engineer is given little support in the selection of suitable techniques to maintain and monitor the case base. This research investigates the coverage, competence and problem-solving capacity of case knowledge with the aim of developing techniques to model and maintain the case base. We present a novel technique that creates a model of the case base by measuring the uncertainty in local areas of the problem space based on the local mix of solutions present. The model provides an insight into the structure of a case base by means of a complexity profile that can assist maintenance decision-making and provide a benchmark to assess future changes to the case base. The distribution of cases in the case base is critical to the performance of a case-based reasoning system. We argue that classification boundaries represent important regions of the problem space and develop two complexity-guided algorithms which use boundary identification techniques to actively discover cases close to boundaries. We introduce a complexity-guided redundancy reduction algorithm which uses a case complexity threshold to retain cases close to boundaries and delete cases that form single class clusters. The algorithm offers control over the balance between maintaining competence and reducing case base size. The performance of a case-based reasoning system relies on the integrity of its case base but in real life applications the available data invariably contains erroneous, noisy cases. Automated removal of these noisy cases can improve system accuracy. In addition, error rates can often be reduced by removing cases to give smoother decision boundaries between classes. We show that the optimal level of boundary smoothing is domain dependent and, therefore, our approach to error reduction reacts to the characteristics of the domain by setting an appropriate level of smoothing. We introduce a novel algorithm which identifies and removes both noisy and boundary cases with the aid of a local distance ratio. A prototype interface has been developed that shows how the modelling and maintenance approaches can be used in practice in an interactive manner. The interface allows the knowledge engineer to make informed maintenance choices without the need for extensive evaluation effort while, at the same time, retaining control over the process. One of the strengths of our approach is in applying a consistent, integrated method to case base maintenance to provide a transparent process that gives a degree of explanation.