Show simple item record

dc.contributor.authorElyan, Eyad
dc.contributor.authorGaber, Mohamed Medhat
dc.date.accessioned2016-08-09T07:33:55Z
dc.date.available2016-08-09T07:33:55Z
dc.date.issued2016-08-04
dc.identifier.citationELYAN, E. and GABER, M.M. 2017. A genetic algorithm approach to optimising random forests applied to class engineered data. Information sciences [online], 384, pages 220-234. Available from: https://dx.doi.org/10.1016/j.ins.2016.08.007en
dc.identifier.issn0020-0255en
dc.identifier.issn1872-6291en
dc.identifier.urihttp://hdl.handle.net/10059/1555
dc.description.abstractIn numerous applications and especially in the life science domain, examples are labelled at a higher level of granularity. For example, binary classification is dominant in many of these datasets, with the positive class denoting the existence of a particular disease in medical diagnosis applications. Such labelling does not depict the reality of having different categories of the same disease; a fact evidenced in the continuous research in root causes and variations of symptoms in a number of diseases. In a quest to enhance such diagnosis, datasests were decomposed using clustering of each class to reveal hidden categories. We then apply the widely adopted ensemble classification technique Random Forests. Such class decomposition has two advantages: (1) diversification of the input that enhances the ensemble classification; and (2) improving class separability, easing the follow-up classification process. However, to be able to apply Random Forests on such class decomposed data, three main parameters need to be set: number of trees forming the ensemble, number of features to split on at each node, and a vector representing the number of clusters in each class. The large search space for tuning these parameters has motivated the use of Genetic Algorithm to optimise the solution. A thorough experimental study on 22 real datasets was conducted, predominantly in a variety of life science applications. To prove the applicability of the method to other areas of application, the proposed method was tested on a number of datasets from other domains. Three variations of Random Forests including the proposed method as well as a boosting ensemble classifier were used in the experimental study. The results prove the superiority of the proposed method in boosting up the accuracy.en
dc.language.isoenen
dc.publisherElsevieren
dc.rightshttps://creativecommons.org/licenses/by-nc-nd/4.0en
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectRandom forestsen
dc.subjectGenetic algorithmen
dc.subjectClass decompositionen
dc.subjectLife scienceen
dc.titleA genetic algorithm approach to optimising random forests applied to class engineered data.en
dc.typeJournal articlesen
dc.publisher.urihttps://dx.doi.org/10.1016/j.ins.2016.08.007en
dcterms.dateAccepted2016-08-03
dcterms.publicationdate2017-04-01
refterms.accessExceptionNAen
refterms.dateDeposit2016-08-09
refterms.dateEmbargoEnd2017-08-04
refterms.dateFCA2017-08-04
refterms.dateFCD2016-08-09
refterms.dateFreeToDownload2017-08-04
refterms.dateFreeToRead2017-08-04
refterms.dateToSearch2017-08-04
refterms.depositExceptionNAen
refterms.panelBen
refterms.technicalExceptionNAen
refterms.versionAMen
rioxxterms.publicationdate2016-08-04
rioxxterms.typeJournal Article/Reviewen
rioxxterms.versionAM


Files in this item

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by-nc-nd/4.0
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0