Image analysis of HDL molecules for risk estimation of coronary heart disease and decision support
A number of learning schemes has been applied and the results recorded. Counting all the different parameter sets as different learners a total of over 2800 learners was applied to each of the data sets. For the data set bayesw 79 learners had an accuracy over 80%. Most of them where of the SVM f...
Κύριος συγγραφέας: | |
---|---|
Άλλοι συγγραφείς: | |
Μορφή: | Thesis |
Γλώσσα: | Greek |
Έκδοση: |
2016
|
Θέματα: | |
Διαθέσιμο Online: | http://hdl.handle.net/10889/9244 |
id |
nemertes-10889-9244 |
---|---|
record_format |
dspace |
institution |
UPatras |
collection |
Nemertes |
language |
Greek |
topic |
High Density Lipoproteins (HDL) Decisions support Coronary heart disease Image analysis Risk estimation Statistical learning Machine learning Στεφανιαία καρδιακή νόσος Υποβοήθηση διαγνωστικής Καρδιακές παθήσεις Ανάλυση εικόνας Εκτίμηση ρίσκου Στατιστική μάθηση 616.123 075 072 4 |
spellingShingle |
High Density Lipoproteins (HDL) Decisions support Coronary heart disease Image analysis Risk estimation Statistical learning Machine learning Στεφανιαία καρδιακή νόσος Υποβοήθηση διαγνωστικής Καρδιακές παθήσεις Ανάλυση εικόνας Εκτίμηση ρίσκου Στατιστική μάθηση 616.123 075 072 4 Kraus, Benedikt Image analysis of HDL molecules for risk estimation of coronary heart disease and decision support |
description |
A number of learning schemes has been applied and the results recorded. Counting all the different
parameter sets as different learners a total of over 2800 learners was applied to each of the data sets.
For the data set bayesw 79 learners had an accuracy over 80%. Most of them where of the SVM
family (POLY: 35, PUK: 38, RBF: 1) whilst five where KNN classifiers.
For the data set matchedw 248 learners yield accuracies over 80%. They belong to the families SVM
(POLY 75 PUK: 75, RBF: 4), decision trees (J48: 7), rules (JRIP : 71, PART: 8) and KNN (8).
For the data set bayes a total of 23 learners yield an efficiency over 90%. All of them belong to category
of kernel machines and specifically to the polynomial kernel or to the Pearson universal function
kernel.
For the data set matched 19 learners yield an efficiency over 90% and here additionally to the PUK
and POLY schemes the JRIP algorithm had a high accuracy.
If one considers the efficiencies over 80% there are 200 learners of all except three types (see table 6)
for the matched set and 109 for the bayes (all kernel machines and k nearest neighbours). This goes
to show that there is room for improvement in the object detection for the HDL particles, to improve
classification accuracy for CHD risk. Another reason why improvement in the HDL particle detection
is crucial is the inclusion of the feature n_hoo (number of human observer objects) in many subsets
(see tables 13 and 12 in the appendix). The inclusion of this feature is to be seen as problematic as it
would generally not be available in an automated application for decision support and thus one cannot
rely on it. However by improving the efficiency of HDL detection one can approximate the number
in the feature n_ado and thus substitute it.
Furthermore the classification of objects in the images was done using Matlab as the weka data mining
suite proofed to restrictive for further processing of the image after classification. On the other
hand Matlab does not have the versatility of weka when it comes to machine learning and thus it
can reasonably be assumed that the classification method used for objects in the binary images is not
optimal. This might be remedied by using a more integrated approach for both image processing and
analysis and machine learning.
So far classification of CHD risk was undertaken using only the EM image of the HDL particles.
No clinical information like BMI, smoking, age, sex, etc. was used. Using clinical information might
improve the results of classification.
It should also be mentioned that the domains of high efficiency for the different learners have not yet
been examined. This means that if there should be diversity among a sufficiently large number of
learners a combination of them is to be considered to improve classification efficiency and make it
more robust.
The best results for each learner respectively for the different data sets are presented in tables 5,6,7
and 8. A summary of frequently appearing features in the optimal subsets as determined by univariate
best first search for the criterion of classification accuracy as determined by tenfold cross
validation is given in tables 9 and 10. That the frequencies are not quite congruent for is to be expected
as the datasets are quite different due to the different methods of selecting the HDL particles,
which are the basis for the following classification. It is worth noting that for the matched dataset
in which it can be expected that the number of HDL particles is reflected most accurately, features
that depend on that number (number_hoo, number_ado and hdl_concentration) seem to play a minor
role in comparison with other features that might depend on a HDL quality like av_Eccentricity,
std_unfiltered_min_intensity or std_Extent. If one was to design a automated system for decision support for similar images using these results
and wanted to pinpoint the ’best’ classifier it would be the SVM classifier utilizing the Pearson
Universal Kernel that is listed in 5. It has the highest classification accuracy and its feature subset 12g
is with seven features just small enough to provide a Sample Feature Ratio SFR = 22=7 greater than
three. However it has to be kept in mind that the chosen filtering method of the Laplacian of Gaussian
filter with the specific parameters has become part of the dataset used for training of this classifier.
And thus using one specific classifier with one specific dataset presupposes using the filtering method
used to train the classifier.
The methods developed in this study are far from being efficient or elegant however judging by the
results it seems feasible to utilize EM images of HDL particles for risk estimation of coronary heart
disease and further study seems to be justified. |
author2 |
Σακελλαρόπουλος, Γεώργιος |
author_facet |
Σακελλαρόπουλος, Γεώργιος Kraus, Benedikt |
format |
Thesis |
author |
Kraus, Benedikt |
author_sort |
Kraus, Benedikt |
title |
Image analysis of HDL molecules for risk estimation of coronary heart disease and decision support |
title_short |
Image analysis of HDL molecules for risk estimation of coronary heart disease and decision support |
title_full |
Image analysis of HDL molecules for risk estimation of coronary heart disease and decision support |
title_fullStr |
Image analysis of HDL molecules for risk estimation of coronary heart disease and decision support |
title_full_unstemmed |
Image analysis of HDL molecules for risk estimation of coronary heart disease and decision support |
title_sort |
image analysis of hdl molecules for risk estimation of coronary heart disease and decision support |
publishDate |
2016 |
url |
http://hdl.handle.net/10889/9244 |
work_keys_str_mv |
AT krausbenedikt imageanalysisofhdlmoleculesforriskestimationofcoronaryheartdiseaseanddecisionsupport |
_version_ |
1771297330727747584 |
spelling |
nemertes-10889-92442022-09-05T20:36:36Z Image analysis of HDL molecules for risk estimation of coronary heart disease and decision support Kraus, Benedikt Σακελλαρόπουλος, Γεώργιος Κάβουρας, Διονύσιος Κυπραίος, Κυριάκος High Density Lipoproteins (HDL) Decisions support Coronary heart disease Image analysis Risk estimation Statistical learning Machine learning Στεφανιαία καρδιακή νόσος Υποβοήθηση διαγνωστικής Καρδιακές παθήσεις Ανάλυση εικόνας Εκτίμηση ρίσκου Στατιστική μάθηση 616.123 075 072 4 A number of learning schemes has been applied and the results recorded. Counting all the different parameter sets as different learners a total of over 2800 learners was applied to each of the data sets. For the data set bayesw 79 learners had an accuracy over 80%. Most of them where of the SVM family (POLY: 35, PUK: 38, RBF: 1) whilst five where KNN classifiers. For the data set matchedw 248 learners yield accuracies over 80%. They belong to the families SVM (POLY 75 PUK: 75, RBF: 4), decision trees (J48: 7), rules (JRIP : 71, PART: 8) and KNN (8). For the data set bayes a total of 23 learners yield an efficiency over 90%. All of them belong to category of kernel machines and specifically to the polynomial kernel or to the Pearson universal function kernel. For the data set matched 19 learners yield an efficiency over 90% and here additionally to the PUK and POLY schemes the JRIP algorithm had a high accuracy. If one considers the efficiencies over 80% there are 200 learners of all except three types (see table 6) for the matched set and 109 for the bayes (all kernel machines and k nearest neighbours). This goes to show that there is room for improvement in the object detection for the HDL particles, to improve classification accuracy for CHD risk. Another reason why improvement in the HDL particle detection is crucial is the inclusion of the feature n_hoo (number of human observer objects) in many subsets (see tables 13 and 12 in the appendix). The inclusion of this feature is to be seen as problematic as it would generally not be available in an automated application for decision support and thus one cannot rely on it. However by improving the efficiency of HDL detection one can approximate the number in the feature n_ado and thus substitute it. Furthermore the classification of objects in the images was done using Matlab as the weka data mining suite proofed to restrictive for further processing of the image after classification. On the other hand Matlab does not have the versatility of weka when it comes to machine learning and thus it can reasonably be assumed that the classification method used for objects in the binary images is not optimal. This might be remedied by using a more integrated approach for both image processing and analysis and machine learning. So far classification of CHD risk was undertaken using only the EM image of the HDL particles. No clinical information like BMI, smoking, age, sex, etc. was used. Using clinical information might improve the results of classification. It should also be mentioned that the domains of high efficiency for the different learners have not yet been examined. This means that if there should be diversity among a sufficiently large number of learners a combination of them is to be considered to improve classification efficiency and make it more robust. The best results for each learner respectively for the different data sets are presented in tables 5,6,7 and 8. A summary of frequently appearing features in the optimal subsets as determined by univariate best first search for the criterion of classification accuracy as determined by tenfold cross validation is given in tables 9 and 10. That the frequencies are not quite congruent for is to be expected as the datasets are quite different due to the different methods of selecting the HDL particles, which are the basis for the following classification. It is worth noting that for the matched dataset in which it can be expected that the number of HDL particles is reflected most accurately, features that depend on that number (number_hoo, number_ado and hdl_concentration) seem to play a minor role in comparison with other features that might depend on a HDL quality like av_Eccentricity, std_unfiltered_min_intensity or std_Extent. If one was to design a automated system for decision support for similar images using these results and wanted to pinpoint the ’best’ classifier it would be the SVM classifier utilizing the Pearson Universal Kernel that is listed in 5. It has the highest classification accuracy and its feature subset 12g is with seven features just small enough to provide a Sample Feature Ratio SFR = 22=7 greater than three. However it has to be kept in mind that the chosen filtering method of the Laplacian of Gaussian filter with the specific parameters has become part of the dataset used for training of this classifier. And thus using one specific classifier with one specific dataset presupposes using the filtering method used to train the classifier. The methods developed in this study are far from being efficient or elegant however judging by the results it seems feasible to utilize EM images of HDL particles for risk estimation of coronary heart disease and further study seems to be justified. Αυτόματη ανάλυση εικόνων ηλεκτρονικής μικροσκοπίας από σωματίδια HDL σε δείγματα αίματος νέων ανθρώπων που επιβίοσαν καρδιακό έμφραγμα. Η ανάλυση αυτή επιτρέπει την δημιουργία μεταβλητών για ταξινόμηση καρδιακού ρίσκου, η οποία πραγματοποιείται. 2016-05-20T13:24:06Z 2016-05-20T13:24:06Z 2016 Thesis http://hdl.handle.net/10889/9244 gr 0 application/pdf |