Digital signal processing and artificial intelligence for the automated classification of food allergy

Niall Twomey

AS a by-product of the ‘information revolution’ which is currently unfolding, life- times of man (and indeed computer) hours are being allocated for the automated and intelligent interpretation of data. This is particularly true in medical and clinical settings, where research into machine-assisted diagnosis of physiological conditions gains momentum daily. Of the conditions which have been addressed, however, automated classification of allergy has not been investigated, even though the numbers of allergic persons are rising, and undiagnosed allergies are most likely to elicit fatal consequences. On the basis of the observations of allergists who conduct oral food challenges (OFCs), activity-based analyses of allergy tests were performed. Algorithms were investigated and validated by a pilot study which verified that accelerometer-based inquiry of human movements is particularly well-suited for objective appraisal of activity. However, when these analyses were applied to OFCs, accelerometer-based investigations were found to provide very poor separation between allergic and non-allergic persons, and it was concluded that the avenues explored in this thesis are inadequate for the classification of allergy. Heart rate variability (HRV) analysis is known to provide very significant diagnostic information for many conditions. Owing to this, electrocardiograms (ECGs) were recorded during OFCs for the purpose of assessing the effect that allergy induces on HRV features. It was found that with appropriate analysis, excellent separation between allergic and non- allergic subjects can be obtained. These results were, however, obtained with manual QRS annotations, and these are not a viable methodology for real-time diagnostic applications. Even so, this was the first work which has categorically correlated changes in HRV features to the onset of allergic events, and manual annotations yield undeniable affirmation of this. Fostered by the successful results which were obtained with manual classifications, automatic QRS detection algorithms were investigated to facilitate the fully automated classification of allergy. The results which were obtained by this process are very promising. Most importantly, the work that is presented in this thesis did not obtain any false positive classifications. This is a most desirable result for OFC classification, as it allows complete confidence to be attributed to classifications of allergy. Furthermore, these results could be particularly advantageous in clinical settings, as machine-based classification can detect the onset of allergy which can allow for early termination of OFCs. Consequently, machine-based monitoring of OFCs has in this work been shown to possess the capacity to significantly and safely advance the current state of clinical art of allergy diagnosis.

Title Digital signal processing and artificial intelligence for the automated classification of food allergy Author(s) Twomey, Niall Joseph Publication date 2013 Original citation Twomey, N.J. 2013. Digital signal processing and artificial intelligence for the automated classification of food allergy. PhD Thesis, University College Cork. Type of publication Doctoral thesis Rights © 2013, Niall J. Twomey http://creativecommons.org/licenses/by-nc-nd/3.0/ Embargo information No embargo required Item downloaded from http://hdl.handle.net/10468/1236 Downloaded on 2015-07-09T15:41:00Z Digital Signal Processing and Artificial Intelligence for the Automated Classification of Food Allergy Niall Twomey A thesis submitted to the National University of Ireland in fulfillment of the requirements for the Degree of Doctor of Philosophy Supervisor: Dr. William P. Marnane Head of Department: Prof. Nabeel Riza Department of Electrical and Electronic Engineering, National University of Ireland, Cork. Abstract A S a by-product of the ‘information revolution’ which is currently unfolding, lifetimes of man (and indeed computer) hours are being allocated for the automated and intelligent interpretation of data. This is particularly true in medical and clinical settings, where research into machine-assisted diagnosis of physiological conditions gains momentum daily. Of the conditions which have been addressed, however, automated classification of allergy has not been investigated, even though the numbers of allergic persons are rising, and undiagnosed allergies are most likely to elicit fatal consequences. On the basis of the observations of allergists who conduct oral food challenges (OFCs), activity-based analyses of allergy tests were performed. Algorithms were investigated and validated by a pilot study which verified that accelerometer-based inquiry of human movements is particularly well-suited for objective appraisal of activity. However, when these analyses were applied to OFCs, accelerometer-based investigations were found to provide very poor separation between allergic and non-allergic persons, and it was concluded that the avenues explored in this thesis are inadequate for the classification of allergy. Heart rate variability (HRV) analysis is known to provide very significant diagnostic information for many conditions. Owing to this, electrocardiograms (ECGs) were recorded during OFCs for the purpose of assessing the effect that allergy induces on HRV features. It was found that with appropriate analysis, excellent separation between allergic and nonallergic subjects can be obtained. These results were, however, obtained with manual QRS annotations, and these are not a viable methodology for real-time diagnostic applications. Even so, this was the first work which has categorically correlated changes in HRV features to the onset of allergic events, and manual annotations yield undeniable affirmation of this. Niall Twomey Chapter 0: Fostered by the successful results which were obtained with manual classifications, automatic QRS detection algorithms were investigated to facilitate the fully automated classification of allergy. The results which were obtained by this process are very promising. Most importantly, the work that is presented in this thesis did not obtain any false positive classifications. This is a most desirable result for OFC classification, as it allows complete confidence to be attributed to classifications of allergy. Furthermore, these results could be particularly advantageous in clinical settings, as machine-based classification can detect the onset of allergy which can allow for early termination of OFCs. Consequently, machine-based monitoring of OFCs has in this work been shown to possess the capacity to significantly and safely advance the current state of clinical art of allergy diagnosis. ii Acknowledgements I must first and foremost thank and acknowledge my supervisor, Dr. Liam Marnane, for taking me on in this project and for his advice during my incarceration in UCC. I would also like to thank the Irish Research Council, the Tril Centre and Intel for funding this research. Next, I would like to thank my internal and external examiners, Dr. Bill Wright and Dr. Fernando Schlindwein, for a surprisingly enjoyable viva. I would like to thank Prof. Jonathan O’B Hourihane, Deirdre Daly and Claire Cullinane from the Department of Paediatrics and Child Health for their patience and support during my time recording the data which forms the basis of this thesis. Your feedback, patient answers and advice was very much appreciated! I would also like to extend my thanks to Stephen Faul and Andrey Temko for their guidance in all aspects of signal processing and machine learning, and for hitting me over the ear when I deserved it. I would like to also thank all of the postgraduate students that I came to know during the course of my research; thank you all for your company, friendship and especially for the caffeine and card game indulgences during the years; they kept me relatively sane. iii Niall Twomey Chapter 0: Finally, I must also thank my friends and family for your constant support and throughout this time, specifically Jen and Chengde. Sincerely, I appreciate your consistent encouragement, sporadic berating and occasional willingness to seem interested in my research more than I can say. iv Statement of Originality I hereby declare that this submission is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of a university or other institute of higher learning, except where due acknowledgement is made in the text. Niall Twomey September, 2013 v Contents 1 Allergy and Allergic Reactions 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Allergy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 1.2.2 1.2.3 Varieties and symptoms of allergy . . . . . . . . . . . . . . . . . . . . 2 1.2.1.1 Variety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1.2 Symptoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Management and treatment of allergic reactions . . . . . . . . . . . . 4 1.2.2.1 Mild reactions . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2.2 Severe reactions . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Risk factors, and quality of life . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3.1 Risk and protection factors . . . . . . . . . . . . . . . . . . . 6 1.2.3.2 Quality of life . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 vi Niall Twomey Section CONTENTS 1.3 Requirement for clinical diagnosis of allergy . . . . . . . . . . . . . . . . . . . 8 1.4 Diagnosis of allergy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.1 Blood testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.2 Skin testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4.3 Challenge testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4.3.1 Preliminary tests . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.3.2 Checkup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.3.3 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.3.4 Failure protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.3.5 Supplementary stages . . . . . . . . . . . . . . . . . . . . . . 14 1.5 Challenge-testing clinical experience . . . . . . . . . . . . . . . . . . . . . . . 15 1.6 Machine-assisted classification of allergy . . . . . . . . . . . . . . . . . . . . . 16 1.7 Layout of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Algorithms, methods and data collection 19 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Activity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.1 Introduction to inertial measurement . . . . . . . . . . . . . . . . . . . 20 2.2.2 Applications of inertial sensing . . . . . . . . . . . . . . . . . . . . . . 22 vii Niall Twomey Chapter 0: 2.2.2.1 2.2.3 Activity recognition . . . . . . . . . . . . . . . . . . . . . . . 23 Activity-based analysis of OFC . . . . . . . . . . . . . . . . . . . . . . 25 2.3 Heart rate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 History and introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Applications of HRV analysis . . . . . . . . . . . . . . . . . . . . . . . 28 2.4 Machine learning and classification . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.1 Introduction to classification . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.2 Background to machine learning . . . . . . . . . . . . . . . . . . . . . 30 2.4.3 Novelty detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.4 2.4.3.1 One class SVMs and GMMs . . . . . . . . . . . . . . . . . . . 34 2.4.3.2 Example of novelty detection . . . . . . . . . . . . . . . . . . 35 Applications of novelty detection . . . . . . . . . . . . . . . . . . . . . 37 2.5 Food challenge data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.5.1 Recording platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.5.2 Integration with oral food challenge . . . . . . . . . . . . . . . . . . . 39 2.5.3 Other data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5.4 Data recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 viii Niall Twomey Section CONTENTS 3 Accelerometer-based analysis of oral food challenges 44 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Accelerometer-based activity analysis . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.1 Activity metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Energy expenditure estimation algorithms . . . . . . . . . . . . . . . . 49 3.2.2.1 A note on the energy expenditure estimation algorithms . . 50 3.2.2.2 Bouten et al . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.2.3 Chen et al . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2.2.4 Crouter et al . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3 Energy expenditure validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.2.1 Codeword conversion . . . . . . . . . . . . . . . . . . . . . . 59 3.3.2.2 Breath and acceleration synchronisation . . . . . . . . . . . . 62 3.3.2.3 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3.5 Discussion on energy expenditure estimation algorithms . . . . . . . 66 ix Niall Twomey 3.3.6 Chapter 0: Conclusion on energy expenditure estimation . . . . . . . . . . . . . . 67 3.4 Accelerometer-based analysis during OFCs . . . . . . . . . . . . . . . . . . . . 67 3.5 Probability density functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4 ECG-based analysis of OFCs 73 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2 ECG and HRV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.1 ECG recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 HRV feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.2 Epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.3 Epoch overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4 Feature normalisation/calibration . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.5 HRV feature categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.5.1 Feature categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.5.2 Frequency domain feature analysis . . . . . . . . . . . . . . . . . . . . 80 4.5.3 Resampling + FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 x Niall Twomey Section CONTENTS 4.5.4 Direct PSD estimation of HRV . . . . . . . . . . . . . . . . . . . . . . . 82 4.5.5 Comparison of HRV frequency analysis methods . . . . . . . . . . . . 85 4.6 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.6.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.6.2 Time domain features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.6.2.1 Mean heart rate . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.6.2.2 Standard deviation . . . . . . . . . . . . . . . . . . . . . . . . 88 4.6.2.3 Coefficient of variation . . . . . . . . . . . . . . . . . . . . . . 88 4.6.2.4 RMSSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.6.2.5 NN/PNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.6.2.6 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.6.3 Sequential domain features . . . . . . . . . . . . . . . . . . . . . . . . 94 4.6.4 Poincaré features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.6.5 Frequency domain features . . . . . . . . . . . . . . . . . . . . . . . . 98 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5 Machine learning for allergy classification 102 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2 Novelty detection for OFC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 xi Niall Twomey Chapter 0: 5.2.1 Choice of classification routine . . . . . . . . . . . . . . . . . . . . . . 103 5.2.2 Feature transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.2.3 Gaussian mixture models . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2.4 5.2.3.1 k-means clustering . . . . . . . . . . . . . . . . . . . . . . . . 108 5.2.3.2 Expectation maximisation . . . . . . . . . . . . . . . . . . . . 109 Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.3 Classification procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4 Classifier model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4.1 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4.2 Parameter selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.4.2.1 Search space . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.4.2.2 Cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.5 Classification metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.5.1 Sensitivity/specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.5.2 Time gain parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.5.2.1 Time gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.5.2.2 Doses saved . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.5.2.3 Activation percentage . . . . . . . . . . . . . . . . . . . . . . 121 xii Niall Twomey Section CONTENTS 5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.6.1 A brief note on the structure of these results . . . . . . . . . . . . . . . 122 5.6.2 Results obtained at epoch length of 60 seconds . . . . . . . . . . . . . 123 5.6.3 Overall results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.6.4 Inconsistent classification at different epoch lengths . . . . . . . . . . 125 5.6.5 5.6.4.1 Short-duration signatures of allergy . . . . . . . . . . . . . . 125 5.6.4.2 Longer signatures of allergy . . . . . . . . . . . . . . . . . . . 127 5.6.4.3 Tolerance to non-allergic variances . . . . . . . . . . . . . . . 127 Boosted allergy classification . . . . . . . . . . . . . . . . . . . . . . . 131 5.6.5.1 Sensitivity/specificity . . . . . . . . . . . . . . . . . . . . . . 131 5.6.5.2 Time gain parameters . . . . . . . . . . . . . . . . . . . . . . 133 5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.7.1 Specificity of OFC classification . . . . . . . . . . . . . . . . . . . . . . 135 5.7.2 Robust classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.7.3 Parameter selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.7.4 5.7.3.1 Importance of correct parameter selection . . . . . . . . . . 138 5.7.3.2 Alternative parameter selection . . . . . . . . . . . . . . . . . 139 Role of classification in OFCs . . . . . . . . . . . . . . . . . . . . . . . 140 xiii Niall Twomey Chapter 0: 6 Automatic QRS detection 142 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.2 QRS detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.2.1 QRS validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2.2 Validation databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.2.3 Sensitivity and positive predictivity 6.2.4 Good detection window . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.2.5 Feature accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.2.6 Box-plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 . . . . . . . . . . . . . . . . . . . 145 6.3 Choice of QRS detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.4 Hilbert transform based QRS detection . . . . . . . . . . . . . . . . . . . . . . 150 6.4.1 Theory of Hilbert transform . . . . . . . . . . . . . . . . . . . . . . . . 150 6.4.2 Method of QRS detection with Hilbert transform . . . . . . . . . . . . 152 6.4.3 Beat identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.5 Filter-banks based QRS detection . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.5.1 Theory of filter banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.5.2 QRS detection with filter banks . . . . . . . . . . . . . . . . . . . . . . 157 6.5.2.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 157 xiv Niall Twomey Section CONTENTS 6.5.2.2 Beat-classification logic . . . . . . . . . . . . . . . . . . . . . 158 6.5.2.3 Overall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 6.6 Results obtained on MIT-BIH database . . . . . . . . . . . . . . . . . . . . . . 164 6.6.1 Sensitivity and positive predictivity . . . . . . . . . . . . . . . . . . . 164 6.6.2 Percentage RMS difference . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.6.3 Conclusions on QRS detection on MIT-BIH database . . . . . . . . . . 167 6.7 Requirement for artefact detection . . . . . . . . . . . . . . . . . . . . . . . . 167 6.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.7.2 Artefact detection algorithm . . . . . . . . . . . . . . . . . . . . . . . . 168 6.7.3 Demonstration of artefact detection . . . . . . . . . . . . . . . . . . . 169 6.8 Results obtained on allergy database . . . . . . . . . . . . . . . . . . . . . . . 171 6.8.1 Artefact detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 6.8.2 Sensitivity and positive predictivity 6.8.3 Percentage RMS difference . . . . . . . . . . . . . . . . . . . . . . . . . 175 . . . . . . . . . . . . . . . . . . . 173 6.9 Overall discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7 Fully automated allergy detection 178 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 7.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 xv Niall Twomey Chapter 0: 7.3 Unmatched classification results . . . . . . . . . . . . . . . . . . . . . . . . . . 181 7.3.1 Results of unmatched classification . . . . . . . . . . . . . . . . . . . . 181 7.3.2 Discussion on unmatched classification results . . . . . . . . . . . . . 182 7.4 Matched classification results . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.4.1 Sensitivity and specificity . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.4.2 Artefact detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 7.4.3 Time gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 8 Overall summary, final conclusions and future work 191 8.1 Summary of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 8.2 Primary contribution of this thesis . . . . . . . . . . . . . . . . . . . . . . . . 194 8.3 Possible avenues of future work . . . . . . . . . . . . . . . . . . . . . . . . . . 195 8.3.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 8.3.2 Alternative novelty detectors . . . . . . . . . . . . . . . . . . . . . . . 196 8.3.3 Real-time and portable implementation . . . . . . . . . . . . . . . . . 197 8.3.4 Feature and epoch analysis . . . . . . . . . . . . . . . . . . . . . . . . . 197 8.4 Publications resulting from this work . . . . . . . . . . . . . . . . . . . . . . . 198 xvi Niall Twomey Section CONTENTS Appendices 201 A Alternative parameter selection routines 201 A.1 Introduction and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 A.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 A.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 A.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 B Investigation into the importance of features 208 B.1 Introduction and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 B.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 B.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 B.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 References 215 xvii List of Tables 2.1 Tabulation of the characteristics of the subjects who were recorded for this study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.1 MET activity and corrected MET activity values. . . . . . . . . . . . . . . . . 56 3.2 Tabulation of the physical characteristics of the subjects who participated in the accelerometer-based energy expenditure validation test. . . . . . . . . 58 3.3 Table of PRD values computed between true energy expenditure and the estimated energy expenditure values obtained from the algorithms investigated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1 Table of HRV diagnostic frequency ranges for children. . . . . . . . . . . . . . 98 5.1 Classification results obtained with the novelty detection routine at epoch length of 60 seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.2 Tabulation of the classification results of the allergic subjects where ‘1’ represents an allergic classification (TP) whereas ‘0’ represents a nonallergic classification (FN). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 xviii Niall Twomey Section LIST OF TABLES 5.3 Classification result, time gain, doses saved and activation percentages obtained by the classification routine. The results in this table were obtained by fusing the results obtained for the individual epoch lengths together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.1 Differences in reported and calculated sensitivity and positive predictivity. . 164 6.2 Sensitivity and positive predictivity of QRS detectors on allergy database. . . 173 6.3 Distribution of mean and standard deviation of the PRD values calculated from automatically extracted QRS points. . . . . . . . . . . . . . . . . . . . . 176 7.1 Sensitivity and specificity of classification results obtained with the manual classification models on the automatically extracted HRV features (i.e. crossover classification results). . . . . . . . . . . . . . . . . . . . . . . . . . . 181 7.2 Sensitivity and specificity of classification results obtained by Afonso’s and the Hilbert transform QRS detectors. . . . . . . . . . . . . . . . . . . . . . . . 183 7.3 Specific time gain metrics obtained from fully automatic allergy classification based on Afonso and Hilbert transform QRS detectors. . . . . . . . . . . 188 A.1 Tabulation of sensitivity, specificity, and the time gain metrics which were obtained by selecting the mean, median and mode of the set of postprocessing parameters from the training data. In the case of the mean method, imperfect specificity was obtained. . . . . . . . . . . . . . . . . . . . 205 B.1 Classification metrics which were obtained with the time–, frequency–, Poincaré– and sequential-domain classification models ranked by order of importance of the feature category in question. . . . . . . . . . . . . . . . . . 210 xix List of Acronyms ADC analogue to digital converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 ANS autonomic nervous system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 BP band-pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 BPM beats per minute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 BW bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 CPET cardio pulmonary exercise testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 CE European conformity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 CSI cardiac sympathetic index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 CVI cardiac vagal index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 DFT discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 DoF degrees of freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 DSP digital signal processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 ECG electrocardiogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 EE energy expenditure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 c EE energy expenditure estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 EEact energy expenditure due to activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 cact EE energy expenditure estimation due to activity . . . . . . . . . . . . . . . . . . . . 51 xx Niall Twomey Section LIST OF TABLES EEtrue true energy expenditure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 EEG electroencephalogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 EM expectation maximisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Epi-pen epinephrine pen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 FFT fast Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 FIR finite impulse response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 FN false negative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 FP false positive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 FT Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 GMM Gaussian mixture model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 HF high frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 HR heart rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 HRV heart rate variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 IAA integral of absolute acceleration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 IAAt total integral of absolute acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 IAAx integral of absolute acceleration in the x-axis . . . . . . . . . . . . . . . . . . . . . 47 IAAy integral of absolute acceleration in the y-axis . . . . . . . . . . . . . . . . . . . . . 47 IAAz integral of absolute acceleration in the z-axis . . . . . . . . . . . . . . . . . . . . . 47 IIR infinite impulse response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 IQR inter-quartile range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 KDE kernel density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 km/h kilometers per hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 kNN k-nearest neighbours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 LF low frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 LOO leave-one-out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 MEMS micro-electro-mechanical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 xxi Niall Twomey Chapter 0: MET metabolic equivalents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 MIT-BIH Massachusetts Institute of Technology Beth Israel Hospital . . . . . 145 nesC network embedded systems C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38 OFC oral food challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 PCA principal component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 PDF probability density function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 +P positive predictivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 PRD percentage root mean square difference . . . . . . . . . . . . . . . . . . . . . . . . . . 63 PSD power spectral density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 REE resting energy expenditure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 RMR resting metabolic rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 RMS root mean square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 RMSE root mean square error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 RMSSD root mean square of successive difference . . . . . . . . . . . . . . . . . . . . . . . . 90 SHIMMER device sensing health with intelligence, modularity, mobility and experimental reusability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 SNR signal to noise ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 SVM support vector machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 TGS specific time gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 TGT total time gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 TN true negative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 TP true positive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 ULF ultra-low frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 VLF very low frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 xxii List of Figures 1.1 Photograph of an adrenaline pen (theonlineallergist.com, 2013). . . . . . . . 6 1.2 Oral food challenge flowchart which presents the means by which allergy is diagnosed in a clinical environment. . . . . . . . . . . . . . . . . . . . . . . . 12 2.1 Illustration of the approximate growth of microelectromechanical systems accelerometers, gyroscopes and magnetometers in the research market over the past decade. (Google Inc., 2013). . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Functional diagram of a MEMS accelerometer, identifying the mass and spring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 The original galvanometer developed by Willem Einthoven to record the ECG in the early 20th century. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 Labelled ECG waveform (Dublin Institute of Technology, 2013). . . . . . . . 28 2.5 Example of how classification is obtained with two-dimensional data (top) with SVM based classification (bottom left) and GMM based classification (bottom right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 xxiii Niall Twomey Chapter 0: 2.6 Example of how novelty detection algorithms can be employed to determine novel and normal data points. The upper image shows the distribution of the labels, and the lower-right figure shows one-class SVM while the lowerleft figure shows the example of GMM-based novelty detection on the same data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.7 Diagram of the SHIMMER device, with the various components annotated (reproduced with permission from SHIMMER-research (2010)). . . . . . . . . 39 2.8 Modified oral food challenge flowchart which is employed to accommodate the introduction of the SHIMMER monitoring device for data collection during OFCs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.1 The x (anterio-posterior), y (medio-lateral), and z (vertical) acceleration directions in relation to a body. . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 Demonstration of the conversion process from the raw digital codewords obtained from the accelerometer (a) to a measurement of absolute gravity (b). 60 c and EEtrue values obtained for participant 2. The values obtained 3.3 The EE are overlaid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4 Example histogram and probability density function of a feature. . . . . . . . 68 3.5 Illustration of the differences between PDFs that describe two separate classes of data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Histograms plotting the normalised IAA values of the allergic and nonallergic subjects who were investigated. . . . . . . . . . . . . . . . . . . . . . 70 4.1 Einthoven triangle configuration for ECG electrode placement (University of Nottingham, 2013). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 xxiv Niall Twomey Section LIST OF FIGURES 4.2 Illustration of relationship between the ECG and the epoch length for ECG recorded in OFC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3 Illustration of relationship between the ECG, the epoch length and the epoch overlap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4 Illustration of the raw HR (⇤) which is not periodically sampled, and the HR re-sampled to 10 Hz via cubic spline interpolation. . . . . . . . . . . . . . 81 4.5 PDF of mean heart rate, generated from allergic and non-allergic subjects . . 88 4.6 PDF of standard deviation of the heart rate, generated from allergic and non-allergic subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.7 PDF of coefficient of variation of the heart rate, generated from allergic and non-allergic subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.8 PDF of RMSSD of the heart rate, generated from allergic and non-allergic subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.9 PDF of PNN50 of the heart rate, generated from allergic and non-allergic subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.10 Histogram of the relative times between successive QRS complexes. . . . . . 92 4.11 PDF of histogram of the heart rate, generated from allergic and non-allergic subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.12 Chart of the change between successive QRS complexes. . . . . . . . . . . . . 94 4.13 PDFs derived for the sequential domain features. . . . . . . . . . . . . . . . . 95 4.14 Original and rotated points plotted in a Poincaré Chart. . . . . . . . . . . . . 96 4.15 CSI and CVI PDF from Poincaré features. . . . . . . . . . . . . . . . . . . . . 97 xxv Niall Twomey Chapter 0: 4.16 PDF of the frequency domain features. . . . . . . . . . . . . . . . . . . . . . . 99 5.1 An illustration of PCA in two-dimensional feature space (subplot a) and two-dimensional component-space (subplot b). . . . . . . . . . . . . . . . . . 105 5.2 A mixture of three equally-weighted Gaussians (dashed lines) which combine to represent a multi-modal non-normal distribution (solid black line). . 107 5.3 A demonstration of the difference in clustering which is obtained by k-means clustering and the expectation maximisation algorithm. With subfigures B and C, a line is drawn from each point to its associated cluster. . 110 5.4 Sample likelihood (subplot a) and histogram of the background likelihood (subplot b) of Subject 23. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.5 Flowchart of classification procedure involving the recording of ECG, annotation of QRS complexes, feature extraction and the classification procedure of OFC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.6 Illustration of the data segmentation and testing routines employed in the allergy classification procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.7 The confusion matrix showing how sensitivity and specificity are obtained with regard to the ground truth (diagnosis) and predicted (classification) results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.8 A demonstration of early detection of allergy (with Subject 11). The segments at 45, 60 and 80 minutes which fall beneath the threshold were classified as allergy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.9 Example demonstrating how the generated likelihood for Subject 2 satisfies the allergy criteria at an epoch length of 60 seconds (subplot a) while failing to do so for epoch lengths of 120, 180 and 300 seconds (subplots b — d). . . 126 xxvi Niall Twomey Section LIST OF FIGURES 5.10 Example demonstrating how the generated likelihood for Subject 13 does not satisfy the allergy criteria at an epoch length of 60, 120 and 180 seconds (subplot a — c) but the criteria are then met for the epoch length of and 300 seconds (subplots d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.11 Example demonstrating how the generated likelihood for Subject 16 surpasses the threshold, but does not satisfy the allergy criteria due to the inclusion of the duration parameter and for all epoch lengths the subject is correctly classified as non-allergic. . . . . . . . . . . . . . . . . . . . . . . . 130 5.12 Example of arrhythmia on the ECG trace (a) and the effect this has on the heart rate (b) on Subject 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.13 The likelihood series chosen for Subject 2 which does not diverge from the background level significantly enough to classify allergy. PCA preserved 80% of the feature variance which was modelled with a GMM order of 32 at an epoch length of 60 seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.1 Relationship between a box-plot, and quartile ranges with a normal distribution. The locations marked Ql and Qu are the lower and upper quartiles respectively, and the median is marked as m. . . . . . . . . . . . . . . . . . . 148 6.2 The real and imaginary components resulting from the Hilbert Transform of the ECG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 6.3 The flowchart for QRS detection from the Hilbert Transform. . . . . . . . . . 153 6.4 The stages employed by the Hilbert Transform QRS detection algorithm (the ECG data was obtained from Patient 113 in the MIT-BIH Database). . . . . . 154 6.5 The generic filter banks flow chart incorporating both bandpass and synthesis filters. The # and " symbols represent down– and up-sampling respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 xxvii Niall Twomey Chapter 0: 6.6 The idealised filter response of the filter banks, with M equally-wide subbands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.7 Overall simplified flowchart of Afonso’s QRS detection method. . . . . . . . 159 6.8 The effect of the various QRS validation levels from Afonso’s QRS detection algorithm. In these charts, the ◦ symbols represent the candidate QRS points. Charts b — d have been down-sampled. . . . . . . . . . . . . . . . . 161 6.9 Incorrect QRS complex localisation (Patient 8 of MIT-BIH arrhythmia database). Manual QRS annotations are marked with ⇥ and automatic detections are marked with ⇤ (Hilbert transform algorithm) and ◦ (Afonso’s algorithm). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.10 Sensitivity and positive predictivity box-plots of QRS detection on the MIT-BIH arrhythmia database. . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.11 PRD box-plots of the mean (µ) and standard deviation (σ) of the heart rate over all subjects in the MIT-BIH arrhythmia database. . . . . . . . . . . . . . 167 6.12 Normalised output of high-frequency (a,b) and low-frequency (c,d) energy estimators for artefact detection (data from Subject 8 of allergy database). . . 170 6.13 The breakdown of the number of artefact events which were detected by the artefact detection algorithm for each subject of the allergy database. . . . . . 172 6.14 Sensitivity and positive predictivity box-plots of QRS detection on the allergy database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.15 Example of poor quality of the ECG signal after the application of denoising filters which contributed to poor sensitivity and positive predictivity values for Subject 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 xxviii Niall Twomey Section LIST OF FIGURES 6.16 Boxplots of the PRD of the mean and standard deviation of the heart rate between the manual and automatic QRS points extracted. . . . . . . . . . . . 176 7.1 The flow of how the matched (right) and unmatched (left) classification results are obtained. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.2 Likelihood plots of Subject 1 for manual and automatic models at epoch lengths of 60 seconds. Subplots (a) — (c) show the likelihoods which were obtained with manual, Afonso and Hilbert models respectively. In all cases the threshold for allergy classification is off the scope of the figures. . . . . . 184 7.3 Likelihood plots of Subject 7. Subplot (a) shows the threshold which was computed without the aid of artefact detection and how allergy classification does not classify allergy. Subplot (b) shows the threshold which was computed when artefact detection was incorporated and how allergy classification is successful with artefact-aware classification. . . . . . 186 A.1 The estimated distribution of duration and multiplicative post-processing parameters which achieve 100% specificity and maximum sensitivity on the training dataset. The image is limited to d and n parameters of 75. The darker regions indicate a higher density of suitable parameters. xxix . . . . . . . 203 CHAPTER 1 Allergy and Allergic Reactions 1.1 A Introduction LLERGY is defined as an abnormally high acquired sensitivity towards certain substances (Dorland, 1901). To the vast majority of the population allergens are harmless and do not interfere with their day-to-day living. However, for those who suffer from allergy, allergens can provoke acquired, predictable and rapid conditions which are called allergic reactions, and in some cases, these can be fatal. This chapter sets the background to the clinical knowledge of allergy and allergic reactions and it also introduces the scope of this thesis. The chapter focuses on the varieties, symptoms, diagnoses and treatment of food-based allergy. The possibility of automated allergy detection for real-time clinical allergy testing environments is then explored and the benefits which this type of analysis could provide are also introduced. 1 Niall Twomey 1.2 Chapter 1: Allergy Approximately 25 — 30% of people believe that they suffer from allergy (Miles et al., 2005), but it is estimated that only 6% of schoolchildren (Bock and Sampson, 1994) and 2.3% of adolescents (Pereira et al., 2005) suffer from food allergy. The reason for this disparity between perceived and true numbers is due to a lack of public understanding of allergy, and also because in most instances, suspicions of allergy are not confirmed diagnostically. 1.2.1 Varieties and symptoms of allergy While allergies can come in many varieties, it is only diagnosed if the symptoms present in a predictable manner. 1.2.1.1 Variety A range of different types of allergy exist, and the primary varieties are listed here. 1. Food allergy: Food allergy is caused by an adverse immune response against a food-type. In some cases, when a sufferer is very sensitive to the allergen, physical contact with the food-type is sufficient to provoke the symptoms of an allergic reaction. The majority of reactions occur after ingestion of the food. The quantity of food required to provoke a reaction differs from person-to-person, and it can be thought of as a specific threshold. Typical food allergens include: 2 Niall Twomey Section 1.2: Allergy • Milk • Egg • Wheat • Peanut • Soy • Cooked egg 2. Environmental allergy: Allergic reactions can be provoked by pets, poor hygiene and insects. Reactions to environmental allergens can manifest due to the presence of the allergen in the air (dust mites, mold spores) or by physical contact (rubbing cat/dog, insect stings etc). The length of exposure required to provoke a reaction differs from person-to-person. Environmental allergens include: • Cats • Dust mites • Mould • Dogs • Pollen • Insect sting/bites 3. Drug/medication allergy: Allergic reactions can also be provoked by the administration of medication. The reaction manifests when the drug enters the sufferer’s blood stream. These allergies can be very dangerous if the sufferer or medical staff are unaware of them. Often, allergies to medical drugs are due to dyes and stabilising agents found in the drug packaging rather than the drug itself. However, due to international regulation, colour coding is required to reduce counterfeiting. Drug allergens include: • Penicillin 1.2.1.2 • Anaesthetics Symptoms The severity of the symptoms that the allergy sufferers encounter depends on the individual subject’s susceptibility to the allergen and on the amount of the allergen that they have come into contact with. For subjects with severe sensitivities, an allergic reaction may be provoked simply by contact between the subject’s skin and the allergen. For less 3 Niall Twomey Chapter 1: sensitive subjects, oral consumption of the food type is required in order to provoke a reaction. The most dangerous of the symptoms of allergy involve restrictions of the airway (wheezing, shortness of breath and asthma attacks) and anaphylaxis. Anaphylaxis is an acute allergic reaction towards an allergen, and 3 — 15% of those who suffer from allergies will be afflicted with an anaphylactic reaction at least once (Matasar and Neugut, 2003; Yanishevsky and Hourihane, 2010; Järvinen et al., 2009), indicating that there is a very real risk of a severe reaction amongst sufferers of allergy. Interestingly, there is no universally accepted definition of anaphylaxis, and there is also disagreement about the criteria for the diagnosis of anaphylactic events (Sampson et al., 2006). In order to protect against reactions, sufferers of allergy must always be constantly vigilant and be equipped to treat and manage allergic reactions. Symptoms of allergy can include: • Hives • Red eyes • Vomiting • Rashes • Sneezing • Diarrhoea • Swelling • Bloating • Wheezing • Running nose • Mood change • Shortness of breath • Sinus congestion • Abdominal pain • Asthma attacks • Itchy eyes • Ear pain • Anaphylaxis 1.2.2 Management and treatment of allergic reactions The best means of prevention of allergy is complete avoidance (Pereira et al., 2005; Sicherer and Sampson, 2006). For the cases of animal allergy, subjects are strongly encouraged not to have a pet in the home and to avoid them elsewhere. With some environmental 4 Niall Twomey Section 1.2: Allergy allergens, such as dust mites, a high degree of hygiene and cleanliness can assist in limiting the exposure to a subject, lowering their overall vulnerability to the allergen. However, it is not possible to manage exposure to all environmental allergens, in the case with pollen for example, and if a subject is allergic to these unavoidable environmental allergens it is possible to self-administer precautionary antihistamines which will increase tolerance towards these symptoms for a short period of time. This precautionary treatment is only viable in the case of environmental allergens. With food and chemical allergies, it is not appropriate to take medication on the chance of coming into contact with the substances. In the case when a person reacts to a food type, insect bite, etc, the means of countering the reaction depends of the extent of the reaction. 1.2.2.1 Mild reactions For mild reactions (hives, rashes, sneezing, abdominal pain, vomiting, etc) off-the-shelf antihistamines can be sufficient to stop the progression of a reaction. Zyrtec is commonly used in home and hospital environments to halt mild allergic reactions. This is a secondgeneration antihistamine which does not pass the blood/brain barrier, so drowsiness is not induced by consumption. Zyrtec can be taken in tablet or liquid form, and can begin to take effect in 10 — 20 minutes. In 2008 Zyrtec was the highest-grossing non-food product in the United States of America (Elliot, 2010), indicating the prevalence of allergy internationally, and the chronic nature of the condition. 1.2.2.2 Severe reactions For severe reactions (wheezing, shortness of breath, anaphylaxis) stronger rescue medications are required. Inhalers and epinephrine pens (Epi-pens) are used in these cases to stop the reaction. 5 Niall Twomey Chapter 1: Figure 1.1: Photograph of an adrenaline pen (theonlineallergist.com, 2013). Epi-pens contain a dose of adrenaline⇤ in a sealed syringe-like container, see Figure 1.1. If a subject requires adrenaline when in a state of anaphylaxis the top of the Epi-pen is removed, revealing a needle. The needle is then inserted in the thigh of the subject for ten seconds while the auto-injection mechanism releases the adrenaline into the subject’s bloodstream. If the effects of the allergic reaction do not cease within ten minutes of administration, a second dose should be administered. After use of the Epi-pen the subject should go to the nearest hospital for a checkup and renewal of the Epi-pen prescriptions. 1.2.3 Risk factors, and quality of life 1.2.3.1 Risk and protection factors Some researchers have demonstrated that a number of factors can increase or decrease the likelihood of a person having allergic reactions. There are conflicting reports about the effect of rural and urban localisation on extensivity of allergy. Some studies (BraunFahrlander et al., 1997; Kilpelinen et al., 2000; Radon et al., 2004; Almqvist et al., 2003) report that children who have grown up on a farm have a lower frequency of allergy in comparison with urban and less-rural children. In contrast with this, other researchers were not able to find significant differences between rates of allergy based on rural and urban localisation (Viinanen et al., 2007; Omenaas et al., 1994; Azpiri et al., 1999). ⇤ The terms adrenaline and epinephrine are synonyms for the same compound and are associated with British and American English respectively. The term Epi-pen employs the American naming convention and is adopted in this thesis because it is the most common term for the medical device. 6 Niall Twomey Section 1.2: Allergy It has also been reported that breastfeeding (Rautava et al., 2002) and exposure to household pets in a childhood home (Ahlbom et al., 1998) have been shown in some studies to be protection factors against allergy, but other studies demonstrate otherwise (Bergmann et al., 2002). Allergy is therefore difficult to predict as a number of unknown dynamics determine a person’s sensitivity to the allergens. It had been consistently reported, however, that suffering from allergy is a risk factor towards other other diseases. Allergy is heavily linked with asthma in childhood (Roberts et al., 2003; Black et al., 2000; Call et al., 1992), eczema (Bryld et al., 2003), wheezing and bronchial hyperresponsiveness (Arshad et al., 2005). 1.2.3.2 Quality of life Allergy has a significant impact on the quality of life of its sufferers. In a quality of life survey, young sufferers of allergy reported much greater fear and anxiety of their condition in comparison to persons who suffer from other chronic diseases, such as insulin dependant diabetes mellitus (Avery et al., 2003). Owing to the relative ease of contamination, sufferers must live in a constant state of awareness towards their current state of health, and towards everything they consume. They must also live at constant risk of potential reactions and anaphylaxis, which increases stress and worry for the rest of their lives. Indeed, sufferers of severe allergies are reminded of their affliction daily because they must always carry Epi-pens in case of reactions. Allergy affects the quality of life of more than just the sufferer and the quality of life of their family members are also impaired. Limitations of family activities are reported among suffering families (Sicherer et al., 2001), and in social situations the parents of suffers report significantly more disruption than non-suffering families (Oude Elberink et al., 2002). This is due to the persistent fear of the sudden death that follows sufferers of allergy (Primeau et al., 2000). 7 Niall Twomey Chapter 1: Adolescents are at the highest risk of death from anaphylaxis (Bock et al., 2007, 2001; Sampson et al., 1992), but young children and adults are also in danger, with up to 1% of these sufferers at mortal risk (Matasar and Neugut, 2003; Yanishevsky and Hourihane, 2010; Järvinen et al., 2009). Without a confirmed diagnosis, a significant burden of anxiety is carried by suffering families (Primeau et al., 2000), and remains until a diagnosis is verified. Positive and negative diagnoses of allergy improve the quality of life of the family and individual involved, as, even with diagnosis of allergy, the burden of uncertainty is removed, and the family can adapt to living with the situation (DunnGalvin et al., 2010). 1.3 Requirement for clinical diagnosis of allergy It is very important for a subject to know if they are allergic to a substance. Precautionary, but unsubstantiated, avoidance of the substance may be considered a safe option, but it can also leave a subject poorly guarded against allergic events. In Ireland, for example, schoolgoing children suspected as being allergic to a food-type are provided with a prescription for four Epi-pens, even before the diagnosis of allergy has been confirmed. One pair of Epi-pens is kept by the guardians of the subject, and the second pair is kept in the subject’s school. While great efforts may be taken to avoid an allergen in the home environment, it is in the unregulated environments that the subject is in greatest danger. For example at school lunchtime, cross-contamination (introduction of allergen from secondary source through contact) of food is common. If a subject has not been diagnosed (or is not waiting to be diagnosed), he/she will not be able to obtain the Epi-pens required to treat severe allergic reactions. They are, therefore, at the highest risk of sudden death as they are without appropriate preventative medicine should anaphylaxis arise. 8 Niall Twomey Section 1.4: Diagnosis of allergy For this reason, it is very important that subjects who might be at risk of allergy have an early clinical diagnosis. Yet, a very high percentage of adults who believe themselves to be allergic are not, and may therefore be needlessly avoiding food substances and impairing their own quality of life and that of their family. 1.4 Diagnosis of allergy Proof of a subject’s susceptibility towards an allergen can be obtained clinically in controlled environments. Three types of test exist to assess the vulnerability and these are discussed in the subsequent sections. 1.4.1 Blood testing Bio-chemically, symptoms of allergic reactions occur as a result of antibodies stimulating cells in such a way that allergic reactions manifest in a physical manner. Studies have shown that subjects with higher levels of these antibodies have a higher probability of suffering from allergy. However, high levels of these antibodies are not sufficient for a diagnosis of allergy, and low concentrations are not sufficient to rule out allergy. Blood tests can be considered as an overall ‘likelihood of allergy’ test. With a blood sample, the plasma can be stimulated with a serum which contains a solution of a specific allergen. By analysing the concentrations of antibodies which are present before and after the serum has been introduced to the blood plasma, the likelihood of existing allergy towards the allergen can be assessed. These tests require specific laboratory equipment and trained personnel. 9 Niall Twomey 1.4.2 Chapter 1: Skin testing Skin testing — also referred to as ‘puncture’, ‘scratch’ or ‘prick’ testing in relevant literature — is a quick means of assessing a subject’s allergic susceptibility towards an allergen. A list of potential allergens is drawn up based on a subject’s history. Samples of each are then obtained, and these are mixed with deionised water, producing a waterbased solution of the allergen. Needles are then placed in the solution and these are used to scratch the skin of the subject. As well as scratching the skin with potential allergens, the skin is also scratched with a needle that was exposed only to deionised water, providing a true-negative scratch. If a subject is allergic to one of the substances, a reaction can occur around the area of the scratch mark. The symptoms of the reaction can range from reddening, to inflammation, to an outbreak of hives on the skin. The extent of the reaction is then measured. The larger the response to the scratches with regard to the true negative, the more likely it is that the subject is allergic to the allergen. While allergic reactions will not present on the true negative scratch, the scratching process alone can introduce reddening (and sometimes inflammation) that must be accounted for in the measurement of the other test scratches. Reaction of the skin to suspected allergens is an indication of the subject’s susceptibility towards the allergen, but, in the same manner as the blood tests, skin tests are not conclusive for a diagnosis of allergy, even in light of a strong reaction to the scratches (Sampson, 1999). 1.4.3 Challenge testing The only clinically validated means of diagnosing food allergy is the oral food challenge (OFC) (Yanishevsky and Hourihane, 2010; Järvinen et al., 2009). During the challenge the subject is required to consume one age-appropriate portion of a food-type. This has the potential to act as a medical poison for some subjects. The portion (e.g. one egg, a 10 Niall Twomey Section 1.4: Diagnosis of allergy glass of milk, eight peanuts etc) is divided up into five sub-portions doubling in size from approximately 1 32 of a portion to 1 2 of a portion. The smallest dose is always administered first. In the case of peanuts, a supplementary sub-portion is introduced where a peanut is rubbed on the subject’s lower lip. Depending on the sensitivity of the subject towards peanuts, the lip may swell after this contact, and this is sufficient for diagnosis of allergy. The flowchart for the oral food challenge (OFC) is shown in Figure 1.2, and every stage of this figure will be discussed in this section. If a subject is allergic to a food-type they might be able to consume a small quantity of the food-type without reacting. The amount they must consume to provoke allergic symptoms can be thought of as a ‘reaction threshold,’ so to induce a reaction requires this threshold to be surpassed. Reactions will be more severe with the consumption of a greater amount of the food. For the comfort and safety of the subjects who react during the food challenge, it is desirable for the subject to consume the smallest amount possible. If the subject consumes the full portion of the food-type they are being tested against they are said to have ‘passed’ the test and may introduce the food-type into their diet. If a reaction to the food-type occurs during the challenge the subject is said to have ‘failed’ the test and the subject must avoid the allergen. 1.4.3.1 Preliminary tests The subject arrives with their parent/guardian and they are admitted to the day ward. Labelled skin tests are performed before the OFC begins. The subjects are left for fifteen minutes and if the skin has reacted to these allergens in this time, the size of the reaction is measured and recorded. Depending on the extent of the reactions to the solutions, supplementary food challenges may be scheduled. The food-type the subject is being tested against will always be included in the skin test before the OFC, but oral consumption is still required, regardless of the extent of reactions which are obtained from the skin tests. 11 Niall Twomey Chapter 1: Subject arrives at Hospital Preliminary tests Observe for 10 — 20 minutes Checkup Fail Pass Administer Dose Yes Dose Remaining? Subject diagnosed ‘allergic’ Fail No 2hrs Observation Challenge over Pass Subject diagnosed ‘non-allergic’ Figure 1.2: Oral food challenge flowchart which presents the means by which allergy is diagnosed in a clinical environment. 12 Niall Twomey Section 1.4: Diagnosis of allergy If the subject has been sick over the preceding two weeks, or if the subject suffered from an allergic reaction over the same time-period, the challenge will not proceed as the subject’s immune system might be compromised, and the results of the OFC may be inconclusive. 1.4.3.2 Checkup The subject is then given his/her first checkup by the allergists. The symptoms of allergic reactions can manifest in an outbreak of hives and rashes on the skin, so it is important to know of any existing skin blemishes a priori so they are not mistaken as the physical manifestations of an allergic reaction in later checkups. Indeed, as there is a definite link between allergy and dermatological conditions such as eczema (Bryld et al., 2003), the subjects in question can present with many non-allergy related skin conditions which might be mistaken for allergic rashes later. Therefore, a survey of these is required for accurate diagnosis. The subject’s heart rate, blood pressure, blood oxygen saturation and respiration rate are measured and logged by the allergist. The first sub-portion of the suspect allergen is then administered to the subject. 1.4.3.3 Observation After the dose is administered the subject is observed from a nearby station by the nursing staff for 10 — 20 minutes. If the subject is thought to be reacting to the allergen during this waiting period, the subject is given a checkup by the nursing staff. If the subject fails the checkup the subject is not required to consume any more food, and the failure protocol (see next subsection) will be followed. If the subject passes this intermediate checkup, the remainder of the observation time is allowed to pass. 13 Niall Twomey Chapter 1: When the observation period has completed, the subject is given another checkup. If the physiological signals recorded from the subject during the checkup are within the normal range for the subject’s age group, and if no manifestations of an allergic reaction have been observed, the next largest dose is administered, and the subject is observed from the observation station for another 10 — 20 minutes. The ‘checkup’, ‘sub-portion administration’ and ‘observation’ sequence repeats until all sub-portions have been fully consumed, or until the subject reacts to the food type. 1.4.3.4 Failure protocol If, at any stage during the OFCs, a subject reacts negatively to the food-type, rescue medications (i.e. antihistamines) can be administered. At this stage the subject will be diagnosed as being allergic to the food-type, and the guardians of the subject are informed about how to avoid the allergen and shown how to use Epi-pens in case of emergencies. Typically the reactions are not severe, and symptoms such as sneezing, itchy eyes and hives will present. In these situations the administration of Zyrtec is sufficient to halt the effects of the reaction. After this time period subjects are monitored for two supplementary hours as further allergic reactions can lay dormant for this length of time after the final dose of allergen has been consumed. Further antihistamines are administered to the subject during this time period if necessary. Approximately 1 — 3% of OFCs will require the administration of adrenaline even under the close supervision of the allergists. 1.4.3.5 Supplementary stages Many of the symptoms the allergists look for are subjective and differ from subject to subject. For a definitive diagnosis, the allergist must continue to administer the allergen until a reaction occurs. If the allergist is unsure about the cause of a symptom (if the subject seems distressed it may be a result of restlessness rather than the onset of a reaction) they will wait for another 10 minutes or repeat the size of the previous subportion as the specific cases require. 14 Section 1.5: Challenge-testing clinical experience Niall Twomey If the subject consumes all sub-portions of food without reacting, they are deemed to have passed the food challenge. However, these subjects are also monitored for a further two hours as delayed reactions can still manifest after the challenge has finished. If the subject reacts during this time the failure protocol is followed and the subject is diagnosed allergic. If a subject does not react during the waiting period, they are deemed to have passed the food challenge. 1.5 Challenge-testing clinical experience Over 600 OFCs have been performed by the Department of Paediatrics and Child Health in Cork University Hospital. These controlled OFCs are monitored by staff who have been trained to recognise changes in behaviour and changes in physiological signals which are typical of a subject who has reacted to the food type they are being tested against. However, even with the close supervision provided by the staff there have been seven cases where administration of adrenaline was required. The allergists who conducted the OFCs identified two events which are suggestive of oncoming allergic reactions (Bindslev-Jensen et al., 2002). These are: Change in activity: There is a tendency for the subject to become quiet, introverted and still before the onset of a reaction (Bindslev-Jensen et al., 2002). The desire to play disappears. Biologically this is due to inflammatory mediators of the cardiovascular system compensating for the effects of histamine and other mediates of allergic response. At this stage, the magnitude of the reaction is not yet sufficient to present perceivable symptoms. This quieting state does not always transpire for subjects who fail a food challenge as reactions can occur before the subject begins to feel unwell depending on the subject’s susceptibilities to the allergen in question. Change in heart rate: There is a general tendency for the heart rate of subjects to increase as a result of a reaction (Bindslev-Jensen et al., 2002). This was observed during the 15 Niall Twomey Chapter 1: checkups that the allergists performed on the subjects during the OFCs. A change in heart rate during a checkup is one of the factors that is used to determine if a subject is allergic to the food-type. However, a change alone is not sufficient to diagnose a subject as being allergic as it has not been definitively linked to the presence of allergy. In the presence of a likely allergic reaction, the absence of a change in heart rate is not sufficient to discount allergy as this is a subjective measure which is only recorded at twenty minute intervals. Blood pressure and blood oxygen saturation are also measured during checkups. These signals will be the last to change as a result of a reaction, and other symptoms of allergic reactions will be observable before they change. However, if these signals exceed the normal range for the subject’s demographics the test will immediately be halted and the failure protocol will be followed. 1.6 Machine-assisted classification of allergy Based on the observations which were discussed in the previous section, it was therefore proposed to investigate the effectiveness of machine-assisted diagnosis of food challenge monitoring. In this, the activity and heart rate of the subjects are monitored in a noninvasive and remote manner, and these are later interrogated in order to enquire into the existence of signatures of allergy in the signals. This monitoring should be performed in an unobtrusive manner without interfering with the subjects, staff or the usual progression of challenges. With the recorded data, characterisation of activity and heart rate will be performed with the goal of detecting, predicting and classifying a subject as being ‘allergic’ or ‘non-allergic’. The classification is designed to complement the allergists diagnoses during OFCs as the allergists cannot be replaced. Therefore, classification must be ‘tuned’ in an objective manner to ensure that no false positive classifications occur as this would introduce an unnecessary violation to the subject’s quality of life indefinitely. 16 Niall Twomey Section 1.7: Layout of this thesis Monitoring of OFCs carries the potential to introduce many advances to the current state of the art. If a correlation between the measurements and the onset of allergy is discovered, automatic classification introduces the possibility of early-detection of allergy. With this, challenges could be stopped earlier, antihistamines could be administered sooner, and the stress on the subject and their family during these tests could be significantly reduced by suppressing the extent of the reaction. Indeed, with machine-assisted monitoring, realtime and non-invasive visualisation of the heart rate throughout the challenges could be obtained. This would also introduce high-resolution clinical information with which the allergists can make their diagnoses. 1.7 Layout of this thesis For the purposes of this investigation, the remainder of this thesis takes the following form: Chapter 2 discusses and reviews the in more detail the algorithms which are employed in later chapters and also provides an overview of the subjects whose data were recorded during the OFC. Chapter 3 discusses the collection of accelerometer and energy-expenditure data for the purpose of validation of the algorithms which were employed to assess OFCs. This chapter also presents the results obtained when employing these algorithms for the purpose of classification of allergy. Chapter 4 introduces the heart rate variability features which are extracted from the electrocardiogram (ECG) data which were recorded during OFCs. This chapter also plots probability density functions which demonstrate how effective the individual features were at discriminating between allergic and non-allergic subjects. 17 Niall Twomey Chapter 1: Chapter 5 discusses the machine learning algorithms which are employed for the classification of allergy. This chapter also presents the results which are obtained when classification of allergy is performed based on manual QRS annotations of the ECG. Chapter 6 discusses the concepts of automatic QRS extraction. Two QRS detection algorithms are discussed and are validated against online ECG databases. The effectiveness of the algorithms is then assessed with the ECG recorded during OFCs. Chapter 7 presents the results which are obtained by combining the capabilities of the topics discussed in Chapters 4, 5 and 6 for the fully automated assessment of allergy classification. Chapter 8 first summarises the work presented in the following chapters. The main contributions of this thesis are then presented and a number of directions that should be investigated in future work are also listed. 18 CHAPTER 2 Algorithms, methods and data collection 2.1 T Introduction HIS chapter provides an overview of the analytic techniques which will be employed later in this thesis. In the previous chapter, it was stated that the allergists, who oversee oral food challenges, observed that subjects who react to the food-type have a tendency to become quiet and still, and it was also stated that there is a tendency for the heart rate of the subject to change before allergic reactions. In later chapters, accelerometer-based activity, and heart rate variability analysis will be employed to objectively assess the extent to which these changes occur. However, as this is the first work to investigate these signals during allergic events it is not possible to cite review results from other researchers for allergy classification. Therefore, the state of the 19 Niall Twomey Chapter 2: art of each of these disciplines is discussed separately in this chapter. Furthermore, a core aspect of this thesis is also in regard to the utilisation of machine learning algorithms for automated decision making. Classification algorithms are also introduced and discussed in this chapter. As this chapter discusses four separate areas (activity, heart rate, machine learning and data collection) which are — for the sake of clarity in this chapter — independent, these topics are discussed in isolation. Later, if chapters require cooperation between these individual methods, the means by which they engage will be discussed. Finally, this chapter will discuss the hardware which was employed to record data during the OFCs, and the means by which this was integrated into the oral food challenge for data collection is outlined. 2.2 Activity analysis 2.2.1 Introduction to inertial measurement Inertial sensors sense acceleration. Recently inertial sensors have been miniaturised with the invention of micro-electro-mechanical systems (MEMS) fabrication technologies. Inertial sensors can accurately sense acceleration in a number of axes (or degrees of freedom (DoF)), and they can be employed to monitor complicated applications with good accuracy. The most common types of inertial sensors are accelerometer–, gyroscope– and magnetometerbased. These measure acceleration, angular velocity and magnetic fields (i.e. cardinal orientation with regard to Earth’s magnetic poles) respectively. Individually, these sensors will typically measure up to three DoFs. However, when combined together in similar planes on one board, a 9-DoF sensor is obtained and these are capable of assessing full kinematic mobility. 20 Niall Twomey Section 2.2: Activity analysis 6,000 # new publications 5,000 Accelerometer Gyroscope Magnetometer 4,000 3,000 2,000 1,000 0 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Publication year Figure 2.1: Illustration of the approximate growth of microelectromechanical systems accelerometers, gyroscopes and magnetometers in the research market over the past decade. (Google Inc., 2013). Fixed plates Applied Acceleration Spring Mass (a) Accelerometer experiencing no external acceleration. (b) Accelerometer experiencing external acceleration. Figure 2.2: Functional diagram of a MEMS accelerometer, identifying the mass and spring. 21 Niall Twomey Chapter 2: Accelerometers are the most popular of these sensors both in production volumes and research markets. Figure 2.1 shows the trends of new research articles published over the past 10-years containing the phrases ‘MEMS accelerometers’, ‘MEMS gyroscopes’ and ‘MEMS magnetometers’ on Google Scholar (Google Inc., 2013). This chart shows steady growth of research for each inertial sensor, but, in particular, the popularity of accelerometers has outgrown the other inertial sensors. Figure 2.2 shows a high-level functional diagram which demonstrates how MEMS accelerometers sense changes in acceleration. These devices consist of fixed plates which stay immobile relative to a device, and mobile plates that are held in position by ‘springs’. Between the springs is a mass which, under the influence of applied acceleration, will physically lag relative to the acceleration that is experienced. This causes the distance between the fixed outer plates and the plate on the moveable mass to change. When these plates are driven with a voltage, the change in distance generates a change in capacitance between the plates. The change in capacitance is then measured by supplementary circuitry, and it is the output of this circuitry which provides the measurement of acceleration. Inertial sensors have a wide number of applications and have been applied to good effect in a large number of areas, including automotive (Galvin et al., 2000), aerospace (Hayton et al., 2001) and crash testing (Castro et al., 1997) to name a few. However, as the inertial sensors are to be employed to monitor children undergoing OFCs, this section focuses on the application of inertial sensors for human monitoring. 2.2.2 Applications of inertial sensing The algorithms which are discussed in this section relate to monitoring of human bodies for the purpose of estimating activity, energy expenditure and other metrics. In all cases accelerometers are employed, but in some cases other inertial measurements are also utilised. 22 Niall Twomey Section 2.2: Activity analysis 2.2.2.1 Activity recognition Inertial sensing can be applied to recognition of postures, gait and activities (i.e. sitting, standing, walking running, etc). This form of sensing has many applications. For example, Minnen et al. (2005) inferred high-level behaviour from low-level gestures for the purpose of the detection of behavioural syndromes such as autism, Asperger’s syndrome, etc. To perform this, three microphones and two accelerometers were worn, and an on-body computer logged the data which were recorded. Simple classification routines were utilised by the researchers, but to good effect. The suitability of the hardware for the application was not discussed even though with three microphones, two accelerometers and associated data logging hardware, this may not be appropriate for persons with behavioural syndromes. Pober et al. (2006) discussed the application of quadratic discriminant analysis to assess activities. It was shown in this study that the classification of activities is non-trivial, because for some (for example with walking uphill and on flat surfaces) the accelerations which are obtained are very similar. It is possible that with measurement of more DoFs that the angle of ascent could have been inferred to allow discrimination between flat and tilted walks, but this is not discussed. However, with the classification routines which were utilised, good results were obtained. The sample size which was used was also quite small (n = 6). While this number of participants is suitable for the validation of pre-existing algorithms, it is small for the generation of new algorithms. The authors also concede this and indicate that there is also value in investigating alternative classification routines. Staudenmayer et al. (2009) presented a routine where 48 volunteers performed light, moderate and intense exercise for 10 minutes a piece. The classification effectiveness that was achieved was very good. Indeed, Staudenmayer et al. (2009) also demonstrates how energy expenditure can be estimated through accelerometer analysis, and when compared to the ground truth, the results which were obtained were competitive with other researchers, such as Crouter et al. (2006); Freedson et al. (1998); Swartz et al. (2000). The results which were obtained involved activities requiring high degrees of energy, such 23 Niall Twomey Chapter 2: as racquetball and basketball. It is shown elsewhere, however, that tests of this nature should not exceed 20 minutes in length as the metabolic response of persons after this time is unreliable (Winter et al., 2006). This is because as the body is exerted to intense or lengthy physical exertion, different attributes will affect the metabolic response which is experienced. Therefore, while the results obtained by Staudenmayer et al. (2009) were accurate, there is the possibility that they may still have been corrupted by this factor. Benbasat and Paradiso (2002) presented a generic motion recognition framework which employed six DoFs to track movements. This platform was designed so that it could be used by researchers with minimal requirement for knowledge of the underlying algorithms. However, while the platform performed very well and adapted to many situations appropriately, it occupied a volume of approximately 500 cm3 , which is not practical for many applications, in particular in the medical setting. Recently, with the integration of inertial sensors into mobile phones, activity analysis has become readily accessible to smartphone users. Interestingly, it was found that by analysis of a single accelerometer from within a smart phone in a user’s pocket, it was possible to predict the identity of users (Kwapisz et al., 2010). Movement-based identification has been performed before in image analysis (Kale et al., 2003) and with multi-sensor gait analysis (Annadhorai et al., 2008) previously, but the novelty of Kwapisz et al. (2010) was the first which obtained this from single sensors in the pocket. Another active area of research is in the estimation of energy expenditure through accelerometer analysis. In particular, with the popularity of mobile phones, and with their integration with accelerometers, overall assessment of daily energy expenditure can be achieved with accelerometer data alone. Indeed, this is a primary application of a number of youth-based anti-obesity initiatives where, integrated with social networking websites, daily activity can be posted as a means of encouraging exercise (LanninghamFoster et al., 2009). Research into activity-based intervention monitoring to combat obesity is gaining momentum (Oude Luttikhuis et al., 2009) and these means of research are not only applicable for countering obesity in youth (Lobstein et al., 2004; Wang and Lobstein, 24 Niall Twomey Section 2.2: Activity analysis 2006), but it also applicable to the adult generation which is also susceptible to this epidemic (WHO, 2000). Accelerometers can also be employed for the analysis of rehabilitation exercises (after a fall, for example). This can be achieved because MEMS accelerometers can provide reliable and objective measures of mobility, gait and balance (Culhane et al., 2005). Accelerometerbased rehabilitation has also been employed to measure the mobility of sufferers of strokes by Uswatte et al. (2005, 2006). These studies have shown that accelerometers have been shown to objectively quantify mobility scores which are within 10% of human-recorded scores (Hester et al., 2006), which shows that accelerometry is an accurate measurement for fully mobile and constrained applications. 2.2.3 Activity-based analysis of OFC In the previous chapter it was stated that there is a tendency for subjects to become quiet and that there is a tendency for them to play less in light of allergic reactions. This effect can be objectively measured by the measurement of activity and by the estimation of energy expenditure, and accelerometry is an appropriate tool for this measurement. Indeed, it has been shown here that accelerometry has the capacity to analyse both activity and energy expenditure in free living and constrained environments alike. For accelerometer-based analysis of OFCs, the primary means of assessment will focus on activity– (i.e. movement) and accelerometer-based energy expenditure. Postural and activity classifications are not performed on this data because the subjects are required to stay on a bed throughout the challenges, and will, therefore, typically be lying supine. 25 Niall Twomey Chapter 2: Figure 2.3: The original galvanometer developed by Willem Einthoven to record the ECG in the early 20th century. 2.3 Heart rate analysis 2.3.1 History and introduction Clinically, ECG analysis is one of the most important area of research for the assessment of cardiac health. The development of modern machines which can record the ECG was pioneered by Willem Einthoven with the invention of the string galvanometer (Barold, 2003). A representation of this machine is shown in Figure 2.3 where the subject has placed his hands and legs in buckets which contain a salt water solution. This is performed to increase the conductivity of the electrical signals of the heart to the galvanometer. The signals were then amplified by a large electromagnetic amplifier, and a string was deflected based on the signals which are obtained. This deflection was transferred to paper and provided a representation of the ECG. 26 Niall Twomey Section 2.3: Heart rate analysis The naming conventions of the ECG waveforms that are used today were initially coined by Einthoven. However, while the terminology is similar, the technology which is employed in the modern recording of the ECG has changed dramatically in size, power and proficiency. Indeed, with today’s technology, the ECG can be recorded with circuitry no bigger than a coin for many hours (Burns et al., 2010), whereas with Einthoven’s contraption, five assistants were required to operate the machine, and water was required to cool the active mechanisms (Churchill, 2008). Even at its primitive beginnings, the ECG was shown to be diagnostically valuable, and doctors, such as Einthoven and later Thomas Lewis, pioneered the investigation of the effects of disease on the ECG. Since this era, the development of cardiology has evolved, and now 12-lead ECG is the state of the art for preliminary diagnosis of heart disease. This type of analysis can not only detect irregularities in the ECG, but with training, cardiologists can also localise the precise area of the heart which is subject to the condition without having to perform more invasive tests (Fuchs et al., 1982), such as contrast-enhanced ultrasound (Martegani et al., 2008; Furlow, 2009). However, upon such diagnoses, more invasive procedures may be scheduled in order to obtain more detailed information about the condition under investigation. Due to the expertise required to apply and interpret 12-lead ECG, these analyses are typically only performed by trained cardiologists when clinical diagnosis of heart disease is required. The lengths of these recordings are typically quite short. For medium-to-long term ECG recordings, 3– and 5-lead ECG are used, which can result in ECG traces similar to that shown in Figure 2.4 (Dublin Institute of Technology, 2013) in which the segments of the ECG are annotated as described by Einthoven. While these 3-lead recordings can highlight cardiac arrhythmias and other heart conditions, their specificity is quite low, and these recordings are often discarded after recording. ECGs recorded with 3– and 5-leads are more typically used in the assessment of the variability of the heart over time. Statistical measurements obtained between sets of RR intervals of the heart rhythm are extracted, and these are known as features (see Figure 27 Niall Twomey Chapter 2: Figure 2.4: Labelled ECG waveform (Dublin Institute of Technology, 2013). 2.4). The variation of these is computed mathematically to quantify the state of the heart. This is known as heart rate variability (HRV) feature analysis. 2.3.2 Applications of HRV analysis The ECG is the best way of measuring variation in the heart (Sabiston Jr, 1981). The ECG has previously been shown to change due respiration (Baldzer et al., 1989; Yamamoto et al., 1991; Jan et al., Nov; Kemp et al., 2010), exercise (Robinson et al., 1966; Tulppo et al., 1996; Cole et al., 2000; Sandercock et al., 2005), stress (Falkner et al., 1979; Kostis et al., 1982; Bernardi et al., 2000; Obrist et al., 2007; Riese et al., 2004; Bořil et al., 2012; Bailón et al., 2010), hypo-tension (Hernando et al., 2011), heart disease (Dekker et al., 2000; Tsuji et al., 1996; Antelmi et al., 2004; Nolan et al., 1998; Licht et al., 2008), anxiety (Licht et al., 2009), asphyxia (Boardman et al., 2002) and later in this thesis, investigations are performed into whether the heart rate variability due to allergic reactions. The heart rate variability (HRV) features, and in particular the sympathetic and parasympathetic indices, are strong indications of the risk of sudden adult death (Bradley and Floras, 2003; Lombardi et al., 2001; Seccareccia et al., 2001; Dekker et al., 1997; Huikuri et al., 2001). This is a condition in which otherwise healthy adults die suddenly. Risk factors for this include a number of well-known arrhythmia and other heart conditions which can be measured with the ECG. Screening can be performed with ECG and persons 28 Niall Twomey Section 2.3: Heart rate analysis can be alerted to this which will then allow for medication and exercise regimes to be used by the patient so that the risk of death is reduced. Much work has been performed into digital signal processing (DSP) related investigation and conditioning of the ECG (Sörnmo and Laguna, 2005, 2006; Schlindwein et al., 2006). For example, the effect of breathing can be ascertained from signal processing analysis of the ECG (Bailón et al., 2007). Indeed, HRV analysis has been used in order to assess seizure in neonatal (Doyle et al., 2010) and adult (Jeppesen et al., 2010) hospitalised patients. The results which have been obtained with these correlations have shown that HRV-based seizure detection is difficult to obtain in general, and that electroencephalogram (EEG) analysis provides the best results. However, it is the case that some patient’s HRV features react strongly to seizure, and for these reasons their analysis can offer good subjective seizure appraisal when applied with appropriate persons (Doyle et al., 2010). While the role of HRV has been stated as being capable of detecting abnormalities in the ECG, it can also be used to characterise normality in adults (Nunan et al., 2010) and children (Aziz et al., 2012) which can be employed to rule out certain diseases. Heart monitoring can be used in many situations, even outside of medically diagnostic applications. For example, HRV analysis can be employed in order to assess the anxiety and stress of persons throughout their daily work commute and employ. With medical professionals, for example, stress can impact their work, decision making and vigilance. Therefore, with HRV analysis, times at which these personnel become stressed can be assessed automatically (Jovanov et al., 2011), which leads to the possibility of alerting the medic of their state of stress. If conditions allow, relaxation methods could be followed. With car driver analysis, the heart rates can be detected and drivers who might be suffering from road rage can be identified (Bořil et al., 2012; Healey and Picard, 2005). This can provide an objective feedback mechanism for drivers and can contribute to obtaining safer roads. In both of these examples, however, the participants must be compliant with analysis and towards the feedback for benefits to be obtained from this. 29 Niall Twomey Chapter 2: Therefore, the use of the HRV features for the detection of signatures of allergy will be investigated in order to discriminate between allergic and non-allergic subjects. HRV features have been shown to be of good clinical use for the assessment of many stressful environments and many acquired and chronic medical conditions, and it is believed that signatures of allergy will be uncovered by analysis of these features. 2.4 Machine learning and classification 2.4.1 Introduction to classification Applications in which activity and HRV analyses have been employed have been described in the previous sections. However, in order to use these analyses in an automated manner, it is generally necessary to employ classification algorithms. A number of machine learning algorithms which would be appropriate for classification of OFC are described here. It is important to note that the algorithms which are employed are generally independent of their applications and can be used to solve a very wide range of other applications when deployed appropriately. For example, classification can be used for many applications including email spam detection (Andersen et al., 2008), fraud detection (Phua et al., 2004), vibration analysis of bridges, cars, engines, etc Oh et al. (2009), to name a few. Machine learning has also applied to activity analyses (Ravi et al., 2005; Mannini and Sabatini, 2010) and HRV (Kononenko, 2001) . The use of classification for these will be discussed in a later section. 2.4.2 Background to machine learning Classification is the process of automatically and intelligently assigning labels to data. The label which is assigned is subject to the application which is being investigated. For example, with accelerometer-based classification the task at hand might be to determine 30 Section 2.4: Machine learning and classification Niall Twomey the activity which is being performed, and possible labels might include walking, running, jumping, etc. Likewise, for ECG analysis, the task might be to determine if a specific event has occurred, and the labels might be normal rhythm, arrhythmia, ectopia, etc (the meanings of these conditions are discussed later). Two main branches of classification are available: supervised and unsupervised. In supervised learning, all the data which is employed is labelled according to the application, whereas in unsupervised learning, supplementary algorithms are first required to automatically assign labels to the data, and then classification can be performed with these labels. In this section (and indeed for the remainder of this thesis) the classification routines which are discussed relate to supervised classification only. Two independent stages are involved with supervised classification: training and testing. The training stage involves employing data with known labels and ‘feeding’ this data into a learning algorithm. Once this process has completed, other data (which are also labelled) are used to test the classifier. By this process the performance of the classifier can be assessed as selected classification labels will be compared to the ground truth. The process where the training data and testing data are separated is the preferred routine which can be employed as it can assess the generalisation error of the problem at hand (Vapnik and Kotz, 1982). The most common type of classifier is the discriminative classifier. This type of classifier is trained on more than one data label. Then, by the algorithms within its framework, the classifier can learn to discriminate between the classes it was trained with. It is not guaranteed that the ground truth will be recovered by classifiers, however, and the accuracy of classification is subject to the application in question and the data which are available. Many different types of classifier exist, including — but not limited to — support vector machines (SVMs), Gaussian mixture models (GMMs), k-nearest neighbourss (kNNs), naive Bayes, etc. When multiple learners cooperate together for tackle a machine learning problem procedures are termed ensemble or boosting methods. Eventually, all of these 31 Niall Twomey Chapter 2: Figure 2.5: Example of how classification is obtained with two-dimensional data (top) with SVM based classification (bottom left) and GMM based classification (bottom right). 32 Niall Twomey Section 2.4: Machine learning and classification algorithms reduce to determining which of a number of labels is the most likely, having been trained on one dataset and being tested on new data. Clearly, for successful application of machine learning, the training dataset must contain a wide range of examples that are representative of the problem under investigation. An example of machine learning can be visualised in Figure 2.5, where SVM and Gaussian mixture model (GMM) classifiers are employed in order to classify two classes (i.e. labels) of data which are marked as ⇥ and ◦. Areas on the left hand side of the images (highlighted in red) are the regions which have been selected as being more indicative of the ⇥ class, while those highlighted on the right hand side (in blue) are more indicative of the ◦. It can be seen that for this simple example, the classification routine attempts to split the feature space in two regions, one for each class of data. An obstacle which is common to all classifiers is the means in which data should be best modelled in order to minimise (and ideally eliminate) uncertainty between the classes. This example shows that complicated decision boundaries can be obtained for relatively well-behaved data. With the example in Figure 2.5, a case where two classes have been analysed is presented. It can sometimes occur that more than two classes require classification, for example previously it was stated that activity classification might assign walking, running, and standing labels. However, it can also be the case that knowledge of only one class is available. In this case, the classification routine is termed ‘novelty’ or ‘abnormality’ detection. 2.4.3 Novelty detection The task of novelty detection algorithms is to detect data which are not from the distribution of the training data. These are termed ‘novel’ data points. This type of classification is the means by which allergy must be classified because only one class of labelled data is available, and this is the data which were recorded before the administration of the first dose of the allergen. This is guaranteed to be non-allergic because the allergen has not yet been administered. Classification of allergy, therefore, 33 Niall Twomey Chapter 2: involves training classification routines on this non-allergic data and determining the boundaries of novelty from this. Data found to be outside this boundary would then be considered novel (i.e. allergic). With the allergy data, one could hypothesise about the temporal labels during the OFC and thereby obtain a discriminative problem (e.g. one could assume that features 10 minutes prior to a reaction belongs to the allergic class, for example). However, these hypotheses will only enable the classifier to learn this assumption rather than facilitate detection of the true signatures of allergy, and this may obtain sub-optimal performance. Consequently, novelty detection is the preferred method for allergy detection. Novelty detection applications are not as popular as multi-class problems, but they have been employed for many applications as expert labels are often difficult — and sometimes impossible — to obtain. These algorithms are trained on the data which are labelled and obtain ‘boundaries of novelty’ and the data within these boundaries are ‘normal’ while those from without are novel. The two most popular types of novelty detection routines involve support estimation by a one-class SVMs, and density estimation by GMMs. 2.4.3.1 One class SVMs and GMMs The discriminative SVM was mentioned previously and an example of how it partitions the data was shown in Figure 2.5. SVMs compute a maximum-margin hyperplane between the distributions of the classes which are investigated (Flach, 2012). However, with one class SVMs, only one class is available for training, and a maximum margin hyperplane cannot be obtained. One-class SVMs therefore learn a hypersphere which ‘surrounds’ a specified percentage of the training data, and the radius of this sphere is employed to estimate the support of a distribution (Schölkopf et al., 2000, 2001). Then, data which lies within this sphere are considered normal, and data which is outside the sphere are novel. The extent of the novelty of this data can be calculated by computing the perpendicular distance from that point to the hypersphere. In effect, the radius of this sphere is a threshold which is selected during the learning process. 34 Section 2.4: Machine learning and classification Niall Twomey The GMM classifier sets out to model a distribution with a mixture of Gaussians. With this, the ‘probability’ that new data are normal can be assessed by ‘reading’ the value off the learnt distribution. It is, however, necessary to apply thresholding for GMM-based novelty detection after the distribution has been modelled. This is acceptable as this is also performed by the one-class SVM, but this process is ‘hidden’ by the learning algorithm. 2.4.3.2 Example of novelty detection Figure 2.6 shows an example of novelty detection algorithms being employed. Here, the same data that was used in Figure 2.5 were employed, and the novelty detection routines were trained on data from Class 1 only. In the images, the red areas are more indicative of normal data, and blue areas are more novel. The upper image shows the decision surface obtained from the one-class SVM. While this algorithm learns a hypersphere, the shape of the separation is not circular because of the ‘kernel trick’ (Aizerman et al., 1964) that was employed. This projects the original data into high-dimensional feature spaces, and it is in these feature spaces where the sphere is obtained. While the examples in Figure 2.5 effectively separated the feature plane, novelty detection is more likely to isolate a small region of the plane and define this as being ‘normal’. With both examples here, the normal data are well modelled and the highest probabilities of ‘normal’ data are found in the centres of the distribution. The further away from this that the data points are found indicate more novel features. It is difficult to say which routine is superior due to the fact that both have been employed very successfully in many applications. In the example in Figure 2.6 the one class SVM learns the boundary of normal data which provides a quick roll-off between ‘normal’ and ‘abnormal’ data. GMMbased modelling models the distribution, so it has a slower roll-off than the SVM-based procedure. 35 Niall Twomey Chapter 2: Figure 2.6: Example of how novelty detection algorithms can be employed to determine novel and normal data points. The upper image shows the distribution of the labels, and the lower-right figure shows one-class SVM while the lower-left figure shows the example of GMM-based novelty detection on the same data. 36 Section 2.4: Machine learning and classification 2.4.4 Niall Twomey Applications of novelty detection GMM– and SVM-based novelty detection algorithms have been employed in a number of machine learning applications to very good effect. This section will focus on physiologicalbased classification as this is the application which is under investigation in this thesis. Novelty detection has been used for the automatic and online classification of epileptic seizure with intracranial EEG (Gardner et al., 2006; Gardner, 2004). This classification routine involved the use of one-class SVMs and obtained excellent sensitivity of classification. However, while excellent sensitivity was obtained, the authors did not report the specificity of the routine. The methodology investigated also only used three energy-based features to describe the EEG, which is a very low number of features in comparison to the state of the art in seizure detection (Temko et al., 2009, 2011b; Thomas et al., 2009). Roberts (1999, 2000), investigated the use of extreme value theory (the branch of statistics which deals with abnormally high or low values in distributions) in conjunction with GMMs in order to assess a number of different applications in the medical and image processing fields. It was shown that GMM-based novelty analysis of hand tremors, epileptic seizure, vigilance (during repetitive, boring or long-term tasks) and anaesthesia are applications with which novelty detection can be employed to classify medical conditions. These papers do not provide any metrics of the overall performance of classification, but rather demonstrated the applicability of novelty detection for these applications. However, they do show that GMM-based analysis is useful for movement (i.e. accelerometer-based) applications, and that it is also appropriate in medical studies. Speech recognition has also employed novelty detection (Markov and Nakamura, 2008) for improved speaker identity, and up until very recently GMMs are the state of the art for speaker identification (Fine et al., 2001; Faundez-Zanuy and Monte-Moreno, 2005). 37 Niall Twomey Chapter 2: 2.5 Food challenge data collection 2.5.1 Recording platform The requirements for the recording platform are that the acceleration and ECG must be recorded concurrently during OFCs. Then, DSP is employed to identify the applicability of the acceleration and ECG data for classification of allergy. The sensing health with intelligence, modularity, mobility and experimental reusability (SHIMMER device) (SHIMMER research, 2010; Burns et al., 2010) is the wireless sensing platform that was chosen for data collection during the OFCs, and is illustrated in Figure 2.7. This is a sensing device which features Bluetooth and 802.15.4 radios, tri-axial accelerometer (Freescale MMA7260Q, with programmable sensitivity⇤ ) with programmable axial sensitivity (selectable from 1.5 g, 2 g, and 4 g), a micro SD card slot (up to 2 GB) and a ultra-low power microprocessor (MSP430). The microprocessor is programmed with the network embedded systems C (nesC) event-driven programming language, and is supported by the TinyOS component-based operating system. The SHIMMER device also supports ‘daughterboards’ which extend the functionality of the platform, and the ECG daughterboard allows for ECG recordings. The SHIMMER device offers a small form factor (53 mm ⇥ 32 mm ⇥ 25 mm), which is approximately 10 times smaller than the platform that was previously discussed in this chapter by Benbasat and Paradiso (2002). This is a very important consideration for the monitoring of children during OFC. The SHIMMER device has been awarded European conformity (CE) certification mark (SHIMMER-research, 2010), signifying it fulfills European Union health and safety requirements for clinical-based use, and it is therefore appropriate for data collection during OFCs. The acceleration signals and the ECG traces were sampled at 256 Hz. ⇤ These accelerometers are sufficiently sensitive for gait– and energy expenditure-based analysis and have been used for many such application domains. 38 Niall Twomey Section 2.5: Food challenge data collection (a) Front of SHIMMER. (b) Back of SHIMMER. Figure 2.7: Diagram of the SHIMMER device, with the various components annotated (reproduced with permission from SHIMMER-research (2010)). 2.5.2 Integration with oral food challenge The Department of Paediatrics and Child Health in Cork University Hospital conduct OFCs on a regular basis. A collaboration between the Department of Electrical and Electronic Engineering of University College Cork and the Department of Paediatrics and Child Health of Cork University Hospital was organised, and ethical approval was sought and obtained from the ethics board to collect accelerometer and ECG data from subjects undergoing OFC with the SHIMMER device. Participation in data collection was voluntary, and if the parents or the child did not wish to partake, the OFC progressed as normal. With the requirements for data collection defined, the procedure of the OFC was modified slightly in order to accommodate data collection. Figure 2.8 presents a modified flowchart in which two supplementary stages are introduced to facilitate the use of the SHIMMER device for data collection. The first supplementary step required is to obtain informed consent from the subject’s guardians for monitoring the subject’s physiological signals during the food challenge in accordance with the ethical approval. Allergists discuss with the guardian the datacollection procedure and provide them with literature they must read before signing the consent forms. If consent was provided, a chest strap holding the SHIMMER device 39 Niall Twomey Chapter 2: Subject arrives at Hospital Preliminary tests Consent. SHIMMER applied Observe for 10 — 20 minutes Checkup Fail Pass Administer Dose Yes Dose Remaining? Subject diagnosed ‘allergic’ Fail No 2hrs Observation SHIMMER removed Pass Subject diagnosed ‘non-allergic’ Challenge over Figure 2.8: Modified oral food challenge flowchart which is employed to accommodate the introduction of the SHIMMER monitoring device for data collection during OFCs. 40 Niall Twomey Section 2.5: Food challenge data collection was fastened around the trunk of the subject and the ECG electrodes are configured to record data. The SHIMMER device then connects to a host computer over which data is transmitted. The second supplementary stage involves removing the SHIMMER device from the subjects after the test has concluded. The OFC process is unchanged at this stage, and diagnoses are unaffected. 2.5.3 Other data While the blood pressure and blood oxygen saturation levels are recorded periodically by the allergists who conduct the OFCs, these were not recorded during the data collection stage. The primary reason for this is because the eventual allergy classification framework is envisioned to require minimal interaction by the allergists, and to also record the physiological data in a non-invasive manner which will introduce as little discomfort as possible to the subject. It would not be in line with this methodology, therefore, to require the allergist to log with blood pressure and blood oxygen saturation levels in a computer after every checkup. Moreover, it is also not feasible to delegate equipment to record these (in particular the blood pressure) as they are are invasive sensors that require compliance and subjectparticipation for meaningful readings. It is also the case that blood pressure and oxygen saturation signals will be the final physiological conditions which change under the influence of allergy, and these will only change after perceivable symptoms have manifested. 2.5.4 Data recording In total, the acceleration and ECG data of 24 subjects were recorded during OFCs. Of these subjects 15 reacted to the food type during OFC and were diagnosed allergic. These subjects are tabulated in Table 2.1. All subjects are ten years of age and under. The 41 Niall Twomey Chapter 2: Table 2.1: Tabulation of the characteristics of the subjects who were recorded for this study. Index Gender Age Recording length (minutes) Allergen 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 male male male male male female male male female male female female female male female male male female male female female male female male 1.5 years 6 years 9 years 12 months 8 years 9 years 6 years 5 years 8 years 3 years 6 years 5 years 3 years 8 years 9 months 6 years 10 years 4 years 6 years 1.5 years 7 months 12 months 4 years 2 years 14 95 90 100 120 33 57 100 50 82 85 40 105 125 96 69 91 125 33 110 57 81 58 90 wheat peanut egg milk peanut peanut soy peanut egg (cake) milk peanut milk milk soy wheat egg egg (cake) soy peanut milk milk milk wheat peanut Diagnosis Allergic Non-allergic shortest test was obtained with Subject 1 whose OFC lasted for 14 minutes, and the longest challenge lasted 130 minutes (Subject 19). While the lengths of the challenges should be consistent, in particular for non-allergic subjects, the OFC is a dynamic test in which delays can occur, and this explains the variances in challenge length. 42 Niall Twomey Section 2.6: Conclusion 2.6 Conclusion It has been shown in this chapter that inertial measurement and ECG analysis can each be employed for the classification of many physiological and clinical applications. In particular, changes in HRV have been associated with many chronic and acquired medical conditions. It has, therefore, been reasoned here that, while these have not been used for allergy detection, acceleration and ECG analyses are capable of assessing the observations of the allergists in an objective and quantifiable manner. Later chapters will now investigate the applicability of these algorithms and for machine learning based classification of allergy. The work presented in the remainder of this thesis is original and has not been investigated by other researchers before. 43 CHAPTER 3 Accelerometer-based analysis of oral food challenges 3.1 T Introduction HIS chapter introduces the concept of accelerometer-based activity and energy expenditure estimation. This is investigated because allergists have reported that there is a tendency for the activity and energy levels of subjects to change before an allergic reaction presents, and these can be measured with accelerometers. Energy can be measured in a number of ways. Physically, the gold-standard of energy expenditure measurement is calorimetry. This measures the heat that radiates from a body and, in combination with metrics such as the size and mass of that body, calorimetry calculates true energy expenditure. This process is feasible for the measurement of energy 44 Niall Twomey Section 3.1: Introduction from small immobile objects, but it is not feasible to use this process to measure the true energy expenditure of people in a non-invasive and comfortable manner. Therefore, indirect calorimetry was devised. This exploits the fact that in order to expend heat, humans (and indeed all animals) must breathe in oxygen and expel carbon dioxide and other gases. By measuring the volumes of O2 and CO2 entering and leaving a subject as they breathe, and by combining this with standard physiological characteristics (such as mass, height, etc), the true energy expenditure of a subject can be inferred. This process is called indirect calorimetry as energy is not directly measured. Two indirect calorimetry processes exist for the measurement of energy expenditure in humans. Both of these methods measure the same data with similar accuracy. The first method utilises a mask that must be worn by a subject who breaths through this into a tube. The oxygen and carbon dioxide content of the subject’s breath is analysed by specialised and calibrated hardware (Cosmet, 2013). This mechanism is carefully designed to ensure that that minimal difficulties are introduced to a subject’s respiratory process by the applied hardware. The system, based on the gas levels recorded, then produces a report of energy expended every time a subject has completed a respiratory cycle, i.e. the subject must both inspire and expire before energy expenditure readings are obtained as both processes provide metrics that are required for the analysis. The regression process was described in detail by Ferrannini (1988). The second method that can be used is called the doubly labelled water method (Schoeller et al., 1988). This process employs isotopes of hydrogen (deuterium) and oxygen (oxygen18). These isotopes are not radioactive or dangerous, and can be found naturally in the atmosphere. Here, they are used as chemical markers. The doubly labelled water method is conducted within a sealed room where the air and temperature conditions are monitored and controlled by trained technicians. The isotopes are introduced into the atmosphere of the room in known and controlled volumes. By monitoring the gases expressed by the respiration process which are marked by these isotopes, and by monitoring the change 45 Niall Twomey Chapter 3: in deuterium and oxygen-18 in the room, true energy expenditure can be computed with similar equations that were employed with the first method. Both of these methods measure true energy expenditure, but the doubly labelled water method is the lesser used of the two as it is more expensive to operate and requires a dedicated room with sophisticated controls and experienced personnel to correctly operate the protocol. It is not practical to employ indirect calorimetry in every situation, however, and it is not suited for recording data during oral food challenges where subjects are young and often become sick and require medical assistance and the comfort of their parents. Therefore, a recent area of research has been in the estimation of true energy expenditure based on the monitoring of signals which can be obtained in a non-invasive, low-power and low-cost manner. Signals obtained from uni-axial (single-dimensional) and tri- axial (three-dimensional) accelerometers have been shown to be highly correlated with true energy expenditure. Figure 3.1 shows a diagram of a body and the three axes of acceleration that are monitored by tri-axial accelerometers. Here, anterio-posterior is the x-axis, mediolateral is the y-axis and vertical is the z-axis. Indeed, accelerometry can provide a number of advantages over indirect calorimetry and Chapter 2 demonstrated how acceleration signals can be employed for activity analysis too. In this chapter, a distinction is drawn between activity and energy expenditure analysis — activity is the measure of movement and mobility, whereas energy expenditure employs activity analysis to estimate energy. This chapter compares a number of accelerometer-based activity and energy expenditure estimation algorithms. These algorithms are validated with independent data recorded during new experiments in order to ensure that the values which result from these algorithms are accurate. The accelerations of subjects undergoing oral food challenges which were recorded are then assessed yielding activity and energy expenditure estimation readings. Based on the results of these, the applicability of accelerometer-based allergy detection is discussed. 46 Section 3.2: Accelerometer-based activity analysis Niall Twomey Figure 3.1: The x (anterio-posterior), y (medio-lateral), and z (vertical) acceleration directions in relation to a body. 3.2 Accelerometer-based activity analysis 3.2.1 Activity metrics A number of metrics can be calculated on raw acceleration values over a time-window of length Te seconds. The most common activity analysis metrics which form the basis of all of the algorithms discussed in this chapter are described here. • The integral of absolute acceleration (IAA) computes the absolute value of acceleration. The integral of absolute acceleration in the x-axis (IAAx ), integral of absolute acceleration in the y-axis (IAAy ) and integral of absolute acceleration in the z-axis 47 Niall Twomey Chapter 3: (IAAz ) can be computed for tri-axial accelerometers and these are defined by Z IAAx (t) = t=⌧+Te |ax (t)| dt, (3.1) ##a (t)### dt, y (3.2) |az (t)| dt, (3.3) t=⌧ Z t=⌧+Te # IAAy (t) = t=⌧ Z t=⌧+Te IAAz (t) = t=⌧ where ax , ay and az are the acceleration values in the x, y and z directions. These are in g-units (i.e. multiples of 9.81ms −2 ). In subsequent sections, when a number is followed by the letter g (e.g. 1 g), this identifies multiples of the unit. • The total integral of absolute acceleration (IAAt ) combines the three IAA values together. This value is sometimes referred to as activity and acceleration counts (Chen and Bassett Jr, 2005) and is defined by Z IAAt (t) = t=⌧+Te ⇣ t=⌧ ⌘ ## ## |ax (t)| + #ay (t)# + |az (t)| dt = IAAx (t) + IAAy (t) + IAAz (t). (3.4) (3.5) • The magnitude of acceleration is the square root of the sum of the squares of the raw accelerations. This metric provides an indication of overall acceleration but the directionality of the signal is removed by the squaring and summation. This is computed by Z t=⌧+Te q ax (t)2 + ay (t)2 + az (t)2 dt. IAV(t) = t=⌧ (3.6) • Instantaneous velocities in x, y and z directions can be computed by Z vx (t) = vy (t) = vz (t) = t=⌧ t=0 Z t=⌧ t=0 Z t=⌧ t=0 ax (t) dt + vx (t = 0), (3.7) ay (t) dt + vy (t = 0), (3.8) az (t) dt + vz (t = 0). (3.9) 48 Section 3.2: Accelerometer-based activity analysis Niall Twomey However, these values must be integrated from t = 0 as acceleration is the continuous derivative of velocity. • The mean kinetic energy can be computed by m KEtot (t) = 2Te Z ⌘ vx (t)2 + vy (t)2 + vz (t)2 dt, t=⌧+Te ⇣ t=⌧ (3.10) where m is the mass of the body on which acceleration is being measured, and vx , vy and vz are the velocities in the x, y and z directions respectively. • The mean power can be computed by m P= 2Te Z t=⌧+Te ## t=⌧ # ## d ⇣v (t)2 + v (t)2 + v (t)2 ⌘### dt. y z # dt x # (3.11) Equations 3.1 — 3.11 are fundamental equations which are employed by researchers for activity analysis and energy expenditure estimation algorithms (Bouten et al., 1994, 1997a,b; Crouter et al., 2006; Chen and Sun, 1997). These equations are presented in the continuous domain, but are employed in the discrete domain for the work here. Replacing the time variable, t, with the sample index, k, yields the discrete-domain equivalent of these equations. The continuous equations were presented in this section as they are more intuitively understood than the discrete equivalents. 3.2.2 Energy expenditure estimation algorithms The goal of accelerometer-based energy expenditure (EE) is to obtain an energy expenc based on activity analysis which is as close as possible to true diture estimation (EE) energy expenditure (EEtrue ). Four different algorithms are described here which have been designed to accomplish this. The algorithms here are selected for investigation because they encompass a number of different aspects of data modelling. Firstly, Bouten et al. (1994) provided algorithms which 49 Niall Twomey Chapter 3: performed an overall assessment of energy expenditure based on one regression equation. Chen and Sun (1997) developed regression equations which were linear and non-linear in nature, and these equations are designed to adapt to the physiological features of the subjects being investigated. Finally, Crouter et al. (2006) employed a ‘decision tree’like algorithm to estimate energy expenditure based on the extent of activity which was detected in the preceding minute. 3.2.2.1 A note on the energy expenditure estimation algorithms In the next three subsections, a number of algorithms are described which can be employed for the estimation of expended energy by persons wearing accelerometers. This thesis does not cover the background on how to obtain such models, and for information on these processes, and for justification of the regression units, the original publications that are cited in the subsequent subsections should be reviewed. For every algorithm presented below, the models are parameterised with values that were found to minimise a specific cost function (e.g. the squared difference between the true and estimated energy expenditure). Each model is therefore non-trivially parameterised by seemingly arbitrary constants, but it should be noted that these were obtained by employing regression techniques to learn the relationship between acceleration signals and true energy expenditure. These parameters are therefore, by definition of the regression techniques, of appropriate units to convert the input values to the units of the target variables (Bishop et al., 2006; Flach, 2012). Therefore, the specific parameters that were obtained from the training process will not be discussed, but are presented here to facilitate speedy reproduction of these algorithms. 3.2.2.2 Bouten et al Bouten et al. (1994), provided regression equations for human-based energy expenditure estimation with a tri-axial accelerometer. 50 Analysis of 30 seconds of accelerometer Section 3.2: Accelerometer-based activity analysis Niall Twomey data is required for this estimation. Based on the data which were available, linear relationships were discovered between acceleration counts and energy expenditure values. Two regression equations were generated to estimated energy expenditure estimation due c act ), and are defined by to activity (EE c act,x = −0.176 + 0.0851 ⇥ IAAx , EE (3.12) c act,t = 0.104 + 0.023 ⇥ IAAt , EE (3.13) c act,x and EE c act,t are the energy expenditure due to activity (EEact ) estimates where EE based on IAAx and IAAt respectively. As these equations model EEact , the resting energy expenditure (REE) must be added to the results obtained to achieve total energy expenditure. REE is calculated by Equations (3.14) and (3.15) for male and female subjects respectively (Hemokinetics, 1993). REEm = REEf = 0 1 BB 215 ⇥ weight (kg)CC BB CC BB CC BB CC BB + 12 ⇥ height (m)CCC BB CC BB CC BB CC BB − 513 (years) ⇥ age CC BB CC BB CC B@ A + 4687 100, 000 0 1 BB 150 CC (kg) ⇥ weight BB CC BB CC BB CC BB + 9 ⇥ height (m)CCC BB CC BB CC BB CC BB − 353 (years) ⇥ age CC BB CC BB CC B@ A + 49854 100, 000 51 (3.14) (3.15) Niall Twomey Chapter 3: Therefore, the total energy expenditure estimation is given by c = EE c act + REE. EE (3.16) Bouten et al. (1994) generated Equations (3.12) and (3.13) by recruiting subjects who walked on a treadmill between 3 and 7 kilometers per hour (km/h). An indirect calorimeter was worn by the participants and the acceleration was recorded at the same time. The parameters of these equations were obtained by a least squares analysis of the c parameters which minimised the squared difference between EEtrue and EE. Equation (3.12) estimates the energy expenditure with recorded accelerations from the IAAx direction only. This is because the majority of the effort required for movement is stated in this publication as being expended in this direction. However, Equation (3.13) estimates energy expenditure based on IAAt . This allows the regression to produce estimates which are representative of whole-body movement. In this work, Equation (3.13) is used over Equation (3.12) because it was stated as providing better results, and because it is intolerant to orientation errors as the overall acceleration is measured (see later). 3.2.2.3 Chen et al While Bouten et al. produced two generalised equations estimating energy expenditure, Chen and Sun (1997), allowed for specialisation of their models based on the age, gender, mass and height of the subjects whose acceleration was recorded. The doubly labelled water method was employed here to obtain the reference energy levels. The participants who were recruited were within the controlled room for two 24 hour periods, and in total over 6000 hours of data was recorded. The range of activities 52 Section 3.2: Accelerometer-based activity analysis Niall Twomey which the participants performed during their stay included sedentary, light, moderate and vigorous activities. The horizontal and vertical acceleration vectors were separated, as it was reasoned by these researchers that different effort would be expended in vertical and horizontal directions (Chen and Sun, 1997). The horizontal and vertical accelerations are computed by Equations (3.17) and (3.19) respectively. H(k) = q ax (k)2 + ay (k)2 q V (k) = az (k)2 = |az (k)| (3.17) (3.18) (3.19) Linear and non-linear algorithms were computed by Chen and Sun (1997), and these both allow targeted specialisation of the algorithm towards specific physical characteristics (i.e. age, height, weight, etc) in order to obtain the minimum error of estimation. • Linear algorithm The linear algorithm assumes the form of c act (k) =aL ⇥ H(k) + bL ⇥ V (k), EE (3.20) where H(k) and V (k) are calculated by Equations (3.17) and (3.19) respectively. 53 Niall Twomey Chapter 3: c The parameters aL and bL were computed by minimising the difference between EE and EEtrue , and the parameters which were found to achieve this on the data available are calculated by aL = 0 1 BB 5.76 ⇥ weight (kg)CCC BB CC BB CC BB BB + 11.95 ⇥ height (cm)CCC BB CC BB CC BB CC BB + 6.89 CC (years) ⇥ age BB CC BB CC B@ CA − 2, 001 1000 bL = , 5.96 ⇥ mass(kg) + 349.5 . 1000 (3.21) (3.22) • Non-linear algorithm The relationship between acceleration and energy expenditure measurements is not necessarily linear (Chen and Sun, 1997). In a similar manner to the linear algorithm, the horizontal and vertical accelerations were also separated for nonlinear equations. The non-linear regression equations take the following form c act (k) = aN ⇥ H(k)p1 + bN ⇥ V (k)p2 . EE (3.23) The optimal scaling and power parameters (i.e. aN , p1, bN and p2) were computed and were given by p1 = 2.66 ⇥ mass(kg) + 146.72 , 1000 54 (3.24) Niall Twomey Section 3.2: Accelerometer-based activity analysis p2 = −3.85 ⇥ mass(kg) + 968.28 , 1000 (3.25) aN = 12.81 ⇥ mass(kg) + 843.22 , 1000 (3.26) bN = 0 1 BB 38.90 ⇥ weight (kg)CC BB CC BB CC BB CC BB − 682.44 ⇥ genderCCC BB CC BB CC B@ A + 692.44 1000 , (3.27) where the ‘gender’ parameter of Equation (3.27) is 1 for male subjects, and 2 for female subjects. 3.2.2.4 Crouter et al Crouter et al. (2006), regressed acceleration readings to a measurement of energy expenditure termed metabolic equivalents (MET). 1 MET is the metabolic rate and energy expended by person in rest (Ainsworth et al., 1993).The ratio of the energy expended performing a task (walking, reading, etc) to that expended in a resting state is the metabolic equivalent of the task, and thus the measurement is dimensionless. Ainsworth et al. defined the rates of metabolic exercise and some of these are reproduced in Table 3.1. Recently, a number of corrections have been made to these figures, (Ainsworth et al., 2000, 2011), and the corrected MET values for normal– and over-weight persons are also shown in Table 3.1. The correction factor between the original and revised MET values is calculable and it is given by Equation (3.28). It is a function of the Harris-Benedict resting metabolic rate (RMR) values which are calculated with Equations (3.29) and (3.30) for female and male subjects respectively. 55 Niall Twomey Chapter 3: Table 3.1: MET activity and corrected MET activity values. Activity Rope jumping Running Bicycling Pushing stroller Calisthenics Shopping Watching TV Original 12.3 9.8 7.5 4.0 3.5 2.3 1.3 Corrected (female) Corrected (male) <77kg ≥77kg <91kg ≥91kg 13.5 10.7 8.2 4.4 3.8 2.5 1.4 16.5 13.1 10.0 5.4 4.7 3.1 1.7 12.9 10.3 7.9 4.2 3.7 2.4 1.4 15.4 12.3 9.4 5.0 4.4 2.9 1.6 Corrected MET value = MET ⇥ 3.5 RMR Rating Vigorous Moderate Light (3.28) RMRf = 5.0 ⇥ Height(cm) + 13.7 ⇥ Weight(kg) − (3.29) 4.7 ⇥ Age(years) + 66.5 RMRm = 1.8 ⇥ Height(cm) + 9.6 ⇥ Weight(kg) − (3.30) 4.7 ⇥ Age(years) + 655.1 The algorithm of Crouter et al. (2006) is a function of the acceleration counts that were calculated over the preceding minute and the coefficient of variation of the counts over the preceding ten seconds. Models were generated to predict the energy expenditures under these conditions. Listing 3.1 details the process of the algorithm (Crouter et al., 2006). The algorithm is selects different regression equations depending on activity that 56 Niall Twomey Section 3.3: Energy expenditure validation Listing 3.1: Crouter et al.’s energy expenditure estimation algorithm. 1 2 i f ( A c c e l e r a t i o n C o u n t s min  50 ) { 3 EEMET = 1 . 0 ; 4 5 } else { 6 7 i f ( CV( A c c e l e r a t i o n C o u n t s 10s )  10 ) { 8 EEMET = 2.379833 ⇥ exp { 0.00013529 ⇥ A c c e l e r a t i o n C o u n t s min } ; 9 10 } else { 11 EEMET = 2.330519 + 0.001646 ⇥ ( A c c e l e r a t i o n C o u n t s min ) 12 − ( 1 . 2 0 1 7 ⇥10 −7 ⇥ ( A c c e l e r a t i o n C o u n t s min ) 2 ) 13 + ( 3 . 3 7 7 9 ⇥10 −12 ⇥ ( A c c e l e r a t i o n C o u n t s min ) 3 ) ; 14 } 15 } 16 was recorded over previous time windows. By this process the algorithm is unique to the energy expenditure field as its formulation concedes that it is beneficial to generate multiple regression equations for different operating points. It was tested over a large set of light to vigorous activities (including those tabulated in Table 3.1) and the authors stated that it was the most accurate algorithm in comparison to algorithms by Freedson et al., 1998; Swartz et al., 2000; Hendelman et al., 2000. 3.3 Energy expenditure validation The algorithms which have have been discussed have been stated to perform well by the authors. However, these algorithms must be replicated and validated as, when these algorithms are employed to assess OFCs, no true energy expenditure data are available. Therefore the accelerometer-based estimates are the only energy expenditure metrics which will be available from these recordings, and validation of the algorithms and validation of the implementation of the algorithms is required. In this section, the validation experiment is discussed. Acceleration and EEtrue metrics c estimation algorithms are were recorded on a small set of subjects (n = 5), and the EE 57 Niall Twomey Chapter 3: Table 3.2: Tabulation of the physical characteristics of the subjects who participated in the accelerometer-based energy expenditure validation test. ID Age (years) Height (m) Weight (kg) 1 2 3 4 5 23 25 22 24 30 1.75 1.83 1.87 1.79 1.70 77.5 81.1 86.7 87 88 µ±σ 24.8 ± 3.1 1.79 ± 0.07 84.2 ± 4.45 employed on the acceleration data, and these results are compared to the EEtrue data. The physical characteristics of each subject were recorded and are presented in Table 3.2. 3.3.1 Experimental setup A treadmill (Powerjog GX100) was set up to operate from 3 — 7 km/h at 1 km/h increments. The treadmill ran at each speed for four minutes at a gradient of 1%, which has previously been stated as emulating walking on a flat surface (Jones and Doust, 1996). The recruited participants walked on this while EEtrue was recorded by a cardio pulmonary exercise testing (CPET) indirect calorimeter (Cosmet, 2013). The CPET was calibrated before each recording. A gas mask was worn by the subject and it was secured around the subject’s nose and mouth. Care was taken to ensure that the volunteers were comfortable with the mask and that no difficulty in breathing was introduced by the equipment. A hose leaving the mask is attached directly to the gas analysers in the CPET calorimeter that was used. To ensure that no gas entered or exited the seal about the subject’s face, the subject was asked to momentarily block the outlet of their gas mask. If they were unable to express air during this brief time the mask was deemed securely fastened and the test continued. The time for which the subject walked at each speed was chosen at four minutes due to works published by Chatagnon and Busso (2006), and Winter et al. (2006). Chatagnon and 58 Niall Twomey Section 3.3: Energy expenditure validation Busso (2006) provides the upper bound of the time that an exercise must be performed for a body to have reached a steady metabolic state as being between 2 and 3 minutes, subject to the fitness of the individual. Winter et al. (2006) states that metabolic tests of this nature should not exceed 20 minutes in length as reliable metabolic response after this time is unreliable because as the body is exerted to more physical exertion, different attributes of the body will affect the metabolic rates of subjects. Employing the treadmill speeds with consideration to Chatagnon and Busso (2006), and Winter et al. (2006) dictates that each speed should be recorded for four minutes. 3.3.2 Pre-processing 3.3.2.1 Codeword conversion The accelerometer on the SHIMMER device produces analogue voltages which the analogue to digital converter (ADC) samples at 12-bits of resolution (allowing for 4096 possible values). Therefore, the data which is obtained by the microprocessor is in the form of 12-bit digital codewords, and these codewords must be converted to measurements of acceleration which are measured in g’s. At rest, the absolute value of the three axes of a tri-axial accelerometer will sum to 1 g, due to the orthogonal orientation of the axes of the sensor. The absolute value of acceleration is obtained in the discrete domain by aabs (k) = q ax (k)2 + ay (k)2 + az (k)2 . (3.31) Figure 3.2a shows the raw codewords (cx , cy , and cz ) which were recorded by the microprocessor when the SHIMMER device was orientated on its six faces for equal periods of time. It should be noted that this data was spliced so that each face of the SHIMMER device provides equal portions of data which ensures that no axis dominates 59 Niall Twomey Chapter 3: Codeword 3,000 2,500 cx cy cz 2,000 0 10 20 30 40 50 60 70 Time 80 90 100 110 120 100 110 120 (a) Accelerometer codewords on X, Y and Z axes Gravity (g) 1.4 1.2 1 0.8 0.6 0 10 20 30 40 50 60 70 Time (s) 80 90 (b) Combination of raw accelerometer data obtaining the absolute acceleration Figure 3.2: Demonstration of the conversion process from the raw digital codewords obtained from the accelerometer (a) to a measurement of absolute gravity (b). 60 Niall Twomey Section 3.3: Energy expenditure validation the calibration process. The signals associated with movements between faces were removed. On a perfectly flat and still surface, when the SHIMMER device rests on a given face, the axis experiencing gravity will read either 1 g or -1 g (depending on the orientation of the sensor). The remaining axes should read 0 g. In order to convert from codewords to gravitational units, each codeword must have the 0 g offset removed (cx,0g , cy,0g , and cz,0g ) and must then be scaled by the 1 g value (cx,1g , cy,1g , and cz,1g ). The absolute acceleration in g’s is then computed with reference to these codewords by s aabs (k) = cx (k) − cx,0g cx,1g !2 + cy (k) − cy,0g cy,1g !2 + cz (k) − cz,0g cz,1g !2 . (3.32) The optimal values of the converting codewords which can transform raw readings to gravitational units can be discovered by searching for c0g and c1g in the range [0, 4095] for each acceleration axis. The values which would be selected are those which minimise the standard deviation of the resulting vector of aabs . This process results in a search space of 40963 bits per transforming metric, and a brute force search is not feasible to make this discovery. Therefore, a binary search algorithm was implemented to select these optimal values in an efficient manner with guaranteed convergence obtained in 12 iterations. The result of the search yields the acceleration vector shown in Figure 3.2b which has obtained average acceleration of 1 ± 0.0082 g (µ ± σ) with the values shown in Figure 3.2a as the input. This process provides codeword conversion values which are tolerant to small orientation errors because the searching routine is performed on the absolute value of acceleration. The robustness of this algorithm can be seen in Figure 3.2b where the orientations of the SHIMMER device were not perfectly orthogonal but the original signal variance is absent from Figure 3.2b. 61 Niall Twomey 3.3.2.2 Chapter 3: Breath and acceleration synchronisation The EEtrue levels captured by the CPET were recorded on a breath-by-breath basis, and the time at which each energy expenditure value is logged is rounded to the nearest second by the hardware. The breathing rate of a human is non-periodic and non-stationary, in particular when the subject is exercising (Chon et al., 2009). Therefore, the EEtrue recordings and the acceleration values computed are not sampled at a common rate, and the two datasets must be synchronised in time together to allow for comparisons between c EEtrue and EE. The selected synchronisation method involved averaging the EEtrue values which occurred within the previous 10 seconds to the current EEtrue value. The 10 second window was chosen as the minimum respiratory rate for healthy adults typically requires 10 seconds per breath (Lindh et al., 2009), and this rate will increase with exercise. Therefore, a 10 second time window allows for minimum latency and phase difference between EEtrue values obtained while allowing for the maximum range of respiratory rates. Consider the (unlikely) situation where a subject does not breathe for 20 seconds, for example. With the time-window averaging algorithm, no EEtrue signals will be logged over this time. This is an intuitive consequence to the absence of breathing as without respiratory effort, the CPET will not log any new energy expenditure data and the averaging method follows this trend. As a result of holding their breath, when the subject next breathes, a higher concentration of CO2 will be expressed and a higher volume of O2 will be inspired which is recorded by the CPET and will yield higher EEtrue . By using the time-window averaging method for synchronisation, the dynamics of EEtrue values will be accurately reflected in the synchronised array, while with others, for example cubic spline interpolation, the true dynamics will not be followed. 62 Niall Twomey Section 3.3: Energy expenditure validation 3.3.2.3 Normalisation Crouter’s algorithm regresses to METs, which was previously described. As neither Chen nor Bouten regressed to this unit of energy it was necessary to scale the reference and the regressed energy values to a common standard so that all algorithms can be compared on the same grounding. This was accomplished by fb = f − min (f ) , max (f − min (f )) (3.33) where f is the signal which is normalised to fb, and the min and max functions select the minimum and maximum values of the data series respectively. Equation (3.33) is c signals and scales the range to between 0 and 1. employed on both the EEtrue and EE 3.3.3 Performance evaluation The root mean square error (RMSE) can be used to quantify the difference between EEtrue c mathematically, and it is defined by and EE v t RMSE = 0 ! 1 BB EE c − EEtrue 2 CC CC, B mean BB@ CA EEtrue (3.34) where mean computes the average of the data array. The RMSE can be converted into the percentage root mean square difference (PRD) which yields a percentage difference representation of the two signals by PRD = RMSE ⇥ 100%. 63 (3.35) Niall Twomey Chapter 3: Table 3.3: Table of PRD values computed between true energy expenditure and the estimated energy expenditure values obtained from the algorithms investigated. PRD (%) Algorithm Bouten Chen linear Chen non-linear Crouter µ σ 7.54 8.07 6.57 13.50 1.93 1.32 0.33 1.41 PRD values of 0 indicate that the arrays are identical, and increasing PRD values indicate that two arrays are becoming more dissimilar. 3.3.4 Results Table 3.3 tabulates the PRD results which were obtained for each energy expenditure estimation algorithm in the order in which the algorithms were presented earlier. Chen’s non-linear algorithm performed best of all of the algorithms investigated over all the subjects yielding an average PRD of approximately 6.6% while also obtaining the lowest c are standard deviation of PRD. Figure 3.3c shows an example where EEtrue and EE overlaid. It can be seen that the values which were obtained follow the trend in the changes of EEtrue . It is not entirely surprising that Chen’s linear and nonlinear algorithms performed well because the algorithms were designed to automatically specialise for persons of a known weight, gender, height etc, and because the algorithms were also generated with reference to the largest amount of data of all the algorithms considered. The fact that the nonlinear algorithm performance is best suggests accelerometer based energy expenditure regression is not a linear process averaged over all participants. However, the differences are small. 64 Niall Twomey 1 1 0.8 0.8 Normalised EE Normalised EE Section 3.3: Energy expenditure validation 0.6 0.4 0.2 0.6 0.4 0.2 EEtrue c EE 0 0 (a) Bouten. (b) Chen (linear). 1 1 0.8 0.8 Normalised EE Normalised EE EEtrue c EE 0.6 0.4 0.2 0.6 0.4 0.2 EEtrue c EE 0 EEtrue c EE 0 (c) Chen (non-linear). (d) Crouter. c and EEtrue values obtained for participant 2. The values obtained are Figure 3.3: The EE overlaid. 65 Niall Twomey Chapter 3: Crouter’s algorithm (Figure 3.3d) consistently overestimated the energy expenditure which contributed heavily to it performing poorest out of the algorithms investigated, yielding a mean PRD of 13.50%. However, the energy trace was smooth and did follow the trend of the metabolic response, see Figure 3.3d. The smooth nature of this signal trace is due to how the algorithm considers not only recent data, but also data from the preceding minute. This introduces low-pass filtering behaviour to the algorithm. 3.3.5 Discussion on energy expenditure estimation algorithms It is very clear from the PRD values and figures presented that accelerometers are a well suited instrument for estimating energy expenditure in the situations investigated here. That Chen’s algorithms provide excellent estimations of energy expenditure is a very strong indication that subject tailored regression was one of the strongest factors for accurate regression to energy expenditure values. Chen’s non-linear algorithm resulted in the lowest overall PRD. This result supports the argument that accelerometry and energy expenditure are non-linearly related. The algorithms of Bouten and Crouter do not target the individual but rather were developed to accommodate the standard user. This carries a convenient ‘plug and play’ feature to the algorithms (i.e. no setup or customisation is required for their use) but in counterpoint the accuracy of energy expenditure estimation is traded, which can be seen by high PRD values for Crouter’s algorithm in Table 3.2. The reported accuracy of all of the algorithms is subject to the reference that their results are compared against. As the reference energy expenditure levels are not periodic and the time stamps are rounded to the nearest second, data synchronisation was performed on the original reference data to coordinate its values to those which were estimated by c algorithms. If the synchronisation algorithm is not chosen carefully, this process the EE can introduce undesirable artefacts to the reference data. The method employed here 66 Section 3.4: Accelerometer-based analysis during OFCs Niall Twomey involved averaging the breath data over time window, which achieves a low pass filtering effect which improves the reliability of the synchronised EEtrue . Indeed, the reference energy levels can themselves be corrupted before any analysis is performed. This can happen on the treadmill if the subject yawns, coughs or talks. Participants were requested to refrain from doing this, but it is neither possible or reasonable to stop a subject from coughing during these recordings if the subject must. These artefacts will accentuate, exaggerate or otherwise affect the accuracy of the reference values and the times they are measured. As low-pass filtering is achieved by synchronisation, these artefacts will be reduced in the EEtrue reference data. 3.3.6 Conclusion on energy expenditure estimation This study shows that accelerometer-based energy expenditure estimation algorithms can achieve very accurate estimates of true energy expenditure. The SHIMMER device was also found to be an ideal platform with which to measure the acceleration data. As the results obtained here were very good and followed the dynamics of the changes of EEtrue , the analysis of accelerometer-based metrics for classification of allergy during oral food challenges will be discussed in the next section. 3.4 Accelerometer-based analysis during OFCs For the assessment of activity and energy expenditure based analysis of OFCs, the SHIMMER device was applied to the torso and dominant wrist before the test began, and then streamed tri-axial accelerometer data to a nearby computer. At this stage, activity analysis and energy expenditure estimation algorithms which were previously validated were applied to this new data. 67 Niall Twomey Chapter 3: Histogram PDF Probability 0.2 0.1 0.08 0 −4 −3 −2 −1 0 Feature Value 1 1.5 2 3 4 Figure 3.4: Example histogram and probability density function of a feature. 3.5 Probability density functions Figure 3.4 shows a sample histogram (blocks) and associated probability density function (PDF) (solid curve) for a randomly generated normal distribution with a zero-mean and a unity standard deviation. In general the histograms are normalised so that the area under the curve sums to 100%. This diagram is indicative of what the distribution of accelerometer metrics might look like. The probability of achieving an arbitrary x-value (within a finite range) can be determined visually by projecting vertically upwards from the selected point until the PDF curve is intersected. The probability of a feature of this value occurring is then determined from the y-value at the intersection. This is illustrated by the dotted arrow in Figure 3.4 for a feature value of x = 1.5. Here, kernel density estimation (KDE) (Sheather and Jones, 1991) was employed in all cases in this thesis to compute the histograms. This is a procedure which will produce ‘smooth’ histograms that can be more representative of the true distribution of the process than centred histogramming (Sheather and Jones, 1991). This example shows the overall distribution of a feature. By computing PDFs on two subsets of the data (first the allergic and second the non-allergic subjects, for example) the characteristic differences between the two classes can be visualised. Figure 3.5 shows 68 Niall Twomey Section 3.6: Results 0.15 PDF Class 1 PDF Class 2 Probability 0.11 0.1 0.05 0.04 0 −5 −4 −3 −2 −1 −0.5 0 0.5 1 2 3 4 5 Feature Value Figure 3.5: Illustration of the differences between PDFs that describe two separate classes of data. how PDFs of the two classes might differ. Two PDFs are shown in this image, representing the allergic and non-allergic classes. The means of these PDFs are centred at +1 and −1. Large differences between the means of the two curves indicate that the feature might be a suitable candidate for separation between allergic and non-allergic subjects. This is because with a smaller overlap there is a smaller probabilistic uncertainty about the class of the data (i.e. does the data originate from an allergic or non-allergic subject?). In contrast to this, if a large overlap exists between the PDFs the metric might not (by itself) facilitate good class separation. In Figure 3.5 the probability densities for the two classes are shown for an x = −0.5 where it can be seen that a higher probability of PDF Class 2 is obtained in comparison to Class 1. However, at x = 0.5 the probabilities obtained for each class are reversed. 3.6 Results Section 3.2.2 presented the regression equations which were employed when computing c for the algorithms being considered, and IAA was used by all of these. activity and EE It can be seen that all of these equations consider IAAx , IAAy and IAAz , and Bouten and Crouter’s algorithms also consider IAAt . PDFs were generated to show the expected values 69 Probability Niall Twomey Chapter 3: Non-allergic subjects Allergic subjects 0.6 0.4 0.2 0 −5 −4 −3 −2 0 −1 1 2 3 4 5 Probability (a) Normalised IAAx Non-allergic subjects Allergic subjects 0.6 0.4 0.2 0 −5 −4 −3 −2 0 −1 1 2 3 4 5 Probability (b) Normalised IAAy Non-allergic subjects Allergic subjects 0.6 0.4 0.2 0 −5 −4 −3 −2 0 −1 1 2 3 4 5 Probability (c) Normalised IAAz Non-allergic subjects Allergic subjects 0.6 0.4 0.2 0 −5 −4 −3 −2 0 −1 1 2 3 4 5 (d) Normalised IAAt Figure 3.6: Histograms plotting the normalised IAA values of the allergic and non-allergic subjects who were investigated. 70 Niall Twomey Section 3.7: Conclusion of these during OFCs, and Figure 3.6 shows the set of histograms of the IAA metrics that was obtained. In every case, a very strong overlap can be seen between the allergic and non-allergic PDFs. In order for these metrics to be employed for classification, separation between the allergic and non-allergic curves must be apparent, but these PDFs show very similar distributions. The set of PDFs show a very high correlation between the allergic and non-allergic classes, c values which were computed. The EE c values which were and this is also reflected in the EE obtained also offered no distinguishable difference in metrics in a similar manner to the IAA metrics which are shown in Figure 3.6. Therefore, as the PDFs which are shown in Figures 3.6a — 3.6d present as overlapped normal distributions, and because functions of normal distributions are themselves normal distributions (Leon-Garcia and Leon-Garcia, c estimates offered no better separability than the IAA metrics. 2009), the EE It should be stated that there will be differences between acceleration obtained from adults and children. The algorithms employed were tested on adults and worked well, but when tested on children they worked poorly. However, PDFs that were displayed in Figure 3.6 show activity-based metrics, which are the basis of the activity and energy expenditure estimation equations. With these data no separability is obtained, and this also occurs with the energy-expenditure based algorithms. Therefore, it is believed that the poor performance is not due to age-based discrepancies, but due to the inadequacy of the accelerometer-based approach for allergy classification. 3.7 Conclusion The PDFs did not, in any case, present with a mean shift or any exploitable anomaly between the allergic and non-allergic classes. This indicates that the subjects cannot be separated by activity analysis by the methods applied here, and therefore classification of allergy cannot be resolved by these means. Therefore, accelerometer-based analysis is not 71 Niall Twomey Chapter 3: an appropriate means of classifying allergy, and will not be employed in the remainder of this thesis. 72 CHAPTER 4 ECG-based analysis of OFCs 4.1 C Introduction HAPTER 3 discussed the use of accelerometer-based activity and energy expenditure metrics for use in the classification of oral food challenges. It was shown previously that the analysis of these achieved very poor separability between the allergic and non-allergic classes, and as such are insufficient measures for the classification of allergy. During the oral food challenges the SHIMMER device recorded the ECG of the subjects who underwent these challenges. While it has been observed by the allergists who conduct the OFCs that there is a tendency for the heart rate of subjects to change before the onset of an allergic reaction, the effect of allergy on the heart has not been definitively quantified and this chapter investigates this. 73 Niall Twomey Chapter 4: Figure 4.1: Einthoven triangle configuration for ECG electrode placement (University of Nottingham, 2013). 4.2 ECG and HRV 4.2.1 ECG recording As 12-lead ECG is generally only performed for diagnosis of cardiac disease and stress tests, for general heart monitoring (i.e. in a hospital ward) 3-lead ECG is recorded. This employs the limb electrodes which are arranged in the Einthoven Triangle configuration (Wilson et al., 1946) shown in Figure 4.1 (University of Nottingham, 2013). Figure 2.4 shows the P–, Q–, R–, S– and T-waves which characterise the ECG (Dublin Institute of Technology, 2013). While all of the waves can yield diagnostically relevant information, the QRS complex is the principal feature of the ECG which is utilised for the identification of heart beats. The intervals between sets of R-R intervals (as shown in Figure 2.4) are employed to described the heart rate variability mathematically. 74 Niall Twomey Section 4.3: HRV feature extraction Epoch Span Figure 4.2: Illustration of relationship between the ECG and the epoch length for ECG recorded in OFC. 4.3 HRV feature extraction 4.3.1 Overview By considering HRV features which are extracted from allergic and non-allergic subjects independently, the characteristic differences between the two classes can be assessed. This can be employed to provide meaningful descriptors between allergic and non-allergic data. In this chapter these differences are analysed and quantified to determine if HRVbased classification of allergy is worthy of investigation. 4.3.2 Epochs HRV feature extraction is performed based on the times of QRS complexes which were found within time-windows (known as epochs) of ECG data. This is illustrated by Figure 4.2, where the QRS points found within the shaded region are employed by the feature extraction for this epoch. 75 Niall Twomey Chapter 4: Longer epochs will naturally consider a greater number of QRS points. These will measure longer term variation of the HRV, whereas shorter epochs will obtain information about shorter term characteristics of the heart. Long– and short-term epochs are diagnostically interesting measurements and are both considered in this work. Various lengths of epoch were investigated for this research. The European Society of Cardiology and the North American Society of Pacing and Electrophysiology stated that epochs between 120 and 300 seconds should be considered when extracting HRV features on adults (Rawenwaaij-Arts et al., 1993). The subjects recorded during OFCs are children, some of which have resting heart rates (HRs) exceeding 160 beats per minute (BPM) (Giddens and Kitney, 1985) which is approximately twice the heart rate of the average resting adult. Therefore, 60 second epochs are also considered for this work. The full set of epoch lengths which are investigated here are {60, 120, 180, 300} seconds. This set of epoch lengths was chosen because the effect of allergy on the HRV features has not been characterised. Therefore, analysis of all of these sets of epochs will qualify whether signatures of allergy are better obtained with longer– or shorter-duration epoch lengths. 4.3.3 Epoch overlap Figure 4.3 illustrates the relationship between the ECG, the epoch length and the epoch overlap. The epochs in this illustration are 9 seconds in length, with one second increments for illustrative purposes only as as this epoch length is insufficient for meaningful feature extraction (van Ravenswaaij-Arts et al., 1993). Feature extraction performed on the QRS points found within the bounds of Epoch 1 quantify the behaviour of the heart between 1 and 10 seconds, while features extracted from QRS points found within the boundaries of Epoch 2 characterise the heart rhythms between 2 and 11 seconds. One second increments in time were utilised between epochs, which represents approximately 98% overlap with epoch lengths of 60 seconds. This was selected in order to increase the number of data points which are available from the OFC. Subject 1 presented 76 Niall Twomey Section 4.4: Feature normalisation/calibration Epoch 2 Epoch 1 Figure 4.3: Illustration of relationship between the ECG, the epoch length and the epoch overlap. with the shortest challenge which lasted approximately 3 minutes after the first dose of the allergen was administered. Without any epoch overlap, only three features would be extracted for this subject after administration of the first dose. However, with 1 second epoch increments, 180 points describe this period. Subject 19 presented with the longest OFC recording. Without epoch overlaps, this entire challenge would be described by approximately 130 data points, but with the overlap, approximately 9000 points will be used. Indeed, for classification purposes, due to the short and varied length of the OFCs, classification algorithms would not have sufficient quantities of data with which to perform classification without epoch overlap (Duda et al., 1995; Bishop et al., 2006; Cherkassky and Mulier, 2007; Catal and Diri, 2009), and therefore, as well as employing one second increments these for distribution analysis in this chapter, small epoch increments are also requirement for later chapters. 4.4 Feature normalisation/calibration Chapter 2 presented Table 2.1 which tabulated the subjects investigated in this study. From this table, it can be seen that the ages of the subjects varied from 7 months to 10 77 Niall Twomey Chapter 4: years. Infants under one year of age will typically have a HR of ⇡ 120 BPM (with a range of 80 — 160 BPM) while that of a ten year-old child will typically have a HR of ⇡ 90 BPM (with a range of 70 — 110 BPM) (O’Brien et al., 1986; Tanaka et al., 2001; Kliegman et al., 2007; Aziz et al., 2012). Because of the age-related differences in the baseline resting heart rate it is not appropriate to directly compare the features extracted from different subjects. Therefore, a calibration procedure is performed in order to allow indirect and valid comparisons between the features extracted from the different subjects. Typically feature data is normalised by subtracting the mean of the data and dividing by its standard deviation. In the case of OFCs this is not appropriate because this process will force the features obtained from the allergic and non-allergic subjects to have similar statistical properties and could reduce and possibly eliminate — any characteristic signatures of allergy in the HRV. Therefore, the calibration process which is employed here computes the mean and standard deviation of the features before the problem foods were administered to the subjects. The entirety of the recording is then normalised by these values. This process guarantees that the features are normalised by non-allergic HRV data as no allergen would have been consumed by subjects. Therefore, if features deviate from this distribution as a result of allergy, this will be observed in the PDFs which were generated for the allergic subjects (see Chapter 3). Likewise, if, for non-allergic subjects, the features obtained for the remainder of the challenge should not change. The time before the first dose of the allergen is administered to the subjects is termed the ‘background’ or ‘baseline’ region henceforth, and it is guaranteed to present non-allergic HRV features. To define this period concretely, it is the time after the skin tests have concluded, but before the administration of the first dose of the allergen. Normalisation is then achieved by 78 Niall Twomey Section 4.5: HRV feature categories f − µb b f= , σb (4.1) where f is the non-normalised feature vector, b f is the normalised feature, µb and σb compute the mean and standard deviation of the background segment of feature f . All background lengths recorded are approximately ten minutes in length, and all PDFs presented in the following sections are computed from normalised features. While the normalised background data is characterised with zero mean and unity variance, non-background data will not preserve these traits. OFCs will induce stresses to the recruited subjects and consequently the HRV features will tend to deviate from this background baseline. As a result, the PDFs that are presented later in this chapter will demonstrate significant probability mass approximately three times beyond the normal ranges for a normal distribution (i.e. far beyond 5σ). This is explained because the units of the x-axis are of the units of σbackground and not the overall standard deviation, and larger multiples of this indicate features that are less similar to this background. 4.5 HRV feature categories 4.5.1 Feature categories It should be stated here that HRV features extracted over an epoch are employed in order to measure the characteristics of the distribution. Therefore, much of the information that is extracted from the data relates to the statistical properties of such distributions, and the extractions of time domain features such as the mean, standard deviation, etc, is performed. 79 Niall Twomey Chapter 4: However, other feature categories are also extracted. For example, sequential domain features measure the relative ‘acceleration’ and ‘deceleration’ that the heart rate has experienced over an epoch length, whereas Poincaré features are employed to assess nonlinear dynamics of the heart rate, and this has been shown to have a close relationship to the sympathetic indices of the autonomic nervous system (ANS). These features are also used for many other applications in different fields (Cogdell and Piatetski-Shapiro, 1990). Other feature types can be extracted which are a measure of the frequency-domain characteristics of the heart, which have been correlated to the ANS too. Frequency domain analyses are popular methodologies employed in many engineering and scientific fields. 4.5.2 Frequency domain feature analysis When computing the frequency spectrum of the heart rate, care must be taken because the heart rate does not beat periodically (Moody, 1993; Badilini and Blanche, 1996). Therefore, direct application of the Fourier transform (FT) cannot be employed on the heart rate data (Moody, 1993; Ebden, 2002; Clifford and Tarassenko, 2005), and the spectrum must be estimated. Two popular means of extracting the frequency power spectrum from the irregular data series exist in HRV literature and are discussed below. 4.5.3 Resampling + FFT With this method, the heart rate series is re-sampled at a periodic frequency to provide a uniformly sampled data series (Clifford and Tarassenko, 2005, Laguna et al., 1998). A variety of re-sampling methods exist in literature including nearest-neighbor, linear, cubic spline and piecewise cubic Hermite techniques (Srikanth et al., 1998). Figure 4.4 shows the raw HR signal and the interpolated signal re-sampled after performing cubic spline interpolation. Once the data series has been re-sampled, the FT can be used to compute the power spectrum from this data series. 80 Niall Twomey Section 4.5: HRV feature categories 104 Heart rate (BPM) Original HR Resampled HR 102 100 98 96 498 500 502 504 Time (s) 506 508 510 Figure 4.4: Illustration of the raw HR (⇤) which is not periodically sampled, and the HR re-sampled to 10 Hz via cubic spline interpolation. The FT is defined by Equation (4.2) and allows translation of a real, continuous timedomain signal, x(t), to a frequency-domain representation, X(!). Z +1 X (!) = x(t)e−j!t dt (4.2) −1 The discrete Fourier transform (DFT) is used to compute the FT of data segments of finite length N in the discrete domain. The data segments which are analysed must be periodically sampled at a frequency f s , and the DFT is described for functions of multiples of the sampling frequency, and yields powers at frequencies which are multiples of this. X (!n ) = N X x(tk )e−j!n tk k=1 81 (4.3) Niall Twomey Chapter 4: The periodogram of the DFT is the estimate of the power spectral density (PSD) of a signal which is defined by ## ##2 N ##X # ## x(tk )e−j!n tk ### # k=1 # 20 12 0 N 12 3 N CC BBX CC 777 1 6666BBBX x (tk ) cos(!n tk )CCCA + BBB@ x(tk ) sin(!n tk )CCCA 777 . = 66BB@ 5 N4 1 Px (!n ) = N k=1 (4.4) (4.5) k=1 Using complexity analysis (Lewis and Papadimitriou, 1997) computation of the DFT can be shown to scale exponentially with regard to N (i.e. O(N 2 )). Today, the DFT is rarely computed from first principles, and complexity optimisation methods (such as butterflying, memoisation and look-up tables) can be employed to reduce the computational cost of the algorithm dramatically to O(N log N ) iterations as is the case with the fast Fourier transform (FFT) (Cooley and Tukey, 1965; Frigo and Johnson, 1998). 4.5.4 Direct PSD estimation of HRV The Lomb periodogram (Lomb, 1976; Flannery et al., 1992) is a least-squares optimisation technique which minimises the squared error between a signal, x(tk ), and a reference signal, s(tk ; !) directly without resampling. Whereas the FT estimation weighs the results based on the time interval, the Lomb periodogram weighs the data on a per-point basis (Biala et al., 2010). This is achieved by computing the square error between a signal and the least squares estimation by N ✓ ◆2 X ✏= x(tk ) − s(tk ; !) , k=1 82 (4.6) Niall Twomey Section 4.5: HRV feature categories where ✏ is the square error value, and s(tk ; !) is a set of reference sinusoids and is defined by s(tk ; !) = a1 cos(!tk ) + a2 sin(!tk ), (4.7) where a1 and a2 are the amplitudes of the constituent components of the sinusoid. The full error equation is written as follows. ✏(a1 , a2 ) = N ✓ X ◆2 x(tk ) − a1 cos(!tk ) − a2 sin(!tk ) (4.8) k=1 The least squares algorithm optimises the amplitudes, a1 and a2 , to achieve the minimum square error against the reference. This is achieved by computing two partial derivatives of Equation (4.8) with regard to the amplitudes of the reference sinusoids, and equating the results to 0, i.e. N ✓ ◆ δ✏ X = −2 cos(!tk ) x(tk ) − a1 cos(!tk ) − a2 sin(!tk ) = 0, δa1 (4.9) N ✓ ◆ δ✏ X = −2 sin(!tk ) x(tk ) − a1 cos(!tk ) − a2 sin(!tk ) = 0. δa2 (4.10) k=1 k=1 It is desirable to have 83 Niall Twomey Chapter 4: N X cos(!tk ) sin(!tk ) = 0 (4.11) k=1 for computational complexity and orthogonality reasons, as Equations (4.9) and (4.10) can be rewritten in matrix form by 0N 1 0 1 N N X X BBX CC BB CC BB x(tk ) cos(!tk )CCC BBB cos2 (!tk ) cos(!tk ) sin(!tk )CCC 0B 1C BB C CC BBa1 CC B BB k=1 CC BB CC BB CC k=1 BBB N CCC = BBB N k=1 CC BB CC . N X BB X CC BBX CC @a A BB C CC 2 B x(tk ) sin(!tk ) CCA BB@ cos(!tk ) sin(!tk ) sin2 (!tk ) B@ CA k=1 k=1 (4.12) k=1 Therefore, when the condition of Equation (4.11) is true, Equation (4.12) becomes orthogonal. To obtain this condition, a time delay factor, ⌧, is introduced, and ⌧ must satisfy the criteria that N X cos(!(tk − ⌧)) sin(!(tk − ⌧)) = 0. (4.13) k=1 Solving Equation (4.13) for ⌧ yields 0 N 1 BB X CC BB sin(!tk ) CCCC BB BB CC 1 B k=1 CC ⌧= arctan BBB CC . N CC 2! BBB X C BB cos(!tk ) CCCA B@ k=1 84 (4.14) Niall Twomey Section 4.5: HRV feature categories Now, all sinusoidal components are subject to the ⌧ correction factor, and applying this to Equation (4.12) provides 1 1 0X 0N N CC CC BBB BBX 2 CC 0 1 C BB B cos (!(t − ⌧)) 0 x(t ) cos(!(t − ⌧)) C k B CC B C k k CC BB BB CC BBa1 CC CC BB k=1 BB k=1 CC BB CC . CC = BB BB CC BB CC N CC BB BB X N X CC @ A CC BB 2 BBB C 0 sin (!(tk − ⌧))CCC a2 x(tk ) sin(!(tk − ⌧)) CA BB@ B@ A (4.15) k=q k=1 The optimal amplitudes of a1 and a2 can be computed and the power at a specific frequency can be estimated by 0N 1 CC 1 BBBX 2 P(!) = BB@ x (tk ) − ✏CCCA . 2 (4.16) k=1 The full Lomb periodogram is then written by 00 0N 1 1 1 N ✓ ◆C2 CC ✓ ◆C2 BX BB BBX B C C BB BBB BBB x(tk ) sin !(tk − ⌧) CCCA CCCC x(tk ) cos !(tk − ⌧) CCCA B@ BB @ CC 1 BB k=1 CC k=1 CC . P(!) = BBBB + CC N N ✓ ◆ ✓ ◆ 2 BB X X CC BB 2 2 CC cos !(tk − ⌧) sin !(tk − ⌧) BB CA @ k=1 4.5.5 (4.17) k=1 Comparison of HRV frequency analysis methods As discussed in Sections 4.5.3 and 4.5.4, there are multiple means by which the PSD of the heart rate can be computed. A number of open-source HRV tools are available for free from software repositories which allow users to choose the PSD estimation method 85 Niall Twomey Chapter 4: (de Carvalho et al., 2002; Hamilton, 2002; Parvin et al., 2002; McSharry and Cifford, 2004). Investigations into the benefits of the direct and indirect PSD estimates indicates that the Lomb periodogram is preferable over the re-sampling methods (Moody, 1993, Laguna et al., 1998, Chang et al., 2001, Clifford and Tarassenko, 2005). Indeed, McSharry and Cifford, 2004, designed an open-source ECG generation model where the true spectrum of the heart signal is set by the user before generating the ECG signals. Comparing the results of Lomb and resampling techniques with the knowledge of the true underlying dynamics of the PSD, it was stated that the Lomb periodogram performed in a superior fashion. For these reasons, the frequency domain features which are described later were extracted with the Lomb periodogram spectral estimation method. 4.6 Features The QRS points which were employed in the generation of the PDFs were manually annotated. This was performed so that the PDFs which were generated were representative of the true dynamics of the cardiovascular system for the allergic and non-allergic subjects. Two PDFs are shown in each chart in this section. The PDF which relates to the allergic subjects can contain a significant portion of non-allergic HRV features. This is because it is not known when the allergy begins to affect the ECG. Indeed, obtaining this is the principal focus of this thesis. In all subsequent PDF charts, the more densely filled (green) curves represent the allergic PDFs and those less-densely filled (blue) are the PDFs which were generated for nonallergic subjects. 86 Niall Twomey Section 4.6: Features 4.6.1 Notation For the description of features, all data is considered to be in the discrete domain. A subject’s OFC is represented by M feature vectors, with one vector per epoch. A specific epoch is identified by the subscript j, and the time difference between the QRS points within the j th epoch are defined as the vector RRj . The i th element of RRj is accessed by RRj (i). Each epoch contains N QRS points, and the value of N is not necessarily similar for all epochs. The elements of RR are in the units of seconds. 4.6.2 Time domain features 4.6.2.1 Mean heart rate The mean heart rate (HR) measures the average heart rate over a given epoch. It is defined by HR = 60 , µ (4.18) where µ is the average time between subsequent heart beats in a given epoch, and is defined by Equation (4.19) µ= N 1X RRj (i). N i=1 87 (4.19) Niall Twomey Chapter 4: Non-allergic subjects Allergic subjects Probability 0.2 0.1 0 −10 −8 −6 −4 −2 0 2 6 4 Normalised feature value 8 10 12 14 Figure 4.5: PDF of mean heart rate, generated from allergic and non-allergic subjects 4.6.2.2 Standard deviation The standard deviation of the R-R intervals measures the variability and diversity of the QRS complexes found within a given epoch. It is defined by v u t σ= N ⌘2 1 X⇣ RRj (i) − µ , N (4.20) i=1 where µ is defined by Equation (4.19). A higher standard deviation of the heart rate indicates high variance of heart beats within an epoch, and conversely a low standard deviation indicates consistency in the heart rhythm. The PDF of the standard deviation is shown in Figure 4.6. 4.6.2.3 Coefficient of variation The coefficient of variation is a normalised measure of the variance of a series of data. It is calculated by dividing the standard deviation of the heart rate by the mean of the heart rate and it is defined by 88 Niall Twomey Section 4.6: Features Probability 0.4 Non-allergic subjects Allergic subjects 0.3 0.2 0.1 0 −4 −2 0 2 6 8 4 Normalised feature value 10 12 14 Figure 4.6: PDF of standard deviation of the heart rate, generated from allergic and nonallergic subjects. Non-allergic subjects Allergic subjects Probability 0.3 0.2 0.1 0 −4 −2 0 2 4 Normalised feature value 6 8 10 Figure 4.7: PDF of coefficient of variation of the heart rate, generated from allergic and non-allergic subjects. Coefficient of Variation = σ . µ (4.21) The effect of this normalisation procedure on the PDFs of the mean and standard deviation features (Figures 4.5 and 4.6) is shown in Figure 4.7. The distributions of the standard deviation and coefficient of variation features are similar, but the coefficient of variation appears to provide better separation than µ and σ alone. 89 Niall Twomey Chapter 4: Non-allergic subjects Allergic subjects Probability 1 0.5 0 −2 −1 0 1 2 3 5 4 Normalised feature value 6 7 8 Figure 4.8: PDF of RMSSD of the heart rate, generated from allergic and non-allergic subjects. 4.6.2.4 RMSSD The root mean square of successive difference (RMSSD) feature measures the root mean square (RMS) of the successive differences between the times of the QRS complex. It is defined by Equation (4.22). v u t RMSSD = N −1 ⌘2 1 X⇣ RRj (i) − RRj (i − 1) N −2 (4.22) i=2 Figure 4.8 shows the PDF of this feature. A high correlation between the allergic and nonallergic subject data can be seen in this figure. However, the probability of the allergic category is greater towards the more positive values of the normalised feature. The shape of this feature is not normal. It is uncertain why this shape occurred, but as the QRS points are manually annotated there is confidence that this distribution is accurate. 90 Niall Twomey Section 4.6: Features Listing 4.1: Calculation of NNx . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 / / Input // RR : // N: // x: parameters p o i n t e r t o t h e RR a r r a y L e n g t h o f RR a r r a y Difference threshold / / Returns // nn : The number o f QRS p o i n t s t h a t d i f f e r // by o v e r xms i n t h e g i v e n e p o c h i n t nn ( double *RR, i n t N, f l o a t x ) { int n = 0; f o r ( i n t i =1; i <N; i ++ ) i f ( RR[ i ] −RR[ i −1] ≥ x ) n++; return n ; } Listing 4.2: Calculation of pNNx . 1 2 3 4 5 6 7 8 9 10 11 12 / / Input // RR : // N: // x: parameters p o i n t e r t o t h e RR a r r a y L e n g t h o f RR a r r a y Difference threshold / / Returns // pnn : The p e r c e n t a g e o f QRS p o i n t s t h a t d i f f e r // by o v e r xms i n t h e g i v e n e p o c h double pnn ( double *RR, i n t N, f l o a t x ) { return ( double ) nn ( RR , N, x ) / ( double ) ( N − 2 ) ; } 4.6.2.5 NN/PNN The NNx and PNNx features describe the number of successive QRS points within an epoch that differ by the time x. Their calculation is described in 0-indexed C code shown in Listings 4.1 and 4.2. Typically values of 25 ms and 50 ms are used. Figure 4.9 shows the allergic and nonallergic distributions of PNN50 . 91 Niall Twomey Chapter 4: Non-allergic subjects Allergic subjects Probability 0.15 0.1 0.05 0 2 6 −14 −12 −10 −8 −6 −4 −2 0 4 Normalised feature value 8 10 12 14 # of occurrences Figure 4.9: PDF of PNN50 of the heart rate, generated from allergic and non-allergic subjects. 60 40 20 0 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 R-R Interval (s) Figure 4.10: Histogram of the relative times between successive QRS complexes. 92 Niall Twomey Probability Section 4.6: Features Non-allergic subjects Allergic subjects 0.2 0.1 0 −12 −10 −8 −6 2 6 −4 −2 0 4 Normalised feature value 8 10 12 14 Figure 4.11: PDF of histogram of the heart rate, generated from allergic and non-allergic subjects. 4.6.2.6 Histogram Features can also be calculated through generation of a histogram of the R-R intervals from within an epoch. Figure 4.10 shows a typical histogram of R-R intervals which was constructed with 10 bins. The feature value is then calculated by hist = h , w (4.23) where w is the width of the histogram, i.e. the difference between the maximum and minimum values on the x-axis, and the h is the height of the most frequently occurring bin. Smaller values of this feature indicate a wide dispersal of R-R intervals over the epoch owing to the smaller h and larger w values, while larger values of this feature indicate consistency between R-R intervals over an epoch. This is reflected in the PDF of this feature shown in Figure 4.11 where a higher probability of allergy can be seen with lower values of this feature. Indeed, this feature shows the best difference between the allergic and non-allergic subjects of all the features which were discussed up until now. 93 Niall Twomey Chapter 4: 0.15 0.1 ∆RRj+1 (s) PP 0.05 0 NN −0.05 −0.1 −0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 ∆RRj (s) Figure 4.12: Chart of the change between successive QRS complexes. 4.6.3 Sequential domain features Features extracted from the sequential domain compute the inter-beat variability of the QRS complexes found within an epoch. The relative increases and decreases of the heart rate are computed by differentiating the vector of R-R intervals between each successive QRS complex (effectively obtaining the relative proportions of ‘acceleration’ of and ‘deceleration’ of the heart rate over an epoch (Schechtman et al., 1992)). Quantification of this can be visualised by plotting the ith result of the difference of the R-R intervals against the (1 + i)th . Figure 4.12 shows this plot and the graph is segmented into four quadrants. The shaded region in the upper right hand quadrant of Figure 4.12 indicates that two consecutive increments in time between successive QRS complexes occurred, which is representative of a slowing heart rate. This region is referred to as the PP quadrant. The shaded region in the lower left hand quadrant of Figure 4.12 indicates the presence of two consecutive R-R intervals with a decreasing interval, which indicates a speeding heart rate. This quadrant is referred to as NN quadrant. 94 Niall Twomey Section 4.6: Features Probability 0.2 Non-allergic subjects Allergic subjects 0.15 0.1 0.05 0 −20 −15 −10 −5 0 5 Normalised feature value 10 15 Probability (a) PDF of PP feature of the heart rate, generated from allergic and non-allergic subjects from the sequential domain. Non-allergic subjects Allergic subjects 0.15 0.1 0.05 0 −14 −12 −10 −8 −6 2 6 −4 −2 0 4 Normalised feature value 8 10 12 14 (b) PDF of NN feature of the heart rate, generated from allergic and non-allergic subjects from the sequential domain. Figure 4.13: PDFs derived for the sequential domain features. The sequential trend features are calculated by counting the number of occurrences of points within the PP and NN quadrants. These are then divided by the total number of points in the chart. The PDF of the features are shown in Figures 4.13a and 4.13b. A significant amount of similarity exists between these two features. However, in both cases, lower values of the feature are more indicative of allergy. 4.6.4 Poincaré features Poincaré features are another means of assessing the beat-to-beat variability and nonlinear dynamics of R-R intervals. Poincaré features are obtained from a Poincaré chart, 95 Niall Twomey Chapter 4: 0.6 Original Rotated 0.2 SD1 RRn +1 (s) 0.4 0 SD2 −0.2 0 0.2 0.4 RRn (s) 0.6 0.8 Figure 4.14: Original and rotated points plotted in a Poincaré Chart. which plots the current R-R interval against the next R-R interval, as is shown by the cluster of ⇥’s in Figure 4.14. This cluster will be dispersed about a line orientated at 45◦ , which plotted as the solid line in Figure 4.14. Features are extracted based on the extent of horizontal and vertical distribution of the ⇥’s about the solid line, i.e. perpendicular and parallel to the 45◦ line. Computer-based computation of these features is simplified if each point is rotated clockwise by 45◦ about the origin, which can be achieved by the rotation matrix defined in Equation (4.24), with ✓ = − ⇡4 . This results in the cluster of ◦’s in Figure 4.14 which are dispersed about y = 0 on the x-axis. 3 2  > 66cos(✓) − sin(✓)77 77 6 77 y 0 = x y ⇥ 6666 7 4sin(✓) cos(✓)5 >  x0 96 (4.24) Niall Twomey Section 4.6: Features Non-allergic subjects Allergic subjects Probability 0.2 0.1 0 2 6 −14 −12 −10 −8 −6 −4 −2 0 4 Normalised feature value 8 10 12 14 (a) PDF of CSI of the heart rate, generated from allergic and non-allergic subjects. Non-allergic subjects Allergic subjects Probability 0.15 0.1 0.05 0 −20 −15 −10 −5 0 5 10 Normalised feature value 15 20 25 (b) PDF of CVI of the heart rate, generated from allergic and non-allergic subjects. Figure 4.15: CSI and CVI PDF from Poincaré features. With the rotated cluster, two measurements from the plot (SD1 and SD2) can easily be computed. SD1 is the standard deviation of the set of rotated x-values, and SD2 is the standard deviation of the set of rotated y-values. Points close to the line of identity indicate a similarity between consecutive beats, and conversely points which are further from the line of identity indicate that a change in the heart rhythm has occurred. The cardiac sympathetic index (CSI) and the cardiac vagal index (CVI) are computed from SD1 and SD2, and they are defined by Equations (4.25) and (4.26) respectively. 97 Niall Twomey Chapter 4: CSI = SD2 SD1 (4.25) CVI = log (SD2 ⇥ SD1) (4.26) Figures 4.15b and 4.15a show the PDFs of the CSI and CVI features for the allergic and non-allergic subjects. The CVI feature shows a positive mean-shift between allergic and non-allergic subjects, with higher probabilities of allergic subjects at the higher positive side of the chart. The CSI feature for allergic subjects presents a wider distribution of the feature but with a similar mean between the allergic and non-allergic subjects. 4.6.5 Frequency domain features The frequency spectrum was computed with the Lomb periodogram. The total powers in the very low frequency (VLF), low frequency (LF), and high frequency (HF) bands were extracted from the PSD estimates from the Lomb periodogram. Gombarska and Horicka, 2012, presented Table 4.1 in which the boundaries for VLF, LF, HF and ultra-low frequency (ULF) are presented. These frequency ranges are indicative of physiological and cardiac events which are listed alongside the frequency ranges in Table 4.1. The ULF frequency band was not considered in this work because it is stated by researchers (Tulppo and Huikuri, 2004; Rajendra Acharya et al., 2006) that meaningful determinations of the associated powers can violate the rules governing PSD determinations. This concern Table 4.1: Table of HRV diagnostic frequency ranges for children. Type Frequency (Hz) Origin HF LF VLF ULF 0.15 – 0.4 0.04 – 0.15 0.0033 – 0.04 < 0.0033 Parasympathetic, respiratory sinus arrhythmia Sympathetic + parasympathetic Sympathetic, chemo-receptors, thermoregulation, endocrine Circadian rhythms 98 Niall Twomey Section 4.6: Features Probability 0.2 Non-allergic subjects Allergic subjects 0.15 0.15 0.1 0.1 0.05 0.05 0 Non-allergic subjects Allergic subjects 0.2 −20 −10 0 0 −5 0 5 10 15 20 Normalised feature value Normalised feature value (a) PDF of the power from VLF frequency band of the heart rate, generated from allergic and non-allergic subjects. Probability 0.25 Non-allergic subjects Allergic subjects (b) PDF of the power from LF frequency band of the heart rate, generated from allergic and non-allergic subjects. 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 −5 0 5 10 15 20 Normalised feature value 0 −5 Non-allergic subjects Allergic subjects 0 5 10 15 20 Normalised feature value (c) PDF of the power from HF frequency band of the (d) PDF of the power from LF/HF frequency band heart rate, generated from allergic and non-allergic of the heart rate, generated from allergic and nonsubjects. allergic subjects. Figure 4.16: PDF of the frequency domain features. 99 Niall Twomey Chapter 4: is also valid for the VLF frequency band, but it has been clinically reported to be strongly related to the parasympathetic response, even in short duration recordings, (Carney et al., 2001; Kleiger et al., 2005). The ratio between the HF and LF frequency bands was computed as another feature, and it is a measure of the radio between the sympathetic and parasympathetic response of the ANS. The PDFs of the frequency-domain HRV features were extracted and are shown in Figures 4.16a — 4.16d. The non-allergic and allergic PDFs overlap significantly, but in all cases there is tendency towards a higher probability of allergy towards the positive and negative extremities of the independent axis. It is expected that these features will perform well in allergy classification because when allergy occurs the ANS should react to combat allergic reactions. 4.7 Discussion This is the first work which has quantified the effect of allergy on HRV features, and it has shown that in a number of cases a clear separation can be obtained between the allergic and non-allergic classes. In the case of the CVI feature, for example, a mean-shift and wider variance of approximately 3σbackground was obtained. It is interesting to note that with the accelerometer features no deviation beyond the background distribution was observed while for the HRV features this is not the case. This trait is one that when identified can be exploited by classification platforms. This is a desirable trait for classification purposes as it indicates that separability and classification are obtainable. This is in counterpoint to the accelerometer-based features that were obtained in the previous chapter, and it indicates that HRV-based classification exposes a viable avenue for automated allergy classification. The sequential domain features also demonstrated significant differences in probability mass and bimodalities were introduced to the allergic PDFs. This modality-distortion also occurred (but to a lesser extent) with the histogram and PNN features. It is interesting to see that the frequency-based histograms are not well-partitioned, with the mean and 100 Niall Twomey Section 4.7: Discussion variance of the distributions being approximately equal for the allergic and non-allergic classes. These features are well documented by other researchers (Aziz et al., 2012) as being highly correlated to the activity of the ANS. It should be stated that the figures in this chapter presented histograms in a single dimension. In later chapters it will be shown that employing high dimensional multivariate data modelling can be exploited to yield superior separability. Such high dimensional representations, however, cannot be visualised on the pages of this thesis, and true separability of the classes can only be obtained analytically. The focus of the remaining chapters of this thesis lies in the means of representing and assessing these distributions. The PDFs shown here also show that a high degree of overlap exists between the allergic and the non-allergic subjects. This is to be expected because OFCs are not temporally annotated, which indicates that supervised class separability is not possible. However, this further supports the argument that machine-based classification of allergy through analysis of HRV features is worthwhile, because even with single dimensional representations of features, class separability has been obtained in many cases. 101 CHAPTER 5 Machine learning for allergy classification 5.1 I Introduction N the previous chapter, the eighteen HRV features which are used to quantify the variability of the heart during OFCs were described and these assess whether machine learning algorithms might be utilised for automatic classification of food allergy. The only temporal label available is that obtained from the ECG data recorded before allergens were administered. Therefore, non-discriminative classification is required for allergy detection as the allergic events are not available. By this process, models are trained on the normalised background heart rate variability features, and allergy will be classified if features anomalous to this distribution are detected. The background data is recorded before any food was ingested by the subjects, so is guaranteed to represent 102 Niall Twomey Section 5.2: Novelty detection for OFC the normal, non-allergic state. This classification routine is called novelty or abnormality detection. As the QRS points used here were annotated, this chapter will determine the effectiveness of HRV-based allergy detection, and the manual aspect of QRS identification provides affirmation of this. The results obtained here will also provide the upper-bound of the estimated efficacy of machine-based allergy classification. 5.2 Novelty detection for OFC 5.2.1 Choice of classification routine Chapter 2 discussed the one-class SVM and GMM classifiers. For the classification work of this thesis, GMMs were chosen over one-class SVMs and other novelty detection algorithms for a number of reasons. GMMs model the distribution of the training data and have been shown to be robust in a number of classification applications. Effectively, this means that the GMMs generate multi-dimensional PDFs which can be used to ascertain the ‘probability’ (or more formally, the likelihood, see later) that new data belong to the background class. GMMs also allow for subject-adaptive procedures, which is important for the classification routine which is employed, and this will be discussed later. Oneclass SVMs belong to the boundary-of-novelty estimate. This is useful for very high input dimensionality when not many data points are available. Therefore, on the basis of these arguments, one-class GMMs are preferable over other options for OFC classification. 5.2.2 Feature transformation For the classification of allergy, principal component analysis (PCA) (Pearson, 1901) is first performed on the normalised training data. The PCA transformation was performed 103 Niall Twomey Chapter 5: in order to de-correlate the feature set. This is a requirement for allergy classification as the allergy database is insufficiently sized to accurately train GMMs with full covariance matrices. Using diagonal covariance matrices dramatically reduces the quantity of samples required for training of the classification models. It is not always possible to assign interpretation to these new components obtained with the PCA transform and this is particularly true in cases where normalisation has been performed on the initial training data (Webb et al., 2011). For high-dimensional data, the PCA transform is equivalent to minimising the scatter matrix, Si , over N dimensions, by N X (xj − µ)(xj − µ)T , Si = (5.1) j=1 where N 1X xj . µ= N (5.2) j=1 The PCA transformation matrix is obtained from the eigenvectors of St , and these can be obtained singular value decomposition (De Lathauwer et al., 2000) or other similar procedures. Figure 5.1 illustrates a two-dimensional example of what the PCA transform accomplishes. Figure 5.1a shows two-dimensional data plotted in feature space. In viewing the histograms associated with each axis it can be seen that the range and variance of feature values are approximately equal in the x– and y-directions. However, once the PCA transform has been performed, the feature data is transformed to the distributions shown in Figure 5.1b which is plotted in ‘component’ space. This is equivalent to viewing the 104 Niall Twomey Section 5.2: Novelty detection for OFC (a) Two-dimensional feature-space, showing the contours lines of the axis of the first principal component. Histograms show the distribution of features along the x– and y-axes with equal bin ranges. (b) Two-dimensional component-space after the PCA transformation. Histograms show the distribution of features along the x– and y-axes with a much greater variance range with the x-axis. Figure 5.1: An illustration of PCA in two-dimensional feature space (subplot a) and twodimensional component-space (subplot b). 105 Niall Twomey Chapter 5: original data along axes shown as the contour lines in Figure 5.1a. In Figure 5.1b, the ranges of the feature data are not equal. Indeed, the computed variance of the first component accounts for 97% of the total variance of the transformed feature data, while in the original distribution the importance of both features is approximately equal. For higher dimensional cases, if the first C (C < N ) principal components account for the majority of the variance, they may be used to describe the variance profile of the data accurately. If the remainder of the matrix is discarded in analysis, dimensionality reduction is achieved. It has been shown by other researchers that employing the PCA transform to select a subset of features in a subject-independent manner can improve the accuracy of classification (Thomas, 2010). 5.2.3 Gaussian mixture models The Gaussian classifier is a density estimation algorithm which is commonly used for modelling data distributions. For classification purposes features are often extracted from the data to obtain more information about the data itself. While the original data may be normally distributed, the extracted features may not preserve this trait. This is a problem for the Gaussian classifier because multi-modal distributions would be poorly modelled by a single Gaussian distribution. Gaussian mixture models (GMMs) were devised to solve this problem. These employ weighted mixtures of several Gaussian distributions to model arbitrary feature distributions. In a similar manner to how the FFT decomposes a data sequence into a weighted sum of complex exponentials and computes the frequency spectrum, GMMs (through the expectation maximisation algorithm, see later) can be thought of as decomposing a data sequence to a basis of Gaussian distributions and obtaining a model of the density of the data. The mixture of Gaussians, therefore, allows arbitrary and non-trivial distributions to be modelled, provided the algorithm is correctly parameterised. 106 Niall Twomey Section 5.2: Novelty detection for OFC Probability density 0.2 A B C Total 0.15 0.1 0.05 0 −6 −5 −4 −3 −2 −1 0 Value 1 2 3 4 5 6 Figure 5.2: A mixture of three equally-weighted Gaussians (dashed lines) which combine to represent a multi-modal non-normal distribution (solid black line). Figure 5.2 demonstrates how three equally-weighted Gaussian distributions, A, B and C, can be combined in order to represent the non-Gaussian distribution, shown by the solid line in Figure 5.2. The number of Gaussians that are required to represent the distribution depends strongly on the data which are to be represented, and is specific to the application in question. For GMMs, conditional density is modelled by p(x|µi , Σ i ) = m X !i N (x|µi , Σ i ), i=1 where m is the number of Gaussians, the weights, !i , satisfy 107 (5.3) Niall Twomey Chapter 5: m X !i = 1 and (5.4) !i ≥ 0, 8i, (5.5) i=1 and N (x|µi , Σ i ) is the multivariate Gaussian function defined by N (x|µi , Σ i ) = 1 1 exp{− (x − µ)T Σ −1 (x − µ)}, 2 (2⇡) |Σ| d 2 1 2 (5.6) where µ and Σ (which are collectively termed the mixture parameters and given the shorthand symbol θ for conciseness) are mean and variance matrices, and d is the dimensionality of the data, x. Σ is a symmetric and positive semi-definite matrix and |Σ| is its determinant. The task of training the GMM involves selecting the appropriate mixture parameters to represent the training data accurately. This is performed through the use of k-means clustering and the expectation maximisation algorithm. 5.2.3.1 k-means clustering For the GMM algorithm, m Gaussians are required to model the training data set, and the value of m is termed the GMM order. The k-means clustering algorithm (Hartigan and Wong, 1979) can be employed to automatically partition the training data set into k data clusters and discover the mean (or centroid) of each of these clusters. The k-means algorithm is iterative in nature, and will generally have indeterminable computation time (Lewis and Papadimitriou, 1997; Aloise et al., 2009). The following steps outline the procedure of the k-means algorithms: 1. Initialise k initial cluster centroids randomly in feature space. 108 Niall Twomey Section 5.2: Novelty detection for OFC 2. Calculate the Euclidean distance (Deza and Deza, 2009) between all data points and the k cluster centroids. 3. Assign membership of each data point to the closest centroid. 4. Compute the new cluster centroids for the k clusters based on the members of each cluster. 5. If no change in centroid coordinates have occurred between steps 2 and 4, the process is terminated. Otherwise repeat step 2 using the new cluster centroid positions that were calculated in step 4. The k-means algorithm will always segment the data into k clusters there is no guarantee that good partitioning will be obtained. Therefore value of k (which is also termed the Gaussian order, m) must be chosen carefully as the data might be automatically clustered into an inappropriate and unrepresentative number of clusters. However, kmeans clustering partitions data without consideration to the density of the data points which are under investigation. To partition data in this manner, expectation maximisation (EM) is required. 5.2.3.2 Expectation maximisation EM takes into consideration not only the means of the clusters, but also the weights and covariance between features and it can provide superior data segmentation than k-means alone (Figueiredo and Jain, 2000). EM is typically initialised by the k-means clustering algorithms, and seeks to select the mixture parameters so as to model the training data accurately. This is achieved with a maximum-likelihood algorithm, and is therefore iterative in nature as no closed form solution exists (Bishop et al., 2006). The likelihood is maximised by EM in order to ensure that the distribution of the training data is modelled accurately. In order to simplify computation, and for computer-based floating point precision, the log-likelihood is taken and this requires N floating point 109 Niall Twomey Chapter 5: 5 Feature 2 0 −5 −10 −15 −10 −8 −6 −4 −2 0 Feature 1 2 4 6 8 (a) The true relationship between the two Gaussian distributions which make up the nonnormal distribution. 5 Feature 2 0 −5 −10 −15 −10 −8 −6 −4 −2 0 Feature 1 2 4 6 8 (b) The partitioning of the classes obtained by k-means clustering (k = 2). 5 Feature 2 0 −5 −10 −15 −10 −8 −6 −4 −2 0 Feature 1 2 4 6 8 (c) The partitioning of the classes obtained by the expectation maximisation clustering (m = 2). Figure 5.3: A demonstration of the difference in clustering which is obtained by k-means clustering and the expectation maximisation algorithm. With subfigures B and C, a line is drawn from each point to its associated cluster. 110 Niall Twomey Section 5.2: Novelty detection for OFC additions rather than N floating point multiplications (where N is the number of examples of the data). L(Xi |Θi ) = log N Y p(x j |Θi ) (5.7) j=1 = N X log j=1 m X !l p(x j |θj ) (5.8) l=1 Like any optimisation algorithm, EM selects parameters so as to minimise a loss function. This process repeats until convergence has been reached or until a pre-specified number of iterations have been performed. EM, therefore, iteratively updates the mixture parameters simultaneously until termination criteria are satisfied. EM is a two-step procedure where an expectation calculation step (E) is first performed. The expectation is the product between the true data distribution and the model of the distribution. EM can therefore be thought of as selecting the mixture parameters so as to maximise the correlation between the true data distribution and the model of the data distribution. The maximisation (M) step selects the next arguments for Equation (5.8) based on the outcome of the previous expectation stages. EM maximises the likelihood because even though the parameters which maximise the likelihood are initially unknown, the knowledge of the distribution of the training data can be used to assess the parameters which were selected on every iteration, and the iterative process guides the algorithm to convergence. Figure 5.3 shows a distribution which was generated with two Gaussian distributions centred at (−2, −2) (marked with ⇤) and (3, 2) (marked with 4), with standard deviations of 3 and 1 respectively. While the two distributions are presented in Figure 5.3 as two classes, this is to illustrate the difference between cluster membership of k-means (Figure 5.3b) and EM (Figure 5.3c). It can be seen that k-means partitions the distribution poorly and mis-categorises a number of data points (obtaining accuracy of less than 90%). The EM partitioning algorithm captures much information from the original distribution, and 111 Niall Twomey Chapter 5: ·10−2 10−1 Likelihood 8 10−2 6 Background Checkups Likelihood 10−3 0 11 20 40 Time (minutes) Background µ µ±σ 4 60 (a) Likelihood series. 0 10 20 30 40 # of occurrences 50 (b) Sample histogram of background likelihood data series. Figure 5.4: Sample likelihood (subplot a) and histogram of the background likelihood (subplot b) of Subject 23. allows for better modelling of the true data (obtaining over 98% accuracy). While the partitioning is not perfect with either method, EM models the data in a superior manner because of the density-based analysis. 5.2.4 Postprocessing After learning the mixture parameters, GMMs produce likelihoods for features extracted from new epochs of normalised HRV data. The likelihoods computed are in the range of 0 and 1. The closer the likelihood is to 1 the more likely it is that the data belongs to the background class, and conversely the smaller the likelihood is the less likely it is that the feature vector belongs to the background class. The likelihood will never reach 0, but will become infinitesimally smaller as the data becomes less similar to the background models. A sample likelihood (for Subject 23) is shown in Figure 5.4a. In this figure, the region highlighted between 0 and 11 minutes represents the background time (before any 112 Niall Twomey Section 5.2: Novelty detection for OFC allergen was introduced to the OFC). The remaining time segments which are highlighted represent checkup times during which the allergist performed checkups on the subject. The solid trace is the likelihood series computed for the subject. This figure is plotted in a log-scale. With the goal of machine-based allergy detection, the criterion for novel data (which is classified allergic) is defined as follows: the likelihood must fall below a specific threshold, th, for a specific duration, d. This is defined thusly because allergy should present with non-background features (see Chapter 4), and the more unlike the background new features are, the smaller the computed likelihood will become. However, the PDFs in Chapter 4 also show a significant amount of overlap between the allergic and nonallergic subjects. Therefore, by incorporating the duration parameter, rejection of spurious deviations will be achieved, which will enable superior classification. The specific values of d and n are obtained via subject independent cross validation. The n and d parameters are henceforth referred to as the ‘multiplicative’ and ‘duration’ parameters individually, and collectively are termed the ‘decision making’ parameters. Figure 5.4b shows the histogram of likelihood values during the initial background state of Subject 23 during OFC, and it can be seen from this example that the values of likelihood follow an approximately normal distribution. Therefore the threshold equation is defined as a function of the mean (µ) and standard deviation (σ) of this background data by th = µ − nσ, (5.9) where n is a multiplicative factor for the standard deviation. This was chosen to make the modelling and decision making subject-adaptive, as it is a function of the subject’s own background distribution. The larger the n parameter the smaller the likelihood must be to surpass the threshold, i.e. the less similar to the background data. 113 Niall Twomey Chapter 5: Decision making ECG QRS HRV Classification Postprocessing Result Figure 5.5: Flowchart of classification procedure involving the recording of ECG, annotation of QRS complexes, feature extraction and the classification procedure of OFC. The second parameter for an allergic decision is d, which is the time for which the likelihood must remain below the threshold th for an allergic decision to be reached. The purpose of this parameter is to reduce the effect of spurious irregularities in the likelihood series which might not be due to allergy, but could be due to the natural variation of the heart. However, the duration parameter will also allow for less extreme threshold values which will facilitate in obtaining better classification results. The signature of abnormality which will be detected by this classification routine can be defined as being a substantiated and sustained departure from the background HRV levels. 5.3 Classification procedures Figure 5.5 shows a high-level overview of the allergy classification procedure. The ECG is first recorded. As mentioned in the introduction, analyses in this chapter focus on manually annotated QRS points in order to confirm the presence of allergic signatures in HRV features. The next step in this process is manual annotation of the QRS points. The features described in Chapter 4 are then extracted and background models are learnt. Following this, post-processing and decision making is performed in order to classify the subject in question as being allergic to the allergen they are being tested against. The specifics of the classification and post-processing stages will be discussed in subsequent sections. 114 Niall Twomey Section 5.3: Classification procedures Model Selection Training data Data selector PCA Model Perform PCA GMM Model Generate Likelihood n, d Decision making Machinebased result Testing routine Figure 5.6: Illustration of the data segmentation and testing routines employed in the allergy classification procedure. For the remainder of this thesis, machine-based detection of allergy is termed classification of allergy, while the result of the OFC is termed diagnosis of allergy. When a subject is stated as being ‘classified allergic’, there is, therefore, an inherent implication that it was with the statistical modelling and post-processing processes that the classification was made. The classification and post-processing blocks of Figure 5.5 can be expanded to what is shown in Figure 5.6. In this Figure, model training and testing data are separated. From the training data, PCA and GMM models are selected by the parameter selection routine. The selected multiplicative and duration post-processing parameters are also obtained from the training data and are used by the system to classify allergy on the test subject. This figure shows how the testing and training sections of this classification routine are completely independent, and that the testing data bears no influence in the model and decision making parameter selection routines. This system only classifies allergy, i.e. it detects abnormal HRV features only. If only normal features are detected tests cannot terminate and will continue as normal. 115 Niall Twomey Chapter 5: 5.4 Classifier model selection 5.4.1 Performance evaluation There are various performance assessment routines proposed in the literature (Kohavi et al., 1995) such as bootstrapping, split-sample, etc. Their effect on the reported performance for neonatal seizure detection has been compared in previous studies (Temko et al., 2011b). The split-sample method where one fixed partition of the available data is allocated for training and the rest is used as a testing set has several major disadvantages as such a division results in a potentially large bias. Over-optimistic and indeed overpessimistic results can be obtained depending on what seems an arbitrary partition of the data yielding a ‘good’ or ‘bad’ split. In this work, leave-one-out (LOO) is used to assess the performance of the developed allergy detector. With this all but one subject are used for training and the remaining subject is used for testing. The process is repeated until every subject was tested, and the average performance is reported. LOO is known to be an almost unbiased estimation of true generalisation error (Vapnik and Kotz, 1982). Additionally, in contrast to randomised re-sampling routines (bootstrapping), the LOO eliminates any subjectivity from the testing protocol, hence it can be repeated and exactly the same results will be obtained (Thomas et al., 2013). What is examined with the LOO procedure is not a particular model, but the methodology used to obtain such a model. This means that a good modelling system is obtained by this methodology, and the parameters are not fixed for all subjects. Here, 24 data splits of 23 vs. 1 are made by the LOO method formed the performance assessment routine, and this means that 24 unique GMM models are obtained by this procedure. 116 Niall Twomey Section 5.4: Classifier model selection 5.4.2 Parameter selection 5.4.2.1 Search space In each of these 24 splits, nested cross-validation model selection on the training 23 subjects’ data was performed to choose suitable model parameters. Those include: • Percentage of information retained by PCA for feature set reduction: The following set of values was searched: { 80%, 90%, 95%, 99%, 99.9%, 100% } This range was selected because of the complicated nature of diagnostic HRV, and because preliminary analyses showed that the allergic condition was better identified by preserving more than 80% of the feature variance. • The number of Gaussians in the GMM model: The following set of values was searched: { 1, 2, 4, 8, 16, 32, 64 } This was in order to facilitate modelling the data distributions with simple and complicated models depending on the complexity of the data used in training. This search space has also been successfully employed for EEG and speech processing applications (Reynolds and Rose, 1995; Thomas, 2010). • The multiplicative factor (n) in decision making: The integer-rounded values logarithmically distributed over the maximum range were searched. This range was selected in order to accommodate the entire range of likelihoods which are required. With the logarithmically distributed range more precision is obtained at the lower ranges without affecting the limits of classification. • The duration parameter (d) in decision making: The integer-rounded values logarithmically distributed over the maximum range were searched. 117 Niall Twomey 5.4.2.2 Chapter 5: Cost function Fully automated machine learning cannot replace allergists who conduct the OFCs. This is due to the fact that the allergists are required to administer the doses of the problem food to the subjects throughout the challenge, and, should allergy occur, they will be required to administer antihistamines. Therefore, the classification routine which is discussed here is designed as a diagnostic assistance tool, and should complement the diagnosis of allergy in a way that improves diagnoses. The cost function which is defined was designed in collaboration with the clinicians in order to best suit their diagnostic needs. The consequence of false positive classifications would yield unacceptable effects on the quality of life of the subjects (Chapter 1). With this consideration, the parameters found within the search space which achieve 100% specificity in the nested cross validation (i.e. models which correctly classify all non-allergic training subjects) are initially selected. This ensures that the parameters were selected based on obtaining no false positive classification in training. From this reduced set, the parameters which lead to the highest sensitivity are selected. If there are more than a single set of parameters that satisfy these conditions, the parameters which achieve the maximum total time gain are used. Sensitivity, specificity and time gain are defined in subsequent sections. The search space was searched in a targeted, but exhaustive manner. This was because only a small portion of the decision making parameters satisfy the criteria mentioned above. Therefore, it was possible to avoid computing the majority of the parameters in the exhaustive enquiry. Other parameter selection routines, such as receiver operating curves, would require the entire search space to be investigated. However, with consideration to the previously defined cost function it is possible to reduce the computation significantly and target the exhaustive search to only the range of values that satisfy the decision making criteria. 118 Niall Twomey Section 5.5: Classification metrics Diagnosis Classification p n p True positive False negative n False positive True negative Figure 5.7: The confusion matrix showing how sensitivity and specificity are obtained with regard to the ground truth (diagnosis) and predicted (classification) results. 5.5 Classification metrics 5.5.1 Sensitivity/specificity The measurements of sensitivity (Se) and specificity (Sp) were computed in order to measure the overall accuracy of the allergy detection framework. These are commonly referred to as the true positive and true negative rates respectively. They are both bounded between 0 and 100%, and higher values indicated better accuracy. Both sensitivity and specificity measure classification accuracy against the ground truth diagnosis and are defined by Se = TP ⇥ 100%, TP + FN (5.10) Sp = TN ⇥ 100%, TN + FP (5.11) 119 Niall Twomey Chapter 5: where true positive (TP) are the number of allergic subjects who were classified allergic, false negative (FN) are the number allergic subjects who were misclassified as non-allergic, true negative (TN) are the number non-allergic subjects classified as non-allergic, and false positive (FP) are non-allergic subjects who were classified as allergic. A confusion matrix can be employed to visualise the means by which these metrics are computed, and is shown in Figure 5.7. In this figure, p is a positive result (i.e. allergic) and n is a negative result (i.e. non-allergic). Sensitivity is an assessment of the two upper quadrants while specificity is represented in the lower two quadrants. 5.5.2 Time gain parameters Other metrics which provides an insight into the algorithmic performance are related to the time gain. Three time gain metrics are calculated for the allergic subset of subjects only, as no time gain can be obtained for the non-allergic subjects. 5.5.2.1 Time gain Time gain measures the difference in time between termination of the OFC by the allergists and classification of allergy. In effect it demonstrates whether it is possible to conclude OFC earlier and reduce the overall risk of anaphylaxis and other strong reactions due to early administration of antihistamines. The average time gain factor is reported in two ways: first as the sum of time gains divided by the number of allergic subjects (always 15 in this study), and second as the total time gain divided by the number of subjects whose allergy was detected by the framework. These two metrics will be equal if perfect sensitivity is obtained, and are termed total time gain (TGT) and specific time gain (TGS) respectively. The remainder of the time gain parameters have associated total and specific results. 120 Niall Twomey Section 5.5: Classification metrics The reason for calculating these two time gain metrics is because this allergy detection platform is envisioned as an diagnosis assistance tool. Therefore it is appropriate to quantify its effectiveness with the entire set of allergic subjects, but also its effectiveness on the subset of correctly classified subjects is also of interest. 5.5.2.2 Doses saved The time gain factor can be converted to another metric which measures the number of doses of the allergen which would not need to be administered if the allergy classification was introduced and the OFC was halted when automatic detection diagnosed allergy. This metric is termed doses saved henceforth. This doses saved metric is a measure of the risk reduction that can be achieved when allergy detection is employed, as with fewer doses administered there is a smaller likelihood of allergic reactions presenting. 5.5.2.3 Activation percentage An additional time gain metric is also calculated which determines the percentage of the allergen which was required to be consumed for abnormal HRV-features to be detected by the classification routine. This metric is related to the doses saved metric in a nonlinear manner, and, therefore, also gauges the risk reduction that is gained by automatic classification. For example, if a subject reacts after consuming five doses of an allergen, but the novelty detection framework detects allergy after three doses, it can be stated retrospectively that only 22.5% of the consumed allergen was required to induce signatures of allergy on the HRV features. This percentage is termed the activation percentage. With N total doses administered to a subject, and s doses saved by the classification, the activation percentage is calculated by 121 Niall Twomey Chapter 5: N −s X Activation percentage = i=1 N X 2i−1 ⇥ 100%, (5.12) 2i−1 i=1 which, for the previously stated example results in 1+2+4 ⇥ 100% 1 + 2 + 4 + 8 + 16 7 ⇥ 100% = 31 Activation percentage = = 22.5%. 5.6 Results 5.6.1 A brief note on the structure of these results This section presents the results which were obtained with an epoch length of 60 seconds. While epoch lengths of 120, 180, and 300 seconds were also investigated, the results which were obtained at an epoch length of 60 seconds will first be demonstrated. Later, results will be presented which show what occurs at the different epoch lengths, before finally discussing a more optimal classification routine which is employed for the remainder of this thesis. The purpose of separating the results is to first introduce the methodology of performance assessment before finally discussing a more optimal routine. 122 Niall Twomey Section 5.6: Results Likelihood 10−1 10−2 Background Checkups Threshold Likelihood Fail times 10−3 10−4 0 10 20 30 50 40 Time (minutes) 60 70 80 Figure 5.8: A demonstration of early detection of allergy (with Subject 11). The segments at 45, 60 and 80 minutes which fall beneath the threshold were classified as allergy. 5.6.2 Results obtained at epoch length of 60 seconds The overall results which were obtained at an epoch length of 60 seconds are presented in Table 5.1. In this table, the first 15 subjects were diagnosed as allergic, and the final 9 subjects were diagnosed as non-allergic. The diagnosis column shows the diagnosis of allergy for the subject, and 1 indicates allergic, and 0 indicates non-allergic. The prediction column presents the results which were obtained by the classification routine, and 1 indicates a classification of allergic and 0 indicates a non-allergic classification. The time gain column presents the time gain results which were obtained. The cells which are marked as ‘—’ did not achieve any time gain. The time gain metric is only applicable for subjects who were diagnosed allergic. It can be seen from Table 5.1 that all of the non-allergic subjects were classified as nonallergic obtaining 100% specificity. Subjects 1, 3, and 13 were not classified as allergic, however, and this yields sensitivity of 80%. Figure 5.8 shows how allergy was detected before the OFC was terminated. In this Figure the shaded regions represent the background and checkup times. The signal trace is the 123 Niall Twomey Chapter 5: Table 5.1: Classification results obtained with the novelty detection routine at epoch length of 60 seconds. Subject ID Diagnosis Classification Time gain 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 — 35.0 — 16.0 34.0 12.0 29.0 12.0 25.0 5.0 40.0 19.0 — 94.0 13.0 — — — — — — — — — likelihood which was calculated and the segments of the likelihood which satisfied the allergic criteria are indicated with ⇥ markers. The likelihoods obtained during checkup times are not considered as allergic. The start of the final checkup period is when the allergist concluded the OFC. The challenge shown in Figure 5.8 was concluded by the allergist at approximately the 85th minute when symptoms of allergy manifested. It can be seen from the example in Figure 5.8 that the subject is classified allergic by the system developed here approximately 40 minutes sooner than the challenge was concluded by the allergist. 124 Niall Twomey Section 5.6: Results When considering the time gain metrics for the entire database, this achieved TGT of 22.26 minutes, TGS of 27.83 minutes, two of five doses were saved which yields an activation percentage of 22.5%. 5.6.3 Overall results For all epoch lengths, 100% specificity was obtained. Sensitivities of 80% were obtained at epoch lengths of 60 and 180 seconds, and 73% sensitivity was obtained with epoch lengths of 120 and 300 seconds. It should be noted here that every subject is classified by different modelling parameters and this is due to the LOO procedure which was incorporated. It should also be noted that different modelling and decision making parameters are utilised by the different epoch lengths for the same reason. 5.6.4 Inconsistent classification at different epoch lengths Subjects 2, 3, 13, 14 and 15 were not consistently classified for different epoch lengths. 5.6.4.1 Short-duration signatures of allergy Figures 5.9a — 5.9d demonstrate this inconsistency with Subject 2. The highlighted regions in these figures represent the background and checkup periods, and the solid trace is the likelihood which was calculated over the OFC. Satisfaction of the allergy criteria occurred once in these figures and is marked with an arrow in Figure 5.9a at approximately 65 minutes. It can be seen here that the departure which classified Subject 2 as allergic in Figure 5.9a is reduced in significance with longer epochs, until in Figures 5.9c and 5.9d, the anomaly is indistinguishable from the likelihood trace. With the longer epochs, the extent of this departure is averaged with ‘regular’ features, and as such the likelihoods at these times are less pronounced which reduces the extent 125 Likelihood Niall Twomey Chapter 5: 10−1 10−3 0 10 20 30 40 50 60 70 80 90 100 70 80 90 100 80 90 100 80 90 100 Likelihood (a) Epoch length of 60 seconds. 10−1 10−3 0 10 20 30 40 50 60 Likelihood (b) Epoch length of 120 seconds. 10−1 10−3 0 10 20 30 40 50 60 70 Likelihood (c) Epoch length of 180 seconds. 10−1 10−3 0 10 20 30 40 50 60 Time (s) 70 (d) Epoch length of 300 seconds. Figure 5.9: Example demonstrating how the generated likelihood for Subject 2 satisfies the allergy criteria at an epoch length of 60 seconds (subplot a) while failing to do so for epoch lengths of 120, 180 and 300 seconds (subplots b — d). 126 Niall Twomey Section 5.6: Results of the departure at these feature lengths. In effect longer epochs will act as a higher-order moving average filter which reduce the significance of these deviations with background HRV metrics, while the shorter epochs allow for classification of the segment due to higher relative importance of novel HRV features. 5.6.4.2 Longer signatures of allergy The opposite of this phenomenon can also occur where longer feature lengths enhance deviations from the background. Figure 5.10 shows how Subject 13 is misclassified as non-allergic at epoch lengths of 60, 120 and 180 seconds. However, it can be seen that with the longer epochs certain departures from the background likelihood become more pronounced (e.g. at approximately 70 minutes). With an epoch length of 300 seconds Subject 13 is correctly classified as allergic at approximately 35 minutes, which is approximately 70 minutes before the challenge was terminated. This is not an isolated departure from the background levels either, and at 70, 90 and 100 minutes allergy is also detected in the likelihood trace. 5.6.4.3 Tolerance to non-allergic variances The likelihood trace in Figures 5.9 and 5.10 all surpass the threshold at some instance in time, e.g. at the very end of every trace with Subject 2, and at approximately 90 minutes with Subject 13. Yet, it is only at certain epoch lengths that the allergic criteria are satisfied. This is due to the inclusion of the duration parameter in decision making. Without this parameter, a lower threshold (i.e. a larger departure from the background) would be required to classify allergy in order to reject deviations from the non-allergic subjects. Figure 5.11 shows the likelihood trace for Subject 16, who was diagnosed non-allergic and also classified as non-allergic at all epoch lengths. In this Figure it can be seen that with longer epoch lengths the baseline of the likelihoods is nearly continuously departed from the background levels of the first checkup. At all epoch lengths (in particular at 50 127 Niall Twomey Chapter 5: 10−1 10−3 0 10 20 30 40 50 60 70 80 90 100 80 90 100 80 90 100 80 90 100 (a) Epoch length of 60 seconds. 10−1 10−3 0 10 20 30 40 50 60 70 (b) Epoch length of 120 seconds. 10−1 10−3 0 10 20 30 40 50 60 70 (c) Epoch length of 180 seconds. 10−1 10−3 0 10 20 30 40 50 60 70 (d) Epoch length of 300 seconds. Figure 5.10: Example demonstrating how the generated likelihood for Subject 13 does not satisfy the allergy criteria at an epoch length of 60, 120 and 180 seconds (subplot a — c) but the criteria are then met for the epoch length of and 300 seconds (subplots d). 128 Niall Twomey Section 5.6: Results minutes with an epoch length of 120 seconds in Figure 5.11b) the likelihood surpasses the threshold at multiple occasions. However, with the inclusion of the duration parameter, none of these points are flagged as allergic, and the subject is correctly classified as nonallergic by the classification routine. It can be seen in Figure 5.11 that with longer epoch lengths, the baseline of the likelihood traces becomes less similar to the background recordings. This is because Subject 16 became agitated during the OFC. This agitation was fueled by the periodic checkups, and this can be seen directly after the administration of the first sub-portion at approximately 10 minutes. This agitation was noted when the ECG was recorded during the OFC. This agitation demanded a number of extra checkups during the challenge as the allergists were required to verify that the agitation was not as a result of allergy. It is believed that the cause for the agitation was psychological: the subjects who are tested for allergy in this manner have been repeatedly told before the OFC that consuming the allergen may cause them to become sick. Subject 16, for example, was tested against egg which was in the form of a cake. Cakes are instantly recognisable to a six-year-old child, and repeated subjection to the food they live in fear of (Chapter 1) resulted in the subject being continuously agitated which resulted in almost all of the likelihoods calculated during the OFC being departed from ‘normal’ background. Even though the subject did not react to the food, there is an underlying fear that they might, and this fear is not inhibited by the fact that the challenge is closely monitored by allergists. Without the inclusion of the duration parameter in the classification routine, these non-allergic subjects would increase the value of the multiplicative parameter of the threshold equation in order to satisfy the 100% specificity criteria in training. This would consequently require a larger deviation from the background for allergy classification and might not facilitate as high sensitivities as were obtained. Figure 5.11a may appear to display a substantiated departure from the background recording, so it is interesting to note that even with this variance, the classification routine did not once classify allergy for this subject, but in all cases correctly classified them 129 Likelihood Niall Twomey Chapter 5: 10−1 10−3 0 20 40 60 80 100 120 100 120 100 120 100 120 Likelihood (a) Epoch length of 60 seconds. 10−1 10−3 0 20 40 60 80 Likelihood (b) Epoch length of 120 seconds. 10−1 10−3 0 20 40 60 80 Likelihood (c) Epoch length of 180 seconds. 10−1 10−3 0 20 40 60 80 (d) Epoch length of 300 seconds. Figure 5.11: Example demonstrating how the generated likelihood for Subject 16 surpasses the threshold, but does not satisfy the allergy criteria due to the inclusion of the duration parameter and for all epoch lengths the subject is correctly classified as nonallergic. 130 Niall Twomey Section 5.6: Results as non-allergic. Even with the presence of ‘agitation’, it is demonstrated that all nonallergic subjects can be classified correctly, which allows justification to the claim that the allergy detection platform demonstrated here is robust in classification, and robust in discrimination between allergy and allergy-like HRV features. Subject 1 was not classified correctly at any epoch length. This is because this challenge was concluded very abruptly three minutes after the challenge began and because a signature of allergy which was detectable by the means discussed here did not present in the ECG of this subject. Therefore, it is believed that perfect sensitivity is not possible to achieve perfect sensitivity on the ECG of the allergy database. 5.6.5 Boosted allergy classification Here, ensemble-based result fusion is performed which will be shown to obtain more optimal classification performance. 5.6.5.1 Sensitivity/specificity Based on the inconsistent classification results which were obtained, it appears that the detection of abnormal signatures on the HRV is a dynamic process which is not suited to one particular epoch length. Therefore, to obtain more optimal results, the remainder of this chapter and thesis will employ classifier fusion based logically OR-ing the results which were obtained for the individual epoch lengths. This is employed in EEG classification applications where individual channels are OR-ed together. The ORing process is justified because it does not violate the subject independent nature of model and post-processing parameter selection. Table 5.2 shows the tabulation of allergy classification with regard to the epoch length. In this table, only Subjects 1 — 15 (i.e. the allergic subjects) are tabulated. 131 Table Niall Twomey Chapter 5: Table 5.2: Tabulation of the classification results of the allergic subjects where ‘1’ represents an allergic classification (TP) whereas ‘0’ represents a non-allergic classification (FN). Epoch length Subject ID Logical OR 60 120 180 300 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Sensitivity Specificity 80.00% 100.0% 73.33% 100.0% 80.00% 100.0% 73.33% 100.0% 93.33% 100.0% elements signified by ‘1’ indicate that the subject been correctly classified as allergic and ‘0’ represents false negative classification. The fifth classification result column which is emphasised by the emboldened characters show the results which are obtained by logically OR-ing the classification results obtained at each of the epoch lengths. In the cases of the individual epoch lengths the highest sensitivity obtained was 80% (12/15 correct classification) which were obtained for 60 and 180 seconds, but by considering the logical OR-ing process, 93.33% sensitivity (14/15 correct classifications) is obtained, and only one false-negative classification was achieved. 132 Niall Twomey Section 5.6: Results Table 5.3: Classification result, time gain, doses saved and activation percentages obtained by the classification routine. The results in this table were obtained by fusing the results obtained for the individual epoch lengths together. 5.6.5.2 Subject ID Classification Time gain Doses saved Activation percentage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 — 35 55 17 39 20 30 48 27 32 42 22 73 94 14 — 2 3 0 2 0 1 2 1 2 2 1 4 4 1 — 22.58 9.68 100.0 22.58 100.0 42.86 22.58 33.33 14.29 22.58 33.33 3.23 3.23 48.39 µ σ µ σ 93.33% — 93.33% — 36.53 23.91 39.15 22.50 1.66 1.30 1.78 1.25 38.58 34.26 34.18 30.88 Time gain parameters Table 5.3 presents the set of time gain results which were obtained by the automatic classification of allergy when employing classifier fusion. As no false positive classifications were obtained here, this table only presents the allergic subjects. This table presents the time gain, doses saved and activation percentage values which were obtained. 5.6.5.2.1 TGT and TGS Column three of Table 5.3 tabulates the set of time gain results which were obtained for the allergic subjects. In this column the ‘—’ symbol is used to mark when the classification routine failed to classify allergy. The mean and 133 Niall Twomey Chapter 5: standard deviation of the total and specific time gains (TGT and TGS respectively) are also presented. The logical OR-ing classification process benefits from selecting the best time gain metrics from the set of time gains which were for every epoch length. Subjects 4 — 12 are classified as allergic at all epoch lengths, and of these, Subjects 4, 5, 7, 9, 11 and 12 allergy were classified at approximately the same time in every instance. However, with Subjects 6, 8 and 10 the OR-ing process adds an additional 60 minutes time gain in comparison to the results obtained from the individual epoch length cases, which contribute overall time gain metrics which are approximately 60 minutes greater than the largest individual epoch length based procedure. For example, between 60 second epoch results and the merged results, both TGT and TGS increased by approximately 12 minutes. The significance of obtaining high time gain metrics is that emergency rescue medication could be quickly administered which could introduce the possibility of reaction-free OFCs for subjects who would suffer from a reaction with the current OFC. 5.6.5.2.2 Portions saved Column four in Table 5.3 presents the number of portions saved which would have been obtained had allergy classification been employed during the OFC recordings. Approximately 1.8 portions are saved when allergy is classified. In three cases, however, with Subjects 1, 4 and 6, no portions were saved. With Subjects 4 and 6 allergy was classified, but owing to the fact that no additional portions of the allergens were administered between the classification of allergy and the diagnosis of allergy, the full amount of the food was required for the detection. 5.6.5.2.3 Activation percentage The final time gain parameter which is calculated is the activation percentage, and this is shown in the fifth column of Table 5.3. The activation percentage is the percentage of the allergen which was required to be consumed before signatures of allergy in HRV features were detected by the allergy classification 134 Niall Twomey Section 5.7: Discussion framework. In Table 5.3, values of 100% indicate that the entire dose of the allergen was required. For Subjects 1, 4 and 6, as no doses were saved from consumption, the activation percentage of 100% was achieved. However, in the case of Subjects 4 and 6, time gains of 17 and 20 minutes were still obtained. Overall, approximately 40% of the doses administered to the subjects are required for allergic classification based on the HRV features. This figure is reduced to approximately 30% when considering only the subjects who were correctly classified allergic. This value indicates that when machine-based allergy classification is achieved, consumption of less than one third of the dose required for a diagnosis of allergy is required for classification of allergy (or with a 70% reduction in exposure to the problem food). 5.7 Discussion A close inspection of several subjects is required to provide additional insight into the robust nature of the allergy classification system’s behaviour. 5.7.1 Specificity of OFC classification The importance of obtaining very high specificity was discussed previously. The significance of mis-classifying non-allergic subjects would have a negative effect on their quality of life indefinitely (Sicherer et al., 2006; Cox et al., 2008). This would be an unacceptable consequence, so parameters were selected to not misclassify allergy on the training data. This characteristic was preserved with the unseen testing subjects. It is very significant that this was obtained as the testing data remains unavailable during parameter selection (see Figure 5.6). Indeed, because perfect specificity was obtained in this section, it can be 135 Niall Twomey Chapter 5: stated that the classification of allergy is equivalent to the diagnosis of allergy, due to the subject-independent nature of the means in which the parameters were selected. 5.7.2 Robust classification The heart rate of Subject 20 presented with frequent isolated arrhythmia events. This is a condition where the heart rate changes from its resting rate to a much higher value, the heart rate will then relax to its resting value (Clarke et al., 1976). Figure 5.12 shows an example of the arrhythmia for this subject. Figure 5.12a shows the raw ECG signal, with the QRS complexes identified with ⇥, while Figure 5.12b depicts the beat-to-beat heart rate which was derived from the QRS points. The resting heart rate can be seen to be approximately 100 BPM before the arrhythmia incidents at 2,758 — 2,760 seconds. The heart rate then reduces by over half to approximately 34 BPM at 2,760 seconds as no QRS complexes have occurred, before rising to approximately 250 BPM soon after. The arrhythmia incidents occur between 2,758 and 2,762 seconds. Subject 20 experienced these arrhythmia events in a number of occasions during their food challenge. However, even in light of these abnormal heart beats — which occurred in some instances over five times per minute — Subject 20 was correctly classified as non-allergic at all epoch lengths. It was previously discussed how Subject 16 presented with unusual HRV features which were due to agitation that the subject felt throughout the challenge. Yet, this subject was also correctly classified as non-allergic by the classification routine presented here at all epoch lengths. The fact that these two non-allergic subjects who presented with abnormal HRV features were both correctly classified as non-allergic in all routines investigated shows that the classification routine presented in this chapter is robust in discrimination between allergic and non-allergic subjects. Indeed, the system is robust in discrimination between allergic 136 Niall Twomey Section 5.7: Discussion Amplitude (µV) 400 200 0 2756 2757 2758 2759 2760 2761 2762 2763 2764 Time (s) Heart Rate (BPM) (a) Raw ECG trace presenting with arrhythmia between seconds 2758 and 2762. 200 100 0 2756 2757 2758 2759 2760 2761 2762 2763 2764 Time (s) (b) The beat-to-beat heart rate calculated and the effect of arrhythmia beats on this. Figure 5.12: Example of arrhythmia on the ECG trace (a) and the effect this has on the heart rate (b) on Subject 20. 137 Niall Twomey Chapter 5: and allergic-like signatures (i.e. HRV signatures which are affected by agitation and arrhythmia). It is very common to pre-process extracted QRS points and remove arrhythmia and ectopic beats with HRV. This was not performed here for a number of reasons, principally because later chapters will employ classification routines which discover QRS points automatically. It is possible for non-QRS points to be mis-labelled as QRS points by these algorithms under the influence of artefacts. Therefore, by allowing arrhythmia and ectopic beats in this training set, the developed models should be more tolerant to artefacts, see Chapters 6 and 7. 5.7.3 Parameter selection Many factors can influence parameter selection. In the attempt to achieve the optimal results, post-processing parameters were selected to achieve 100% specificity and the maximum sensitivity on the training data. Yet, the values chosen will be in a region where there is the possibility that mis-classifications might occur. This can be seen in Figure 5.11 where individual cases of the likelihood surpass the threshold at regular intervals, and it is with the selection of appropriate duration parameters that the previously stated robustness is obtained. This is evidenced by the fact that perfect specificity was obtained at all epoch lengths and that very high sensitivity was also obtained. 5.7.3.1 Importance of correct parameter selection Subjects 1, 2 and 3 presented with HRV features which did not vary greatly due to their allergic reactions, and it is because of this this fact, the selection of PCA and GMM parameters is crucial for appropriate emphasis of novel regions. Figure 5.13 shows the likelihood of Subject 2 where the PCA and GMM parameters were selected manually to illustrate how a poor choice of modelling parameters can affect the range of the computed likelihoods. The extent of the anomaly which classified the subject as allergic at the 65th 138 Niall Twomey Section 5.7: Discussion Likelihood 100 10−1 10−2 10−3 0 10 20 30 40 50 60 70 80 90 100 Time (minutes) Figure 5.13: The likelihood series chosen for Subject 2 which does not diverge from the background level significantly enough to classify allergy. PCA preserved 80% of the feature variance which was modelled with a GMM order of 32 at an epoch length of 60 seconds. minute in Figure 5.9a was not equalled in Figure 5.13. Indeed, the deviation shown in this figure is no more varied than the likelihoods obtained in the background. This example illustrates the importance of parameters selection, and expresses the fact that allergy is non-trivial to detect through statistical HRV feature analysis while also demonstrating the data-driven nature of the classification procedure. 5.7.3.2 Alternative parameter selection Appendix A presents a study in which alternative parameter selection routines to those employed in this chapter were investigated. It shows that while the means by which parameters are selected is important, alternative methods can be utilised which will also achieve 93% sensitivity and 100% specificity. However, the selection routine in this chapter is time gain aware (i.e. parameter selection selects the parameters which obtained the best time gain results in training data) and those are discussed in Appendix A were not. This is reflected in the results which were obtained, and in all cases the parameter 139 Niall Twomey Chapter 5: selection routine described in this chapter achieved superior time gain, doses saved and activation percentage results. This provides evidence that the parameter selection and cost function routines which were discussed in this chapter are very well suited for the classification problem. 5.7.4 Role of classification in OFCs The classification routine presented here is well suited as a diagnostic assistance tool. This is because the allergist can never be replaced, as remote monitoring of physiological signals cannot administer allergens or antihistamines when required. Therefore, the allergist will always be present during OFC, and if this classification system is to be used in conjunction with the standard OFC, machine learning algorithms can greatly assist. Excellent time gain metrics were obtained, and on average, approximately 35 minutes would have been saved by this classification routine. This time could be employed to good advantage for the administration of antihistamines which could reduce — and possibly eliminate! — allergic symptoms and reactions in some cases. For machine-based allergy classification, false negative classifications (i.e. classifying an allergic subject as non-allergic) can be tolerated, as these challenges reduce to the standard OFC, which is the current state of clinical art. The classification routine here should complement the diagnosis of allergy and should be used alongside the allergists. With having obtained 100% specificity in a subject-independent manner, machine-based allergy detection can introduce many improvements to the clinical diagnosis of allergy. It is worth noting that allergists diagnose allergy in these patients by having access to more subjective signals and information sources such as mood and temper which were not available for this automated analysis. These are not employed for machine learning purposes because it is difficult to monitor these objectively and in a non-invasive manner. Non-invasive monitoring is important for allergy detection, as the introduction 140 Niall Twomey Section 5.7: Discussion of discomfort might aggravate the subjects, and may increase the number of non-allergic subjects who presented with likelihood traces similar to Subject 16. Clinically other signals, such as the blood pressure, blood oxygen saturation and temperature, will be the last physiological metrics which change as a result of allergy. As the goal of this research is to detect allergy as early as possible, only the ECG is recorded. However, the system developed here presents an additional insight into the temporal nature of subjects’ states of allergy during OFCs, and even without the extra data which the allergists have access to, excellent results were obtained. 141 CHAPTER 6 Automatic QRS detection 6.1 T Introduction HE features which were extracted for classification in Chapter 5 were extracted from QRS points which were manually annotated. These were used in order to definitively assess whether it was possible to identify allergic subjects through analysis of HRV features which has not been done before. It was discovered that allergy affects HRV in a manner which allows classification, and therefore automated QRS detection is assessed here before fully automated allergy classification is assessed in Chapter 7. 6.2 QRS detection Software-based QRS detection has been an important research topic for more than 40 years (Kohler et al., 2002). At the core of QRS detection are algorithms which were designed to 142 Niall Twomey Section 6.2: QRS detection specifically enhance the QRS complex of the ECG while diminishing the levels of artefact and other features of the heart beat (i.e. P– and T-waves). All QRS detection algorithms require a thresholding stage, and the underlying algorithms which are employed cover a wide range of disciplines of DSP. The rich set of algorithms which can be employed reflects how technological ability has evolved over the years in which QRS detection has been researched, and popular methods can involve derivative processing (Okada, 1979; Fraden and Neuman, 1980; Ahlstrom and Tompkins, 1983; Arzeno et al., 2008); digital filter banks and wavelets (Gyaw and Ray, 1994; Di Virgilio et al., 1995; Bahoura et al., 1997; Afonso et al., 1999; Chen et al., 2006; Strang and Nguyen, 1996); template matching (Dobbs et al., 1984); adaptive filtering architectures (Kyrkos et al., 1987; Hamilton and Tompkins, 1988; Thakor and Zhu, 1991); artificial neural networks (Xue et al., 1992; Vijaya et al., 1998; Rajendra Acharya et al., 2003); transformation methods (Bolton and Westphal, 1981a, 1984; Benitez et al., 2000, 2001), and new methods are always being investigated. The reason that many different areas of signal processing have been employed for QRS detection is because it is very difficult to generalise one algorithm towards the complete set of ECG waveform shapes. The extent of algorithms which exist is also indicative of the fact that it is not appropriate to cater towards ‘well behaved’ ECG only, because the shape of the ECG and the resulting HRV features can be affected by age (Nunan et al., 2010; Aziz et al., 2012), gender (Antelmi et al., 2004), heart disease (Rajendra Acharya et al., 2006), physical fitness (Hamer and Steptoe, 2007) and even by coffee consumption (Monda et al., 2009). These differences in ECG-shapes can result in missed or extra QRS points which each have the effect of inaccurately reporting the HRV features. 6.2.1 QRS validation Of the previously mentioned algorithms, each has their own advantages, but in every case it is necessary to validate the QRS points with hard limits which generally cannot 143 Niall Twomey Chapter 6: be exceeded, i.e. the maximum heart rate that can be obtained is age-dependent and is only surpassed in very exceptional circumstances. For this reason, routines have been developed which validate QRS points which are returned by QRS detection algorithms. A set of rules were defined by Hamilton (2002) and are reproduced here: • Ignore all peaks that precede or follow larger peaks by less than 200 ms. • If a peak occurs, check to see whether the raw signal contained both positive and negative slopes. If not, the peak represents a baseline shift. • If the peak occurred within 360 ms of a previous detection check to see if the maximum derivative in the raw signal was at least half the maximum derivative of the previous detection. If not, the peak is assumed to be a T-wave. • If the peak is larger than the detection threshold call it a QRS complex, otherwise call it noise. • If no QRS has been detected within 1.5 RR intervals, there was a peak that was larger than half the detection threshold, and the peak followed the preceding detection by at least 360 ms, classify that peak as a QRS complex. These rules provide the ability to increase the accuracy of the QRS detection which was performed. 6.2.2 Validation databases Typically, QRS detection algorithms will be validated on databases which provide an abundance of healthy and unhealthy ECG shapes. This strategy allows for objective comparisons between QRS detection algorithms with regard to detection accuracy and computational complexity on challenging data. If QRS detection was assessed on data recorded from healthy volunteers only, quantification of arrhythmias and other 144 Niall Twomey Section 6.2: QRS detection cardiovascular defects would not be addressed, and accuracy would be very subjectively reported. One of the more popular databases on which QRS detection is assessed is the Massachusetts Institute of Technology Beth Israel Hospital (MIT-BIH) database (Mark et al., 1982; Goldberger et al., 2000; Moody and Mark, 2001). This database consists of 48 half-hour, two-channel ambulatory ECG recordings, which were recorded between 1975 and 1979. Twenty-three of the recordings were manually selected at random from a set of 4,000 in– and out-patients. The remainder were selected from the same set to include clinically significant arrhythmias which were not well represented by the initial selection. Twicevalidated expert annotations accompany this database. In this chapter, QRS detection results will be discussed from the MIT-BIH and allergy databases to verify the assessment of QRS detection. When discussing the MIT-BIH arrhythmia database, individual cases are labelled as patient records, while when discussing the allergy database individual cases are labelled as subjects. From the medical perspective, this is because participants of the allergy database were not admitted as patients, whereas those of the MIT-BIH database were. This naming convention is also employed to facilitate simple distinction between the two databases without the need to reference the name of the database in question, as references to patients refer to persons from the MIT-BIH database, and references to subjects refer to persons from the allergy database. 6.2.3 Sensitivity and positive predictivity The sensitivity of QRS detection measures true positive rate of QRS detection and is defined by Se = TP ⇥ 100%, TP + FN 145 (6.1) Niall Twomey Chapter 6: where, TP and FN are as described previously. The specificity, which was employed in the previous chapter to measure the true negative classification rate, is an inappropriate measurement for QRS detection, and therefore the precision or positive predictivity (+P) of the automatically extracted points QRS points is computed by +P = TP ⇥ 100%, TP + FP (6.2) where FP is also as was described previously. 6.2.4 Good detection window In some cases the location of the R-wave can be ambiguous, in particular if the patient suffers from cardiovascular disease or arrhythmia. In these cases, automatic QRS detection algorithms might identify the heart beat, but the point might not be localised on the apex of the R-wave. Friesen et al. (1990) and Ganong and Ganong (2005), state that the duration of the QRS complex is approximately 88 ms. Therefore, in this work, if a QRS complex was found to be within this range of the annotated point, it is flagged as a true positive, as the QRS complex has been identified. However, if the identified point is found to be outside of this range of annotated points, the candidate QRS point is flagged as a false positive as the QRS complex was not identified. 6.2.5 Feature accuracy It will be seen in later sections that under certain conditions the accuracy of QRS detection is pessimistically reported. The cause for this is that the QRS points reported from the 146 Niall Twomey Section 6.2: QRS detection detection algorithms report times which are outside of the 88 ms ‘good detection window’. These QRS points are therefore reported as false positives by the algorithm. However, while the QRS points which are reported are not within the allowed QRS window, beats are periodically detected, but were localised incorrectly on the S– and T-waves. In order to assess the effect of this, HRV-based feature metrics are computed and these are compared to the same HRV features which were obtained from the manual QRS annotations. It will be shown that when incorrect QRS localisation occurs, QRS beats are consistently localised at the same part in the wave, i.e. if the QRS is first located on the T-wave, it will generally be located on the T-wave for that recording. Insight into the effect of poor localisation can be ascertained by computing the normalised differences between the HRV features calculated from the manual and automatic sets of QRS points. The difference is computed with the PRD metric which was employed in Chapter 3, and the equation is reproduced below, and has been modified for HRV case. v u t PRD = 1 N 0N ! 1 BBX f m (n) − f a (n) 2 CC BB CC ⇥ 100%, B@ CA f a (n) (6.3) n=1 where f m is feature vector obtained from the manual annotations, f a is the vector obtained by automated data, and both of these features are of the same length, N . 6.2.6 Box-plots Box-plots are employed in order to graphically present a number of statistical parameters of a distribution. Figure 6.1 shows an example box plot (upper) and its relationship to a normal distribution (lower). Five points are presented with the box plot: the median, the lower and upper quartiles, and the largest and smallest values with 1.5 quartile ranges of the distribution. 147 Niall Twomey Chapter 6: IQR Ql − 1.5 ⇥ IQR Ql m Qu Qu + 1.5 ⇥ IQR Figure 6.1: Relationship between a box-plot, and quartile ranges with a normal distribution. The locations marked Ql and Qu are the lower and upper quartiles respectively, and the median is marked as m. The lower– and upper-quartile ranges are shown with the horizontal boundaries of the rectangle in Figure 6.1, and this is termed the inter-quartile range (IQR). The vertical line found within the rectangle is the median of the distribution (which can also be termed the 50th percentile) and is assigned the label m in this figure. In this plot the median is equal to the mean of the distribution shown, however, this will not be the case for general data. To the left and right of the IQR in the box plot, dashed lines radiate until the maximum and minimum values within 1.5 IQRs of the lower and upper quartiles of the distribution are obtained. This region is highlighted in the lighter shade in Figure 6.1. The box plot is a visual aid which allows the variance of the data to be easily visualised. In Figure 6.1 the box-plot is presented in a horizontal orientation in order to facilitate comparisons to a normal distribution, but box-plots are typically presented vertically. The box-plot is employed in this chapter because it facilitates immediate quantification of the IQR ranges, median, etc, while with a normal distribution a viewer would be required to estimate these values. This also helps in the evaluation of median-based statistics which are useful means of assessing effectiveness of algorithms such as QRS detection. 148 Niall Twomey Section 6.3: Choice of QRS detector 6.3 Choice of QRS detector Two QRS detectors were investigated for this work. The first QRS detection algorithm which was chosen employs the Hilbert transform (Hilbert, 1912). This algorithm was chosen because the Hilbert transform method of QRS detection has been used by many researchers for over 30 years with very good results (Bolton and Westphal, 1981b; Nygårds and Sörnmo, 1983; Bolton and Westphal, 1985). This algorithm has also recently been employed for a robust QRS detector which reduces the effect of baseline drift, muscular and motion artefacts (Benitez et al., 2001). These are necessary traits for use during OFC as the subjects are free to move around the bed they lie on. The second QRS detector which was selected was presented by Afonso et al. (1999). This method was selected because it incorporates filter banks (which can be thought of as being analogous to wavelet transforms) in order to decompose the signal into uniform bandwidth constituents. Wavelet and filter-bank based signal processing have been shown to be very useful for DSP applications and therefore this was investigated. Both of these algorithms have reported over 99% sensitivity and positive predictivity when employed on the MIT-BIH database, and the next two sections discuss the procedures and algorithmic properties of these. Where possible, the difference equations that were used by the QRS detection algorithms will be provided. However, it is not feasible to provide these in every case as the algorithms will often require filters that are approximately of the order of the sampling rate. Where this occurs, the digital design toolbox of Matlab®can be easily used to generate the difference equations required. 149 Niall Twomey Chapter 6: 6.4 Hilbert transform based QRS detection 6.4.1 Theory of Hilbert transform The first QRS detection algorithm investigated involves the use of the Hilbert transform (Hilbert, 1912). The Hilbert Transform is defined by b x(t) = H [x(t)] , Z 1 1 1 d⌧, = x(⌧) ⇡ −1 t −⌧ (6.4) (6.5) where, H is the Hilbert transform operator and b x is the Hilbert transform of signal, x(t). The Hilbert transform provides a time varying and a linear function of x(t) and Equation (6.4) can be rewritten as the convolution between the signal and b x = x(t) ⇤ 1 . ⇡t 1 ⇡t , i.e. (6.6) Convolution can be performed efficiently in the frequency domain as the convolutions theorem states that the Fourier transform (FT) of a convolutions is the pointwise product of the individual Fourier transforms (Katznelson, 2004). Using this, the Hilbert transform can be rewritten as F {b x(t)} = 1 1 F { }F {x(t)}. ⇡ t 150 (6.7) Section 6.4: Hilbert transform based QRS detection The Fourier transform of Niall Twomey 1 is simplified to t 1 F{ }= t Z 1 1 −2⇡f xdx e , −1 x = −j⇡ sign(f ), (6.8) (6.9) where the sign(f ) = 1 for f > 0, 0 when f = 0, and -1 when f < 0. With this result, Equation (6.7) is rewritten as F {b x(t)} = −j sign(f ) F {x(t)}. (6.10) Taking the inverse Fourier transform of Equation (6.10) yields real and complex timedependent variables, illustrated in Figure 6.2. The envelope of this function and the original signal is used in many applications and is calculated by q x2 (t), B(t) = x2 (t) + b (6.11) and the instantaneous phase angle can be computed by ! b x(t) . ✓(t) = arctan x(t) 151 (6.12) Niall Twomey Chapter 6: jy x b(t) .B(t) θ(t) x x(t) Figure 6.2: The real and imaginary components resulting from the Hilbert Transform of the ECG. 6.4.2 Method of QRS detection with Hilbert transform The Hilbert transform was previously formed in the continuous domain. However, discrete-domain equivalents can be obtained by performing the FFT, which was discussed in Chapter 4. The Hilbert transform is an odd function, which means that whenever there is a change in inflection of the signal (i.e. when the signal slope changes from positive to negative, or from negative to positive) the Hilbert transformed signal will cross the horizontal axis. This property is favourable for QRS detection as the R-wave of the complex is characterised by this. The Hilbert transform was first used for QRS detection by Bolton and Westphal (1981a; 1981b; 1984; 1985) and has since been employed by a variety of ECG researchers (Nygårds and Sörnmo, 1983; Mietus et al., 2000; Benitez et al., 2000, 2001; de Oliveira and Cortez, 2004; Arzeno et al., 2006). However, for computational reasons in some of the early publications, the Hilbert transform was approximated by a band-limited finite impulse response (FIR) filter and the envelope was also approximated by 152 Niall Twomey Section 6.4: Hilbert transform based QRS detection d dt Filtering ECG H Envelope Peak Detection Subset Windowing QRS Points Figure 6.3: The flowchart for QRS detection from the Hilbert Transform. d = |x(n)| + |b B(n) xapprox (n)|. (6.13) While the original work required optimisation for computational and power reasons, the Hilbert transform was not approximated in this work and was computed by the means outlined in Section 6.4.1. The pipeline for QRS identification is summarised in Figure 6.3. The raw ECG signal is first filtered by a high-order Kaiser-Bessel band-pass FIR filter with the pass-band specified by the bandwidth of typical QRS complexes, i.e. 8 — 20 Hz (Dinh et al., 2001; Kohler et al., 2002; Schlindwein et al., 2006). This frequency band reduces the effect of muscular artefacts while pre-emphasising the QRS complex. After filtering, the ECG signal is differentiated by 1 d x(n) = [x(n + 1) − x(n − 1)] , dn 2∆t 153 (6.14) Niall Twomey Chapter 6: 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 3.5 4 4.5 5 (a) Raw ECG trace (x(t)). 2 0 −2 0 0.5 1 1.5 2 2.5 3 d x(t)). (b) Derivative of ECG trace after filtering ( dt 4 2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 4.5 5 d x(t))). (c) Hilbert transform of derivative of the ECG (H( dt 4 3 2 1 0 0.5 1 1.5 2 2.5 3 3.5 4 d x(t)))). (d) Result of enveloping of the Hilbert transform (B(H( dt Figure 6.4: The stages employed by the Hilbert Transform QRS detection algorithm (the ECG data was obtained from Patient 113 in the MIT-BIH Database). where x(n) is the nth sample, and ∆t is period of the sampling frequency. The reason for choosing a non-causal filter was not discussed, but it is a popular methodology in the QRS detection algorithms (Kohler et al., 2002). Figure 6.4 shows the progression of the ECG trace from recording to enveloping. The original ECG (Figure 6.4a) is filtered and the derivative of the result is calculated (Figure 154 Section 6.4: Hilbert transform based QRS detection Niall Twomey 6.4b). The Hilbert transform is then performed on this data (Figure 6.4c) and enveloping is obtained by Equation (6.11) (Figure 6.4d). These figures show how the effect of high-amplitude T-waves are reduced by this algorithm, and how the QRS complex is accentuated whereas other aspects of the ECG are attenuated. 6.4.3 Beat identification Higher values of the envelope indicate a higher probability of a true QRS peaks. The P– and T-waves are typically characterised by a lower frequency bandwidth than the QRS complex, so even with T-waves of comparable amplitude to the QRS complex (Figure 6.4a) the resulting envelope of the QRS complexes is significantly more pronounced than resulting from the P– and T-waves (see Figure 6.4d). Adaptive thresholding is incorporated to identify QRS complexes as stated by Benitez et al. (2001). Thresholds are dynamically calculated with regard to estimates of signal noise. In this case, noise is any non-QRS ECG signal shapes and is approximated by computing the RMS value of the result of the envelope over a 1024-sample time window. If the RMS value at a particular instance is greater than 18% of the maximum value in the same time window, the level of noise is considered high, and a threshold of 39% of the maximum value over the window is selected and points which exceed this are selected as QRS points. If the noise estimate is less than 18% of the maximum value, the threshold is set low to 1.6 times the RMS noise estimate. If two peaks are detected within 200 ms, one of the peaks is eliminated upon review of both amplitudes and relative position of both peaks to the previous QRS peak. By this process, QRS detection is obtained through the use of the Hilbert transform. In subsequent sections this algorithm is referred to as the Hilbert transform algorithm. It should be noted, however, that the association refers to the underlying algorithm and not to the author of the publication. 155 Niall Twomey Chapter 6: H1 (z) # H2 (z) # .. . .. . HM (z) # ω1 (n) ω2 (n) " F1 (z) " F2 (z) .. . .. . " FM (z) y(n) x(n) ω4 (n) Figure 6.5: The generic filter banks flow chart incorporating both bandpass and synthesis filters. The # and " symbols represent down– and up-sampling respectively. 6.5 Filter-banks based QRS detection 6.5.1 Theory of filter banks A filter bank is an array of filters whose purpose is to decompose an input signal into M components (Soman et al., 1993; Saramäki and Bregovic, 2002). In the digital domain, this is achieved by an array of FIR or infinite impulse response (IIR) filters and Figure 6.5 presents the general architecture which can achieve this. In this Figure, the filters H1 — HM are applied to the input signal x(n). In some filter bank applications the signal is then down-sampled before analysis is performed. The signal can then be up-sampled, resynthesised by the synthesis filters F1 — FM and combined to generate the representation of the original signal, y(n). The idealised output of the filter banks method is shown in H1 F1 H2 F2 H3 F3 HM FM dB ...... 1π M 2π M 3π M Mπ M ω Figure 6.6: The idealised filter response of the filter banks, with M equally-wide subbands. 156 Section 6.5: Filter-banks based QRS detection Niall Twomey Figure 6.6. In this Figure, M frequency responses are shown, and each band is labelled with the decomposing filters (Hm ) which were employed to achieve the response. 6.5.2 QRS detection with filter banks Afonso et al. (1999), presented a QRS detection algorithm involving filter banks in which the ECG was decomposed into four banks of uniform bandwidths (BWs) of 5.6 Hz (each decomposition filter contains fs components). Once the signal has been decomposed, the algorithm down-samples the signal on each subband by a ratio R which is calculated by ! fs/2 . R = round BW (6.15) Here, fs is the signal sampling rate and BW is the bandwidth as before, and R = 23 for fs = 256 Hz. The purpose of down-sampling the subband signals is to reduce the noise and to increase the signal to noise ratio (SNR) of the derived signals (Afonso et al., 1999, 1995). With this algorithm the signals are not reconstructed, so the up-sampling blocks and synthesis shown in Figure 6.5 are not used. The remainder of this QRS detection involves multi-stage processing incorporating many threshold stages. 6.5.2.1 Pre-processing Four features are extracted from the sub-bands, and these features are employed to aid in QRS detection. Each feature combines certain sub-bands of interest and these are defined by Equations (6.16) — (6.19). 157 Niall Twomey Chapter 6: P1 = 3 X |wl (z)| (6.16) |wl (z)| (6.17) |wl (z)| (6.18) wl (z)2 (6.19) l=1 P2 = 4 X l=1 P3 = 4 X l=2 P4 = 3 X l=1 where wl represents the output of the l th bandpass filter. These features effectively measure the energy which is found within the bands of interest by linearly combining the absolute values of the sub-bands. For example, feature P1 combines the output of the first three bandpass filters, so this is representative of the energy up to 16.8 Hz. Two-sample moving window integration is performed on each feature. 6.5.2.2 Beat-classification logic Afonso’s algorithm involves a complicated series of filtering and thresholding routines. Figure 6.7 attempts to describe the algorithm simply. First, the ECG is filtered into four component bands and downsampled. Then multiple levels of QRS validation are performed, and the result of this is a series of validated QRS points. Subsequently, six 158 Niall Twomey Section 6.5: Filter-banks based QRS detection ECG H1 H2 H3 H4 # # # # Feature extraction and multi-level validation QRS Points Figure 6.7: Overall simplified flowchart of Afonso’s QRS detection method. levels of QRS validation are executed over all candidate QRS points. These are described here. 6.5.2.2.1 Level 1 The first validation Level involves identifying all the positive peaks of the moving window integrator of feature P1 , Equation (6.16). This stage is designed to identify as many QRS candidate events as possible. While this process will produce a large number of false-positive points, subsequent validation levels will eliminate inappropriate candidates. Figure 6.8a shows the raw ECG trace which was recorded, and Figure 6.8b shows the events which are selected by this feature. It can be seen that every positive peak which occurred in the time-series is marked as a QRS point. In four instances at t 159 Niall Twomey Chapter 6: = {141.75, 144.5, 146, 146.75} seconds it can be seen that T-waves contributed to positive peaks and were identified by this level of QRS detection. 6.5.2.2.2 Level 2 The second Level involves two single-channel threshold stages in which candidate QRS points from level 1 are employed as estimates of ‘signal’ and ‘noise’ of the ECG signal. Here, ‘signal’ refers to the times that Level 1 have identified as QRS candidate points, and ‘noise’ refers to points which were flagged as QRS candidates. The mean value of feature P2 at the candidate QRS times from Level 1 is computed. The initial ‘signal’ level is estimated as 10% above this mean value, while the noise level is initially set as 10% below this. The values of the candidate QRS points are compared against these levels giving two sets of QRS events. A parameter is introduced to identify a measure of confidence of whether these are truly signal or noise, and is termed decision strength. This is defined by DS(i) = feature(i) − N L , SL − N L (6.20) where DS(i) is the decision strength measure for the i th QRS candidate, NL is the Noise Level estimate, SL is the Signal Level estimate, and feature(i) is the i th candidate of the feature being considered. This equation is parametrised because the decision strength is used in later levels for validation. The lower the decision strength parameter, the more likely that the candidate point was noise, but the higher the value, the more likely it is a QRS point. The value of the parameter is force-bounded between 0 and 1. The thresholding which is introduced at this Level is performed on the decision strength parameter. A low and high threshold are utilised, and the low threshold is set at 0.08, while the high threshold is set at 0.7. The low threshold is set so as to identify a large number of candidate points, while the high threshold is set so as to produce a high degree of certainty with a set of candidate points. The signal and noise levels are updated throughout the QRS 160 Niall Twomey Section 6.5: Filter-banks based QRS detection 142 143 144 145 146 147 146 147 (a) Raw ECG. 142 143 144 145 (b) Moving window integration of feature P1 (Level 1). 142 143 144 145 146 147 (c) Moving window integration of feature P2 (Level 2). 142 143 144 145 146 147 (d) Moving window integration of feature P3 (Level 4). 142 143 144 145 146 147 Time (s) (e) ECG with automatically extracted QRS points (Level 6). Figure 6.8: The effect of the various QRS validation levels from Afonso’s QRS detection algorithm. In these charts, the ◦ symbols represent the candidate QRS points. Charts b — d have been down-sampled. 161 Niall Twomey Chapter 6: detection stage based on whether the classification of the current event was deemed to be ‘signal’ or ‘noise’ with regard to these thresholds and how they are employed later. The events classified in this stage with sample ECG is shown in Figure 6.8c. It can be seen here that all of the false positive QRS points have been removed, but the detection algorithm has also removed a true QRS beat which can be found at 145.75 seconds. 6.5.2.2.3 Level 3 Level three of the beat-classification logic fuses the two candidate QRS points from Level 2 together, i.e. the times at which the decision strengths surpassed the upper and lower thresholds. The results of stage two are termed ’channel 1’ (which gives the QRS candidate times with reference to the low threshold) and ’channel 2’ (QRS events with the high threshold). Three possible outcomes can occur: 1. If channel 2 detected an event, a QRS event is always deemed to have occurred as the threshold was set very high. In this situation channel 1 will also have been triggered. 2. If neither channels detect a QRS event, the QRS event from the previous levels is eliminated from consideration. 3. If channel 1 (low threshold) detects an event, but channel 2 does not, the decision strengths of each channel are then computed. Two further parameters are computed from these decision strengths and are defined by ∆1 = DS1 − th1 , 1 − th1 (6.21) ∆2 = th2 − DS2 , th2 (6.22) 162 Section 6.5: Filter-banks based QRS detection Niall Twomey where th1 and th2 are the thresholds which were used when computing the candidate events and ∆1 and ∆2 relate to channels 1 and 2 respectively. If ∆1 is greater than ∆2 , the event is deemed to have been a QRS event. Otherwise is deemed to have been noise. 6.5.2.2.4 Level 4 The fourth Level uses another one-channel detection block, this time using feature P3. A new threshold of 0.3 (of decision strength) is used here. If Level 3 removed a candidate QRS event, and the decision strength from P3 exceeds the threshold of 0.3, the beat is re-introduced as a candidate. This stage reduces the number of false negative QRS events in this algorithm. This Level only operates on events which were removed Level 3. The events classified in this stage are shown in Figure 6.8d. It can be seen that the point which was removed in Figure 6.8c at 145.75 seconds is re-introduced by Level 4. 6.5.2.2.5 Levels 5 and 6 The fifth stage reviews the points which are still under consideration QRS and performs decision logic based on timing between the QRS events. If the time between two consecutive QRS events is greater than 1.5 times the mean of the previous 100 QRS events, a lower decision strength threshold of 0.2 is employed to accept events which were removed by the validation stages. Furthermore, if the difference between two consecutive QRS events is less than 0.24 seconds, the point which resulted in the smaller peak in the original ECG signal is removed. 6.5.2.3 Overall The set of candidate points which are obtained after Level 6 are considered as true QRS points by the algorithm and no further post-processing is performed. The means by which the points relate to the original ECG trace is shown in Figure 6.8e where the QRS points are shown with the symbol ◦. In subsequent sections, this algorithm is referred to as Afonso’s algorithm. 163 Niall Twomey Chapter 6: Table 6.1: Differences in reported and calculated sensitivity and positive predictivity. Afonso Reported This work Sensitivity Positive Predictivity 6.6 99.59% 99.56% 99.15% 98.16% Hilbert Transform Reported This work 99.87% 99.94% 99.03% 97.43% Results obtained on MIT-BIH database Here, automatic QRS detection is assessed on the MIT-BIH database. The Hilbert transform and Afonso’s QRS detection algorithms are employed for this detection process. 6.6.1 Sensitivity and positive predictivity Table 6.1 tabulates the sensitivity and positive predictivity of QRS detection results which were obtained with the MIT-BIH database. Sensitivity and positive predictivity of 99.15% and 99.16% were obtained with Afonso’s algorithm, and the Hilbert transform algorithm yielded sensitivity of 99.03% and positive predictivity of 97.43%. The distribution of these results are presented in Figure 6.10, which shows that the majority of the QRS points which were detected were within the window of acceptance. However, these results are slightly different from the results which were obtained in the literature, and the cause for this was due to incorrect localisation of the QRS points on the ECG signal trace. Patient 8, whose ECG is shown in Figure 6.9, has a non-standard QRS complex in comparison to the example shown in Chapter 1, where it can be seen that the S-wave drops significantly below the baseline level of the ECG trace. Figure 6.9 illustrates how the bad QRS localisation can occur. In this figure, the solid trace is the raw ECG, the ⇥ markers represent the expert QRS annotations, the ◦ markers identify the QRS points from Afonso’s algorithm, and the ⇤ markers represent the times which the Hilbert transform identified as the QRS complexes. It can be seen that Afonso’s algorithm misclassified the T-wave as 164 Section 6.6: Results obtained on MIT-BIH database Niall Twomey Amplitude (µV) 200 0 −200 −400 71.6 71.8 72 72.2 72.4 72.6 72.8 73 73.2 73.4 73.6 73.8 74 74.2 74.4 Time (s) Figure 6.9: Incorrect QRS complex localisation (Patient 8 of MIT-BIH arrhythmia database). Manual QRS annotations are marked with ⇥ and automatic detections are marked with ⇤ (Hilbert transform algorithm) and ◦ (Afonso’s algorithm). a R-wave while the Hilbert transform algorithm misinterprets the S-wave as the R-wave. The positions of these mis-localisations are logical given the nature of the two algorithms. All of these points are flagged as false negative and false positive points as they were outside of the window of acceptance, and therefore the sensitivity and positive predictivity results that were obtained are reduced. For some patients, the incorrect localisation occurred intermittently during their recording. As the majority of the QRS complexes extracted on these patients was consistent with the annotations, it was not deemed appropriate to eliminate these for performance evaluation. Indeed, using median-based statistics, the sensitivity and positive predictivity of detection can all be seen to be above 99.5% from Figure 6.10. It is believed that the reason for the superior results obtained in the literature is due to a wider window of acceptance. However, it will be shown later that widening the window is not appropriate for this this work. 165 Niall Twomey Chapter 6: 100 Positive Predictivity Sensitivty 100 99.5 99 Afonso Hilbert 98 96 Afonso (a) Sensitivity. Hilbert (b) Positive Predictivity. Figure 6.10: Sensitivity and positive predictivity box-plots of QRS detection on the MIT-BIH arrhythmia database. 6.6.2 Percentage RMS difference The mean and standard deviation of the heart rate for the patients in the MIT-BIH database were computed from the expert QRS annotations and those extracted automatically by the QRS detectors based on the Hilbert transform and based on Afonso’s algorithm. The PRD was computed according to Equation (6.3), and Figure 6.11 shows the box-plot of the PRD of these features over the MIT-BIH database. The closer the PRD is to 0%, the more similar the automatically generated features are to the manual features. Indeed, it can be seen that the median of the PRD in all cases is very low, indicating a high degree of agreement between the manual and automatic QRS points in the majority of cases. Figure 6.11a shows the PRD values which were calculated from the mean heart rate. This figure shows that even though incorrect localisation of the QRS complex occurred, the features which were obtained from these points do not distort the features, and that the values which were extracted are representative of the heart rates obtained with the manual annotations. This is further supported by the low variance which was computed on the same subjects. 166 Niall Twomey Section 6.7: Requirement for artefact detection 0.6 PRD % PRD % 3 2 1 0 0.4 0.2 0 Afonso Hilbert Afonso (a) Mean HR PRD. Hilbert (b) Standard Deviation PRD. Figure 6.11: PRD box-plots of the mean (µ) and standard deviation (σ) of the heart rate over all subjects in the MIT-BIH arrhythmia database. 6.6.3 Conclusions on QRS detection on MIT-BIH database This section has shown that the two QRS detection algorithms which were developed achieved competitive results when considered against the results which were published. While these results have already been obtained in the literature, it is important to verify that the results were reproducible, in particular when the algorithms which were assessed are intended to be employed for diagnostic applications. The two QRS detection algorithms which were employed were found to perform slightly better in the literature. 6.7 Requirement for artefact detection 6.7.1 Introduction In the MIT-BIH database, the patients whose data were recorded were adults that were undergoing important medical examinations. Therefore, these patients were not inclined to move, and while there were a tendencies for some artefacts, their contribution were low in general. However, the subjects who participated in the OFCs were children with a mean age of approximately 5 years, who wanted to play games and move during their tests. As 167 Niall Twomey Chapter 6: a result, artefacts were introduced in the ECG that was recorded, and artefact detection is required in order to provide confidence that the QRS points which were extracted were accurate. 6.7.2 Artefact detection algorithm The bandwidth of the QRS complex ranges from between approximately 10 and 25 Hz (Dinh et al., 2001; Kohler et al., 2002; Schlindwein et al., 2006). For this reason, almost all QRS detectors will filter the ECG with a band-pass (BP) filter so as to attenuate nonQRS complex shapes, such as artefact, P– and T-waves. ECG artefacts are signals which interfere with the signal trace and they arise from physiological (motion, muscular spasm, and touching electrodes), electrical (mains interference) and resistive (poor connection with the skin) conditions. Artefacts can lead to false-detections and non-detections of the QRS complex, in both cases the accuracy of the features which are extracted from the ECG signal are affected. Even with de-noising filters which reduce the extent of artefacts on the ECG, noise can still affect the fidelity of the QRS complex. Therefore, when considering unconstrained, fully automated biomedical classification platforms, artefact-awareness is an important requirement. An algorithm incorporating energy detection was adopted to detect the signal strength in the ‘noise bands’ of the ECG. ‘Noise bands’ are the frequency bands outside of the bandwidth of interest of the QRS complex. The lower noise band was set to 0 — 10 Hz and the high noise band includes all frequencies greater than 25 Hz. Energy detection estimation is performed on these frequency bands in order to assess the extent of noise in the ECG. Energy detection is a popular means of signal detection and is used by communications researchers (Horgan and Murphy, 2010). Here, two 50th order FIR filters were designed and are denoted as Hh and Hl . Hh is a high-pass filter with a corner frequency of 25 Hz which was employed in order to detect high-frequency artefacts 168 Niall Twomey Section 6.7: Requirement for artefact detection resulting from muscular, motion noise, etc. A low-pass filter, Hl , was also designed with a cutoff frequency of 10 Hz. The purpose of this filter was to detect low-frequency artefacts. Energy estimates at a time index k are computed from the output of Hh and Hl in a time window of length N , i.e. between k − N and k. In this work, the window length of 0.25 seconds (i.e. 64 points when sampled at 256 Hz) was chosen in order to detect short-duration artefacts. The energy E(n) of a signal x can be computed by a ‘square and accumulate’ algorithm in Equation (6.23). E(k) = k X x2 (i) (6.23) i=k−N 6.7.3 Demonstration of artefact detection Figures 6.12a and 6.12b show the scaled output of the energy detection algorithm for high frequency artefact detection. At t = 45.6 minutes in Figure 6.12a high-frequency muscular artefacts can be observed. The QRS complexes for 6 seconds after this time are obscured by high-frequency artefacts. The output of the high-frequency artefact detection filter is normalised by the standard deviation measurement from all other Subjects in the databases where this algorithm was used by a LOO procedure similar to the one employed for classification in the previous chapter. This output, shown in Figure 6.12b can be seen to rise at the same time as the high-frequency noise presents in the ECG. The artefact output was smoothed with a 2 second moving average filter. The filter was centred so as to remove any phase delay which might occur due to the moving average process. Dynamic thresholding is applied to the output of the energy detection filters and when the result surpasses the threshold, that epoch is considered ‘noisy’ and will not be considered for feature extraction. The threshold for artefact detection is set dynamically for each subject. A moving window of 1 minute of non-artefact energy estimates is selected and the 169 Niall Twomey ECG (µHz) Chapter 6: 45.45 45.5 45.55 45.6 45.65 Time 45.7 45.75 45.8 45.85 Normalised energy (a) ECG with high-frequency artefacts present. 45.5 45.55 45.6 45.65 Time 45.7 45.75 45.8 ECG (µHz) (b) Output of high-frequency artefact detection filter. 8 8.05 8.1 8.15 8.2 8.25 8.3 8.35 Time 8.45 8.5 8.55 8.6 8.45 8.5 8.55 8.6 8.4 Normalised energy (c) ECG with low-frequency artefacts present. 8 8.05 8.1 8.15 8.2 8.25 8.3 8.35 Time 8.4 (d) Output of low-frequency artefact detection filter. Figure 6.12: Normalised output of high-frequency (a,b) and low-frequency (c,d) energy estimators for artefact detection (data from Subject 8 of allergy database). 170 Section 6.8: Results obtained on allergy database Niall Twomey mean, µ, and standard deviation, σ, of this window is calculated. If the signal surpassed µ + 3σ, the ECG at this segment was labelled ‘noisy.’ Figures 6.12c and 6.12d are similar to Figures 6.12a and 6.12b, but they presents the output of the low-pass artefact detection filter. At t = 8.28 minutes a baseline deviation can be observed in the ECG trace. This artefact is likely to be as a result of the subject in question touching the ECG electrode or electrode pop of the sensor. The internal filtering in the hardware ECG daughterboard of the SHIMMER device will compensate for the electrode pop artefact gradually which results in the slower wave which can be seen. Once again the artefact affects the amplitude of the QRS complex to the point where the accuracy of reliable QRS detection is compromised. The normalised output of the energy detection is shown in Figure 6.12b and a change in the artefact signal can be observed at the same time as a change in the ECG trace. The output of the low and high frequency artefact detection filters are fused together by the logical OR-ing operation. 6.8 Results obtained on allergy database Here, the QRS detection results which were obtained with the automatic QRS detectors on the allergy database are presented. 6.8.1 Artefact detection The number of artefact events which were was flagged during the OFC recordings of the subjects in the allergy database are presented in the bar chart in Figure 6.13. The subjects indices of the allergy database are presented on the horizontal axis while the number of instances of artefact is presented on the vertical axis. 171 Niall Twomey Chapter 6: # artefact events detected 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Subject ID Figure 6.13: The breakdown of the number of artefact events which were detected by the artefact detection algorithm for each subject of the allergy database. The subject which presented with the greatest number of artefacts was Subject 3 where 28 individual events were identified. The events which were detected were similar to the high and low frequency artefacts which were shown in Figures 6.12a — 6.12d. In earlier chapters, the most stringent requirement for allergy detection was stated to be obtaining 100% specificity (i.e. no false positive allergy classifications) as only with these conditions will the OFC be improved. It will be shown in the next chapter that the presence of artefacts in the ECG will have the capacity to mislabel segments of data as allergy. Approximately 50% of the artefacts which were detected were recorded during the checkup periods the oral food challenge. When the allergist checks the heart rate, blood pressure, blood oxygen saturation levels and temperature of the subjects during the checkups, the subject is required to move so that the equipment is correctly applied. As a result, high and low frequency artefacts were prevalent during checkups. In allergy classification, the time periods which satisfy the classification criteria and which occur during checkup periods are not classified as allergy. This is because checkups interfere with the heart rate of the subjects, and causes them to move which introduces artefacts to the ECG. It is also the case that artefact detection during the checkup periods is not strictly necessary because if this were used in a clinical setting, the allergists would be aware of the introduction of artefacts, and would likely disregard this data. 172 Niall Twomey Section 6.8: Results obtained on allergy database Table 6.2: Sensitivity and positive predictivity of QRS detectors on allergy database. Sensitivity Positive Predictivity Afonso Hilbert Transform 92.89% 93.15% 92.17% 93.00% However, in over 50% of cases, artefacts were not due to the influence of the allergist on the subject. Every event which was flagged as an artefact was visually inspected, and it was found that the majority of the artefacts which remained contaminated the ECG points to the degree where QRS detection was unreliable. When QRS annotations were labelled manually for previous chapters, it was possible to identify the QRS points during time periods such as this with careful visual inspection, and it was at times like this when QRS extraction failed. With approximately 5% of the artefact events, the accuracy of automatic QRS extraction was not affected. While it is possible that short-duration signatures of allergy might not be considered for allergy classification, it was deemed that less than 10% false positive artefact events was an acceptable trade in the attempt to preserve 100% specificity for fully automated allergy classification. 6.8.2 Sensitivity and positive predictivity Table 6.2 tabulates the mean of the sensitivity and positive predictivity values obtained with QRS detection on the allergy database by the QRS detectors against the manual annotations. The sensitivity and positive predictivity values calculated are of the same overall percentage (⇡ 92 — 93%). The majority of these are above 99% which can be seen from the median lines in the box plots of Figure 6.14. In some cases sensitivity and positive predictivity values of approximately 70% were obtained. Upon investigation of the subjects from whom poor QRS detection was obtained, it was discovered that the quality of the recorded ECG was the primary source of the reduction in 173 Niall Twomey Positive predictivity (%) Chapter 6: Sensitivity (%) 100 98 96 94 Afonso Hilbert 100 (a) Sensitivity. 99 98 97 Afonso Hilbert (b) Positive Predictivity. Figure 6.14: Sensitivity and positive predictivity box-plots of QRS detection on the allergy database. accuracy metrics. It was difficult to determine the exact cause for the poor quality of ECG recorded, but it is believed that it is predominantly due to high-resistance connections between the ECG electrodes and the subject’s skin. Figure 6.15 shows an example of the poor quality of ECG which was obtained over a 6 second period after applying filtering (Clifford et al., 2006). Contrasting the ECG in this recording to the ECG found in Figure 6.9, for example, it can be seen that even after filtering, the baseline of the recording in Figure 6.15 contains a significant amount of noise which is of the same bandwidth as the QRS complexes. The amplitude of the QRS complex is also much diminished in comparison to those shown previously in Figure 6.9. A second cause for the poor performance of QRS detection was the phenomenon of incorrect QRS which was previously discussed for the MIT-BIH database in Section 6.6 and is illustrated in Figure 6.9. This contributed to the lower sensitivity and positive predictivity values which were obtained on the allergy database. Interestingly, while the cause of this in the MIT-BIH database was due to the rich set of arrhythmias which were recorded, the cause for the incorrect localisation on the allergy database was due to poor integrity of ECG signal that was recorded. The overall sensitivity and positive predictivity could be improved by widening the window of acceptance on the allergy database. As a consequence of doing this, however, many of the QRS complexes would be identified as true positives, even though the true 174 Section 6.8: Results obtained on allergy database Niall Twomey Voltage (µV ) 150 100 50 0 −50 83 83.01 83.02 83.03 83.04 83.05 83.06 83.07 83.08 83.09 83.1 Time (mins) Figure 6.15: Example of poor quality of the ECG signal after the application of denoising filters which contributed to poor sensitivity and positive predictivity values for Subject 3. source of the point was from artefacts. This would optimistically report the sensitivity and positive predictivity. Therefore, the criteria for true positive and false positive labelling was not relaxed in this work. 6.8.3 Percentage RMS difference The PRD of the mean and standard deviation features were extracted to assess the effect of obtaining approximately 93% sensitivity and positive predictivity on the HRV metrics. Table 6.3 tabulates the results which were obtained. Excellent mean PRD values were obtained with both Afonso and the Hilbert transform algorithms (< 1.3% when averaged over all detection algorithms and features). The distribution of the PRD values obtained is outlined in Figures 6.16a and 6.16b. Interestingly, the PRD results obtained here are slightly better than those which were obtained on the MIT-BIH arrhythmia database even though poorer overall sensitivity and positive predictivity results were obtained with the allergy database. The PRD values obtained by the mean heart rate of the Hilbert transform present a significantly lower variance between subjects. Once more, medianbased statistics of these indicate that for the majority of the subjects, excellent data are recorded, and as before, it is with some subjects that QRS detection is not as accurate. 175 Niall Twomey Chapter 6: Table 6.3: Distribution of mean and standard deviation of the PRD values calculated from automatically extracted QRS points. Mean Heart Rate Standard Deviation 6.9 Afonso Hilbert Transform 0.68% 1.26% 0.56% 1.1% Overall discussion The accuracy of the two QRS detection algorithms was assessed on two databases. Initially, QRS extraction performance assessment was performed on the MIT-BIH arrhythmia database and sensitivity and specificity measurements were computed. It was observed that correct localisation of the QRS complex on the R-wave was not always obtained, and that in certain situations QRS detection algorithms can localise the QRS complex on the S– and T-waves. This resulted in the discrepancies between the results which were reported and those which were obtained here. The slight discrepancies were due to a number of patients’ ECG traces toggling between regular ECG and arrhythmia. Further accuracy assessment was performed by extracting HRV features from the QRS points reported by the QRS detection algorithms. The differences between the features were calculated via the PRD function, and average discrepancies of ⇡ 1% were obtained, but median-based analysis yielded excellent results. 0.6 PRD (%) PRD (%) 3 2 1 0 0.4 0.2 0 Afonso Hilbert Afonso (a) PRD of mean heart rate. Hilbert (b) PRD of standard deviation. Figure 6.16: Boxplots of the PRD of the mean and standard deviation of the heart rate between the manual and automatic QRS points extracted. 176 Niall Twomey Section 6.9: Overall discussion In contrast to the patients of the MIT-BIH arrhythmia database, the subjects of the allergy database were generally of an age where they were unwilling to remain still during their tests. For this reason, artefacts presented frequently in the ECG recordings in allergy database. Artefact detection was performed on the ECGs recorded to instill confidence that the features extracted from the ECG were contributions of the heart beats rather than from artefact noise. The sensitivity, positive predictivity, and the mean and standard deviation of the heart rates resulting from the automatic QRS detection were then extracted from the regions which were not flagged as artefact. The sensitivity and positive predictivity results were ⇡ 93%, but once more, median-based results of over 99% were obtained for both QRS detectors. This indicates that in the majority of cases, excellent QRS detection is obtained. The extracted features were also very close to the true feature values. The algorithms that were employed here were adapted directly from the literature which investigated the accuracy of QRS detection for adults. As the subjects under investigation for allergy classification are children under the age of 10 years, it could be argued that in varying certain parameters (e.g. threshold values) that better accuracy would be obtained. However, it was not possible to perform this tuning as no independent database consisting of children’s ECG was available to allow objective parameter tuning, and therefore the ‘stock’ values that were provided in the original publications were employed. 177 CHAPTER 7 Fully automated allergy detection 7.1 I Introduction N Chapter 5, classification of allergy was performed through statistical modelling of background, non-allergic heart rate variability features. Through these means, 93% sensitivity and 100% specificity of classification were achieved. However, these results were obtained on features which were extracted from QRS points that were manually annotated. This process provided an indication of the validity of heart rate variability based allergy classification, but due to the manual intervention required to acquire these annotations, the process would not be suitable for a real-time classification environment. Therefore, Chapter 6 discussed the accuracy of QRS detection achieved by fully automated QRS detection algorithms. Two automatic QRS detection algorithms were investigated and the accuracy of QRS detection was benchmarked against a number of ECG databases. Based on these results, both of the QRS detection algorithms were deemed suitable for investigation here. 178 Niall Twomey Section 7.2: Methods ECG Models from Chapter 5 Unmatched decision making QRS Testing data Unmatched classification result Training Data Automatic model generation and selection Matched decision making Matched classification result Figure 7.1: The flow of how the matched (right) and unmatched (left) classification results are obtained. In this chapter, the heart rate variability features extracted with the automatic QRS points are employed in conjunction with the classification framework in order to assess fully automated allergy classification. 7.2 Methods Two means of automatic classification of allergy are assessed in this chapter. These are assessed because for applications of this nature where two representations of the same data are available (i.e. manual and automatic HRV features), the accuracy of machine learning algorithms can be assessed in a number of directions. These avenues are termed matched and unmatched classification and the processes of these are shown in Figure 7.1. In this figure, the right hand side shows the process by which the matched classification results are obtained, while the left hand side presents the unmatched process. 179 Niall Twomey Chapter 7: The first method assesses the results obtained by employing the classification models which were obtained in Chapter 5 (‘manual models’) with the normalised HRV features which were extracted from the QRS points which were obtained from the automatic QRS detectors discussed in Chapter 6. These sets of features are termed ‘automatic data’ to simplify discussions of the results in later sections, and automatic data is also subdivided into ‘Afonso’ and ‘Hilbert’ data, and relate to the feature data of each QRS detection algorithm. As these results employ manual models and automatic data, these results are termed ‘crossover’ and ‘unmatched’ results henceforth. Because the features which were obtained in Chapter 6 presented low PRD against the features which were extracted from manual QRS annotations, crossover classification sheds light on the sensitivity of the classification framework to input data which is of the same nature, but which is different in origin. The second method investigated employed the model generation framework from Chapter 5, but trained new classification models and newly selected decision making parameters based only on the automatic data. This process was investigated because the classification framework has previously been stated as being data-driven, and unmatched classification might not be appropriate for fully automated allergy classification. Models generated in this process are termed ‘automatic’ models. Automatic models are further subdivided into ‘Afonso’ and ‘Hilbert’ models in reference to the QRS detection algorithms which were employed in their generation (see Chapter 6). The procedures employed in this chapter utilised the epoch-length fusion which was introduced in Chapter 5. Matched and unmatched methods are investigated because classification of this variety has not been performed on allergy data previously. Therefore, it is uncertain which procedure will perform best for automatic classification, and this chapter investigates this. Typically, it would be expected to obtain better results with unmatched results as the models which would be employed in this case relate to manual models which were generated without error. 180 Niall Twomey Section 7.3: Unmatched classification results Table 7.1: Sensitivity and specificity of classification results obtained with the manual classification models on the automatically extracted HRV features (i.e. crossover classification results). 7.3 QRS detector Sensitivity Specificity Afonso Hilbert 60% 66.66% 88.88% 88.88% Manual 93.33% 100% Unmatched classification results This section presents the crossover classification results, and then discusses the applicability of this routine for autonomous classification of allergy. 7.3.1 Results of unmatched classification Table 7.1 presents the sensitivity and specificity results which were computed by the crossover classification process. It can be seen from this table that in both cases specificity of 88.88% was calculated, indicating that false positive classifications were made. The sensitivity which was obtained is also quite poor in comparison to Chapter 5, with Afonso data achieving six false negatives (60% sensitivity) and Hilbert transform obtaining five false negatives (66.66% sensitivity). The time gain metrics which were obtained with crossover classification will not be discussed here. This is because in previous chapters it was repeatedly stated that the most important aspect of allergy classification is obtaining 100% specificity, and this metric was not obtained with the crossover classification results. To employ an analogy, obtaining suboptimal specificity and discussing time gain metrics is equivalent to discussing a novel feature of a vehicle, when its braking mechanism does not function correctly. The priority in these cases should be safety, and this must be addressed before additional features warrant analysis. Without obtaining perfect specificity, machine-based classification of 181 Niall Twomey Chapter 7: allergy should not be trusted because the impact on the quality of life of a subject would elicit in unacceptable consequences. 7.3.2 Discussion on unmatched classification results Subtle dynamics control the applicability of models for classification of allergy. This is evidenced by the fact that both poor sensitivity and poor specificity were obtained with crossover classification. From a decision making point of view, poor sensitivity implies that the duration and threshold parameters that were employed were too large as the allergy classification criteria were not satisfied in a number of cases. However, as this process also obtained suboptimal specificity, these decision making parameters were also, in fact, too low in other cases. Therefore, it can be stated that classification of allergy is a very strong function of the modelling parameter selection routine, and the novelty of certain segments can be under-represented with poor choices in these parameters. Indeed, this is perhaps not surprising. In Chapter 5, for example, it was shown that even with matched classification that the extent of the ‘novelty’ of certain regions of the HRV can be under-represented with some choices of parameters. This suggests that the model and parameter selection is data driven, and that optimal automated allergy classification is not obtained by crossover classification. Every aspect of the classification framework is subject to the data which was employed to generate the models, and even with slight variations with these data, different results, model and decision making parameters are obtained. This, then, explains that while 100% specificity and good sensitivity was preserved between training and testing in Chapter 5, it was not preserved between unmatched results. 182 Niall Twomey Section 7.4: Matched classification results 7.4 Matched classification results The results achieved in this section employed the boosting ensemble-based classification framework and learnt new classification models from the HRV data obtained from the QRS detection algorithms. The sets of PCA, GMM, duration and multiplicative parameters utilised for the generation of these results are all new to these methods and are entirely independent of the manual models, see Figure 7.1. 7.4.1 Sensitivity and specificity Table 7.2 presents the overall results which were obtained by the two QRS algorithms which were investigated. In both cases, specificities of 100% were obtained by the automatic classification algorithms. This means that no false positive classifications were encountered. While 100% specificity has already been obtained in Chapter 5, obtaining 100% specificity with fully automated models is a very significant result as it means that a subject, having been classified allergic, can immediately be diagnosed as allergic at this time without the risk of false positive classifications. The same confidence which is attributed to the diagnosis of allergy can, therefore, be attributed to classification of allergy as classification of allergy is equivalent to a diagnosis of allergy. Automatic classification obtained 80% sensitivity with both QRS detection algorithms. The subjects who were misclassified are Subjects 1, 2 and 3. Subject 1 was not classified as allergic by either the manual or automatic classification routines. The reason for this Table 7.2: Sensitivity and specificity of classification results obtained by Afonso’s and the Hilbert transform QRS detectors. QRS detector Sensitivity Specificity Afonso Hilbert 80% 80% 100% 100% Manual 93.33% 100% 183 Niall Twomey Chapter 7: Likelihood 10−1 10−2 0 2 4 6 8 10 12 14 10 12 14 10 12 14 (a) Manual model. Likelihood 10−1 10−2 0 2 4 6 8 (b) Afonso model. Likelihood 10−1 10−2 0 2 4 6 8 (c) Hilbert model. Figure 7.2: Likelihood plots of Subject 1 for manual and automatic models at epoch lengths of 60 seconds. Subplots (a) — (c) show the likelihoods which were obtained with manual, Afonso and Hilbert models respectively. In all cases the threshold for allergy classification is off the scope of the figures. is because the Subject reacted to the food three minutes after consuming it. Within this short time frame, the features which were extracted did not change sufficiently for the allergy classification criteria to be satisfied. Figure 7.2 shows the likelihood traces which were obtained for Subject 1 with manual (Figure 7.2a), Afonso (Figure 7.2b) and Hilbert transform (Figure 7.2c) models at epoch lengths of 60 seconds. It should be noted that while the length of the recording for Subject 1 is approximately 14 minutes, the first dose of the allergen was administered 11 minutes after the challenge began. It can be seen in this Figure that the set of three likelihood traces tend to deviate away from the background likelihood levels before the challenge was terminated. This is due to the 184 Niall Twomey Section 7.4: Matched classification results allergic reaction which the subject would experience that induced the subject to vomiting at approximately 14 minutes. While there is a visible departure from the background likelihood range with manual, Afonso and Hilbert likelihoods, the extent of this departure was insufficient for correct classification of this subject and in all cases the threshold is outside the boundary of the vertical axes. This is because some subjects (Subject 16, for example) presented with non-background-like HRV features throughout the OFC and in order to obtain 100% specificity in training, larger thresholds are selected. The slight differences in the likelihoods also support the data-driven nature of the classification process, and models selected for each process will display the same trends (i.e. departure from background as allergic reaction manifests) but the extent of the trends will vary depending on the models selected. Subjects 2 and 3 were correctly classified by the manual models, but were not classified correctly with automatic models. In Chapter 5 the likelihoods which were computed for Subject 2 were shown (Figure 5.9, page 126) and it was seen that at approximately 65 minutes a period of the likelihood was classified as allergic. It was also shown that these departures can disappear with poor choices of classifier model with matched classification (see Chapter 5). The extent of the departures obtained with fully automated methods was not sufficient for machine-based classification of allergy for these subjects. 7.4.2 Artefact detection The requirement for artefact detection in the ECG was discussed in Chapter 6. It will be shown in this section that artefact detection did not reduce the sensitivity, but rather allowed Subjects 7 and 10 to be correctly classified as allergic when, without artefact detection, they would have been incorrectly classified as non-allergic. Figure 7.3a shows an example of of Subject 7’s likelihood trace with epoch lengths of 60 seconds. The threshold is also presented in this figure when artefact detection was not employed. The shaded regions indicate the background and checkup times as 185 Likelihood Niall Twomey Chapter 7: 10−1 10−3 0 10 20 30 40 50 60 (a) Allergy detection of Subject 7 without the application of artefact detection. Likelihood 100 10−1 10−2 10−3 0 10 20 30 40 50 60 (b) Allergy detection of Subject 7 with the application of artefact detection. Figure 7.3: Likelihood plots of Subject 7. Subplot (a) shows the threshold which was computed without the aid of artefact detection and how allergy classification does not classify allergy. Subplot (b) shows the threshold which was computed when artefact detection was incorporated and how allergy classification is successful with artefact-aware classification. before. In this Figure, the threshold can be seen to have been selected at approximately 7 ⇥ 10−4 , and this threshold was not surpassed once during the OFC even though two substantiated deviations from the background levels can be seen in the likelihood traces at approximately 30 and 50 minutes. The reason that a large threshold was selected is because the mean and standard deviation of the background likelihood was contaminated with artefact, which can be seen near t = 0. This increased the mean and standard deviation of the background data, and in the calculation of the threshold, this increase was scaled by the multiplicative parameter to the point where the allergic criteria were unable to be satisfied. Figure 7.3b shows the same likelihood trace, but when the region at the beginning of the challenge was dismissed by the artefact detection algorithm. As a result, the mean 186 Niall Twomey Section 7.4: Matched classification results and standard deviation of the background was calculated with only ‘normal’ data and the threshold which was calculated allowed allergy to be classified for this subject with a time gain of approximately 30 minutes. It should be stated that it is uncertain what occurred at 31 minutes in Figure 7.3. It is possible — owing to the close proximity of this to the checkup at the 34th minute — that the subject became agitated and that the reduced likelihood is as a result of the impending checkup, but it is also possible that the deviation is due to the subject fighting the allergic reaction. It is believed, however, that the latter option occurred in this case due to the fact that no false positives were obtained in any of the experiments that were performed on the allergy database (with the manual and automatic models). 7.4.3 Time gain Table 7.3 tabulates the time gain metrics which were obtained with fully automated allergy classification. The metrics which are presented are the specific time gain metrics, see Chapter 5. The reason for choosing the specific time gain is because when comparing the total time gain metrics against those obtained by the manual models the comparison will favour the manual models. This is because the total time gain obtained by manual models would be diluted with only one single 0 minute time gain (as 93% sensitivity was obtained), whereas automatic results would contend with three (as 80% sensitivity was obtained), so specific time gain results present a fairer comparison of the performance. It can be seen from Table 7.3 that in all cases the time gain which was obtained by manual models outperformed the time gain metrics which were obtained from the automatic models. It can also be seen that the time gain metrics which were obtained with the Hilbert transform QRS detection algorithm consistently outperformed those obtained by Afonso’s algorithm even though both algorithms obtained identical sensitivity of detection. However, the mean difference between Hilbert and Afonso time gains is 187 Niall Twomey Chapter 7: Table 7.3: Specific time gain metrics obtained from fully automatic allergy classification based on Afonso and Hilbert transform QRS detectors. QRS detector Time gain (mins) Doses saved (portions) Activation percentage Afonso Hilbert 34.62 35.47 1.47 1.67 53.77% 50.63% Manual 39.15 1.78 31.91% approximately one minute, which is not a significant difference. The total time gain is a linear scaling of the sensitivity and with both QRS detectors is approximately 30 minutes. 7.5 Discussion The unmatched classification results that were obtained were not acceptable as suboptimal specificity was obtained. This was due to the data-driven nature of the classification and model generation process, and, with new types of data (i.e. automatic QRS data), poor sensitivity and specificity were obtained with the use of manual models. Because of the disparity between the crossover results and fully manual results, it is stated that crossover classification is not appropriate for the classification of allergy and fully automated classification must be employed. Fully automatic models (i.e. automatic models generated with automatic data) obtained 100% specificity. This is a very significant result as it means that (due to the subjectindependent means in which the results were obtained) equivalence can be attributed to machine-based classification of allergy and clinical diagnosis of allergy. 80% sensitivity was also obtained, and this corresponds to three false negative classifications. While this sensitivity is lower than the sensitivity that was obtained with manual models, automatic classification of allergy can still introduce a significant improvement to the state of the art of allergy diagnosis as OFCs could have been terminated on average over 30 minutes sooner with machine-based classification. This time gain can be used to good advantage by 188 Niall Twomey Section 7.6: Conclusion administering antihistamines, such as Zyrtec, to counter the symptoms of allergy. Zyrtec begins to take effect between 10 — 20 minutes and by achieving time gain of over 30 minutes it might therefore be possible to eliminate the effects of allergy for some subjects who underwent OFC. The thresholds which were selected for some subjects were too large to obtain classification. Figures 7.2a — 7.2c shows the likelihood traces for Subject 1, and the threshold is outside the limits of the axes in each case. The reason for this is due to the subjectindependent means by which the threshold is selected, and the requirement to obtain 100% specificity in training as some subjects contributed to larger deviations from the background. However, OFC will be tolerant to false negative classifications. This is because the same vigilance which is employed during OFC would be employed during monitored OFC, so in the cases of false negatives, the current state of clinical art is not impaired, and safe diagnosis of allergy will always be obtained. For Subjects 2 and 3 the models which were selected did not exemplify the signatures of allergy that were obtained by the manual models and data. It was stated in Chapter 5 and earlier in this chapter that the choice of an appropriate classification model is important for the detection of signatures of allergy in likelihood traces, and indeed that a poor choice in these parameters can have the effect of under-representing the ‘novelty’ of the HRV data. This occurred with Subjects 2 and 3 with the automatic models, where, upon manual investigation of alternative likelihoods, larger deviations from background levels could have been chosen with automatic models. It is inappropriate to infer whether different model choices would have yielded correct classification, as this would introduce manual analysis, and the aim of this chapter is to assess automated classification of allergy. 7.6 Conclusion This chapter has demonstrated the means by which fully automated classification of allergy can be achieved. Because of the subject-independent nature of the procedure 189 Niall Twomey Chapter 7: employed here, the results which were obtained should be representative of results that will be obtained with future data. Importantly, these results obtained no false positive classifications on the unseen data in a fully objective manner, and therefore the machinebased classification of allergy which is presented here is equivalent to clinical diagnosis of allergy. Consequently an important question to consider is whether this procedure might be employed for oral food challenges. Having demonstrated the ability of obtaining 100% specificity and 80% sensitivity, the application of machine-assisted classification is appropriate for automated detection of allergy without compromising the quality of medical diagnoses. Manual annotations (which provide an upper bound of the expected performance) have demonstrated that it is possible to achieve higher sensitivity, but even in light of this, the time gain results which were obtained allow the state of clinical art to be significantly improved by over 30 minutes, which introduces the capability of reducing the effect of allergic reactions on subjects through the early administration of antihistamines. Therefore, the state of the clinical art of food allergy diagnosis can be significantly advanced with digital signal processing and artificial intelligence based monitoring of the heart rate variability features during oral food challenge. 190 CHAPTER 8 Overall summary, final conclusions and future work 8.1 T Summary of this thesis HIS thesis has presented an investigation of the applicability of machine-based analyses of non-invasive on-body sensors for the purpose of automatically detecting signatures of food allergy. This work was inspired by two observations of allergists who have conducted food challenges. The observations were that there is both a tendency for subjects to become quiet before the onset of allergic reactions, and that there is also a tendency for the heart rate of the subjects to change before allergic reactions occur (Bindslev-Jensen et al., 2002). It was shown that the accelerometer-based approaches investigated in this thesis did not provide a viable means of assessing the state of allergy, even though this is one of the 191 Niall Twomey Chapter 8: observations identified by the allergists as being indicative of the onset of allergy. Indeed, this is, perhaps, the more identifiable of the two observations for the allergists as it is easily observed. The reason for the poor efficacy of acceleration-based analysis is that the range of activities — and therefore the movements and related energy expenditure — that the subjects can undertake is limited by the fact that subjects are required to remain on a bed for the duration of the oral food challenge. As a result of this, no separability was achieved between allergic and non-allergic subjects with activity-based metrics and with energy expenditure estimation algorithms. It is possible that by allowing subjects to play freely that a means of allergy detection based on activity and energy analyses could be formed. Furthermore, it is possible that with placebo-based challenges that more nonallergic data would be available for model generation. However, these new procedures would require a new brand of oral food challenge and were not investigated. The second observation was then investigated. In order to confirm the existence of signatures of allergy in the ECG, the QRS points were manually annotated. Heart rate variability features were then extracted from these annotations, and the features were separated based on whether the data originated from an allergic subject. By these means, it was possible to assess the characteristic differences between the allergic and nonallergic heart rate variability features. This is the first work which has investigated this, and indeed it was shown that, even in single-dimensional representations, many of the features extracted from the ECG obtained separability between the allergic and the nonallergic classes. In comparison to the separation capabilities which were obtained with the accelerometer data, the heart rate variability data provides a more suitable platform on which to base machine learning algorithms. GMM-based novelty detection was employed on manually annotated HRV feature data with nested, subject-independent parameter selection and performance assessment routines. This type of classification was performed because the only set of labelled data available was the background data which was recorded before the administration of the first dose of the allergen. By these novelty detection means, 93% sensitivity and 100% specificity were obtained. These metrics also facilitated in obtaining approximately 192 Niall Twomey Section 8.1: Summary of this thesis 39 minutes time gain after consumption of just 30% of the dose which can drastically reduce the risk of reactions. This is the first research which has been performed for the classification of allergy based on the heart rate variability features. The results show that this approach is very suitable and that through these means excellent classification, perfect specificity and very high time gain results can be obtained. Manual QRS annotations yield undeniable affirmation of this as there is no uncertainty about the validity of the QRS points and therefore the features. These results were obtained with manually annotated QRS points, and therefore represent the upper-bound of the expected results that could be obtained with automatically extracted heart rate variability features. Two automatic QRS detection algorithms were therefore investigated, and these were validated against the well-known MITBIH arrhythmia database. These algorithms were then tested against the ECG which was recorded during the oral food challenges. Both of these algorithms performed satisfactorily against both databases, and were therefore employed for the fully automated and autonomous classification of allergy. From the automatically identified QRS points, heart rate variability features were extracted. These were employed with the subject-independent classification framework mentioned previously. It was found that 100% specificity can be obtained by this process. While this was previously obtained with manual classification models, this is a very significant result as they were obtained in a fully automated manner. For this fullyautomated classification of allergy framework, therefore, it can be stated that classification of allergy is equivalent to a diagnosis of allergy because no false positive classifications were obtained. Sensitivity of 80% was obtained by the fully automated routine. This value is lower than that which was obtained with manual annotations, but the time gain metrics which were obtained with this data were very strong, and approximately 30 minutes would have been saved on average. While this time gain is also lower than that which was obtained with the manual classification models, achieving this in a fully automated manner is a 193 Niall Twomey Chapter 8: very significant result. This time gain could be employed to administer antihistamines to the subjects which could potentially eliminate allergic reactions for some subjects. Alternatively, this time gain could be employed to allow the subjects to recover from the allergic condition naturally, as with no additional doses of the allergen administered it may be possible that the subjects could ‘fight’ the reaction and overcome the symptoms naturally. 8.2 Primary contribution of this thesis Allergy is a chronic disorder which can only be diagnosed in a clinically invasive manner. This thesis has been the first to investigate two approaches to the classification of allergy. It was shown previously that accelerometer-based assessment of activity (Twomey et al., 2010b) during oral food challenges is unable to characterise the allergic and non-allergic classes in a manner which allows separability. This thesis has ruled out a number of avenues for its applicability with allergy classification. However, it was discovered that allergy affects the heart rate variability in a detectable manner where it can be exploited to classify the condition (Twomey et al., 2010a, 2011, 2013b). It was also discovered that signatures of allergy are sometimes better exemplified at different epoch lengths. By employing novel result fusion in a subjectindependent manner, subtle characteristics of the signatures of allergy can be identified and classification of allergy can be achieved in an objective manner that obtains 100% specificity. This allows equivalence to be drawn between the automatic classification and clinical diagnosis of allergy. Signatures of allergy were repeatedly identified on the heart rate variability features before the onset of physical reactions. In fact, this thesis has shown how signatures of allergy can be classified approximately 30 minutes earlier than allergists with the consumption of approximately two-thirds less of the problem foods. The significance of this is that 194 Section 8.3: Possible avenues of future work Niall Twomey subjects could be diagnosed with the same certainty and accuracy, but with a reduced risk of allergic reactions and anaphylaxis (Twomey et al., 2013a). The chapters of this thesis described the means to significantly and safely advance the current state of clinical art of allergy diagnosis in an objective and automatic manner, and this work, therefore, supports the case for the inclusion and use of intelligent monitoring for the diagnosis of allergy during oral food challenges. 8.3 Possible avenues of future work This thesis has explored a number of approaches to classifying food allergy. There are other avenues which deserve examination that could not be investigated in this thesis. These are briefly described below. 8.3.1 Data collection The results obtained excellent and consistent results, which, due to the subject independent nature of the classification routine, is indicative of the relationship between allergy and the HRV. In order to further define the effect of allergy on the HRV features, a larger database on which to test this classification routine should be obtained. This should be the primary focus of any researcher who takes on this research, and with this, other classification methods, such as bootstrapping, can be investigated with ‘fixed’ models. In contrast to the leave one out procedure that was used in previous chapters, bootstrapping utilises multiple test subjects and would validate the robustness of one model over multiple participants in the test set. 195 Niall Twomey 8.3.2 Chapter 8: Alternative novelty detectors Chapters 2 and 5 discussed the area of novelty detection, and the reasons that GMM-based classification was selected. However, while GMM-based novelty detection was stated as being preferable for the initial analysis, investigation into alternative classification routines should also be performed. Two alternative novelty detection routines which might be analysed include: One-class SVM: The principal alternative is the one-class SVM. These classifiers are the single-class equivalent to the popular SVM discriminative classifier (Schölkopf et al., 2000, 2001) and these estimate the boundary of support of the singly-labelled data, efficiently making this a discriminative novelty detection classifier. Whereas GMM-based novelty detection computed the likelihood of new data belonging to the background class, one-class SVMs compute the distance of new data to the surface of a hypersphere which surrounds the background distribution. One-class SVMs can also employ the ‘kernel trick’ (Aizerman et al., 1964) which can perform nonlinear mapping of features to higher-dimensional feature space. This has proven to be useful for many applications, and may prove to be a useful feature for allergy classification. Elliptic envelope: Minimum covariance determinant estimator (Rousseeuw and Van Driessen, 1999) can also be employed to detect novelty as described here. This algorithm effectively fits elliptic curves about training distributions, and benefits from rotational symmetry which one-class SVMs and GMMs may not encompass. This method works well for normal (and normal-like) data, but with multimodal distributions, elliptic envelopes have a tendency to poorly model the underlying structure as it cannot generalise to multimodal distributions unlike GMMs and GMMs. Yet, it might yield interesting results with allergy detection as it should result in a simpler model representation. 196 Section 8.3: Possible avenues of future work 8.3.3 Niall Twomey Real-time and portable implementation All of the classification and analyses which were performed here were performed offline, i.e. the data was recorded and stored, and the data interrogation was performed on a computer. While the results which were presented at the end of this thesis are fully automated, it would be interesting to verify that the same classification results are obtained in a real-time solution. Preliminary work for this has been performed already (Gutiérrez et al., 2013). For a fully-wireless solution, a mobile device would record the ECG data, and compute features and their novelty. However, as it might be necessary to review the decisions that the mobile solution obtained, it would be necessary to log the ECG data electronically and transfer this via a wireless link. These two operations are heavy consumers of power in mobile platforms. With lossy compression, however, the data which needs to be transmitted can be compressed to a high degree. Twomey et al. (2010c) demonstrated that with a compression ratio of 30, reliable heart rate variability features can be extracted, and the area of ECG compression has also been investigated by other researchers (Olmos et al., 1996; Craven et al., 2012). Analysis of the effect of compression on classification accuracy and overall time gain would unveil the applicability of the lossy compression solution for long-term monitoring situations. 8.3.4 Feature and epoch analysis It was demonstrated that in some cases different epoch lengths display the effects of certain characteristics of allergy better than others. It would be useful to investigate if features which are employed with speech classification (Temko et al., 2010, 2011a) or EEG classification (Doyle et al., 2010) might be suitable in obtaining accurate classification of allergy with fusion of fewer epoch lengths. 197 Niall Twomey Chapter 8: Additional insight could be gained from quantifying the effectiveness of the individual features which were employed for the classification. Preliminary work which investigates the importance of the categories of features was performed on this in Appendix B, and it is concluded that the full set of features is required for the best classification and time gain results. However, fuller investigation is justified, and analysis of the results of algorithms such as recursive feature elimination could be assessed. 8.4 Publications resulting from this work The publications which have resulted from this research and which have been published, are currently in review and preparation are listed below. Twomey, Niall and Faul, Stephen and Marnane, William P (2010); Comparison of accelerometer-based energy expenditure estimation algorithms; Pervasive Computing Technologies for Healthcare; pages 1–8. Twomey, Niall and Walsh, Noel and Doyle, Orla and McGinley, Brian and Glavin, Martin and Jones, Edward and Marnane, WP (2010); The effect of lossy ECG compression on QRS and HRV feature extraction; Engineering in Medicine and Biology Society; pages 634–638. Twomey, Niall and Faul, Stephen and Daly, Deirdre and Hourihane, JO and Marnane, William P (2010); Classification of biophysical changes during food allergy challenges; International Symposium on Applied Sciences in Biomedical and Communication Technologies; pages 1–5. Twomey, Niall and Temko, Andrey and Hourihane, Jonathan O’B and Marnane, William P (2011); Allergy detection with statistical modelling of HRV-based non-reaction baseline features; International Symposium on Applied Sciences in Biomedical and Communication Technologies; pages 134–138. 198 Section 8.4: Publications resulting from this work Niall Twomey Twomey, Niall and Temko, Andrey, and Cullinane, Claire, and Daly, Deirdre, and Marnane, William P, and Hourihane, Jonathan O’B (2013); Detection of heart rate variation could improve patient safety and diagnostic yield during oral food challenge; European Academy of Allergology and Clinical Immunology; In press. Twomey, Niall and Gutiérrez, Raquel, and Marnane, William P, and Campos-Garcia, Jesús (2013); Real-Time Allergy Detection; IEEE Society of Intelligent Signal Processing; In press. Twomey, Niall and Temko, Andrey and Hourihane, Jonathan O’B and Marnane, William P (2013); Fully automated allergy detection from paediatric ECG; IEEE Transactions on Information Technology in Biomedicine; In press. 199 APPENDIX A Alternative parameter selection routines A.1 T Introduction and Methods HIS chapter investigates the effect of alternative model selection routines. Previously, decision making parameters were chosen based on a cost function and candidates which did not satisfy the cost function were eliminated from consideration. This process resulted in the elimination of many tens of thousands of parameters. While this process yields excellent classification and time gain results, this chapter investigates whether the optimal results are obtained by employing alternative parameter selection routines based on less-austere cost functions. In Chapter 5, the means by which the parameters were chosen was discussed. The number of parameters which are selected is strongly influenced by the constraint of selecting the 201 Niall Twomey Chapter A: parameters which obtain the maximum time gain in the training data, and in general only one parameter is selected from the entire search space. The effect of relaxing this restriction on the data, and alternative means of selecting post-processing parameters was investigated here. The statistical mode (i.e. the most frequently occurring parameter) was the means by which parameters were selected in Chapter 5. The parameter search space is reduced by means of a cost function which assesses the entire search space and reduces the quantity by the following three factors (listed in order of importance): 1. Eliminate parameters which fail to obtain 100% specificity. 2. Eliminate parameters which fail to obtain the maximum sensitivity from the subset resulting from step 1. 3. Eliminate parameters which fail to obtain the maximum time gain from the subset resulting from step 2. While item 1 is the most important (followed by items 2 and 3) the number of parameters which are eliminated by each step are ordered in descending order of importance of each step, i.e. step 3 eliminates the majority of the parameters while step 1 eliminates the fewest. Approximately ten thousand parameters are removed by step 3⇤ , and it is by considering the set of ten thousand parameters that this alternative parameter selection is performed. Figure A.1 shows an example density of parameters obtained before the time gain criteria eliminated parameters. The density distribution is plotted against the multiplicative and duration parameters, and darker regions indicate areas of higher density. Parameter selection which terminates at item two of the itemisation is termed ‘relaxed parameter selection’ henceforth, as the time gain criteria is not considered by this process and this is ⇤ This number is dependent on many factors, but 10,000 was calculated based on the average number of parameters which were obtained over all of the internal leave-one-out stages from the simulations which were performed. 202 Niall Twomey Section A.1: Introduction and Methods Figure A.1: The estimated distribution of duration and multiplicative post-processing parameters which achieve 100% specificity and maximum sensitivity on the training dataset. The image is limited to d and n parameters of 75. The darker regions indicate a higher density of suitable parameters. viewed as a relaxation of the routine. The parameters which result from this are termed the ‘relaxed parameter set’. For clarity, the original parameter selection routine from Chapter 5 is termed the ‘original parameter selection routine’ in this section and the ‘original parameter set’ are the parameters which result from this. It can be seen in Figure A.1 that at a duration of 1 yields the highest density of parameters. This is an intuitive value as with a constant multiplicative parameter and an increasing duration parameter, there will be a larger quantity of points which satisfy a smaller duration parameter than would a larger parameter. It is also expected that with a constant duration parameter and an increasing multiplicative parameter that the density will increase to a maximum value and will then reduce, as is shown in Figure A.1. This is because low multiplicative parameters will 203 Niall Twomey Chapter A: not satisfy the 100% specificity criteria and are rejected from step 1 and few parameters are chosen. As the multiplicative parameter increases an increasing number of points are selected. However, as the parameter continues to rise the number of points which satisfy the maximum sensitivity criteria begins to roll off, and fewer points are selected as the maximum sensitivity is not attained. Ripples which are seen on the chart for increasing multiplicative values are due to the distribution of the duration and multiplicative parameters. The search space consists of the unique, integer-rounded values which are logarithmically distributed between 1 and 300. The density chart was obtained by KDE and in Figure A.1 the x– and y-axes are continuously distributed while the search space is not. Therefore, when the density of the parameters is estimated between the discrete values, the density reduces until the next parameter has been encountered. This section investigates the effect of selecting the mean, median and statistical mode of the distribution of the relaxed parameter set. A.2 Results Table A.1 tabulates the sensitivity, specificity and time gain metrics which were obtained when employing the relaxed parameter selection constraints. The results which were presented in this table were calculated by the boosting classification procedure described in Chapter 5. The mean selection routine achieved the same sensitivity as the results from the original parameter selection routine. Subject 16 was misclassified as allergic by this parameter selection routine, however. This is the same subject who was stated in Chapter 5 as being physically opposed to consuming the food type. Even though the subjected ended up not being allergic to the food type, given the subject’s attitude during the OFC, it is likely that allergists performing the challenge would not utilise automatic classification. 204 Niall Twomey Section A.2: Results Table A.1: Tabulation of sensitivity, specificity, and the time gain metrics which were obtained by selecting the mean, median and mode of the set of post-processing parameters from the training data. In the case of the mean method, imperfect specificity was obtained. Selection type Sensitivity % Specificity % Time gain mins Mean Median Mode 93.33 86.66 93.33 88.88 100.00 100.00 28.36 21.75 20.29 Chapter 5 93.33 100.00 36.5 The cause for the misclassification of Subject 16 is that the duration and multiplicative parameters are typically biased in a region where the threshold can be surpassed by both allergic and non-allergic subjects. The duration parameter will reject spurious deviations from the background and only substantiated departures from the background will be classified as allergy. However, upon investigation of Figure A.1, it can be seen that a high proportion of duration parameters at a value of 1 are obtained. As a result, when computing the mean parameter of the distribution, a lower duration parameter was selected, and Subject 16 was misclassified as allergic as a result. The median and mode selection methods obtained perfect specificity, and sensitivity of 87% and 93% respectively; results which are competitive with those obtained by the original parameter selection criteria. It is interesting to note, however, that for both full and relaxed selection criteria, the parameter selection routine based on the mode of the distributions obtained the same sensitivity and specificity. Table A.1 also shows the mean specific time gain which was calculated from these parameter selection methods. Of the values obtained with the relaxed parameter selection, parameter selection based on the mean of the distribution gain obtained the largest time gain. This is also due to the high percentage of low-value duration parameters in the distribution. The selection routines based on the median and mode functions achieved approximately equal time gain. In all cases, however, the total time gain obtained from the original parameter selection criteria outperform the relaxed criteria. 205 Niall Twomey A.3 Chapter A: Discussion Alternative parameter selection routines can yield classification results which are as accurate as the parameter selection routine which was described in Chapter 5 (sensitivity of 93.33% and 100% specificity). This is because the full parameter selection routine selects parameters which have been ‘known’ (from training data) to obtain the best time gain results, and this high accuracy was preserved between training and unseen testing data. In many cases the signature of allergy is detected a number of times during the OFC, as is shown by Figure 5.8 (Chapter 5, page 123) where the allergy is classified at approximately 45, 60 and 80 minutes. With the original parameter selection routine, the multiplicative and duration parameters were selected in such a way as to specialise towards finding the first signature of allergy from the training data. With a plurality of departures from the background likelihoods, it was observed that there is a tendency that the extent of departure from the background levels increases in comparison to the previous departure. Therefore, smaller multiplicative parameters will yield better time gain results as they provide the greatest likelihood of satisfying the failure criteria. However, the relaxed parameter selection routine gives no consideration to the time gain values when selecting parameters. Hence the sensitivity and specificity results which were obtained were excellent and in some cases matched the sensitivity and specificity results obtained in Chapter 5. A.4 Conclusion This section investigated whether the parameter selection routine utilised previously selected the optimal parameters in comparison to other parameter selection routines. 206 Niall Twomey Section A.4: Conclusion It was shown that alternative selections can yield equal sensitivity and specificity to what was obtained in Chapter 5. However, as the selection routine is not time-gain aware, the time gain results were approximately one half of those which were obtained with time gain aware parameter selection. Therefore, the parameter selection routine which was incorporated previously is the best methodology of all obtained, as it achieved the best accuracy and time gain results and this process should be used for parameter selection . 207 APPENDIX B Investigation into the importance of features B.1 F Introduction and methods OR the classification of allergy, PCA is performed on normalised training data. This transformation was performed in order to de-correlate the features which was a requirement for allergy classification as the allergy database is insufficiently sized to train GMMs with full covariance matrices. With the transformed features, GMMs were generated. The EM algorithm is employed to compute the optimal means, covariances and weights of the mixture model. It is possible to investigate the explained variances of the PCA matrices, (Wold et al., 1987) and the weights which are attributed to the features in order to assess the relative significance of the features. However, these values relate to the PCA components rather than the features 208 Niall Twomey Section B.1: Introduction and methods themselves. It is not appropriate to assign interpretation to the transformed features, in particular when normalisation was performed (Webb et al., 2011; Bishop et al., 2006) so explained variances and Gaussian weights cannot be utilised to assess the importance of the features. It is also difficult to employ the probability density functions which were presented in Chapter 4 as a means of quantifying the effectiveness of individual features. This is because these will contain a significant amount of non-allergic data (as non-allergic HRV features will present in an allergic subject’s challenge until an allergic reaction occurs). Therefore, in order to quantify the importance of feature categories, the allergy classification framework utilised previously was employed on subsets of features, where each subset is representative of a feature type. The full set of features can be loosely grouped into time, frequency, Poincaré and sequential groups (see Chapter 4 for the members of each group). Classification was then performed once for each subset of feature category. Based on the results of this process, higher feature importance is attributed to feature sets which obtain superior results. The order of importance of the results are as follows: 1. 100% Specificity. 2. Sensitivity. 3. Time gain. and ranking is achieved first by descending specificity, then by descending sensitivity and finally by descending time gain. Two key differences between the classification framework outlined in Chapter 5 and this process outlined here exist. Firstly, as the process outlined here performs classification based on a subset of features, the dimensionality of the input to the classification framework is reduced from 18 to N , where N is the number of features associated with a particular category. The second difference is that the full set of N features is then preserved after the PCA transformation, and dimensionality reduction is not 209 Niall Twomey Chapter B: performed. This procedure was chosen because this process was executed to assess the significance of feature categories. Allowing feature reduction on the feature categories would not illuminate the importance of the feature categories as only the importance of the components could be attained. The models which were generated are termed time, frequency, Poincaré and sequential domain models and these relate directly to the category of the features on which the models were trained. B.2 Results Table B.1 tabulates the results which were obtained. This table tabulates the sensitivity, specificity, total time gain, doses saved and activation percentage metrics for all models investigated (for definitions of these metrics, see Chapter 5). The time gain metrics which are presented are the total time gain results which were obtained by the experiment. This metric was chosen over the specific time gain parameter because the overall performance of classification is of interest rather than the specific performance on certain correctly classified subjects. Table B.1: Classification metrics which were obtained with the time–, frequency–, Poincaré– and sequential-domain classification models ranked by order of importance of the feature category in question. Feature Category Specificity Sensitivity Time gain (mins) Doses saved (portions) Activation percentage Time domain Poincaré domain Frequency domain Sequential domain 100.00% 100.00% 77.77% 66.66% 80.00% 53.33% 6.66% 13.33% 30.25 15.63 0.65 0.06 1.27 1.0 0 0.13 54.44% 69.19% 100.00% 95.55% Chapter 5 100% 93.33% 36.5 1.67 38.57% 210 Niall Twomey Section B.2: Results It can be seen that time-domain features achieved good classification metrics in all cases, obtaining 100% specificity and 80% sensitivity. Interestingly, Subjects 1, 3 and 4 were misclassified by the time domain models. It is not surprising that Subjects 1 and 3 were misclassified as Subject 1 wasn’t classified correctly by any classification routine and Subject 3 was inconsistently classified at different epoch lengths in Chapter 5. However, this is the first instance over all classification runs up to this point where Subject 4 is misclassified as non-allergic. It is interesting to note that while the time domain models achieve very good overall performance, one of the most consistently classified subjects of the allergy database is misclassified by the highest ranked feature category. The frequency domain models achieved very poor results. This was an unexpected result because the frequency domain parameters can estimate the sympathetic and parasympathetic response of the ANS (Kamath et al., 1993; Stein et al., 1999; Hedman et al., 2008). It was postulated that quantification of these metrics would be good indicators of allergic signatures in Chapter 4, as a subject’s ANS should react to the onset of an allergic reaction. It is uncertain why such poor classification of allergy was achieved with frequency domain features. However, upon investigation of the histograms which were obtained in Chapter 4, it can be seen that poor separation was obtained between the allergic and non-allergic categories. It is interesting to note that Subject 4 was the only subject who was correctly classified by these models when this subject was one of the subjects who was misclassified by the time domain models. The frequency domain models also misclassified two non-allergic subjects as allergic, and achieved very low average time gain as so few subjects were correctly classified. The Poincaré models performed adequately, yielding 53% sensitivity and 15 minutes average time gain. However, Poincaré features also measure the response of the ANS (Brennan et al., 2002; Mourot et al., 2004; Piskorski and Guzik, 2005) but the models that were generated based on these features outperformed the frequency-domain models which can also measure the response ANS. It is also interesting to note that the Poincaré features identified allergy in Subject 3, who has been inconsistently classified. Indeed, 211 Niall Twomey Chapter B: this is the only feature category which correctly classified Subject 3 as allergic. Poincaré features obtained 100% specificity. The sequential domain models performed poorest of all, misclassified three non-allergic subjects, obtained the lowest specificity (66.66%), and only correctly classified two allergic subjects (13.33% sensitivity). It is very interesting to note that one of the two subjects who was classified correctly as allergic was Subject 1 (obtaining a time gain of approximately 15 seconds). This is the only case in which the parameters which were selected achieved correct classification of this subject. However, while correct classification of Subject 1 occurred, it should be noted that the majority of the remaining allergic subjects were incorrectly classified. B.3 Discussion It must be noted that the results which were obtained in this section are not intended to compete with the classifiers discussed in previous chapters. The purpose of this section is to assess the importance of each feature category and to investigate whether the optimal results were obtained in Chapter 5 with the full set of features. When ranking the feature categories based off the importance of the results, the most important feature category was found to be the time-domain features. These features yielded 100% specificity, 80% sensitivity and an average of 30 minutes time gain. Three subjects were misclassified as non-allergic by these features because the likelihoods which were selected did not deviate from background levels to the same degree as they did with manual models. Upon investigation of these subjects, it was seen that two were correctly classified by the manual models. The cause for misclassification of these subjects is that the time domain features alone did not exhibit signatures of allergy for these subjects. The specific time gain of the time domain features, approximately 37 minutes (30.25 ⇥ 15 12 minutes), is very close to the specific time gain obtained by manual models (39.15 212 Niall Twomey Section B.3: Discussion minutes). Therefore, the time domain models are the biggest contributor of time gain to allergy classification as well as the best performing individual feature category. Interestingly, classification based on the sequential domain features correctly classified Subject 1 as allergic. This is the first and only instance where this Subject was correctly classified as allergic. While this subject was correctly classified, the sensitivity and specificity which were obtained with the sequential domain classification models was quite poor. Therefore, it appears that the sequential domain features are suitable for the classification of quick-onset allergic reactions, but these features do not appear to be able to well distinguish between allergic and non-allergic subjects. The number of quick-onset allergic reactions is not sufficient to substantiate this assertion with certainty. However, evidence based on previous models support the argument as these correctly classified the entire set of allergic subjects except for Subject 1. Indeed, as this feature achieved poor specificity it is likely that contributions of features in this category were removed by feature reduction during PCA model selection. Subject 16 was one of the three non-allergic subjects who were misclassified as allergic by the sequential features. This subject was discussed in Chapter 5 and it was stated that they presented with allergy-like signatures in the likelihoods which were obtained. As a result, the likelihood values computed for this subject resembled those which might be obtained by allergic subjects. The Poincaré models yielded 100% specificity and approximately 50% sensitivity. However, while this feature category did not yield as high a sensitivity as the time domain features, Subjects 3 and 4, who were both misclassified by the time domain models, are correctly classified by the Poincaré models. If one considered the fusion of the results of the time domain and Poincaré models, one would achieve the same accuracy of classification that was obtained in Chapter 5. The fact that the combination of the time domain and Poincaré domain features matches the sensitivity obtained previously is a retrospective result which sheds light on the importance of considering combinations of features, and should not be interpreted as 213 Niall Twomey Chapter B: the best of features which will obtain the optimal features results. The reason for this is that the choice of time domain and Poincaré domain features together might yield results which are comparable to those obtained by the manual models, but they may not yield the optimal results, and inclusion of less important features may yield better results. One of the Poincaré features computes the CSI which is a measure of the sympathetic response of the autonomic nervous system. However, while the frequency domain features also measure this response, the frequency domain parameters did not achieve good classification results, achieving the second lowest specificity, and the lowest sensitivity. B.4 Conclusion This section performed allergy classification on subsets of features for the purpose of assessing the importance of each category of feature in allergy classification and identifying whether the results obtained in Chapter 5 could be improved. It was discovered that the time domain features account for the majority of the sensitivity and time gain that was obtained in the Chapter 5. However, the sensitivity obtained with time gain models alone did not account for the entirety of the results obtained in previous chapters. For the optimal results, the inclusion of other features complement the classification process and yield the best results. However, correct classification of Subject 1 was obtained by the sequential domain features. This indicates that there might is a possibility of obtaining perfect sensitivity of classification. It is uncertain whether it might also be possible to obtain perfect specificity as the sequential domain features achieved very poor specificity. However, as the most important criterion for classification of allergy has consistently been stated as obtaining 100% specificity, it is believed that the overall results which were obtained by utilising the full set of features cannot be improved upon, and that the full set of features should be employed for allergy classification. 214 References Afonso, V., Tompkins, W., Nguyen, T., and Luo, S. (1999). ECG beat detection using filter banks. IEEE Transactions on Biomedical Engineering, 46(2):192–202. Afonso, V., Tompkins, W., Nguyen, T., Trautmann, S., and Luo, S. (1995). Filter bankbased processing of the stress ECG. In Engineering in Medicine and Biology Society, pages 887–888. Ahlbom, A., Backman, A., Bakke, J., Foucard, T., Halken, S., Kjellman, N., Malm, L., Skerfving, S., Sundell, J., and Zetterstrom, O. (1998). Pets Indoors–A Risk Factor For or Protection Against Sensitisation/Allergy. Indoor Air, 8(4):219–235. Ahlstrom, M. and Tompkins, W. (1983). Automated high-speed analysis of Holter tapes with microcomputers. IEEE Transactions on Biomedical Engineering, 30(10):651–657. Ainsworth, B., Haskell, W., Herrmann, S., Meckes, N., Bassett, D., Tudor-Locke, C., Greer, J., Vezina, J., Whitt-Glover, M., and Leon, A. (2011). 2011 Compendium of Physical Activities: A Second Update of Codes and MET Values. Medicine and Science in Sports and Exercise, 43(8):1575–1581. Ainsworth, B., Haskell, W., Leon, A., Jacobs, D., Montoye, H., Sallis, J., and Paffenbarger, R. (1993). Compendium of physical activities: classification of energy costs of human physical activities. Medicine and Science in Sports and Exercise, 25(1):71–80. 215 Niall Twomey Chapter B: Ainsworth, B., Haskell, W., Whitt, M., Irwin, M., Swartz, A., Strath, S., O’Brien, W., Bassett Jr, D., Schmitz, K., Emplaincourt, P., et al. (2000). Compendium of physical activities: an update of activity codes and MET intensities. Medicine and Science in Sports and Exercise, 32(9):498–504. Aizerman, A., Braverman, E. M., and Rozoner, L. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25(1):821–837. Almqvist, C., Egmar, A., Hedlin, G., Lundqvist, M., Nordvall, S., Pershagen, G., Svartengren, M., Hage-Hamsten, M., and Wickman, M. (2003). Direct and indirect exposure to pets–risk of sensitization and asthma at 4 years in a birth cohort. Clinical and Experimental Allergy, 33(9):1190–1197. Aloise, D., Deshpande, A., Hansen, P., and Popat, P. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245–248. Andersen, R., Borgs, C., Chayes, J., Hopcroft, J., Jain, K., Mirrokni, V., and Teng, S. (2008). Robust PageRank and locally computable spam detection features. In Adversarial information retrieval on the web, pages 69–76. Annadhorai, A., Guenterberg, E., Barnes, J., Haraga, K., and Jafari, R. (2008). Human identification by gait analysis. In Workshop on Systems and Networking Support for Health Care and Assisted Living Environments, page 11. Antelmi, I., De Paula, R. S., Shinzato, A. R., Peres, C. A., Mansur, A. J., and Grupi, C. J. (2004). Influence of age, gender, body mass index, and functional capacity on heart rate variability in a cohort of subjects without heart disease. The American Journal of Cardiology, 93(3):381–385. Arshad, S., Kurukulaaratchy, R., Fenn, M., and Matthews, S. (2005). Early Life Risk Factors for Current Wheeze, Asthma, and Bronchial Hyperresponsiveness at 10 Years of Age*. Official Journal of American College of Chest Physicians, 127(2):502–508. Arzeno, N., Deng, Z., and Poon, C. (2008). Analysis of first-derivative based QRS detection algorithms. IEEE Transactions on Biomedical Engineering, 55:478–484. 216 Niall Twomey Section REFERENCES Arzeno, N., Poon, C., and Deng, Z. (2006). Quantitative analysis of QRS detection algorithms based on the first derivative of the ECG. In Engineering in Medicine and Biology Society, pages 1788–1791. Avery, N., King, R., Knight, S., and Hourihane, J. (2003). Assessment of quality of life in children with peanut allergy. Pediatric Allergy and Immunology, 125(6):1327–1335. Aziz, W., Schlindwein, F. S., Wailoo, M., Biala, T., and Rocha, F. C. (2012). Heart rate variability analysis of normal and growth restricted children. Clinical Autonomic Research, 261(5):480–487. Azpiri, A., Alonso, E., Gamboa, P., Jauregui, I., Antepara, I., Fernandez, E., Férnandez de Corres, L., Audicana, M., Munoz, D., and Escobar, A. (1999). Prevalence of pollinosis in the Basque Country. European Journal of Allergy and Clinical Immunology, 54(10):1100– 1104. Badilini, F. and Blanche, P. (1996). HRV Spectral Analysis by the Averaged Periodogram. Annals of Noninvasive Electrocardiology, 1(4):423–429. Bahoura, M., Hassani, M., and Hubin, M. (1997). DSP implementation of wavelet transform for real time ECG wave forms detection and heart rate analysis. Computer Methods and Programs in Biomedicine, 52(1):35–44. Bailón, R., Laguna, P., Mainardi, L., and Sornmo, L. (2007). Analysis of heart rate variability using time-varying frequency bands based on respiratory frequency. In Engineering in Medicine and Biology Society, pages 6674–6677. Bailón, R., Mainardi, L., Orini, M., Sörnmo, L., and Laguna, P. (2010). Analysis of heart rate variability during exercise stress testing using respiratory information. Biomedical Signal Processing and Control, 5(4):299–310. Baldzer, K., Dykes, F., Jones, S., Brogan, M., Carrigan, T., and Giddens, D. (1989). Heart rate variability analysis in full-term infants: spectral indices for study of neonatal cardiorespiratory control. Pediatric research, 26(3):188–195. 217 Niall Twomey Chapter B: Barold, S. S. (2003). Willem Einthoven and the birth of clinical electrocardiography a hundred years ago. Cardiac Electrophysiology Review, 7(1):99–104. Benbasat, A. and Paradiso, J. (2002). An inertial measurement framework for gesture recognition and applications. Gesture and Sign Language in Human-Computer Interaction, 2298(1):9–20. Benitez, D., Gaydecki, P., Zaidi, A., and Fitzpatrick, A. (2000). A new QRS detection algorithm based on the Hilbert transform. In Computers in Cardiology, pages 379–382. Benitez, D., Gaydecki, P., Zaidi, A., and Fitzpatrick, A. (2001). The use of the Hilbert transform in ECG signal analysis. Computers in Biology and Medicine, 31(5):399–406. Bergmann, R., Diepgen, T., Kuss, O., Bergmann, K., Kujat, J., Dudenhausen, J., and Wahn, U. (2002). Breastfeeding duration is a risk factor for atopic eczema. Clinical and Experimental Allergy, 32(2):205–209. Bernardi, L., Wdowczyk-Szulc, J., Valenti, C., Castoldi, S., Passino, C., Spadacini, G., and Sleight, P. (2000). Effects of controlled breathing, mental activity and mental stress with or without verbalization on heart rate variability. Journal of the American College of Cardiology, 35(6):1462–1469. Biala, T., Dodge, M., Schlindwein, F. S., and Wailoo, M. (2010). Heart rate variability using Poincaré plots in 10 year old healthy and intrauterine growth restricted children with reference to maternal smoking habits during pregnancy. In Computing in Cardiology, pages 971–974. Bindslev-Jensen, C., Briggs, D., and Osterballe, M. (2002). Can we determine a threshold level for allergenic foods by statistical analysis of published data in the literature? Allergy, 57(8):741–746. Bishop, C. et al. (2006). Pattern recognition and machine learning. springer New York. Black, P., Udy, A., and Brodie, S. (2000). Sensitivity to fungal allergens is a risk factor for life-threatening asthma. European Journal of Allergy and Clinical Immunology, 55(5):501– 504. 218 Niall Twomey Section REFERENCES Boardman, M. A., Schlindwein, F. S., Thakor, N. V., Kimura, T., and Geocadin, R. (2002). Detection of asphyxia using heart rate variability. Medical and Biological Engineering and Computing, 40(6):618–624. Bock, S., Muñoz-Furlong, A., and Sampson, H. (2001). Fatalities due to anaphylactic reactions to foods. Journal of Allergy and Clinical Immunology, 107(1):191–193. Bock, S., Muñoz-Furlong, A., and Sampson, H. (2007). by anaphylactic reactions to food, 2001-2006. Further fatalities caused The Journal of Allergy and Clinical Immunology, 119(4):1016–1018. Bock, S. and Sampson, H. (1994). Food allergy in infancy. Pediatric Clinics of North America, 41(5):1047. Bolton, R. and Westphal, L. (1981a). Hilbert Transform Processing of ECG’s. In IREECON. Bolton, R. and Westphal, L. (1981b). Preliminary results in display and abnormality recognition of Hilbert transformed ecgs. Medical and Biological Engineering and Computing, 19(3):377–384. Bolton, R. and Westphal, L. (1984). On the use of the Hilbert Transform for ECG waveform processing. Computers in Cardiology, 19:533–536. Bolton, R. and Westphal, L. (1985). ECG display and QRS detection using the Hilbert Transform. Computers in Cardiology, 31(1):399–406. Bořil, H., Boyraz, P., and Hansen, J. (2012). Towards Multimodal Driver’s Stress Detection. Digital Signal Processing for In-Vehicle Systems and Safety, 132(1):3–19. Bouten, C., Koekkoek, K., Verduin, M., Kodde, R., and Janssen, J. (1997a). A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE Transactions on Biomedical Engineering, 44(3):136–147. Bouten, C., Sauren, A., Verduin, M., and Janssen, J. (1997b). Effects of placement and orientation of body-fixed accelerometers on the assessment of energy expenditure during walking. Medical and Biological Engineering and Computing, 35(1):50–56. 219 Niall Twomey Chapter B: Bouten, C., Westerterp, K., Verduin, M., and JANSSEN, J. (1994). Assessment of energy expenditure for physical activity using a triaxial accelerometer. Medicine and Science in Sports and Exercise, 23(1):21–27. Bradley, T. D. and Floras, J. S. (2003). Sleep apnea and heart failure. Circulation, 107(12):1671–1678. Braun-Fahrlander, C., Vuille, J., Sennhauser, F., Neu, U., Kunzle, T., Grize, L., Gassner, M., Minder, C., Schindler, C., Varonier, H., et al. (1997). Respiratory health and long-term exposure to air pollutants in Swiss schoolchildren. SCARPOL Team. Swiss Study on Childhood Allergy and Respiratory Symptoms with Respect to Air Pollution, Climate and Pollen. American Journal of Respiratory and Critical Care Medicine, 155(3):1042– 1049. Brennan, M., Palaniswami, M., and Kamen, P. (2002). Poincaré plot interpretation using a physiological model of HRV based on a network of oscillators. American Journal of Physiology-Heart and Circulatory Physiology, 283(5):1873–1886. Bryld, L., Hindsberger, C., Kyvik, K., Agner, T., and Menne, T. (2003). Risk factors influencing the development of hand eczema in a population-based twin sample. British Journal of Dermatology, 149(6):1214–1220. Burns, A., Greene, B. R., McGrath, M. J., O’Shea, T. J., Kuris, B., Ayer, S. M., Stroiescu, F., and Cionca, V. (2010). SHIMMER–a wireless sensor platform for noninvasive biomedical research. IEEE Journal of Sensors, 10(9):1527–1534. Call, R., Smith, T., Morris, E., Chapman, M., and Platts-Mills, T. (1992). Risk factors for asthma in inner city children. The Journal of Pediatrics, 121(6):862–866. Carney, R., Blumenthal, J., Stein, P., Watkins, L., Catellier, D., Berkman, L., and Freedland, K. (2001). Depression, heart rate variability, and acute myocardial infarction. Circulation, 104(17):2024–2028. Castro, W., Schilgen, M., Meyer, S., Weber, M., Peuker, C., and Wörtler, K. (1997). Do” whiplash injuries” occur in low-speed rear impacts? European Spine Journal, 6(6):366. 220 Niall Twomey Section REFERENCES Catal, C. and Diri, B. (2009). Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Information Sciences, 179(8):1040–1058. Chang, K., Monahan, K., Griffin, M., Lake, D., and Moorman, J. (2001). Comparison and clinical application of frequency domain methods in analysis of neonatal heart rate time series. Annals of Biomedical Engineering, 29(9):764–774. Chatagnon, M. and Busso, T. (2006). Modelling of aerobic and anaerobic energy production during exhaustive exercise on a cycle ergometer. European Journal of Applied Physiology, 97(6):755–760. Chen, K. and Bassett Jr, D. (2005). The technology of accelerometry-based activity monitors: current and future. Medicine and Science in Sports and Exercise, 37(11):490. Chen, K. and Sun, M. (1997). Improving energy expenditure estimation by using a triaxial accelerometer. Journal of Applied Physiology, 83(6):2112–2122. Chen, S., Chen, H., and Chan, H. (2006). A real-time QRS detection method based on moving-averaging incorporating with wavelet denoising. Computer Methods and Programs in Biomedicine, 82(3):187–195. Cherkassky, V. and Mulier, F. (2007). Learning from data: concepts, theory, and methods. Wiley-IEEE Press. Chon, K. H., Dash, S., and Ju, K. (2009). Estimation of respiratory rate from photoplethysmogram data using time–frequency spectral estimation. IEEE Transactions on Biomedical Engineering, 56(8):2054–2063. Churchill, W. (2008). The Magnificent Century of Cardiothoracic Surgery. History of Medicine, 4(3):187–191. Clarke, J., Shelton, J., Venning, G., Hamer, J., and Taylor, S. (1976). The rhythm of the normal human heart. The Lancet, 308(7984):508–512. 221 Niall Twomey Chapter B: Clifford, G. and Tarassenko, L. (2005). Quantifying errors in spectral estimates of HRV due to beat replacement and resampling. IEEE Transactions on Biomedical Engineering, 52(4):630–638. Clifford, G. D., Azuaje, F., McSharry, P., et al. (2006). Advanced methods and tools for ECG data analysis. Artech house London. Cogdell, J. W. and Piatetski-Shapiro, I. (1990). The arithmetic and spectral analysis of Poincaré series. Academic Press Boston, MA. Cole, C. R., Foody, J., Blackstone, E. H., Lauer, M. S., et al. (2000). Heart rate recovery after submaximal exercise testing as a predictor of mortality in a cardiovascularly healthy cohort. Annals of Internal Medicine, 132(7):552–555. Cooley, J. W. and Tukey, J. W. (1965). An algorithm for the machine calculation of complex Fourier series. Mathematics of computation, 19(90):297–301. Cosmet (2013). CPET Homepage. [Online; accessed March-2013] — http://www.cosmed.it. Cox, L., Williams, B., Sicherer, S., Oppenheimer, J., Sher, L., Hamilton, R., and Golden, D. (2008). Pearls and pitfalls of allergy diagnostic testing: report from the American College of Allergy, Asthma and Immunology/American Academy of Allergy, Asthma and Immunology Specific IgE Test Task Force. Annals of Allergy, Asthma and Immunology, 101(6):580–592. Craven, D., Glavin, M., Kilmartin, L., and Jones, E. (2012). Potential for Extended Battery Life in Mobile Healthcare with Bluetooth Low Energy and Signal Compression. Irish Signals and Systems Conference, 42:151–165. Crouter, S. E., Clowers, K. G., and Bassett, D. R. (2006). A novel method for using accelerometer data to predict energy expenditure. Journal of Applied Physiology, 100(4):1324–1331. Culhane, K., OConnor, M., Lyons, D., and GM., L. (2005). Accelerometers in rehabilitation medicine for older adults. Age and Ageing, 34(6):556–560. 222 Niall Twomey Section REFERENCES de Carvalho, J., da Rocha, A., de Oliveira Nascimento, F., Neto, J., and Junqueira Jr, L. (2002). Development of a Matlab software for analysis of heart rate variability. In Signal Processing, pages 1488–1491. De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253–1278. de Oliveira, F. and Cortez, P. (2004). A QRS detection based on hilbert transform and wavelet bases. In Machine Learning for Signal Processing, 2004, pages 481–489. Dekker, J. M., Crow, R. S., Folsom, A. R., Hannan, P. J., Liao, D., Swenne, C. A., and Schouten, E. G. (2000). Low heart rate variability in a 2-minute rhythm strip predicts risk of coronary heart disease and mortality from several causes: the ARIC Study. Circulation, 102(11):1239–1244. Dekker, J. M., Schouten, E. G., Klootwijk, P., Pool, J., Swenne, C. A., and Kromhout, D. (1997). Heart Rate Variability from Short Electrocardiographic Recordings Predicts Mortality from All Causes in Middle-aged and Elderly Men The Zutphen Study. American Journal of Epidemiology, 145(10):899–908. Deza, M. and Deza, E. (2009). Encyclopedia of distances. Springer. Di Virgilio, V., Francaiancia, C., Lino, S., and Cerutti, S. (1995). ECG fiducial points detection through wavelet transform. In Engineering in Medicine and Biology Society, pages 1051–1052. Dinh, H., Kumar, D., Pah, N., and Burton, P. (2001). Wavelets for QRS detection. In Engineering in Medicine and Biology Society, pages 1883–1887. Dobbs, S., Schmitt, N., and Ozemek, H. (1984). QRS detection by template matching using real-time correlation on a microcomputer. Journal of Clinical Engineering, 9(3):197–212. Dorland, W. A. N. (1901). The American illustrated medical dictionary: a new and completed dictionary of the terms used in medicine, surgery, dentistry, pharmacy, chemistry, and the kindred branches with their pronunciation, derivation, and definition. Saunders. 223 Niall Twomey Chapter B: Doyle, O., Temko, A., Marnane, W., Lightbody, G., and Boylan, G. (2010). Heart rate based automatic seizure detection in the newborn. Medical Engineering and Physics, 32(8):829– 839. Dublin Institute of Technology (2013). Healthy, annotated ECG trace. [Online; accessed March-2013] — http://eleceng.dit.ie. Duda, R., Hart, P., and Stork, D. (1995). Pattern Classification and Scene Analysis 2nd ed. springer New York. DunnGalvin, A., Cullinane, C., Daly, D., Flokstra-de Blok, B., Dubois, A., and Hourihane, J. (2010). Longitudinal validity and responsiveness of the Food Allergy Quality of Life Questionnaire–Parent Form in children 0–12 years following positive and negative food challenges. Clinical and Experimental Allergy, 40(3):476–485. Ebden, M. (2002). A Comparison of HRV Techniques: The Lomb Periodogram versus The Smoothed Pseudo Wigner-Ville Distribution. A report submitted to Prof. Lionel Tarassenko, 23(1):325–364. Elliot, S. (2010). A Strategy When Times Are Tough. New York Times. Falkner, B., Onesti, G., Angelakos, E., Fernandes, M., and Langman, C. (1979). Cardiovascular response to mental stress in normal adolescents with hypertensive parents. Hemodynamics and mental stress in adolescents. Hypertension, 1(1):23–30. Faundez-Zanuy, M. and Monte-Moreno, E. (2005). State-of-the-art in speaker recognition. IEEE Aerospace and Electronics Systems Magazine, 20(5):7–12. Ferrannini, E. (1988). The theoretical bases of indirect calorimetry: a review. Metabolism, 37(3):287–301. Figueiredo, M. and Jain, A. (2000). Unsupervised selection and estimation of finite mixture models. In Pattern Recognition, pages 87–90. Fine, S., Navratil, J., and Gopinath, R. A. (2001). A hybrid GMM/SVM approach to speaker identification. In Acoustics, Speech, and Signal Processing, pages 417–420. 224 Niall Twomey Section REFERENCES Flach, P. (2012). Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press. Flannery, B. P., Press, W. H., Teukolsky, S. A., and Vetterling, W. (1992). Numerical recipes in C. Press Syndicate of the University of Cambridge, New York. Fraden, J. and Neuman, M. (1980). QRS wave detection. Medical and Biological Engineering and Computing, 18(2):125–132. Freedson, P. S., Melanson, E., Sirard, J., et al. (1998). Calibration of the Computer Science and Applications, Inc. accelerometer. Medicine and Science in Sports and Exercise, 30(5):777–781. Friesen, G., Jannett, T., Jadallah, M., Yates, S., Quint, S., and Nagle, H. (1990). A comparison of the noise sensitivity of nine QRS detection algorithms. IEEE Transactions on Biomedical Engineering, 37(1):85–98. Frigo, M. and Johnson, S. (1998). FFTW: An adaptive software architecture for the FFT. In Acoustics, Speech and Signal Processing, pages 1381–1384. Fuchs, R. M., Achuff, S., Grunwald, L., Yin, F., and Griffith, L. (1982). Electrocardiographic localization of coronary artery narrowings: studies during myocardial ischemia and infarction in patients with one-vessel disease. Circulation, 66(6):1168–1176. Furlow, B. (2009). Contrast-enhanced ultrasound. Radiologic Technology, 80(6):547–561. Galvin, G. J., Davis, T. J., and MacDonald, N. C. (2000). Micromechanical accelerometer for automotive applications. US Patent 6,149,190. Ganong, W. F. and Ganong, W. (2005). Review of medical physiology. McGraw-Hill Medical ˆ eNew York New York. Gardner, A., Krieger, A., Vachtsevanos, G., and Litt, B. (2006). One-class novelty detection for seizure analysis from intracranial EEG. The Journal of Machine Learning Research, 7(1):1025–1044. Gardner, A. B. (2004). A novelty detection approach to seizure analysis from intracranial EEG. PhD thesis, Georgia Institute of Technology. 225 Niall Twomey Chapter B: Giddens, D. and Kitney, R. (1985). Neonatal heart rate variability and its relation to respiration. Journal of Theoretical Biology, 113(4):759–780. Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P., Mark, R., Mietus, J., Moody, G., Peng, C., and Stanley, H. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):215–220. Gombarska, D. and Horicka, M. (2012). Evaluation of heart rate variability in time and Frequency domain. In ELEKTRO, pages 415 , 418. Google Inc. (2013). Google Scholar. [Online; accessed March-2013] — https://scholar.google.com. Gutiérrez, R., Twomey, N., Marnane, W. P., and Campos-Garcia, J. (2013). Real-Time Allergy Detection. In IEEE Society of Intelligent Signal Processing. Gyaw, T. and Ray, S. (1994). The wavelet transform as a tool for recognition of biosignals. Biomedical Sciences Instrumentation, 30:63–68. Hamer, M. and Steptoe, A. (2007). Association between physical fitness, parasympathetic control, and proinflammatory responses to mental stress. Psychosomatic Medicine, 69(7):660–666. Hamilton, P. (2002). Open source ECG analysis. Computers in Cardiology, 10:101–104. Hamilton, P. and Tompkins, W. (1988). Adaptive matched filtering for QRS detection. In Engineering in Medicine and Biology Society, pages 147–148. Hartigan, J. and Wong, M. (1979). Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, 28(1):100–108. Hayton, P., Scholkopf, B., Tarassenko, L., and Anuzis, P. (2001). Support vector novelty detection applied to jet engine vibration spectra. Processing Systems, 13(1):946–952. 226 Advances in Neural Information Niall Twomey Section REFERENCES Healey, J. A. and Picard, R. W. (2005). Detecting stress during real-world driving tasks using physiological sensors. IEEE Transactions on Intelligent Transportation Systems, 6(2):156–166. Hedman, A., Hartikainen, J., Tahvanainen, K., and Hakumäki, M. (2008). The high frequency component of heart rate variability reflects cardiac parasympathetic modulation rather than parasympathetic. Acta Physiologica Scandinavica, 155(3):267– 273. Hemokinetics, I. (1993). Tritrac-R3D Research Ergometer Operations. Journal of Applied Physiology, 7:149–159. Hendelman, D., Miller, K., Baggett, C., Debold, E., and Freedson, P. (2000). Validity of accelerometry for the assessment of moderate intensity physical activity in the field. Medicine and Science in Sports and Exercise, 32(9):442–449. Hernando, D., Bailón, R., Laguna, P., and Sornmo, L. (2011). Heart rate variability during hemodialysis and its relation to hypotension. In Computing in Cardiology, pages 189– 192. Hester, T., Hughes, R., Sherrill, D. M., Knorr, B., Akay, M., Stein, J., and Bonato, P. (2006). Using wearable sensors to measure motor abilities following stroke. In Wearable and Implantable Body Sensor Networks, pages 4–7. Hilbert, D. (1912). Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen. Berlin. Horgan, D. and Murphy, C. C. (2010). Voting rule optimisation for double threshold energy detector-based cognitive radio networks. In Signal Processing and Communication Systems, pages 1–8. Huikuri, H. V., Castellanos, A., and Myerburg, R. J. (2001). Sudden death due to cardiac arrhythmias. New England Journal of Medicine, 345(20):1473–1482. 227 Niall Twomey Chapter B: Jan, K., Nagel, J., Hurwitz, B., and Schneiderman, N. (Oct-3 Nov). Decomposition of heart rate variability by adaptive filtering for estimation of cardiac vagal tone. In Engineering in Medicine and Biology Society, pages 660 – 661. Järvinen, K. M., Amalanayagam, S., Shreffler, W. G., Noone, S., Sicherer, S. H., Sampson, H. A., and Nowak-Wegrzyn, A. (2009). Epinephrine treatment is infrequent and biphasic reactions are rare in food-induced reactions during oral food challenges in children. Journal of Allergy and Clinical Immunology, 124(6):1267–1272. Jeppesen, J., Beniczky, S., Fuglsang-Frederiksen, A., Sidenius, P., and Jasemian, Y. (2010). Detection of epileptic-seizures by means of power spectrum analysis of heart rate variability: A pilot study. Technology and Health Care, 18(6):417–426. Jones, A. M. and Doust, J. H. (1996). A 1% treadmill grade most accurately reflects the energetic cost of outdoor running. Journal of Sports Sciences, 14(4):321–327. Jovanov, E., Frith, K., Anderson, F., Milosevic, M., and Shrove, M. T. (2011). Real-time monitoring of occupational stress of nurses. In Engineering in Medicine and Biology Society, pages 3640–3643. Kale, A., Cuntoor, N., Yegnanarayana, B., Rajagopalan, A., and Chellappa, R. (2003). Gait analysis for human identification. In Audio-and Video-Based Biometric Person Authentication, pages 706–714. Kamath, M. V., Fallen, E., et al. (1993). Power spectral analysis of heart rate variability: a noninvasive signature of cardiac autonomic function. Critical reviews in Biomedical Engineering, 21(3):245–311. Katznelson, Y. (2004). An introduction to harmonic analysis. Cambridge University Press. Kemp, A. H., Quintana, D. S., Gray, M. A., Felmingham, K. L., Brown, K., and Gatt, J. M. (2010). Impact of depression and antidepressant treatment on heart rate variability: a review and meta-analysis. Biological Psychiatry, 67(11):1067–1074. 228 Niall Twomey Section REFERENCES Kilpelinen, M., Terho, E., Helenius, H., and Koskenvuo, M. (2000). Farm environment in childhood prevents the development of allergies. Clinical and Experimental allergy, 30(2):201–208. Kleiger, R., Stein, P., and Bigger Jr, J. (2005). Heart rate variability: measurement and clinical utility. Annals of Noninvasive Electrocardiology, 10(1):88–101. Kliegman, R. et al. (2007). Nelson textbook of pediatrics. Saunders Elsevier Philadelphia. Kohavi, R. et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Artificial Intelligence, pages 1137–1145. Kohler, B., Hennig, C., and Orglmeister, R. (2002). The principles of software QRS detection. IEEE Engineering in Medicine and Biology Magazine, 21(1):42–57. Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 23(1):89–109. Kostis, J., Moreyra, A., Amendo, M., Di Pietro, J., Cosgrove, N., and Kuo, P. (1982). The effect of age on heart rate in subjects free of heart disease. Studies by ambulatory electrocardiography and maximal exercise stress test. Circulation, 65(1):141–145. Kwapisz, J. R., Weiss, G. M., and Moore, S. A. (2010). Cell phone-based biometric identification. In Biometrics: Theory Applications and Systems, pages 1–7. Kyrkos, A., Giakoumakis, E., and Carayannis, G. (1987). Time recursive prediction techniques on QRS detection problem. In Engineering in Medicine and Biology Society, pages 13–16. Laguna, P., Moody, G., and Mark, R. (1998). Power spectral density of unevenly sampled data by least-square analysis: performance and application to heart rate signals. IEEE Transactions on Biomedical Engineering, 45(6):698–715. Lanningham-Foster, L., Foster, R. C., McCrady, S. K., Jensen, T. B., Mitre, N., and Levine, J. A. (2009). Activity promoting games and increased energy expenditure. The Journal of Pediatrics, 154(6):819–823. 229 Niall Twomey Chapter B: Leon-Garcia, A. and Leon-Garcia, A. (2009). Probability, statistics, and random processes for electrical engineering. Pearson/Prentice Hall. Lewis, H. and Papadimitriou, C. (1997). Elements of the Theory of Computation. Prentice Hall PTR. Licht, C. M., de Geus, E. J., van Dyck, R., and Penninx, B. W. (2009). Association between anxiety disorders and heart rate variability in The Netherlands Study of Depression and Anxiety (NESDA). Psychosomatic Medicine, 65(12):508–518. Licht, C. M., de Geus, E. J., Zitman, F. G., Hoogendijk, W. J., van Dyck, R., and Penninx, B. W. (2008). Association between major depressive disorder and heart rate variability in the Netherlands Study of Depression and Anxiety (NESDA). Archives of General Psychiatry, 65(12):508–518. Lindh, W., Pooler, M., Tamparo, C., and Dahl, B. M. (2009). Delmar’s comprehensive medical assisting: administrative and clinical competencies. Cengage Learning. Lobstein, T., Baur, L., and Uauy, R. (2004). Obesity in children and young people: a crisis in public health. Obesity, 5(1):4–85. Lomb, N. (1976). Least-squares frequency analysis of unequally spaced data. Astrophysics and space science, 39(2):447–462. Lombardi, F., Mäkikallio, T. H., Myerburg, R. J., and Huikuri, H. V. (2001). Sudden cardiac death: role of heart rate variability to identify patients at risk. Cardiovascular Research, 50(2):210–217. Mannini, A. and Sabatini, A. M. (2010). Machine learning methods for classifying human physical activity from on-body accelerometers. Sensors, 10(2):1154–1175. Mark, R., Schluter, P., Moody, G., Devlin, P., and Chernoff, D. (1982). An annotated ECG database for evaluating arrhythmia detectors. IEEE Transactions on Biomedical Engineering, 42:205–210. Markov, K. and Nakamura, S. (2008). Improved novelty detection for online GMM based speaker diarization. In Interspeech, pages 363–366. 230 Niall Twomey Section REFERENCES Martegani, A., Meairs, S., Nolsøe, C., Piscaglia, F., Ricci, P., Seidel, G., Skjoldbye, B., Solbiati, L., Thorelius, L., Tranquart, F., et al. (2008). Guidelines and Good Clinical Practice Recommenda-tions for Contrast Enhanced Ultrasound (CEUS)–Update 2008. Ultraschall in der Medizin, 39(2):187–210. Matasar, M. and Neugut, A. (2003). Epidemiology of anaphylaxis in the United States. Current Allergy and Asthma Reports, 3(1):30–35. McSharry, P. and Cifford, G. (2004). Open-source software for generating electrocardiogram signals. Medical Engineering and Physics, 1:2–10. Mietus, J., Peng, C., Ivanov, P., and Goldberger, A. (2000). Detection of obstructive sleep apnea from cardiac interbeat interval time series. In Computers in Cardiology, pages 753–756. Miles, S., Fordham, R., Mills, C., Valovirta, E., and Mugford, M. (2005). A framework for measuring costs to society of IgE-mediated food allergy. European Journal of Allergy and Clinical Immunology, 60(8):996–1003. Minnen, D., Starner, T., Ward, J., Lukowicz, P., and Troster, G. (2005). Recognizing and discovering human actions from on-body sensor data. In Multimedia, pages 1545–1548. Monda, M., Viggiano, A., Vicidomini, C., Viggiano, A., Iannaccone, T., Tafuri, D., and De Luca, B. (2009). Expresso coffee increases parasympathetic activity in young, healthy people. Nutritional Neuroscience, 12(1):43–48. Moody, G. (1993). Spectral analysis of heart rate without resampling. In Computers in Cardiology, pages 715–718. Moody, G. and Mark, R. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50. Mourot, L., Bouhaddi, M., Perrey, S., Rouillon, J.-D., and Regnard, J. (2004). Quantitative Poincare plot analysis of heart rate variability: effect of endurance training. European Journal of Applied Physiology, 91(1):79–87. 231 Niall Twomey Chapter B: Nolan, J., Batin, P. D., Andrews, R., Lindsay, S. J., Brooksby, P., Mullen, M., Baig, W., Flapan, A. D., Cowley, A., Prescott, R. J., et al. (1998). Prospective study of heart rate variability and mortality in chronic heart failure: results of the United Kingdom heart failure evaluation and assessment of risk trial (UK-heart). Circulation, 98(15):1510– 1516. Nunan, D., Sandercock, G., and Brodie, D. (2010). A Quantitative Systematic Review of Normal Values for Short-Term Heart Rate Variability in Healthy Adults. Pacing and Clinical Electrophysiology, 33(11):1407–1417. Nygårds, M. and Sörnmo, L. (1983). Delineation of the QRS complex using the envelope of the ECG. Medical and Biological Engineering and Computing, 21(5):538–547. O’Brien, I., O’Hare, P., and Corrall, R. (1986). Heart rate variability in healthy subjects: effect of age and the derivation of normal ranges for tests of autonomic function. British Heart Journal, 55(4):348–354. Obrist, P. A., Gaebelein, C. J., Teller, E. S., Langer, A. W., Grignolo, A., Light, K. C., and McCubbin, J. A. (2007). The relationship among heart rate, carotid dP/dt, and blood pressure in humans as a function of the type of stress. Psychophysiology, 15(2):102–115. Oh, C., Sohn, H., and Bae, I. (2009). Statistical novelty detection within the Yeongjong suspension bridge under environmental and operational variations. Smart Materials and Structures, 18(12):125–132. Okada, M. (1979). A Digital Filter for the ORS Complex Detection. IEEE Transactions on Biomedical Engineering, 42(12):700–703. Olmos, S., MillAn, M., Garcia, J., and Laguna, P. (1996). ECG data compression with the Karhunen-Loeve transform. In Computers in Cardiology, pages 253–256. Omenaas, E., Bakke, P., Elsayed, S., Hanoa, R., and Gulsvik, A. (1994). Total and specific serum IgE levels in adults: relationship to sex, age and environmental factors. Clinical and Experimental Allergy, 24(6):530–539. 232 Niall Twomey Section REFERENCES Oude Elberink, J., de Monchy, J., van der Heide, S., Guyatt, G., and Dubois, A. (2002). Venom immunotherapy improves health-related quality of life in patients allergic to yellow jacket venom. Journal of Allergy and Clinical Immunology, 110(1):174–182. Oude Luttikhuis, H., Baur, L., Jansen, H., Shrewsbury, V. A., OMalley, C., Stolk, R. P., and Summerbell, C. D. (2009). Interventions for treating obesity in children. The Cochrane Database of Systematic Reviews, 1(1):1. Parvin, B., Yang, Q., Fontenay, G., and Barcellos-Hoff, M. (2002). BioSig: an imaging bioinformatic system for studying phenomics. Computer, 35(7):65–71. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559– 572. Pereira, B., Venter, C., Grundy, J., Clayton, C., Arshad, S., and Dean, T. (2005). Prevalence of sensitization to food allergens, reported adverse reaction to foods, food avoidance, and food hypersensitivity among teenagers. Journal of Allergy and Clinical Immunology, 116(4):884–892. Phua, C., Alahakoon, D., and Lee, V. (2004). Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explorations Newsletter, 6(1):50–59. Piskorski, J. and Guzik, P. (2005). Filtering Poincaré plots. Computational Methods in Science and Technology, 91(2):201–208. Pober, D. M., Staudenmayer, J., Raphael, C., Freedson, P. S., et al. (2006). Development of novel techniques to classify physical activity mode using accelerometers. Medicine and Science in Sports and Exercise, 38(9):16–26. Primeau, M., Kagan, R., Joseph, L., Lim, H., Dufresne, C., Duffy, C., Prhcal, D., and Clarke, A. (2000). The psychological burden of peanut allergy as perceived by adults with peanut allergy and the parents of peanut-allergic children. Clinical and Experimental Allergy, 30(8):1135–1143. 233 Niall Twomey Chapter B: Radon, K., Ehrenstein, V., Praml, G., and Nowak, D. (2004). Childhood visits to animal buildings and atopic diseases in adulthood: An age-dependent relationship. American Journal of Industrial Medicine, 46(4):349–356. Rajendra Acharya, U., Paul Joseph, K., Kannathal, N., Lim, C., and Suri, J. (2006). Heart rate variability: a review. Medical and Biological Engineering and Computing, 44(12):1031–1051. Rajendra Acharya, U., Subbanna Bhat, P., Iyengar, S., Rao, A., and Dua, S. (2003). Classification of heart rate data using artificial neural network and fuzzy equivalence relation. Pattern Recognition, 36(1):61–68. Rautava, S., Kalliomaki, M., and Isolauri, E. (2002). Probiotics during pregnancy and breast-feeding might confer immunomodulatory protection against atopic disease in the infant. Journal of Allergy and Clinical Immunolog, 109(1):119–121. Ravi, N., Dandekar, N., Mysore, P., and Littman, M. L. (2005). Activity recognition from accelerometer data. In Proceedings of the national conference on artificial intelligence, pages 1541–1546. Rawenwaaij-Arts, C., Kallee, L., Hopman, J., et al. (1993). Heart rate variability. Standards of measurement, physiological interpretation, and clinical use. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. European Heart Journal, 52(17):1353–1365. Reynolds, D. and Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1):72–83. Riese, H., Van Doornen, L. J., Houtman, I. L., and De Geus, E. J. (2004). Job strain in relation to ambulatory blood pressure, heart rate, and heart rate variability among female nurses. Scandinavian Journal of Work, Environment and Health, 61(3):387–396. Roberts, G., Patel, N., Levi-Schaffer, F., Habibi, P., and Lack, G. (2003). Food allergy as a risk factor for life-threatening asthma in childhood: a case-controlled study. Journal of Allergy and Clinical Immunology, 112(1):168–174. 234 Niall Twomey Section REFERENCES Roberts, S. (2000). Extreme value statistics for novelty detection in biomedical data processing. IEE Proceedings Science, Measurement and Technology, 147(6):363–367. Roberts, S. J. (1999). Novelty detection using extreme value statistics. IEE Proceedings Vision, Image and Signal Processing, 146(3):124–129. Robinson, B. F., Epstein, S. E., Beiser, G. D., and BRAUNWALD, E. (1966). Control of Heart Rate by the Autonomic Nervous System Studies in Man on the Interrelation Between Baroreceptor Mechanisms and Exercise. Circulation, 19(2):400–411. Rousseeuw, P. J. and Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223. Sabiston Jr, D. C. (1981). Heart disease: a textbook of cardiovascular medicine. Annals of Surgery, 194(1):116. Sampson, H. (1999). Food allergy. Part 2: diagnosis and management. Journal of Allergy and Clinical Immunology, 103(6):981–989. Sampson, H., Mendelson, L., and Rosen, J. (1992). Fatal and near-fatal anaphylactic reactions to food in children and adolescents. The New England Journal of Medicine, 327(6):380–384. Sampson, H. A., Muñoz-Furlong, A., Campbell, R. L., Adkinson Jr, N. F., Allan Bock, S., Branum, A., Brown, S. G., Camargo Jr, C. A., Cydulka, R., Galli, S. J., et al. (2006). Second symposium on the definition and management of anaphylaxis: summary reportSecond National Institute of Allergy and Infectious Disease/Food Allergy and Anaphylaxis Network symposium. Annals of emergency medicine, 117(2):391–397. Sandercock, G., Bromley, P. D., Brodie, D. A., et al. (2005). Effects of exercise on heart rate variability: inferences from meta-analysis. Medicine and Science in Sports and Exercise, 37(3):433–439. Saramäki, T. and Bregovic, R. (2002). Multirate systems and filter banks. Multirate Systems: Design and Applications, 2:27–85. 235 Niall Twomey Chapter B: Schechtman, V., Raetz, S., Harper, R., Garfinkel, A., Wilson, A., Southall, D., and Harper, R. (1992). Dynamic analysis of cardiac RR intervals in normal infants and in infants who subsequently succumbed to the sudden infant death syndrome. Pediatric research, 31(6):606–612. Schlindwein, F. S., Yi, A., Edwards, T., and Bien, I. (2006). Optimal frequency and bandwidth for FIR bandpass filter for QRS detection. In Advances in Medical, Signal and Information Processing, pages 1–4. Schoeller, D. A. et al. (1988). Measurement of energy expenditure in free-living humans by using doubly labeled water. The Journal of Nutrition, 118(11):1278–1289. Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., and Williamson, R. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471. Schölkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., and Platt, J. (2000). Support vector method for novelty detection. Advances in Neural Information Processing Systems, 42(1):582588. Seccareccia, F., Pannozzo, F., Dima, F., Minoprio, A., Menditto, A., Lo Noce, C., and Giampaoli, S. (2001). Heart rate as a predictor of mortality: the MATISS project. American Journal of Public Health, 91(8):1258–1263. Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, 53(3):683–690. SHIMMER research (2010). SHIMMER research. [Online; accessed March 2013] — http://www.shimmer-research.com. Sicherer, S., Noone, S., and Muñoz-Furlong, A. (2001). The impact of childhood food allergy on quality of life. Annals of Allergy, Asthma and Immunology, 87(6):461–464. Sicherer, S. and Sampson, H. (2006). 9. Food allergy. Journal of Allergy and Clinical Immunology, 117(2):470–475. Sicherer, S., Sampson, H., et al. (2006). 9. Food allergy. The Journal of Allergy and Clinical Immunology, 117(2):470–475. 236 Niall Twomey Section REFERENCES Soman, A., Vaidyanathan, P., and Nguyen, T. (1993). Linear phase paraunitary filter banks: Theory, factorizations and designs. IEEE Transactions on Signal Processing, 41(12):3480– 3496. Sörnmo, L. and Laguna, P. (2005). Bioelectrical signal processing in cardiac and neurological applications. Academic Press. Sörnmo, L. and Laguna, P. (2006). Electrocardiogram (ECG) signal processing. Wiley Encyclopedia of Biomedical Engineering. Srikanth, T., Napper, S., and Gu, H. (1998). Assessment of resampling methodologies of electrocardiogram signals for feature extraction, statistical and neural networks applications. In Computers in Cardiology, pages 537–540. Staudenmayer, J., Pober, D., Crouter, S., Bassett, D., and Freedson, P. (2009). An artificial neural network to estimate physical activity energy expenditure and identify physical activity type from an accelerometer. Journal of Applied Physiology, 107(4):1300–1307. Stein, P., Kleiger, R., et al. (1999). Insights from the study of heart rate variability. Annual Review of Medicine, 50(1):249–261. Strang, G. and Nguyen, T. (1996). Wavelets and filter banks. Cambridge University Press. Swartz, A. M., Strath, S. J., Bassett Jr, D. R., O’Brien, W. L., King, G. A., Ainsworth, B. E., et al. (2000). Estimation of energy expenditure using CSA accelerometers at hip and wrist sites. Medicine and Science in Sports and Exercise, 32(9):450–456. Tanaka, H., Monahan, K., and Seals, D. (2001). Age-predicted maximal heart rate revisited. Journal of the American College of Cardiology, 37(1):153–156. Temko, A., Boylan, G., Marnane, W., and Lightbody, G. (2010). Speech recognition features for EEG signal description in detection of neonatal seizures. In Engineering in Medicine and Biology Society, pages 3281–3284. Temko, A., Nadeu, C., Marnane, W., Boylan, G. B., and Lightbody, G. (2011a). EEG signal description with spectral-envelope-based speech recognition features for detection 237 Niall Twomey of neonatal seizures. Chapter B: IEEE Transactions on Information Technology in Biomedicine, 15(6):839–847. Temko, A., Thomas, E., Boylan, G., Marnane, W., and Lightbody, G. (2009). An SVMbased system and its performance for detection of seizures in neonates. In Engineering in Medicine and Biology Society, pages 2643–2646. Temko, A., Thomas, E., Marnane, W., Lightbody, G., and Boylan, G. (2011b). Performance assessment for EEG-based neonatal seizure detectors. Clinical Neurophysiology, 122(3):474–482. SHIMMER-research (2010). CE Certification. [Online; accessed March 2013] — http://www.shimmer-research.com. Thakor, N. and Zhu, Y. (1991). Applications of adaptive filtering to ECG analysis: noise cancellation and arrhythmia detection. IEEE Transactions on Biomedical Engineering, 38(8):785–794. theonlineallergist.com (2013). Epinephrine autoinjector. [Online; accessed 13-March2012] — http://www.theonlineallergist.com. Thomas, E. (2010). A machine learning framework for neonatal seizure detection. PhD thesis, University College Cork. Thomas, E., Temko, A., Lightbody, G., Marnane, W., and Boylan, G. (2009). A gaussian mixture model based statistical classification system for neonatal seizure detection. In Machine Learning for Signal Processing, pages 1–6. Thomas, E., Temko, A., Marnane, W., Boylan, G., and Lightbody, G. (2013). Discriminative and generative classification techniques applied to automated neonatal seizure detection. IEEE Journal of Biomedical and Health Informatics, 31(7):1047. Tsuji, H., Larson, M. G., Venditti, F. J., Manders, E. S., Evans, J. C., Feldman, C. L., and Levy, D. (1996). Impact of reduced heart rate variability on risk for cardiac events: the Framingham Heart Study. Circulation, 94(11):2850–2855. 238 Niall Twomey Section REFERENCES Tulppo, M. and Huikuri, H. (2004). Origin and significance of heart rate variability. Journal of the American College of Cardiology, 43(12):2278–2280. Tulppo, M. P., Makikallio, T., Takala, T., Seppanen, T., and Huikuri, H. (1996). Quantitative beat-to-beat analysis of heart rate dynamics during exercise. American Journal of Physiology-Heart and Circulatory Physiology, 271(1):244–252. Twomey, N., Faul, S., Daly, D., Hourihane, J., and Marnane, W. (2010a). Classification of biophysical changes during food allergy challenges. In Applied Sciences in Biomedical and Communication Technologies, pages 1–5. Twomey, N., Faul, S., and Marnane, W. P. (2010b). Comparison of accelerometer-based energy expenditure estimation algorithms. In Pervasive Computing Technologies for Healthcare, pages 1–8. Twomey, N., Temko, A., Cullinane, C., Daly, D., Marnane, W. P., and Hourihane, J. O. (2013a). Detection of heart rate variation could improve patient safety and diagnostic yield during oral food challenge. European Academy of Allergology and Clinical Immunology. Twomey, N., Temko, A., Hourihane, J., and Marnane, W. (2011). Allergy detection with statistical modelling of HRV-based non-reaction baseline features. In Applied Sciences in Biomedical and Communication Technologies, pages 134–138. Twomey, N., Temko, A., Hourihane, J. O., and Marnane, W. P. (2013b). Fully automated allergy detection from paediatric ECG. IEEE Transactions on Information Technology in Biomedicine. Twomey, N., Walsh, N., Doyle, O., McGinley, B., Glavin, M., Jones, E., and Marnane, W. (2010c). The effect of lossy ECG compression on QRS and HRV feature extraction. In Engineering in Medicine and Biology Society, pages 634–637. University of Nottingham (2013). Einthoven ECG configuration. [Online; accessed March2013] — http://www.nottingham.ac.uk. 239 Niall Twomey Chapter B: Uswatte, G., Foo, W. L., Olmstead, H., Lopez, K., Holand, A., Simms, L. B., et al. (2005). Ambulatory monitoring of arm movement using accelerometry: an objective measure of upper-extremity rehabilitation in persons with chronic stroke. Archives of Physical Medicine and Rehabilitation, 86(7):1498–1501. Uswatte, G., Giuliani, C., Winstein, C., Zeringue, A., Hobbs, L., and Wolf, S. L. (2006). Validity of accelerometry for monitoring real-world arm activity in patients with subacute stroke: evidence from the extremity constraint-induced therapy evaluation trial. Archives of Physical Medicine and Rehabilitation, 87(10):1340–1345. van Ravenswaaij-Arts, C., Kollee, L., Hopman, J., Stoelinga, G., and van Geijn, H. (1993). Heart rate variability. Annals of Internal Medicine, 81(6):1803–1810. Vapnik, V. N. and Kotz, S. (1982). Estimation of dependences based on empirical data. Springer-Verlag New York, 89(12):5675–5679. Viinanen, A., Munhbayarlah, S., Zevgee, T., Narantsetseg, L., Naidansuren, T., Koskenvuo, M., Helenius, H., and Terho, E. (2007). The protective effect of rural living against atopy in Mongolia. European Journal of Allergy and Clinical Immunology, 62(3):272–280. Vijaya, G., Kumar, V., and Verma, H. (1998). ANN-based QRS-complex analysis of ECG. Journal of Medical Engineering and Technology, 22(4):160–167. Wang, Y. and Lobstein, T. (2006). Worldwide trends in childhood overweight and obesity. International Journal of Pediatric Obesity, 1(1):11–25. Webb, A., Copsey, K., and Cawley, G. (2011). Statistical pattern recognition. Wiley. WHO (2000). Obesity: preventing and managing the global epidemic. World Health Organization Technical Report Series, 70(3):510. Wilson, F. N., Johnston, F. D., et al. (1946). On Einthoven’s triangle, the theory of unipolar electrocardiographic leads, and the interpretation of the precordial electrocardiogram. American Heart Journal, 32(3):277–310. 240 Niall Twomey Section REFERENCES Winter, E., Jones, A., Davidson, R., Bromley, P., and Mercer, T. (2006). Sport and Exercise Physiology Testing Guidelines: Volume I-Sport Testing. The British Association of Sport and Exercise Sciences Guide. Routledge, UK. Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1):37–52. Xue, Q., Hu, Y., and Tompkins, W. (1992). Neural-network-based adaptive matched filtering for QRS detection. IEEE Transactions on Biomedical Engineering, 39(4):317–329. Yamamoto, Y., Hughson, R. L., and Peterson, J. C. (1991). Autonomic control of heart rate during exercise studied by heart rate variability spectral analysis. Journal of Applied Physiology, 71(3):1136–1142. Yanishevsky, Y. and Hourihane, J. O. (2010). Differences in treatment of food challenge induced reactions reflect physicians’ protocols more than reaction severity. The Journal of Allergy and Clinical Immunology, 126(1):182. 241

RELATED PAPERS

RELATED TOPICS

Log In

Digital signal processing and artificial intelligence for the automated classification of food allergy

Digital signal processing and artificial intelligence for the automated classification of food allergy

Digital signal processing and artificial intelligence for the automated classification of food allergy

Digital signal processing and artificial intelligence for the automated classification of food allergy

Digital signal processing and artificial intelligence for the automated classification of food allergy

Related Papers

RELATED PAPERS

RELATED TOPICS