Εργαστήριο Ψηφιακής Επεξεργασίας Σημάτων και Εικόνας
Μόνιμο URI για αυτήν την κοινότηταhttps://dspace.library.tuc.gr/handle/123456789/23
Νέα
42
Περιηγούμαι
Πλοήγηση Εργαστήριο Ψηφιακής Επεξεργασίας Σημάτων και Εικόνας ανά Συγγραφέα "Balas Costas"
Τώρα δείχνει 1 - 2 από 2
- Αποτελέσματα ανά σελίδα
- Επιλογές ταξινόμησης
Δημοσίευση Machine learning methods for genomic signature extraction(Technical University of Crete, 2015) Chlis Nikolaos-Kosmas; Χλης Νικολαος-Κοσμας; Zervakis Michalis; Ζερβακης Μιχαλης; Balas Costas; Μπαλας Κωστας; Mania Aikaterini; Μανια ΑικατερινηThe application of machine learning methodologies for the analysis of DNA microarray data has become a common practice in the field of bioinformatics. DNA microarrays can be used in order to simultaneously measure the expression value of thousands of genes. Given the measurements of gene expression, machine learning methods can be employed in order to identify candidate genes that are related to a biological state or phenotype of interest, such as cancer. These lists of candidate genes are often called “genomic signatures” in literature. The application of machine learning methods for the extraction of genomic signatures is a necessity, since it is practically impossible for field experts to assess the importance of each gene individually by manual inspection due to the large size of the genome, which consists of approximately 25,000 genes. Machine learning methods such as feature subset selection and classification algorithms are popular choices for the extraction of genomic signatures. Univariate feature selection methods filter genes according to difference in their gene expression profiles among samples belonging to different classes of interest, such as control and disease. Since they test each gene individually, univariate methods are computationally efficient and they select genes with high discrimination ability. However, they ignore associations among genes. On the other hand, multivariate methods simultaneously assess groups of genes and select candidate genes based on their predictive performance when used in conjunction with a classifier. As such, they are more efficient at capturing the latent associations among genes and select genes with high predictive capability, at the cost of being computationally expensive. While the applied feature selection and classification methodologies have matured and several state of the art algorithms have been established, the stability of the extracted genomic signatures is often overlooked. As a result, the genomic signatures extracted by many methodologies are unstable under sample variations. That is, the extracted signatures differ significantly under variations of the training data. Since result stability is related to generalization, this instability raises skepticism in the expert community and hinders the validity and clinical application of research findings extracted from such gene expression studies. This thesis deals with the following three aspects of the selection and evaluation of gene signatures, namely stability, predictive capability and statistical significance. First, a framework for the extraction of stable genomic signatures, called Stable Bootstrap Validation (SBV) is introduced. The proposed methodology enforces stability at the validation step. As a result, it can be combined with any classification method, as long as it supports feature selection. Three publicly available gene expression datasets are used in order to test the proposed methodology. First the dimensionality of the datasets is reduced using a filtering method. Then, bootstrap resampling is utilized in order to generate a list of candidate signatures according to the selection frequency of genes across all bootstrap datasets. Then, a stable signature which has maximal predictive performance in terms of accuracy, sensitivity and specificity is extracted and the predictive performance of all candidate signatures is plotted in an elaborate manner for further inspection. Additionally, the application of random sampling methods for countering the negative effects of imbalanced datasets in classification was investigated, since imbalanced datasets are frequently found in DNA microarray studies where control samples are usually scarce. Moreover, a proper statistical framework was implemented that includes two separate statistical tests, in order to assess the statistical significance of the extracted signature in terms of classification accuracy as well as association to the response variable (phenotype/biological state). Finally, the robustness of the methodology is assessed by testing the degree of “agreement” among signatures extracted from independent executions of the methodology.Δημοσίευση Αναγνώριση και κατηγοριοποίηση των κυκλοφορούντων καρκινικών κυττάρων μέσω χρήσης μεταγραφικών δεδομένων(Πολυτεχνείο Κρήτης, 2015) Kotronia Maria; Κοτρωνια Μαρια; Zervakis Michalis; Ζερβακης Μιχαλης; Garofalakis Minos; Γαροφαλακης Μινως; Balas Costas; Μπαλας ΚωσταςDespite recent advances in Microarray technology towards gene expression analysis and extraction of biological significance indices, the successful use of this technology is still elusive for many researchers. This is mainly because there is no standardization yet in methods used for the normalization of systematic noise which occurs during experimental procedures or in methods used for the biological data classification. Also, the analysis of large amount of data produced by such experiments remains a significant challenge. The objective of this diploma thesis was the development of a software platform in Matlab, able of managing and analyzing large amounts of biological data through a single personal computer. The software offers the potential to the user to choose the desired method for data analysis through a variety of normalization and classification algorithms, as well as to develop and integrate custom methods. Data is managed and stored by a specialized database management system. The software platform was tested in a plethora of biological samples with great success. This thesis is intended to serve in biological research as the basis of an innovative and complete software tool for the management and processing of large amounts of biological data in order both to produce reliable and comparable results from different microarray experiments and also to offer potential of choosing between a complete range statistical analysis methods.