Open Access

Construction of a 26‑feature gene support vector machine classifier for smoking and non‑smoking lung adenocarcinoma sample classification

  • Authors:
    • Lei Yang
    • Lu Sun
    • Wei Wang
    • Hao Xu
    • Yi Li
    • Jia‑Ying Zhao
    • Da‑Zhong Liu
    • Fei Wang
    • Lin‑You Zhang
  • View Affiliations

  • Published online on: December 7, 2017     https://doi.org/10.3892/mmr.2017.8220
  • Pages: 3005-3013
  • Copyright: © Yang et al. This is an open access article distributed under the terms of Creative Commons Attribution License.

Metrics: Total Views: 0 (Spandidos Publications: | PMC Statistics: )
Total PDF Downloads: 0 (Spandidos Publications: | PMC Statistics: )


Abstract

The present study aimed to identify the feature genes associated with smoking in lung adenocarcinoma (LAC) samples and explore the underlying mechanism. Three gene expression datasets of LAC samples were downloaded from the Gene Expression Omnibus database through pre‑set criteria and the expression data were processed using meta‑analysis. Differentially expressed genes (DEGs) between LAC samples of smokers and non‑smokers were identified using limma package in R. The classification accuracy of selected DEGs were visualized using hierarchical clustering analysis in R language. A protein‑protein interaction (PPI) network was constructed using gene interaction data from the Human Protein Reference Database for the DEGs. Betweenness centrality was calculated for each node in the network and genes with the greatest BC values were utilized for the construction of the support vector machine (SVM) classifier. The dataset GSE43458 was used as the training dataset for the construction and the other datasets (GSE12667 and GSE10072) were used as the validation datasets. The classification accuracy of the classifier was tested using sensitivity, specificity, positive predictive value, negative predictive value and area under curve parameters with the pROC package in R language. The feature genes in the SVM classifier were subjected to pathway enrichment analysis using Fisher's exact test. A total of 347 genes were identified to be differentially expressed between samples of smokers and non‑smokers. The PPI network of DEGs were comprised of 202 nodes and 300 edges. An SVM classifier comprised of 26 feature genes was constructed to distinguish between different LAC samples, with prediction accuracies for the GSE43458, GSE12667 and GSE10072 datasets of 100, 100 and 94.83%, respectively. Furthermore, the 26 feature genes that were significantly enriched in 9 overrepresented biological pathways, including extracellular matrix‑receptor interaction, proteoglycans in cancer, cell adhesion molecules, p53 signaling pathway, microRNAs in cancer and apoptosis, were identified to be smoking‑related genes in LAC. In conclusion, an SVM classifier with a high prediction accuracy for smoking and non‑smoking samples was obtained. The genes in the classifier may likely be the potential feature genes associated with the development of patients with LAC who smoke.
View Figures
View References

Related Articles

Journal Cover

February-2018
Volume 17 Issue 2

Print ISSN: 1791-2997
Online ISSN:1791-3004

Sign up for eToc alerts

Recommend to Library

Copy and paste a formatted citation
x
Spandidos Publications style
Yang L, Sun L, Wang W, Xu H, Li Y, Zhao JY, Liu DZ, Wang F and Zhang LY: Construction of a 26‑feature gene support vector machine classifier for smoking and non‑smoking lung adenocarcinoma sample classification. Mol Med Rep 17: 3005-3013, 2018.
APA
Yang, L., Sun, L., Wang, W., Xu, H., Li, Y., Zhao, J. ... Zhang, L. (2018). Construction of a 26‑feature gene support vector machine classifier for smoking and non‑smoking lung adenocarcinoma sample classification. Molecular Medicine Reports, 17, 3005-3013. https://doi.org/10.3892/mmr.2017.8220
MLA
Yang, L., Sun, L., Wang, W., Xu, H., Li, Y., Zhao, J., Liu, D., Wang, F., Zhang, L."Construction of a 26‑feature gene support vector machine classifier for smoking and non‑smoking lung adenocarcinoma sample classification". Molecular Medicine Reports 17.2 (2018): 3005-3013.
Chicago
Yang, L., Sun, L., Wang, W., Xu, H., Li, Y., Zhao, J., Liu, D., Wang, F., Zhang, L."Construction of a 26‑feature gene support vector machine classifier for smoking and non‑smoking lung adenocarcinoma sample classification". Molecular Medicine Reports 17, no. 2 (2018): 3005-3013. https://doi.org/10.3892/mmr.2017.8220