Home Help Download Team

GENERAL INFORMATION


Expression profile Submission
Input expression profile of genes:-
Our server provides two options for submitting the query samples gene expression. The first option user can paste their query sample gene expression in the given inbox. The other option user can upload the query samples in csv files.


Dataset Information:-
The dataset used in this study consists of 610 samples of invasive ductal carcinoma from TCGA( The Cancer Genome Atlas). TCGA is an comprehensive and coordinated effort to accelerate our understanding of molecular basis of cancer thorugh next generation sequencing.The Cancer Genome Atlas (TCGA), a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. This joint effort between the National Cancer Institute and the National Human Genome Research Institute began in 2006, bringing together researchers from diverse disciplines and multiple institutions.The datasets from invasive ductal carcinoma was obtained from TCGA consisiting of 20505 gene expression profile of 610 patients.These dataset was used to train and test our method.

Methodology:-
We have used developed two-class machine-learning classification models to differentiate the early and late stages of invasive ductal carcinoma. The predicted model are trained with RNA-seq gene expression profiles representing different IDC stages od 610 patients, obtained from TCGA. Different supervised learning algorithms were trained and evaluated with an enriched model learning, faciliated by different feature selection methods. We also developed machine-learning classifier trained on the same datasets with training sets reduced data corresponding driver gene of cancer.

Query file:-
The query file consists of gene expression profile selected by our two classifier- Driver gene based model or Expression gene based model in csv file format obtained from next generation sequencing experiments.

Evaluation of Performance:-
The accuracy of results commonly measured by the quantity of True Positives (TP), True Negatives (TN),False Positives (FP) and False Negatives (FN). In the prediction system the total prediction accuracy, Matthew's correlation coefficient(MCC), sensitivity and specificity was calculated by following equations.

Sensitivity = TP / (TP+FN),

Specificity = TN / (TN+FP),

Accuracy = TP+TN / TP+TN+FP+FN and

MCC = sqrt [(TP*TN)-(FP*FN)/(TP+FN)*(TP+FP)*(TN+FP)*(TN+FN)]

auROC = The area under the ROC curve is a measure of how well a parameter can distinguish between two diagnostic groups.


duct-BRCA-CSP Bioinformatics Lab, ICGEB, New Delhi  Contact us

This site is best viewed in Internet Explorer 8.0 or later and Mozilla firefox version 3.0 or later.
Designed by Shikha Roy