What is RNAi?
"RNA silencing: General term used to describe different events triggered by small RNAs to induce transcriptional or posttranscriptional gene downregulation in a sequence-homology manner" [1]
What are RNA Silencing Suppressors?
"RNA silencing suppressor (RSS): Factor (usually a protein) with the capacity to interfere with the onset of RNA silencing or with its maintenance. The expression of viral RSSs is the most common strategy that plant viruses use to escape from RNA silencing [1]."
What is Support Vector Machine?
"First pioneered by Vapnik in 1995, SVM is a supervised machine learning method which delivers state-of-the-art performance in recognition and discrimination of cryptic patterns in complex datasets. SVM is used in conjunction with the kernel functions with implicity map input data to high
dimensional non-linear feature space. SVM then constructs a hyperplane separating the positive examples from the negative ones in the new space representations. To avoid overfitting, SVM chooses the Optimal Seperating Hyperplane (OSH) that maximizes the margin i.e. the minimal distance
between the hyperplane are called support vectors" [2].
Which SVM package has been used to implement SVM in this prediction tool?
We have used LIBSVM package [3].
What are the training features used to train SVM in VSupPred's algorithm?
1. Amino Acid Composition (AAC):
"This is an input training vector of 20 dimentions, represents amino acids present in the protein sequence
Fi = Total number of amino acid i / Length of protein
Where i can be any of the amino acids."
2. Dipeptide Composition(DPC):
"This is an input training vector of 400 dimensions, represents the occurrence of 2 amino acids together as dipeptide.
Fj = Total number of dipeptidej / (Length of the protein - 1)
Where j can be any of the 400 dipeptides"
3. Secondary Structure Composition (SSC):
"This is an input training vector of 60 dimensions. Secondary Structure Prediction is carried out using PSI-PRED (version: 2.2.18) [4]. It predicts secondary structure for each residue and provides a confidence score for each three types of secondary structures: helices, beta-sheets and coils. The scores for each secondary structure corresponds to a particular residue were added up and divided by residue frequency generating a 20x3 matrix, which was used to calculate the features corresponding to secondary structure prediction.
Fss,j = ΣSS/Fi(j)
Where SS is score for any of the three secondary structures(helix/sheet/coil) with summation running over the protein length for each amino acid j. For each j, there exist three F ss,j corresponding to each secondary structure. Fi(j) is the frequency of the amino acid j in the protein. This was normalized using the following logistic function:
g(x) = 1/(1+exp(-x))
where x is the raw value in PSSM profile and g(x) is the normalized value of x".
4. Position Specific Scoring Matrix(PSSM) Profile
"This is an input training vector of 400 dimensions (20x20)"
5. PSSM and SSC hybrid
"This is a hybrid model, with input training features of both PSSM as well as SSC i.e. 460 dimensions."
What are Classifier performance metrics used?
To evaluate the accuracy of SVM classifiers, developed in cross validation cycles, following metrics were used:
Senstivity: percentage of VSR's that are correctly predicted as VSR's.
Specificity: percentage of non VSR sequences that are correctly predicted as non VSR's.
Accuracy: percentage of correct predictions out of total number of predictions.
Matthews Correlation Coefficient(MCC): measure of both sensitivity and specificity, MCC=0 indicated completely random prediction, while MCC=1 indicates perfect predictions.
References:
1. Valli, Adrian, Lopez-Moya, Juan Jose, and Garcia, Juan Antonio(Sep 2009) RNA Silencing and its Suppressors in the Plant-virus Interplay. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0021261].
2. Ramana J, Gupta D (2010) FaaPred: A SVM-Based Prediction Method for Fungal Adhesins and Adhesin-Like Proteins. PLoS ONE 5(3): e9695. doi:10.1371/journal.pone.0009695.
3. Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
|