Computational Methods for Predicting Protein-protein Interactions and Binding Sites

0

Add to Wishlist
Add to Wishlist

Description

Computational Methods for Predicting Protein-protein Interactions and Binding Sites, Is A Well-Researched Topic, It Is To Be Used As A Guide Or Framework For Your Research

Abstract

Proteins are essential to organisms and participate in virtually every process within cells. Quite often, they keep the cells functioning by interacting with other proteins. This process is called protein-protein interaction (PPI). The bonding amino acid residues during the process of protein-protein interactions are called PPI binding sites. Identifying
PPIs and PPI binding sites are fundamental problems in system biology.
Experimental methods for solving these two problems are slow and expensive. Therefore, great e orts are being made towards increasing the performance of computational methods.

We present DELPHI, a deep learning based program for PPI site prediction and SPRINT, an algorithmic based program for PPI prediction. Both programs have been compared to the state-of-the-art programs on several datasets. Both DELPHI and SPRINT are more accurate than the competing method. SPRINT is also orders of magnitudes faster while using very little memory.

The dataset and source code for both DELPHI and SPRINT are publicly available at: github.com/lucian-ilie and and www.csd.uwo.ca/~ilie/software.html

Table of Contents

Abstract i
Lay Summary ii
Acknowlegements iii
List of Figures viii
List of Tables xi
1 Introduction 1
1.1 DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Protein-protein Interaction Prediction . . . . . . . . . . . . . . . . . . . 4
1.4 Protein-protein Interaction Binding Sites Prediction . . . . . . . . . . . . 6
1.5 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 DELPHI 10
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Deep Learning in Bioinformatics . . . . . . . . . . . . . . . . . . . 10
2.1.2 Basic Notions and De nitions . . . . . . . . . . . . . . . . . . . . 14
Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 14
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . 18
Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . 22

Ensemble Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Dropout Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Training, Validation, and Testing Dataset . . . . . . . . . . . . . 27
Data Augmentations and Sampling . . . . . . . . . . . . . . . . . 27
2.1.3 Previous Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
PIPE-sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
DLPred . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
DeepPPISP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
SCRIBER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1 Training Data Preparation . . . . . . . . . . . . . . . . . . . . . . 33
Raw Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Similarities Eliminations . . . . . . . . . . . . . . . . . . . . . . . 33
Data Split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.3 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . 39
Many-to-one Structure . . . . . . . . . . . . . . . . . . . . . . . . 39
Architecture of the CNN Network . . . . . . . . . . . . . . . . . . 41
Architecture of the RNN Network . . . . . . . . . . . . . . . . . . 42
Architecture of the Ensemble Network . . . . . . . . . . . . . . . 42
2.2.4 Parameter/Hyper-parameter Tuning . . . . . . . . . . . . . . . . 43
2.2.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Environment Con guration . . . . . . . . . . . . . . . . . . . . . 45
Class Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Data Shuing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2.6 The DELPHI Web Server . . . . . . . . . . . . . . . . . . . . . . 47
The Architecture of the Web Server . . . . . . . . . . . . . . . . . 47
Front End Server Con guration . . . . . . . . . . . . . . . . . . . 48
Back End Server Con guration . . . . . . . . . . . . . . . . . . . 49

Communications between the Front and Back End Servers . . . . 49
Pre-computing PSSMs . . . . . . . . . . . . . . . . . . . . . . . . 50
Job Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3.1 Testing Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3.2 Evaluation Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.3.3 Performance Comparison on Dset 448 and Dset 355 . . . . . . . . 54
2.3.4 Performance Comparison on Dset 186, Dset 164, and Dset 72 . . 54
2.3.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Feature Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 56
The Evaluation of the Model Architecture and the Novel Features 57
2.3.6 Evolutionary Conservation . . . . . . . . . . . . . . . . . . . . . . 58
2.3.7 Accuracy of PBR Prediction . . . . . . . . . . . . . . . . . . . . . 59
2.3.8 Human Proteome Prediction . . . . . . . . . . . . . . . . . . . . . 62
2.3.9 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3 SPRINT 64
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.1.1 Similarity Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
BLAST Seeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Spaced Seeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Multiple Spaced Seeds . . . . . . . . . . . . . . . . . . . . . . . . 68
Substitution Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Interactome Prediction . . . . . . . . . . . . . . . . . . . . . . . . 70
3.1.2 Previous Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
PIPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Martin’s Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Shen’s Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Guo’s Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Ding’s Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.2.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2.2 Detecting Similarities . . . . . . . . . . . . . . . . . . . . . . . . 83
3.2.3 Predicting Interactions . . . . . . . . . . . . . . . . . . . . . . . . 88
Post-processing Similarities . . . . . . . . . . . . . . . . . . . . . 88
Scoring Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.2.4 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Tuneable Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 90
Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
System Con guration . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.1 Datasets Classi cation . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.3 Competing Methods . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.3.4 Comparative Analysis on Park & Marcotte’s Datasets . . . . . . . 94
3.3.5 Comparative Analysis on Seven Human Datasets . . . . . . . . . 95
3.3.6 Comparative Analysis on Human Interactome Prediction . . . . . 97
3.3.7 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4 Conclusion and Future Research 102
4.1 Common Deep Learning Practises in Bioinformatics . . . . . . . . . . . . 102
4.1.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . 104
Improving Results . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Bibliography 106
Curriculum Vitae 124

Brand

YourPastQuestions Brand

Additional information

Author

Yiwei Li

No of Chapters

4

No of Pages

138

Reference

YES

Format

PDF

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.