diff --git a/education/HADDOCK24/HADDOCK24-protein-protein-basic/index.md b/education/HADDOCK24/HADDOCK24-protein-protein-basic/index.md index 8e7c7d52c..615b7b21e 100644 --- a/education/HADDOCK24/HADDOCK24-protein-protein-basic/index.md +++ b/education/HADDOCK24/HADDOCK24-protein-protein-basic/index.md @@ -11,18 +11,48 @@ This tutorial consists of the following sections: * table of contents {:toc} +This tutorial was last updated on 12-03-2026
## Introduction -This tutorial will demonstrate the use of HADDOCK for predicting the structure of a protein-protein complex from NMR chemical shift perturbation (CSP) data. Namely, we will dock two E. coli proteins involved in glucose transport: the glucose-specific enzyme IIA (E2A) and the histidine-containing phosphocarrier protein (HPr). The structures in the free form have been determined using X-ray crystallography (E2A) (PDB ID [1F3G](https://www.ebi.ac.uk/pdbe/entry/pdb/1f3g){:target="_blank"}) and NMR spectroscopy (HPr) (PDB ID [1HDN](https://www.ebi.ac.uk/pdbe/entry/pdb/1hdn){:target="_blank"}). The structure of the native complex has also been determined with NMR (PDB ID [1GGR](https://www.ebi.ac.uk/pdbe/entry/pdb/1ggr){:target="_blank"}). These NMR experiments have also provided us with an array of data on the interaction itself (chemical shift perturbations, intermolecular NOEs, residual dipolar couplings, and simulated diffusion anisotropy data), which will be useful for the docking. For this tutorial, we will only make use of inteface residues identified from NMR chemical shift perturbation data as described in [Wang *et al*, EMBO J (2000)](https://onlinelibrary.wiley.com/doi/10.1093/emboj/19.21.5635/abstract){:target="_blank"}. +This tutorial will demonstrate the use of HADDOCK for predicting the structure of a protein-protein complex from NMR chemical shift perturbation (CSP) data. Namely, we will dock two E. coli proteins involved in glucose transport: the [glucose-specific enzyme IIA](https://www.uniprot.org/uniprotkb/P69783/){:target="_blank"} (E2A) and the [histidine-containing phosphocarrier protein](https://www.uniprot.org/uniprotkb/P0AA04/){:target="_blank"} (HPR). + +Bacteria use a specific mechanism to import glucose from outside the cell. +As glucose enters the cell, a phosphate group is attached to it, i.e. glucose becomes phosphorylated. +This phosphorylation prevents glucose from diffusing back out of the cell and at the same time marks it for further metabolism. +The phosphate group used for the glucose transport process originates from phosphoenolpyruvate (PEP) and is transferred through a cascade of proteins. +It first moves from PEP to enzyme I, then to HPR, next to E2A, and finally to enzyme IIB. Enzyme IIB is located on the cytoplasmic side of the membrane, where the phosphate group is ultimately transferred to glucose as it crosses the membrane. +More information can be found in [Jeckelmann *et al*, Eur J Physiol (2020)](https://doi.org/10.1007/s00424-020-02379-0){:target="_blank"}. +In the mean time, this animation provides a simple visualisation of the entire process: + +
+ +
+ +
+ The phosphate group travels between these proteins by forming covalent bonds with side chains of amino acids - in bacteria, via histidine residues: +
+ +
+The goal of this tutorial is to model the complex between HPR and E2A at the stage when the phosphate group has been transferred from HPR to E2A. + +HADDOCK requires input structures of the molecules to be docked. These inputs can be either experimentally determined unbound structures or computational models. In this case, unbound structures are available for both proteins: E2A was determined by X-ray crystallography (PDB ID [1F3G](https://www.ebi.ac.uk/pdbe/entry/pdb/1f3g){:target="_blank"}), and HPR was solved by NMR spectroscopy (PDB ID [1HDN](https://www.ebi.ac.uk/pdbe/entry/pdb/1hdn){:target="_blank"}). + +The structure of the native complex has also been determined with NMR (PDB ID [1GGR](https://www.ebi.ac.uk/pdbe/entry/pdb/1ggr){:target="_blank"}). These NMR experiments have also provided us with an array of data on the interaction itself (chemical shift perturbations, intermolecular NOEs, residual dipolar couplings, and simulated diffusion anisotropy data), which will be useful to guide the docking. + +For the purpose of this tutorial, we will only use interface residues identified from NMR chemical shift perturbation data from [Wang *et al.*, EMBO J (2000)](https://doi.org/10.1093/emboj/19.21.5635){:target="_blank"}. The structure of the native complex will be used only for the final evaluation of the docking results, and not during the docking itself. For this tutorial we will make use of the [HADDOCK2.4 webserver](https://wenmr.science.uu.nl/haddock2.4){:target="_blank"}. {% include paper_citation.html key="haddock24" %} - -Throughout the tutorial, coloured text will be used to refer to questions or instructions, and/or PyMOL commands. +Throughout the tutorial, coloured text will be used to refer to questions, instructions, and/or PyMOL commands. This is a question prompt: try answering it! This an instruction prompt: follow it! @@ -58,7 +88,7 @@ In this initial stage, the interacting partners are treated as rigid bodies, mea
**2. Semi-flexible simulated annealing in torsion angle space (it1)** -The second stage of the docking protocol introduces flexibility to the interacting partners through a three-step molecular dynamics-based refinement in order to optimize interface packing. It is worth noting that flexibility in torsion angle space means that bond lengths and angles are still frozen. The interacting partners are first kept rigid and only their orientations are optimized. Flexibility is then introduced in the interface, which is automatically defined based on an analysis of intermolecular contacts within a 5Å cut-off. This allows different binding poses coming from it0 to have different flexible regions defined. Residues belonging to this interface region are then allowed to move their side-chains in a second refinement step. Finally, both backbone and side-chains of the flexible interface are granted freedom. The AIRs again play an important role at this stage since they might drive conformational changes. +The second stage of the docking protocol introduces flexibility to the interacting partners through a three-step molecular dynamics-based refinement in order to optimize interface packing. It is worth noting that flexibility in torsion angle space means that bond lengths and angles are still frozen. The interacting partners are first kept rigid and only their orientations are optimized. Flexibility is then introduced in the interface, which is automatically defined based on an analysis of intermolecular contacts within a 5Å cut-off. This allows different binding poses coming from it0 to have different flexible regions defined. Residues belonging to this interface region are then allowed to move their side chains in a second refinement step. Finally, both backbone and side chains of the flexible interface are granted freedom. The AIRs again play an important role at this stage since they might drive conformational changes.
@@ -70,8 +100,11 @@ The second stage of the docking protocol introduces flexibility to the interacti

- **3. Refinement in Cartesian space with explicit solvent (water)** - **Note:** This stage was part of the standard HADDOCK protocol up to (and including) v2.2. As of v2.4 it is no longer performed by default but the user still has the option of enabling it. In its place, a short energy minimisation is performed instead. The final stage of the docking protocol immerses the complex in a solvent shell so as to improve the energetics of the interaction. HADDOCK currently supports water (TIP3P model) and DMSO environments. The latter can be used as a membrane mimic. In this short explicit solvent refinement the models are subjected to a short molecular dynamics simulation at 300K, with position restraints on the non-interface heavy atoms. These restraints are later relaxed to allow all side chains to be optimized. + **3. Refinement in Cartesian space with explicit solvent (itw)** + The final stage of the docking protocol immerses the complex in a solvent shell so as to improve the energetics of the interaction. HADDOCK currently supports water (TIP3P model) and DMSO environments. The latter can be used as a membrane mimic. In this short explicit solvent refinement the models are subjected to a short molecular dynamics simulation at 300K, with position restraints on the non-interface heavy atoms. These restraints are later relaxed to allow all side chains to be optimized.
+**_Note_** that as of v2.4, it is no longer performed by default, which used to be the case up to (and including) v2.2. +Instead, a short energy minimisation in Cartesian space is performed. +Users can still opt for refinement with explicit solvent as an alternative to this energy minimisation.
@@ -83,8 +116,6 @@ The second stage of the docking protocol introduces flexibility to the interacti

- - The performance of this protocol of course depends on the number of models generated at each step. Few models are less probable to capture the correct binding pose, while an exaggerated number will become computationally unreasonable. The standard HADDOCK protocol generates 1000 models in the rigid body minimization stage, and then refines the best 200 – regarding the energy function - in both it1 and water. Note, however, that while 1000 models are generated by default in it0, they are the result of five minimization trials and for each of these the 180º symmetrical solution is also sampled. Effectively, the 1000 models written to disk are thus the results of the sampling of 10.000 docking solutions. The final models are automatically clustered based on a specific similarity measure - either the *positional interface ligand RMSD* (iL-RMSD) that captures conformational changes about the interface by fitting on the interface of the receptor (the first molecule) and calculating the RMSDs on the interface of the smaller partner, or the *fraction of common contacts* (current default) that measures the similarity of the intermolecular contacts. For RMSD clustering, the interface used in the calculation is automatically defined based on an analysis of all contacts made in all models. @@ -92,40 +123,40 @@ The final models are automatically clustered based on a specific similarity meas ## Inspecting and preparing E2A for docking We will now inspect the E2A structure. For this start PyMOL and in the command line window of PyMOL (indicated by PyMOL>) type: - fetch 1F3G
show cartoon
hide lines
-show sticks, resn HIS
-You should see a backbone representation of the protein with only the histidine side-chains visible. -Try to locate the histidines in this structure. - -Is there any phosphate group present in this structure? - -*Hint* : you can select phosphate atoms with the following PyMOL command: -select elem P - -Note that you can zoom on the histidines by typing in PyMOL: +You should see a cartoon representation of the protein. +It is known from the literature that a phosphate group interacts and can form a covalent bond with the side chain of a histidine residue. +Let us first check whether histidine residues are present in this structure: + +show sticks, resn HIS
+
+The histidine side chains are now displayed in stick representation. You can zoom in on the histidines using: zoom resn HIS -Revert to a full view with: - +To return to the full view of the structure, type: zoom vis -As a preparation step before docking, it is advised to remove any irrelevant water and other small molecules (e.g. small molecules from the crystallisation buffer), however do leave relevant co-factors if present. For E2A, the PDB file only contains water molecules. You can remove those in PyMOL by typing: +This structure has two histidines present. How about phosphate group? + +Is there a phosphate group present in this structure? + +*Hint* : you can select phosphate atoms with the following command and check how many atoms are in this selection: +select elem P +As a preparation step before docking, it is advised to remove any irrelevant water and other small molecules (e.g. small molecules from the crystallisation buffer), however do leave relevant co-factors if present. For E2A, the only irrelevant molecules in the PDB file are the water molecules. You can remove those by typing: remove resn HOH -Now let's vizualize the residues affected by binding as identified by NMR. From [Wang *et al*, EMBO J (2000)](https://onlinelibrary.wiley.com/doi/10.1093/emboj/19.21.5635/abstract){:target="_blank"} the following residues of E2A were identified has having significant chemical shift perturbations: +Now let's vizualize the residues affected by binding as identified by NMR. From [Wang *et al*, EMBO J (2000)](https://doi.org/10.1093/emboj/19.21.5635){:target="_blank"} the following residues of E2A were identified has having significant chemical shift perturbations: 38,40,45,46,69,71,78,80,94,96,141 We will now switch to a surface representation of the molecule and highlight the NMR-defined interface. In PyMOL type the following commands: - color white, all
show surface
@@ -142,18 +173,21 @@ Inspect the surface.
Do the identified residues form a well defined patch on the surface? Do they form a contiguous surface? -The answer to the last question should be no: We can observe residue in the center of the patch that do not seem significantly affected while still being in the middle of the defined interface. This is the reason why in HADDOCK we also define "*passive*" residues that correspond to surface neighbors of active residues. These can be selected manually, or more conveniently you can let the HADDOCK server do it for you (see [Setting up the docking run](#setting-up-the-docking-run) below). +The answer to the last question should be **no**: we can observe residue in the center of the patch that do not seem significantly affected while still being in the middle of the defined interface. This is the reason why in HADDOCK we also define "*passive*" residues that correspond to surface neighbors of active residues. These can be selected manually, or more conveniently you can let the HADDOCK server do it for you (see [Setting up the docking run](#setting-up-the-docking-run) below). As final step save the molecule as a new PDB file which we will call: *e2a_1F3G.pdb*
For this in the PyMOL menu on top select: - File -> Export molecule... Click on the save button Select as ouptut format PDB (*.pdb *.pdb.gz) Name your file *e2a_1F3G.pdb* and note its location -After saving the molecule delete it from the Pymol window or close Pymol. You can remove the molecule by typing this into the command line window of PyMOL: +Another way to save the structure as a PDB file is via the command: +save e2a_1F3G.pdb, 1F3G +The file will be written to the current working directory: if PyMOL was launched from a terminal, it will be saved in the directory from which PyMOL was started; if PyMOL was opened manually (e.g., via the graphical interface), it is typically saved in your home directory. + +After saving the molecule delete it from the PyMOL window or close PyMOL. You can remove the molecule by: delete 1F3G @@ -162,21 +196,21 @@ delete 1F3G ## Inspecting and preparing HPR for docking We will now inspect the HPR structure. For this start PyMOL and in the command line window of PyMOL type: - fetch 1HDN
show cartoon
hide lines
-Since this is an NMR structure it does not contain any water molecules and we don't need to remove them. +Are there any histidines present in this structure? +Is there a phosphate group present in this structure? +Are there any irrelevant (for the docking) molecules present in this structure? -Let's vizualize the residues affected by binding as identified by NMR. From [Wang *et al*, EMBO J (2000)](https://onlinelibrary.wiley.com/doi/10.1093/emboj/19.21.5635/abstract){:target="_blank"} the following residues were identified has having significant chemical shift perturbations: +Let's vizualize the residues affected by binding as identified by NMR. From [Wang *et al*, EMBO J (2000)](https://doi.org/10.1093/emboj/19.21.5635){:target="_blank"} the following residues were identified has having significant chemical shift perturbations: 15,16,17,20,48,49,51,52,54,56 We will now switch to a surface representation of the molecule and highlight the NMR-defined interface. In PyMOL type the following commands: - color white, all
show surface
@@ -189,18 +223,32 @@ Again, inspect the surface.
Do the identified residues form a well defined patch on the surface? Do they form a contiguous surface? -Now since this is an NMR structure, it actually consists of an ensemble of models. HADDOCK can handle such ensemble, using each conformer in turn as starting point for the docking. We however recommend to limit the number of conformers used for docking, since the number of conformer combinations of the input molecules might explode (e.g. 10 conformers each will give 100 starting combinations and if we generate 1000 ridig body models (see [HADDOCK general concepts](#haddock-general-concepts) above) each combination will only be sampled 10 times). +You may have noticed that the set of PyMOL commands above took slightly longer to execute compared to the similar set of commands for E2A. +This is because 1HDN is an NMR structure. Unlike X-ray structures, NMR entries contain an ensemble of models - in this case, we have 30 different conformers. -Now let's vizualise this NMR ensemble. In PyMOL type: +You can display all 30 conformers, looped in succession, using: + mplay +To stop the playback: + mstop +HADDOCK is able to handle such ensembles by using each conformer in turn as a starting point for docking. +We generally recommend limiting the number of conformers used. +Otherwise, the number of possible combinations between the input molecules can quickly escalate (i.e. become very large). +For example, if both partners contain 10 conformers, this results in 100 possible starting combinations. If 1000 rigid-body models are generated (see [HADDOCK general concepts](#haddock-general-concepts) above), each combination would then be sampled only 10 times! + +In case if limiting number of input conformers is an unreasonable choice, it is possible to increase the number of models generated in the rigid-body docking stage (it0). +However, this requires elevated permissions level on the HADDOCK 2.4 server, which you can request via "[User Dashboard](https://wenmr.science.uu.nl/dashboard){:target="_blank"}". + +Now let's display all models of this NMR ensemble simultaneously in ribbon representation. +This representation is handy for visualizing backbone conformation: hide all
show ribbon
set all_states, on
-You should now be seing the 30 conformers present in this NMR structure. To illustrate the potential benefit of using an ensemble of conformations as starting point for docking let's look at the side-chains of the active residues: - +You should now be seeing the 30 conformers present in this NMR structure. +It may appear that conformations are fairly conserved across all 30 models, but let us look at the side chains of the active residues: show lines, hpr_active
@@ -209,55 +257,64 @@ show lines, hpr_active
-You should be able to see the amount of conformational space sampled by those surface side-chains. You can clearly see that some residues do sample a large variety of conformations, one of which might lead to much better docking results. - -**Note:** Pre-sampling of possible conformational changes can thus be beneficial for the docking, but again do limit the number of conformers used for the docking (or increase the number of sampled models, which is possible for users with expert- or guru-level access. The default access level is however only easy - for a higher level access do request it after registration). - -As final step, save the molecule as a new PDB file which we will call: *hpr-ensemble.pdb* -For this in the PyMOL menu select: +You should now be able to observe the range of conformational space sampled by these surface side chains. +Some residues clearly adopt a wide variety of conformations, and one of these might lead to much better docking results. +This illustrates the potential benefit of using an ensemble of conformations as starting points rather than a single structure, especially when there is no clear indication of which 1 out of the 30 models would be best for the docking. +As final step, save the molecule as a new PDB file which we will call *hpr-ensemble.pdb*. +For this, in the PyMOL menu select: File -> Export molecule... Select as State 0 (all states) Click on Save... -Select as ouptut format PDB (*.pdb *.pdb.gz) +Select as output format PDB (*.pdb *.pdb.gz) Name your file *hpr-ensemble.pdb* and note its location +**_Note_** that it is important to change "State" from the default "-1" to "0". Otherwise a single conformation will be saved instead of the multiple ones. +
## Adding a phosphate group -Since the biological function of this complex is to transfer a phosphate group from one protein to another, via histidines side-chains, it is relevant to make sure that a phosphate group be present for docking. As we have seen above none is currently present in the PDB files. HADDOCK does support a list of modified amino acids which you can find at the following link: [https://wenmr.science.uu.nl/haddock2.4/library](https://wenmr.science.uu.nl/haddock2.4/library){:target="_blank"}. +Since the biological function of this complex is to transfer a phosphate group from one protein to another via histidine side chains, it is important that the phosphate group participates in the docking. +Yet both the structures we prepared and saved do not currently contain any phosphate group. + +As a reminder (see [Introduction](#introduction) above), in bacteria the phosphate group is transferred between histidine residues of the interacting proteins. +From the literature it is known that in E2A histidine 90 is involved in this transfer. +We can include phosphate group into the docking by modifying this canonical histidine into a phosphorylated histidine, i.e. histidine with covalently attached phosphate group. -Check the list of supported modified amino acids. -What is the proper residue name for a phospho-histidine in HADDOCK? +HADDOCK supports a number of modified amino acids, which can be found at: [https://wenmr.science.uu.nl/haddock2.4/library](https://wenmr.science.uu.nl/haddock2.4/library){:target="_blank"}. -In order to use a modified amino-acid in HADDOCK, the only thing you will need to do is to edit the PDB file and change the residue name of the amino-acid you want to modify. Don't bother deleting irrelevant atoms or adding missing ones, HADDOCK will take care of that. For E2A, the histidine that is phosphorylated has residue number 90. In order to change it to a phosphorylated histidine do the following: +Check the list of supported modified amino acids. What is the proper residue name for a phosphorylated histidine in HADDOCK? -Edit the PDB file (*e2a_1F3G.pdb*) in your favorite editor -Change the name of histidine 90 to NEP -Save the file (as simple text file) under a new name, e.g. *e2aP_1F3G.pdb* +To use a modified amino acid in HADDOCK, it is sufficient to edit the PDB file and change the residue name of the corresponding residue. There is no need to add or delete atoms - HADDOCK will take care of this automatically. -**Note:** The same procedure can be used to introduce a mutation in an input protein structure. +To introduce this modification: +Open the PDB file *e2a_1F3G.pdb* in your favorite text editor +Find histidine with residue sequence number equal to 90 +Remember that residue sequence number is the second integer value in the line starting with "ATOM". Check [this link](https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/pdbintro.html){:target="_blank"} for more info. +Change this residue name to NEP +Save the file under a new name, e.g. *e2aP_1F3G.pdb* + +**_Note_** that the same procedure can also be used to introduce mutations in an input protein structure.
## Setting up the docking run #### Registration / Login -In order to start the submission, either click on "*here*" next to the submission section, or click [here](https://wenmr.science.uu.nl/auth/register/){:target="_blank"}. To start the submission process, we are prompted for our login credentials. After successful validation of our credentials we can proceed to the structure upload. - -**Note:** The blue bars on the server can be folded/unfolded by clicking on the arrow on the left +In order to use HADDOCK web server, you need to navigate to [wenmr portal](https://wenmr.science.uu.nl/auth/){:target="_blank"} and login. +After successful validation of credentials, scroll down to "Services", locate **HADDOCK v2.4** and click on "Go to service". Or use this link: [https://wenmr.science.uu.nl/haddock2.4/](https://wenmr.science.uu.nl/haddock2.4/){:target="_blank"}. -#### Submission and validation of structures +#### HADDOCK submission: Input data -For this we will make us of the [HADDOCK 2.4 interface](https://wenmr.science.uu.nl/haddock2.4/submit/1){:target="_blank"} of the HADDOCK web server. +Locate "[Submit a new job](https://wenmr.science.uu.nl/haddock2.4/submit/1){:target="_blank"}" button. Note that you are now in "Input data" tab. -In this stage of the submission process we can upload the structures we previously prepared with PyMOL. +In this stage of the submission process we will upload the structures we previously prepared with PyMOL. -* **Step1:** Define a name for your docking run in the field "Job name", e.g. *E2A-HPR*. +* **Step1:** In the field "Job name", define a name for your docking run, e.g. *E2A-HPR*. -* **Step2:** Select the number of molecules to dock, in this case the default *2*. +* **Step2:** In the field "Number of molecules", select the number of molecules to dock, in this case 2. -* **Step3:** Input the first protein PDB file. For this unfold the **Molecule 1 - input** if it isn't already unfolded. +* **Step3:** In the section "Molecule 1 - input", upload the PDB file for E2A. Which chain to be used? -> All (for this particular case) @@ -266,24 +323,36 @@ Which chain to be used? -> All (for this particular case) PDB structure to submit -> Browse and select *e2aP_1F3G.pdb* (the file you edited to modify the histidine) -**Note:** Leave all other options to their default values. +Leave all other options to their default values. +**_Note_** that you can fold and unfold the "Molecule 1 - input" section by clicking on the ▼ icon. This works for any section and subsection of HADDOCK server. -* **Step4:** Input the second protein PDB file. For this unfold the **Molecule 2 - input** if it isn't already unfolded. +* **Step4:** In the subsection "Molecule 2 - input", upload the PDB file for HPR. Which chain to be used? -> All (for this particular case) -PDB structure to submit -> Browse and select *hpr-ensemble.pdb* (the file you saved) +PDB structure to submit -> Browse and select *hpr-ensemble.pdb* (the ensemble of NMR conformations you saved) -* **Step 5:** Click on the "Next" button at the bottom left of the interface. This will upload the structures to the HADDOCK webserver where they will be processed and validated (checked for formatting errors). The server makes use of [Molprobity](http://molprobity.biochem.duke.edu/){:target="_blank"} to check side-chain conformations, eventually swap them (e.g. for asparagines) and define the protonation state of histidine residues. +**_Note_** that HADDOCK server will automatically adjust several docking parameters based on the field "What kind of molecule are you docking?". + +* **Step 5:** Click on the "Next" button at the bottom left of the interface. + +This will upload the structures to the HADDOCK webserver where they will be processed and validated (checked for formatting errors). The server makes use of [Molprobity](http://molprobity.biochem.duke.edu/){:target="_blank"} to check side chain conformations, eventually swap them (e.g. for asparagines) and define the protonation state of histidine residues. + + +#### HADDOCK submission: Input parameters + +If processing and validation of the input files run without errors, you will be brought to the "Input parameters" tab. In case any issues had occurred - you will remain in the "Input data" tab and the error message will be shown either on top of the page or in the subsection for one of the molecules. + +##### Definition of restraints -#### Definition of restraints +In this tab, we will define distance restraints by specify active residues for each molecule. -If everything went well, the interface window should have updated itself and it should now show the list of residues for molecules 1 and 2. We will be making use of the text boxes below the residue sequence of every molecule to specify the list of active residues to be used for the docking run. +* **Step 6:** In the section "Molecule 1 - parameters", in the subsection "Active/Passive residues - Selection #1", in the field "Active residues (directly involved in the interaction)", specify the active residues for E2A. -* **Step 6:** Specify the active residues for the first molecule. For this unfold the "Molecule 1 - parameters" if it isn't already unfolded. +**_Note_** that "residue sequence number" and "residue ID" are equivalent terms. Active residues (directly involved in the interaction) -> 38,40,45,46,69,71,78,80,94,96,141 @@ -291,7 +360,7 @@ Active residues (directly involved in the interaction) -> 38,40,45,46,69,71,78,8 Automatically define passive residues around the active residues -> check (checked by default) -* **Step 7:** Specify the active residues for the second molecule. For this unfold the "Molecule 2 - parameters" if it isn't already unfolded. +* **Step 7:** In the section "Molecule 2 - parameters", in the subsection "Active/Passive residues - Selection #2", in the field "Active residues (directly involved in the interaction)", specify the active residues for the HPR. Active residues (directly involved in the interaction) -> 15,16,17,20,48,49,51,52,54,56 @@ -299,27 +368,30 @@ Active residues (directly involved in the interaction) -> 15,16,17,20,48,49,51,5 Automatically define passive residues around the active residues -> check (checked by default) +##### Checking the histidines protonation state -#### Checking the histidines protonation state. +The HADDOCK server assigns the protonation states of histidines automatically using [MolProbity](http://molprobity.biochem.duke.edu/){:target="_blank"}. +However, we know that the histidine of HPR that is expected to interact with the phosphate group should be positively charged. +From the literature, this residue in HPR is most likely histidine 15. Let's make sure this histidine is positively charged. -One of the NMR-identified residue on HPR is a Histidine (residue 15). As this complex is a phospho-transfer complex, this histidine is most likely to interact with the phosphate group on E2A. As such its most likely protonation state should be a charged histidine (HIS+) for docking purposes. The server has assigned the protonation state of Histines based on [Molprobity](http://molprobity.biochem.duke.edu/){:target="_blank"}. - -* **Step 8:** Unfold the Histidine protonation state bar for molecule 2 and check the defined protonation state of His15. +* **Step 8:** In the section "Molecule 2 - parameters", unfold the subsection "Histidine protonation state" and check the defined protonation state of HIS 15. -If not HIS+ change it to HIS+ to use a positively charged Histidine for this residue +Change the state of HIS 15 to "HIS+" * **Step 9:** Click on the "Next" button at the bottom left of the interface. -#### Job submission +#### HADDOCK submission: Docking parameters + +This interface allows us to modify many parameters that control the behaviour of HADDOCK but in our case the default values are all appropriate. The best way to learn more about these parameters is by completing the other HADDOCK 2.4 tutorials. -This interface allows us to modify many parameters that control the behaviour of HADDOCK but in our case the default values are all appropriate. It also allows us to download the input structures of the docking run (in the form of a tgz archive) and a haddockparameter file which contains all the settings and input structures for our run (in json format). We stronly recommend to download this file as it will allow you to repeat the run after uploading into the [file upload inteface](https://wenmr.science.uu.nl/haddock2.4/submit_file){:target="_blank"} of the HADDOCK webserver. It can serve as input reference for the run. This file can also be edited to change a few parameters for example. An excerpt of this file is shown here: + Scroll to the bottom of the page. +Here you should see buttons "Download parameter file" and "Download input files". The "parameter file" is a json that contains all the settings of the run. We strongly recommend to download and keep this file - this will allow you to run reproducible experiments. With this file, you can use [HADDOCK File Upload Interface](https://wenmr.science.uu.nl/haddock2.4/submit_file){:target="_blank"} to repeat the run with exact same parameters. This file can also be edited to change a one or a few parameters - it's quicker than repeating all submission steps. An excerpt of this file is shown here:
-{
     "amb_cool1": 10.0,
     "amb_cool2": 50.0,
     "amb_cool3": 50.0,
@@ -330,83 +402,82 @@ This interface allows us to modify many parameters that control the behaviour of
 ...
 
+The "input files" is a tar archive that contains all files HADDOCK will use during the run. For example, in this archive you will se 30 separate PDB file titled _protein2_1.pdb_, _protein2_2.pdb_ ... _protein2_30.pdb_ - these are individual conformations extracted from _hpr_ensemble.pdb_ we uploaded earlier. Another example is _ambig.tbl_ - ths file will contain an actual list of distance restraints created based on the active residues we selected earlier. Lastly, *job_params.json* is the "parameter file" discussed above. + * **Step 10:** Click on the "Submit" button at the bottom left of the interface. -Upon submission you will be presented with a web page which also contains a link to the previously mentioned haddockparameter file as well as some information about the status of the run. +##### Your job has been successfully processed! -
- -
+Upon submission you will be presented with a web page with a message "Your job has been successfully processed!". This page allows you to track the execution of the run and also download "parameter file". -Currently your run should be queued but eventually its status will change to "Running": +At first your job will have status "Processed", then "Queued", and eventually it will change to "Running" and you will see progress bar moving along each stage.
- +
-The page will automatically refresh and the results will appear upon completions (which can take between 1/2 hour to several hours depending on the size of your system and the load of the server). You will be notified by email once your job has successfully completed. +This run will take between 30 minutes to several hours - depending on the load of the server. You will be notified by email once your job has been completed. The results will remain accessible for a week. + +You do not have to keep this page open, all resent jobs can be accessed via the "[Workspace](https://wenmr.science.uu.nl/haddock2.4/workspace){:target="_blank"}" button in the navigation bar.
## Analysing the results -Once your run has completed you will be presented with a result page showing the cluster statistics and some graphical representation of the data (and if registered, you will also be notified by email). Such an example output page can be found [here](https://wenmr.science.uu.nl/haddock2.4/run/4242424242/195967-E2A-HPR){:target="_blank"} in case you don't want to wait for the results of your docking run. +Once your run has completed (you will also be notified by email about it) you will be presented with a result page showing the cluster statistics and graphical representation of the run. +An example output page for E2A-HPR docking can be found [here](https://wenmr.science.uu.nl/haddock2.4/run/4242424242/195967-E2A-HPR){:target="_blank"} - just in case you don't want to wait for the results of your docking run.
-Inspect the result page -How many clusters are generated? +Inspect the result page. How many clusters have been generated? -**Note:** The bottom of the page gives you some graphical representations of the results, showing the distribution of the solutions for various measures (HADDOCK score, van der Waals energy, ...) as a function of the Fraction of Common Contact with- and RMSD from the best generated model (the best scoring model). The graphs are interactive and you can turn on and off specific clusters, but also zoom in on specific areas of the plot. +For this run, 80% of 200 models have been clustered, meaning that run has converged. If only a small percentage on models have been clustered it might indicate, among the others, insufficient sampling with respect to the number of input conformers or that restraints are too diverse. -The bottom graphs show you the distribution of scores (Evdw, Eelec and Edesol) for the various clusters. - - -
- -
-The ranking of the clusters is based on the average score of the top 4 members of each cluster. The score is calculated as: +HADDOCK clusters are named according to the number of models they contain, e.g. the largest cluster is always labeled "Cluster 1", the second-largest "Cluster 2", and so on. +Clusters are then ordered by their average HADDOCK score. As a result, it is not extremely unusual to see, for example, "Cluster 3" ranked above "Cluster 2". +For each cluster, the average and standard deviation of the HADDOCK score and other associated metrics are reported. These statistics are calculated using only the four lowest-scoring models within each cluster. +The score for each model is calculated as:
-      HADDOCKscore = 1.0 * Evdw + 0.2 * Eelec + 1.0 * Edesol + 0.1 * Eair
+      HADDOCK_score = 1.0 * E_vdw + 0.2 * E_elec + 1.0 * E_desol + 0.1 * E_air,
 
-where Evdw is the intermolecular van der Waals energy, Eelec the intermolecular electrostatic energy, Edesol represents an empirical desolvation energy term adapted from Fernandez-Recio *et al.* J. Mol. Biol. 2004, and Eair the AIR energy. The cluster numbering reflects the size of the cluster, with cluster 1 being the most populated cluster. The various components of the HADDOCK score are also reported for each cluster on the results web page. +where *E_vdw* is the intermolecular van der Waals energy, *E_elec* is the intermolecular electrostatic energy, *E_desol* represents an empirical desolvation energy term adapted from Fernandez-Recio *et al.* J. Mol. Biol. 2004, and *E_air* is a penalty for violation of the restraints. -Consider the cluster scores and their standard deviations. -Is the top ranked cluster significantly better than the second one? (This is also reflected in the z-score). +Consider the cluster scores and their standard deviations. Is the top ranked cluster significantly better than the second one? (This is also reflected in the z-score). -In case the scores of various clusters are within standard devatiation from each other, all should be considered as a valid solution for the docking. Ideally, some additional independent experimental information should be available to decide on the best solution. In this case we do have such a piece of information: the phosphate transfer mechanism (see [Biological insights](#biological-insights) below). +At the bottom of this page you can find graphical representations of the results, showing the distribution of the solutions for HADDOCK score and its components as a function of the Fraction of Common Contact with- and RMSD from the best generated model (i.e. model with lowest HADDOCK score). The graphs are interactive and you can show/hide clusters, zoom in on specific areas of the plot, etc. +
+ +
-**Note:** The type of calculations performed by HADDOCK does have some chaotic nature, meaning that you will only get exactly the same results if you are running on the same hardware, operating system and using the same executable. The HADDOCK server makes use of [EGI](https://www.egi.eu)/[EOSC](https://www.eosc-hub.eu){:target="_blank"} high throughput computing (HTC) resources to distribute the jobs over a wide grid of computers worldwide. As such, your results might look slightly different from what is presented in the [example output page](https://wenmr.science.uu.nl/haddock2.4/run/4242424242/E2A-HPR){:target="_blank"}. That run was run on our local cluster. Small differences in scores are to be expected, but the overall picture should be consistent. +Can you locate the lowest-scored model on one of the graphs? What is the ID of this model? +In case the scores of various clusters are within standard deviation from each other, all clusters should be considered as a valid solution for the docking. Ideally, some additional independent experimental information should be available to decide on the best solution. In this case we do have such a piece of information: the phosphate transfer mechanism (see [Biological insights](#biological-insights) below). +**_Note_** that the type of calculations performed by HADDOCK does have some chaotic nature, meaning that you will only get exactly the same results if you are running on the same hardware, operating system and using the same executable. The HADDOCK server makes use of [EGI](https://www.egi.eu)/[EOSC](https://www.eosc-hub.eu){:target="_blank"} high throughput computing (HTC) resources to distribute the jobs over a wide grid of computers worldwide. As such, your results might look slightly different from what is presented in the [example output page](https://wenmr.science.uu.nl/haddock2.4/run/4242424242/E2A-HPR){:target="_blank"}, which was performed on our local cluster. Small differences in scores are to be expected, but the overall picture should be consistent.
## Visualisation -The new HADDOCK2.4 server integrates the NGL viewer which allows you to quickly visualize a specific structure. For that click on the "eye" icon next to a structure. +HADDOCK server integrates the NGL viewer which allows you to quickly visualize a specific structure among clustered. For that click on the "eye" icon next to a structure. In order to compare the various clusters we will however download the models and inspect them using PyMol. - - -Download and save to disk the first model of each cluster (use the PDB format) +Download and save to disk the first model of each cluster (use the PDB format). To do it, search for the "download all cluster files" link just above the top-ranked cluster. Then start PyMOL and load each cluster representative: - File menu -> Open -> select cluster1_1.pdb -Repeat this for each cluster. Once all files have been loaded, type in the PyMOL command window: - +Repeat this for each cluster. Once all files have been loaded: show cartoon
util.cbc
hide lines
-Let's then superimpose all models on chain A of the first cluster: - +You can display and hide a cluster by clicking on its name in the right panel of the PyMOL window. +Let's superimpose all models on chain A of the first cluster: select cluster1_1 and chain A
alignto sele
@@ -418,10 +489,8 @@ This will align all clusters on chain A (E2A), maximizing the differences in the Examine the various clusters. How does the orientation of HPR differ between them?
-**Note:** You can turn on and off a cluster by clicking on its name in the right panel of the PyMOL window. - -Let's now check if the active residues which we defined are actually part of the interface. In the PyMOL command window type: - +Let's now check if the active residues which we defined are actually part of the interface. +For this, we need to create selections of active residues for each molecule and colour them differently: select e2a_active, (resi 38,40,45,46,69,71,78,80,94,96,141) and chain A
select hpr_active, (resi 15,16,17,20,48,49,51,52,54,56) and chain B
@@ -429,25 +498,31 @@ color red, e2a_active
color orange, hpr_active
+You can display side chains of the active residues as lines to get a better view of their orientation: + +show lines, e2a_active and sidechain
+show lines, hpr_active and sidechain +
+ -Are the active residues in the interface? +Are the active residues in the interface? Is it the case for all clusters?
## Biological insights -The E2A-HPR complex is involved in phosphate-transfer, in which a phosphate group attached to histidine 90 of E2A (which we named NEP) is transferred to a histidine of HPR. As such, the docking models should make sense according to this information, meaning that two histidines should be in close proximity at the interface. Using PyMOL, check the various cluster representatives (we are assuming here you have performed all PyMOL commands of the previous section): - +The E2A-HPR complex is involved in phosphate transfer, in which a phosphate group travels from histidine 15 of HPR to histidine 90 of E2A. As such, the docking models should make sense according to this information, meaning that two histidines should be in close proximity at the interface. Using PyMOL, check the various cluster representatives (we are assuming here you have performed all PyMOL commands of the previous section): +hide lines
+util.cbc
select histidines, resn HIS+NEP
-show spheres, histidines
+show sticks, histidines
util.cnc
First of all, has the phosphate group been properly generated? -**Note:** You can zoom on the phosphorylated histidine using the following PyMOL command: - +Zoom on the phosphorylated histidine (called NEP in HADDOCK) using the following PyMOL command: zoom resn NEP
@@ -456,26 +531,21 @@ zoom resn NEP
-Zoom back to all visible molecules with - +Zoom back to all visible molecules with: zoom vis
Now inspect each cluster in turn and check if histidine 90 of E2A is in close proximity to another histidine of HPR. -To facilitate this analysis, view each cluster in turn (use the right panel to activate/desactivate the various clusters by clicking on their name). - -Based on this analysis, which cluster does satisfy best the biolocal information? -Is this cluster also the best ranked one? +Based on this analysis, which cluster fits biological information the mos does satisfy best the biological information? Is this cluster also the best ranked one?
## Comparison with the reference structure As explained in the introduction, the structure of the native complex has been determined by NMR (PDB ID [1GGR](https://www.ebi.ac.uk/pdbe/entry/pdb/1ggr){:target="_blank"}) using a combination of intermolecular NOEs and dipolar coupling restraints. We will now compare the docking models with this structure. -If you still have all cluster representative open in PyMOL you can proceed with the sub-sequent analysis, otherwise load again each cluster representative as described above. Then, fetch the reference complex by typing in PyMOL: - +If you still have all cluster representative open in PyMOL you can proceed with the following analysis, otherwise load again each cluster representatives as described above. Then, fetch the reference complex and colour its chains: fetch 1GGR
show cartoon
@@ -483,63 +553,48 @@ color yellow, 1GGR and chain A
color orange, 1GGR and chain B
-The number of chain B in this structure is however different from the HPR numbering in the structure we used: It starts at 301 while in our models chain B starts at 1. We can change the residue numbering easily in PyMol with the following command: - +The numbering of chain B in this structure is different from the HPR numbering in the structure we used: it starts at 301 while in our models chain B starts at 1. We can shift the residue numbering by 300 using the following command: alter (chain B and 1GGR), resv -=300
+This shift is critical for the RMDS calculation described below! -Then superimpose all cluster representatives on the reference structure, using the entire chain A (E2A): - +Let's superimpose all cluster representatives on the chain A of the reference structure: -select 1GGR and chain A
-alignto sele
+alignto 1GGR and chain A
-Does any of the cluster representatives ressemble the reference NMR structure? - - -In case you found a reasonable prediction, what is its cluster rank? +Does any of the cluster representatives resemble the reference NMR structure? If yes, what is the rank of this model? -In the blind protein-protein prediction experiment [CAPRI](https://capri.ebi.ac.uk/){:target="_blank"} (Critical PRediction of Interactions), a measure of the quality of a model is the so-called ligand-RMSD (l-RMSD). It is calculated by fitting on the receptor chain (E2A or chain A in our case) and calculating the RMSD on the backbone of the ligand (HPR or chain B in our case). This can be done in PyMOL with the following command: - +One of the common metrics for the evaluation of the similarity of the complexes is ligand-RMSD (lRMSD). It is calculated by fitting a complex on the receptor chain (E2A or chain A in our case) and calculating the RMSD on the backbone of the ligand (HPR or chain B in our case). This can be done in PyMOL with: -rms_cur cluster1_1 and chain B, 1GGR
+align 1GGR and chain A, cluster1_1 and chain A
+rms_cur cluster1_1 and chain B, 1GGR
-**Note:** On some machines the pymol rms_cur command can fail due to a bug in the PyMOL software. In this case you can use the following command instead: - - -align cluster1_1, 1GGR, cycles=0
-
- -This will align the two structures based on the all-atom RMSD, different from the ligand-RMSD (l-RMSD) that you can calculate with rms_cur and the above commands. - -In CAPRI, the l-RMSD value defines the quality of a model: - -* acceptable model: l-RMSD<10Å -* medium quality model: l-RMSD<5Å -* high quality model: l-RMSD<1Å +In the community-wide blind protein-protein prediction experiment [CAPRI](https://capri.ebi.ac.uk/){:target="_blank"} (Critical PRediction of Interactions), the following cutoff are used to define the quality of the model with respect to the native structure: +* acceptable model: lRMSD<10Å +* medium quality model: lRMSD<5Å +* high quality model: lRMSD<1Å -What is based on this CAPRI criterion the quality of the best model? +What is based on this CAPRI criterion the quality of the best model? Is it the same model that did fit biological insights best?
-## Congratulations! +## Congratulations! 🎉 You have completed this tutorial. If you have any questions or suggestions, feel free to contact us via email or asking a question through our [support center](https://ask.bioexcel.eu){:target="_blank"}.
-## Additional docking runs - -If you are curious and want learn more about HADDOCK and the impact of the input data on the docking results, consider performing and analysing, as described above, the following runs: +## Additional docking runs -* Same run as above, but without defining the phosphorylated histidine -* Same run as above, but using only the first model of the HPR ensemble (edit the file to extract it) +If you are curious and want learn more the impact of the input data on the docking results in HADDOCK, consider performing and analysing the following runs: +* E2A-HPR docking without defining the phosphorylated histidine; +* E2A-HPR docking using only the first model of the HPR ensemble - you can either open ensemble in PyMOL and save the 1st state only, or manually copy "MODEL 1" from file using text editor, or use [PDBTOOLS](https://wenmr.science.uu.nl/pdbtools/submit){:target="_blank"}. -And check also our [education](/education) web page where you will find more tutorials! +Don't hesitate to browse [education](/education) page, you will find more tutorials these! [link-pymol]: https://www.pymol.org/ "PyMOL" diff --git a/education/HADDOCK24/HADDOCK24-protein-protein-basic/phosphate-binds-histidine.png b/education/HADDOCK24/HADDOCK24-protein-protein-basic/phosphate-binds-histidine.png new file mode 100644 index 000000000..e5c61d3ce Binary files /dev/null and b/education/HADDOCK24/HADDOCK24-protein-protein-basic/phosphate-binds-histidine.png differ diff --git a/education/HADDOCK24/HADDOCK24-protein-protein-basic/running.png b/education/HADDOCK24/HADDOCK24-protein-protein-basic/running.png index a68c532ef..368f4898b 100644 Binary files a/education/HADDOCK24/HADDOCK24-protein-protein-basic/running.png and b/education/HADDOCK24/HADDOCK24-protein-protein-basic/running.png differ diff --git a/education/HADDOCK24/HADDOCK24-protein-protein-basic/submission.png b/education/HADDOCK24/HADDOCK24-protein-protein-basic/submission.png deleted file mode 100644 index 8af454047..000000000 Binary files a/education/HADDOCK24/HADDOCK24-protein-protein-basic/submission.png and /dev/null differ