Connecting Two DNA Strands¶

This tutorial demonstrates how to identify and connect two separate DNA strands from a loaded structure, and integrate the connected DNA with protein components. This process is essential for creating comprehensive models for visualization and analysis.

Steps Covered:¶

Load and identify DNA strands in a structure containing proteins.
Calculate and visualize control points for connecting DNA strands.
Connect the DNA strands and integrate the result with the protein structure.
Visualize and save the new combined structure.

Loading the Structure and Identifying DNA Strands¶

Load a PDB file containing both DNA and protein components and identify DNA strands based on residue names.

In [1]:

Copied!





import mdtraj as md
import numpy as np
import matplotlib.pyplot as plt
import nglview as nv
import mdna 


# Load the structure
traj = md.load('./pdbs/8srp.pdb')
protein = traj.atom_slice(traj.top.select('protein'))
view = nv.show_mdtraj(traj)

view
import mdtraj as md
import numpy as np
import matplotlib.pyplot as plt
import nglview as nv
import mdna 


# Load the structure
traj = md.load('./pdbs/8srp.pdb')
protein = traj.atom_slice(traj.top.select('protein'))
view = nv.show_mdtraj(traj)

view

/Users/thor/surfdrive/Projects/mdna/mdna/PMCpy/pmcpy/Evals/PyLk/pylk/writhemap.py:16: UserWarning: Cython version of writhemap (PyLk) not compiled. Defaulting to numba implementation. Consider compiling the cython version.
  warnings.warn(
/Users/thor/surfdrive/Projects/mdna/mdna/PMCpy/pmcpy/Evals/PyLk/pylk/eval_link.py:10: UserWarning: Cython version of linkingnumber (PyLk) not compiled. Defaulting to numba implementation. Consider compiling the cython version.
  warnings.warn(

joblib is not installed. Falling back to sequential computation.

/Users/thor/miniforge3/envs/mdna/lib/python3.12/site-packages/mdtraj/formats/pdb/pdbfile.py:200: UserWarning: Unlikely unit cell vectors detected in PDB file likely resulting from a dummy CRYST1 record. Discarding unit cell vectors.
  warnings.warn('Unlikely unit cell vectors detected in PDB file likely '

NGLWidget()

Identifying DNA Chains¶

Extract the indices of DNA chains based on common residue names for nucleotides.

In [9]:

Copied!





DNA_residue_names = ['DG','DC','DT','DA']
DNA_chainids = []
for chain in traj.top.chains:
    for res in chain._residues:
         if str(res.name) in DNA_residue_names:
            DNA_chainids.append(res.chain.index)
DNA_chainids = np.unique(DNA_chainids)
DNA_chainids = np.array([DNA_chainids[i:i + 2] for i in range(0, len(DNA_chainids), 2)])
DNA_residue_names = ['DG','DC','DT','DA']
DNA_chainids = []
for chain in traj.top.chains:
    for res in chain._residues:
         if str(res.name) in DNA_residue_names:
            DNA_chainids.append(res.chain.index)
DNA_chainids = np.unique(DNA_chainids)
DNA_chainids = np.array([DNA_chainids[i:i + 2] for i in range(0, len(DNA_chainids), 2)])

Connecting DNA Strands¶

Load the DNA strands, calculate control points for connection, and visualize the initial and connected states.

In [10]:

Copied!





dna_a = mdna.load(traj=traj, chainids=DNA_chainids[0])
dna_b = mdna.load(traj=traj, chainids=DNA_chainids[1])

# Get frames to calculate control points
frames_a = dna_a.get_frames()
frames_b = dna_b.get_frames()
start = np.squeeze(frames_a[-1])[0]
end = np.squeeze(frames_b[0])[0]

# Calculate incremental positions for smoother curves
start_increment = start + np.array([0.5, 0, 3])
end_increment = end + np.array([-0.5, 0, 3])
center_of_mass = (start + end) / 2
center_of_mass += np.array([0, 0, 1]) * 20
control_points = np.array([start, start_increment, center_of_mass, end_increment, end])

# Draw initial strands and control points
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111, projection='3d')
dna_a.draw(ax=ax, triads=True, color_lead='blue')
dna_b.draw(ax=ax, triads=True, color_lead='red')
ax.plot(control_points[:,0], control_points[:,1], control_points[:,2], marker='o', color='g', linestyle='-', linewidth=2, markersize=12)
dna_a = mdna.load(traj=traj, chainids=DNA_chainids[0])
dna_b = mdna.load(traj=traj, chainids=DNA_chainids[1])

# Get frames to calculate control points
frames_a = dna_a.get_frames()
frames_b = dna_b.get_frames()
start = np.squeeze(frames_a[-1])[0]
end = np.squeeze(frames_b[0])[0]

# Calculate incremental positions for smoother curves
start_increment = start + np.array([0.5, 0, 3])
end_increment = end + np.array([-0.5, 0, 3])
center_of_mass = (start + end) / 2
center_of_mass += np.array([0, 0, 1]) * 20
control_points = np.array([start, start_increment, center_of_mass, end_increment, end])

# Draw initial strands and control points
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111, projection='3d')
dna_a.draw(ax=ax, triads=True, color_lead='blue')
dna_b.draw(ax=ax, triads=True, color_lead='red')
ax.plot(control_points[:,0], control_points[:,1], control_points[:,2], marker='o', color='g', linestyle='-', linewidth=2, markersize=12)

Out[10]:

[<mpl_toolkits.mplot3d.art3d.Line3D at 0x28efbc740>]

No description has been provided for this image

Now the control point shave been defined we can make the loop.

In [11]:

Copied!

# Connect the strands
dna_c = mdna.connect(dna_a, dna_b, control_points=control_points)
dna_c.draw(ax=ax)
# Connect the strands
dna_c = mdna.connect(dna_a, dna_b, control_points=control_points)
dna_c.draw(ax=ax)

Optimal BP: 26, Twist Difference per BP: 0.176 degrees
Optimal number of base pairs: 26
Random sequence: TGTTCTGAGCCACGTTCCCTCGACCGGACTTTAAGTGGGTCAGTCTTTTATAACCTGAGGTAAGAAGTAGAGCGTACTAGTGATGTAAGTGCAAACGAGCGTGGGGGAGACATATCGCACG 

Minimize the DNA structure:
simple equilibration = False 
equilibrate writhe = False 
excluded volume radius = 0.0 
temperature = 300
Circular: False

Visualization and Saving the Connected Structure¶

Combine the connected DNA with the protein structure and visualize the complete assembly.

In [12]:

Copied!





# Combine DNA and protein trajectories
dna_traj = dna_c.get_traj()
connected_traj = protein.stack(dna_traj)
view = nv.show_mdtraj(connected_traj)
view
# Combine DNA and protein trajectories
dna_traj = dna_c.get_traj()
connected_traj = protein.stack(dna_traj)
view = nv.show_mdtraj(connected_traj)
view

NGLWidget()

In [13]:

Copied!

# Save the combined structure
connected_traj.save('./pdbs/8srp_connected.pdb')
# Save the combined structure
connected_traj.save('./pdbs/8srp_connected.pdb')