Non-Canonical Nucleobases¶

MDNA supports a wide range of non-canonical and synthetic nucleobases beyond the standard A, T, G, C alphabet. This page describes all supported bases and how to use them.

Complementary Base Pairing Map¶

complementary_map = {
    'A': 'T',  'T': 'A',  'G': 'C',  'C': 'G',   # Canonical
    'U': 'A',                                        # RNA
    'E': 'T',  'D': 'G',                            # Fluorescent
    'L': 'M',  'M': 'L',                            # Hydrophobic UBP
    'B': 'S',  'S': 'B',  'Z': 'P',  'P': 'Z'      # Hachimoji
}

Base Categories¶

Canonical Bases¶

Code	Name	Pairs With
A	Adenine	T
T	Thymine	A
G	Guanine	C
C	Cytosine	G

RNA Incorporation¶

Code	Name	Pairs With	Reference
U	Uracil	A	—

Represents RNA incorporation into a DNA duplex.

Fluorescent Bases¶

Code	Name	Pairs With	Reference
E	2-Aminopurine (2AP)	T	Ward et al., 1969
D	tC (tricyclic cytosine)	G	Wilhelmsson et al., 2003

2-Aminopurine is widely used as a fluorescent probe due to its sensitivity to the local environment. The tricyclic cytosine analogue (tC) is known for its unique photophysical properties and high quantum yield.

Hydrophobic Unnatural Base Pairs¶

Code	Name	Pairs With	Reference
L	d5SICS	M	Malyshev et al., 2014
M	dNaM	L	Malyshev et al., 2014

These hydrophobic pairs maintain duplex stability without hydrogen bonding, demonstrating that shape complementarity alone can support base pairing.

Hachimoji Bases¶

Code	Name	Pairs With	Reference
B	A-analogue (isoG)	S	Hoshika et al., 2019
S	T-analogue (isoC)	B	Hoshika et al., 2019
Z	G-analogue	P	Hoshika et al., 2019
P	C-analogue	Z	Hoshika et al., 2019

The Hachimoji ("eight-letter") DNA system extends the genetic alphabet from 4 to 8 bases, forming two new orthogonal pairs (B–S and P–Z).

Usage¶

Constructing with Non-Canonical Bases¶

You can include non-canonical bases directly in the sequence:

import mdna

# DNA with hachimoji bases
dna = mdna.make(sequence='ATBSCGPZ')
dna.describe()

Mutating to Non-Canonical Bases¶

dna = mdna.make(sequence='AGCGATATAGA')

# Introduce fluorescent base at position 0 and hydrophobic pair at position 5
dna.mutate({0: 'E', 5: 'L'}, complementary=True)
dna.describe()

Inspecting the Complementary Map¶

# The Nucleic object carries the pairing map
print(dna.base_pair_map)

Structural Notes¶

All non-canonical bases use reference geometries stored as HDF5 files in mdna/atomic/bases/. The reference frame convention follows the Tsukuba convention, with the glycosidic bond atom varying by base type:

Purines (A, G, E, B, P): N9–C4 convention
Pyrimidines (C, T, U, D): N1–C2 convention
Hachimoji pyrimidines (S, Z): C1–C2 convention
Hydrophobic (L): N1–C5 convention
Hydrophobic (M): C1–C6 convention

These conventions ensure correct frame placement and base pair geometry for all supported nucleotides.