In 1987, while investigating the mechanism for isozyme conversion of alkaline phosphatase, Ishino noticed short, regular, invariant repeats in the genome of E.coli, which were later to be called Clustered Regularly Interspaced Short Palindromic Repeats, or CRISPR. Six years later, Mojica noticed the same repeats in the genome of an archaea called
Haloferax mediterranei. (Archae are procaryotes, like bacteria, and do not have organelles or a nucleus. They often live in extreme environments such as hot springs). Over the subsequent years, CRISPRs were described in more archaea and bacteria. As these were mostly thermophilic or hyperthermophilic, it was thought that the CRISPR locus encoded genes to allow these organisms to survive in hot environments.
However, in 2005, 3 separate papers pointed to a startling fact- the repeats in the CRISPR loci had spacer nucleotides between them. The nucleotide sequence in these spacers was identical to certain bacteriophages, prophages and plasmids. However, while these viruses and plasmids freely invaded many species of archaea and bacteria, the species which contained CRISPR were immune to them.
Thus the CRISPR and the contained spacers were thought to be responsible for providing immunity against phages and plasmids.
This was finally proven in 2007, when the lactic acid bacterium Streptococcus thermophilus acquired resistance to a phage, upon the phage sequence being introduced into its spacer region within the CRISPR locus. This resistance disappeared when the putative sequence (called
protospacer when it is part of the viral genome) was removed from the phage.
The elucidation of genomic sequences at the turn of the century led to the comparison of CRISPR containing organisms with others. It then became clear that there were several conserved genes (4 initially) next to the CRISPR region. These were named CRISPR associated genes or
Cas for short. Makarova later showed that the Cas genes coded for nucleases, helicases and integrases for DNA. It was fairly intuitive to figure out that CRISPR and Cas worked in harmony to impart the host organism immunity against invading phages and plasmids. Thus the term CRISPR-Cas was coined.
It is thought that phylogenetic ancestors of the archaea captured the viral or plasmid DNA during previous infections. When the concerned phage tries to invade the archaea, DNA from the spacers present in the CRISPR locus produces a complementary strand of RNA. This guide strand, hybridises with the DNA of the invading phage, where the endonuclease in Cas, the most well characterised being Cas9, makes a double stranded break in the phage DNA (Figure 1).
Figure 1. CRISPR-Cas system
This is very similar to the adaptive immune system developed by eucaryotes. There is remarkable similarity between the CRISPR-Cas system, and the RNA interference seen in eucaryotes. The latter occurs in 2 ways through non-coding RNA. Firstly a single stranded antisense oligonucleotide (ASO) can hybridise with target mRNA. The central portion of the ASO has a DNA like configuration, and upon hybridisation with its target, activates an enzyme called RNAse-H which fragments the mRNA. A second type of non coding RNA exists in a double stranded form. This dsRNA is digested by an enzyme called dicer (RNAse III) into a 21-nucleotide fragment called small interfering RNA (siRNA). Inside the cytoplasm, the passenger strand of the siRNA is peeled away by an enzyme called Argonaute-2 (Ago2), while the guide strand is loaded onto an assembly called RNA induced silencing complex (RISC). Inside the RISC, Ago-2, which is functionally similar to the Cas proteins, will cleave the target mRNA once the siRNA has hybridised with the latter.
Micro RNA (miRNA) is another non-coding RNA, which works very similarly to siRNA. It is split by dicer from hairpin shaped pre-miRNA, which itself has been split off from a larger strand called primary miRNA by an enzyme called Drosha. the principal difference between siRNA and miRNA is that while siRNA has to hybridise completely with the target mRNA strand to cleave the latter through Ago-2, miRNA only partially hybridises with the target mRNA, at its noncoding 3' end. Thus, while siRNA is strictly specific for its cognate mRNA in a 1:1 stoichiometry, miRNA can neutralise myriad mRNA due to its requirement for only partial hybridisation. While siRNA acts by cleaving its target mRNA, miRNA acts by neutralising or inactivating the latter in most cases, and thus does not require Ago2.
Back to the CRISPR-Cas system. As the CRISPR-Cas system is present in 90% of archaea and around half of bacteria, it is believed that the system evolved in archaea and moved horizontally into bacteria.
Why do the Cas enzymes not destroy the spacer sequence in the host procaryote? This is prevented by the requirement that the guide strand must contain an additional sequence that would be complimentary to the protospacer adjacent motif (PAM) sequence, situated next to the protospacer in viruses. Without recognition of this PAM motif, the Cas complex will not cleave the protospacer sequence.
Two classes of Cas enzymes are recognised- Class I & Class II. Class I Cas enzymes are multicomponent with multiple Cas, while Class II Cas enzymes have a single Cas. Class I is further subdivided into Groups 1, 3 & 4, while Class II is comprised of groups 2, 5 & 6.
Class II Cas enzymes are particularly useful for genome editing. The most well characterised Cas enzyme- Cas9- is a member of Group 2, while Cas12a and 12b bbelong to Group 5 and Cas 13a and !3b are in Group 6.
The CRISPR-Cas9 complex contains the guide RNA. This is derived from an initial transcript of the spacer sequence called pre-CRISPR RNA or pre crRNA. The guide RNA is derived from the precrRNA and is called crRNA. The CRISPR-Cas9 complex contains a second RNA in addition to crRNA, called the transactivating RNA or traRNA. This latter is required for Cas9 packaging into the complex, but is not required for all Cas enzymes.
The crRNA is also called the single guide RNA or sgRNA.
While Cas9 cleaves only dsDNA, Cas12a cleaves both dsDNA and ssDNA, as well as RNA, and Cas 13a only cleaves RNA.
The CRISPR-Cas9 system has become the predominant tool for genome editing due to its simplicity of use and specificity. Genome editing exploits the principle that the cell will repair a break in DNA through one of two mechanisms- either non-homologous end joining (NHEJ) or homology directed repair (HDR). In NHEJ, the cell simply seals the break in DNA, by joining the two broken ends. This results in a high likelihood of insertions and deletions (indels). In HDR, the template strand is used to repair the break, thus resulting in a higher degree of fidelity. Thus, these 2 methods can be exploited in different ways in gene editing.
When the aim of gene editing is to knock out a defective or disease causing gene, NHEJ repair is targeted. This is applied for example, in subjects with HIV, where the receptor CCR5 in T-cells, which serves as an entry portal for HIV, is knocked out through the CRISPR-Cas9 tool.
On the other hand, when the purpose of gene editing is to replace a vital missing gene, say that encoding the common gamma chain in X-linked SCID, HDR is more suitable. Here, CRISPR Cas9 is used to make the double stranded break, but in addition, a replacement template containing the putative missing gene is introduced into the cell by electroporation. When the cell uses HDR to repair the break, the template containing the missing gene slots into place. This is more difficult to perform than knocking out genes through NHEJ, and as such has a lower success rate.
Of course, CRISPR-Cas9 is not without its challenges and drawbacks. For one, the enzyme Cas9 is derived from Steptococcus pyogenes, and most human beings have antibodies to it. However, as the enzyme is used as single use hit and run, this is not insurmountable. A second problem is that innate immune system may be stimulated by the crRNA and traRNA in CRISPR to produce interferons through TLR stimulation. Synthetic RNA tends to be less stimulatory, however.
A third problem is off target DNA destruction by Cas9, or sustained on target action beyond that required. Some ingenious solutions have been suggested to get around this problem. For example, an anti-oligonucleotide oligonucleotide can be injected to neutralise the sgRNA after it has served its purpose. This will only happen if a further oligonucleotide is included to target the PAM sequence in addition to the one targeting the stem loop sequence contained within the guide RNA.
Gene editing can occur ex vivo, or in vivo. In vivo administration requires carefully planned delivery to ensure that the RNA reaches its target site. For example, the liver takes up oligonucleotides, tagged with n-acetyl glucosamine, through its asialoglycoprotein receptors. Thus CRISPR-Cas9, primed to knock out the PCSK9 gene can be targeted to the liver to lower cholesterol. CAR-T therapy, designed to treat aggressive leukaemias, is a successful application of ex-vivo gene editing, where CD52 and TCR alpha are knocked out to make the infused T-cells resistant to alemtuzumab, administered to ablate the host T cells.