Science Blog: April 2020

Wednesday, 29 April 2020

The Non-coding RNAs of Eucaryotes

Messenger RNA, tRNA and rRNA are not the only RNA species present in cells. There are several other non-coding (they do not code for protein) RNA in the cell- small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), heterogenous nuclear RNA (hnRNA), micro RNA (miRNA), small interfering RNA (siRNA), telomerase and signal recognition peptide (SRP). It is important to appreciate that noncoding RNA are also transcribed from DNA- but unlike mRNA, they are not translated into protein. That is to say, all genes do not lead to protein as the end product- for some, the end product is RNA.

Some of the noncoding RNA are produced from introns (as below), which might seem non-intuitive as the generally held view is that introns do not code for anything.

The eucaryotic pre-mRNA is capable of being alternatively spliced in to different mRNAs and can therefore produce different proteins from the same DNA code. This happens in the "spliceosome". Unusually, the splicing reaction is not catalysed by proteins, but by non-coding RNA, called small nuclear RNA (snRNA). A similar situation is seen during protein synthesis, where rRNA, which forms 2/3 of the ribosome-the other third is protein- catalyses the peptide bond formation required to lengthen the polypeptide chain. Such catalytic RNA, which act like enzymes, are called ribozymes. It's thought that they hark back to very early evolution, when RNA was the main catalyst for living cells rather than protein.

There are 5 types of snRNA- U1, U2, U4, U5 and U6. There is no U3 snRNA. These associate with proteins, and together, the complex is is called snRNP. Around 90% of multiexonic mRNA in humans is subject to alternative splicing. The most common form of alternative splicing in humans is exon-skipping, followed by intron retention.

Bacteria have a single RNA polymerase, and do not have alternative splicing. There are 3 RNA Polymerases in eucaryotes- I, II, and III. RNA Polymerase I codes for most rRNA- 18S, 28S, and 5.8S. RNA Polymerase II codes for all protein coding RNA- ie the messenger RNA. RNA Polymerase III codes for tRNA, and also 5S rRNA. The S in rRNA refers to the rate of sedimantation in an ultracentrifuge. The larger the S value, the larger the rRNA.

Messsenger RNA only form 3-5% of the total RNA in the cell. Fully 80% of cellular RNA is comprised of rRNA. Thus, human beings contain some 200 rRNA genes per haploid genome, which are together responsible for making 10 million copies of each type of rRNA (28S, 5.8S, 18S and 5S) to constitute 10 million ribosomes.

The 200 rRNA coding genes are present in just 5 chromosomes- 13, 14, 15, 21 and 22. In all 5 chromosomes, these genes are located right at the very tip (end) of the short arms. 28S, 18S and 5.8S rRNA are all made initially as part of a larger 45 S pre-rRNA before being spliced. 5S rRNA is made separately. All but 18S RNA contribute to form the large 60S rRNA subunit. The smaller 40S subunit is comprised of 18S rRNA.

Many Small nucleolar RNAs are encoded on the introns of other genes, mainly for ribosomal proteins. They are synthesised by RNA Polymerase II and processed from excised Intron sequences.

The nucleolus is the cellular site for manufacture and processing of all noncoding RNA, and additionally carries the genes for tRNA. The 5 pairs of chromosomes mentioned above each make a portion of the nucleolus, which then fuse together to form one large nucleolus.

Antibodies to RNA Polymerase III in Systemic Sclerosis are associated with a higher risk of cancer.

Monday, 20 April 2020

Insights From Human Genome Sequencing

It's two decades since the human genome was sequenced. What it revealed has changed our understanding of the human genome and allowed us to construct a phylogenetic tree of how we got here.

The human genome, like most other mammalian genomes, is comprised of 3.2 million base pairs. There are 25000 known genes. Most mammals have similar number of DNA base pairs, the chicken has around a million, while the Japanese pufferfish, Takifugu rubripes is an outlier with only 400 million base pairs.

Only 5% of the human DNA is transcribed, i.e. read into mRNA, and only 1.5% is translated into protein from exons. Vast tracts of DNA therefore has no apparent function. It is in this context that the pufferfish's remarkable genomic efficiency must be viewed, as it appears to have rid itself of most of its "junk" DNA through evolution.

The need to preserve and keep hold of essential DNA leads to the remarkable similarities seen between diverse organisms such as the yeast, a worm (Caenorhabditis elegans), a model plant (Arabidopsis thaliana) and the mammalian genome.

However, the "excess (noncoded) DNA" is not without its uses. It provides a window on millions of years of evolution. By comparing the genomic sequences of chicken and Homo sapiens, for example, one can tell that these two organisms diverged from their common ancestor 300 million years ago. A similar exercise tells you that we humans diverged from our nearest anthropoid relative, the Chimpanzee a mere 7 million years ago. We are slightly more distantly related to the Gorilla, and even more distantly to the Orang utan.

So what accounts for the remarkable redundancy of the human (and mammalian) genome? Most such redundancy can be explained by "repeats", which account for over 50% of the >3 million base pairs. There are several types of such repeats, namely transposons, simple sequence repeats and segmental duplications. We'll discuss each in turn.

By sheer volume alone, the most abundant of these repeats are the transposons, forming fully 44% of the human genome. Barbara McClintock was awarded the Nobel Prize in 1983 for her discovery of transposons, popularly known as "jumping genes". Transposons are parasitic DNA, which far predate the human species itself. There is evidence that their origins stretch back more than 300 million years.

There are 4 types of transposons- LINE, SINE, LTR retrotransposons, and DNA transposons. Of these, LINE is the most abundant and arguably the most successful, as it still accounts for roughly 1 in 250 mutations in the human genome. Very few SINE still exists in the genome, and the LTR transposons and DNA transposons have died out for all practical purposes except for the solitary exception of HERV-K for the former.

LINE, or Long Interspersed Nuclear Elements- constitute 21% of the human genome. Such DNA is "autonomous", which means that it codes its own proteins needed for propagation. LINE contains 2 open reading frames (the equivalent of exons), and a reverse transcriptase. Unlike retroviruses, and their related LTR retrotansposons, reverse transcription takes place in the nucleus, where an endonuclease makes a single stranded nick in the human DNA to insert the retrotranscribed LINE DNA. This process starts at the 3' end and proceeds towards the 5' end. However, it is often incomplete, i.e. in many cases, it doesn't reach the 5' end. LINE derived RNA has a poly-A tail at the 3' untranslated region, which in eucaryotes, protects the mRNA from degradation once transcribed. The tail has an exactly opposite function in procaryotes.

Misplaced insertion of LINE elements has been associated with diverse human diseases such as dementia and cancers. Perhaps the most interesting example, described by Kazazian, was the insertion of a LINE transposon from Chromosome 22 in a woman into her X chromosome in the middle of the Factor VIII gene, a fact uncovered after her son was born with Haemophilia A, despite an absence of family history.

SINE, or Short Interspersed Elements are non-autonomous, unlike LINE. They do not code for protein, and in fact depend on LINE for the proteins needed for transposition. As such, they are more vulnerable to mutational loss. For example, when LINE2 died out roughly 50 million years ago, so did the associated SINE.

While most SINEs are now no longer functional and therefore cannot propagate, there are two- Alu and SVA which remain functional.

Interestingly, SINE & LINE are located in different parts of the genome. While SINE favours GC rich areas, LINE is located in more AT rich areas. GC rich areas have a higher gene density, while AT rich areas are "genetic deserts", ie they are dominated by non coding, apparently nonfunctional DNA. Some authorities think that SINE elements have a symbiotic relationship with the host DNA, where they work by reducing the likelihood of harmful mutations.

The LTR (Long Terminal repeat) retrotransposons are thought to be the predecessors of ancient retoviruses. Like the latter, they have LTR at both ends and reproduce by reverse transcription from RNA in the cytoplasm (not nucleus, unlike LINEs). As such, they code for gag and pol proteins, just like retroviruses. LTR retrotransposons have all but died out in the human genome. The only remaining retrotransposon- HERV-K has no known function.

Similarly, DNA transposons- which contain inverted repeats at either end, and whose description in maize led to the award of McClintock's Nobel Prize nearly 40 years later, are no longer functional in the human genome. They do remain functional in bacteria though, where they are responsible for horizontal transmission of antibiotic resistance. As it cannot spread from human beings horizontally, it became nonfunctional in the latter.

The human genome is remarkably repeat rich with interspersed transposons, in comparison with the yeast or invertebrates. Furthermore, the transposons in the human genome are ancient, compared with their counterparts in these other organisms. Again, a direct comparison of these repeats between human beings and mouse shows that human repeats are much older. It seems therefore that Homo sapiens has kept hold of ancient repeats in comparison with other organisms including fellow mammals despite the fact that most of these repeats serve no discernible function. Wish we were all as efficient as the puffer fish!

Not all chromosomes in the human cell are equally ancient. The Y chromosome, for example, is a relatively "young" chromosome, with rapid turnover of repeats, ie the repeats on Y chromosome are phylogenetically millions of years younger than on other chromosomes.

Simple sequence repeats are 2 or 3 base repeats such as AT and ATG which are polymorphic. This latter property- polymorphism- particularly in (CA)n has been useful in establishing identity, paternity tests, etc. When n is 1-13, these repeats are called microsatellites, while n=14 or more are called minisatellites. For some reason, (CA)n polymorphisms are infrequent on the X chromosome, i.e. Most X chromosomes have roughly equal numbers of CA repeats.

Segmental duplications involve duplications of 1-250 kb. For some reason, they tend to favour pericentromeric and telomeric regions. They can be intrachromosomal or interchromosomal. When they are intrachromosomal, they are called Low Copy repeats (LCRs). Intrachromosomal segmental duplications lead to deletion or duplication during crossover, and thus contiguous gene syndromes such as CMT 1A (due to duplication of PMP22). Similarly, it can lead to microdeletion syndromes such DiGeorge and velocardiofacial syndrome and Williams-Bueren syndrome.

LCRs are ubiquitous and can lead to problems with accurate genetic mapping with short reads, leading to gaps in the mapped genome.

Interchromosomal segemental duplication can lead to spread of a disease causing sequence to other chromosomes The most notable example of this is is the duplication of the adrenolekodystrophy locus from Xq28 to the pericentromeric region of chromosomes 2, 10, 16 and 22. Many inter and intrachromosomal duplications involve the X chromosome.

During meiosis, crossover occurs. There are 2 structural observations that are relevant here. First crossovers tend to affect the short arm of chromosomes far more than the long arm. Secondly, meiotic crossover is less common close to centromeres, and increases in the terminal 20-35 Mb section of the chromosome.

The unit for measuring "closeness" or linkage of loci is centiMorgan or cM. The closer the loci are, the less is the likelihood of crossover at meiosis. When two loci are separated by 1 cM, that equates with an 1% chance that these two loci will be separated by crossover at meiosis. The chances of crossing over for two loci is expressed as cM/Mb. The most crossover prone part of the human genome resides in the short arms of chromosome X & Y. Thus, two genes located in Xp or Yp have an almost 100% chance of being separated at cross over.

Common though the repeats above are, there are certain portions of the genome which they leave alone as being almost sacred. These regions have very few repeats. In mammals, 4 such regions are Homeobox A, B, C, and D. They are known as HoxA, HoxB, HoxC, and HoxD. The homeoboxes are responsible for embryonic development in the antero-posterior axis, and it is thought that ontogenically, mammals will not tolerate any disruption of this function by the interposition of repeats. The same however does not apply to reptiles, who have many repeats in their Hox regions and display a remarkable variety of species, perhaps due to the variation caused by these repeats during embryonic development. The remarkable speciation found in Anilis lizard is a good example of this phenomenon.

Not all parts of the human genome are equally rich in GC or AT. In fact, GC pairs only constitute 41% of the human genome, and AT pairs make up the other 59%. It is thought that over time (millions of years), there is steady mutational erosion of GC, being gradually replaced by AT. This is of some importance, as GC pairings are remarkably over-represented in gene rich regions- ie they appear in areas of high gene density. This is not to be confused with the density seen on Giemsa staining- called G bands. GC rich areas correspond to lighter G bands, while AT rich areas have denser G bands, ie the exact opposite of gene density.

What causes GC rich areas to be more gene dense? This is almost completely attributed to much shorter intron lengths in GC rich areas. The length of exons and exon numbers are relatively invariant between GC rich and AT rich areas.

CpG islands consist of cytosine bound to guanine through a phosphodiester bond in the 5'-3' direction from C to G (that is to say CpG is not the same as GpC). If we go by the relative frequency of Cytosine and Guanine bases- 21% each, then the frequency of CpG islands in the human genome should be 0.21*0.21, or around 4%. In actuality, the frequency of CpG islands is only a fifth of this.

This remarkable finding is explained by the fact that a large proportion of cytosine bases in H.sapiens are methylated. These spontaneously mutate to thymine. Unmethylated cytosine bases also mutate spontaneously to uracil, which, being foreign to DNA, is quickly corrected back to cytosine.

There may be an element of self preservation about the fact that humans have methylated CpG islands. The opposite applies to bacteria and viruses, who have hypomethylated DNA. When bacteria or viruses invade the human cell, TLR9 detects them through the fact that they are unmethylated and thus activates the innate immune system. For viruses, for example, this can lead to increased production of Type I interferons by plasmacytoid dendritic cells.

CpG islands in human beings are over-represented in promoter regions based at the transcription start (5') end, and it is thought that they play a vital part in the function of these promoter regions. As expected, CpG islands occur in gene dense areas, just like CG base pairs.

Again, the human chromosomes differ in their content of CpG islands. The average number of CpG islands across all human chromosomes is 5-15 per Mb. The Y chromosome is relatively bereft, with only 2.9, while Chromosome 19 is an extreme outlier with 43 CpG islands per Mb.

Since there are 4 bases in RNA, the number of triplet codons on mRNA that can be made from these 4 bases is 4^3 or 64. As there are only 20 amino acids (21 if you include Selenocysteine), there is redundancy here. However redundancy is also reflected in the number of anticodons on tRNA, numbering only 46, due to the fact that the 1st RNA base on an anticodon, which corresponds with the 3rd RNA base on the codon, often shows "wobble". For 2-codon boxes (where the 3rd base on codon could be one of 2 choices), this is seen when the 3rd base is either C or U. The corresponding 1st base on the anticodon in such cases could be either G or A. Asparagine, for example has codons AAU & AAC. The cognate tRNA anticodon for asparagine could thus be either GUU or AUU. In reality, there are 33 genes which code for GUU and only one for AUU, thus reflecting both redundancy and codon preference.

In practice, when A is present as the first base on an anticodon, it is almost always post-translationally deaminated to inosine. Thus AUU, in reality, becomes IUU.

Friday, 10 April 2020

CRISPR-Cas9- From Archaea to Genome Editing

In 1987, while investigating the mechanism for isozyme conversion of alkaline phosphatase, Ishino noticed short, regular, invariant repeats in the genome of E.coli, which were later to be called Clustered Regularly Interspaced Short Palindromic Repeats, or CRISPR. Six years later, Mojica noticed the same repeats in the genome of an archaea called Haloferax mediterranei. (Archae are procaryotes, like bacteria, and do not have organelles or a nucleus. They often live in extreme environments such as hot springs). Over the subsequent years, CRISPRs were described in more archaea and bacteria. As these were mostly thermophilic or hyperthermophilic, it was thought that the CRISPR locus encoded genes to allow these organisms to survive in hot environments.

However, in 2005, 3 separate papers pointed to a startling fact- the repeats in the CRISPR loci had spacer nucleotides between them. The nucleotide sequence in these spacers was identical to certain bacteriophages, prophages and plasmids. However, while these viruses and plasmids freely invaded many species of archaea and bacteria, the species which contained CRISPR were immune to them.

Thus the CRISPR and the contained spacers were thought to be responsible for providing immunity against phages and plasmids.

This was finally proven in 2007, when the lactic acid bacterium Streptococcus thermophilus acquired resistance to a phage, upon the phage sequence being introduced into its spacer region within the CRISPR locus. This resistance disappeared when the putative sequence (called protospacer when it is part of the viral genome) was removed from the phage.

The elucidation of genomic sequences at the turn of the century led to the comparison of CRISPR containing organisms with others. It then became clear that there were several conserved genes (4 initially) next to the CRISPR region. These were named CRISPR associated genes or Cas for short. Makarova later showed that the Cas genes coded for nucleases, helicases and integrases for DNA. It was fairly intuitive to figure out that CRISPR and Cas worked in harmony to impart the host organism immunity against invading phages and plasmids. Thus the term CRISPR-Cas was coined.

It is thought that phylogenetic ancestors of the archaea captured the viral or plasmid DNA during previous infections. When the concerned phage tries to invade the archaea, DNA from the spacers present in the CRISPR locus produces a complementary strand of RNA. This guide strand, hybridises with the DNA of the invading phage, where the endonuclease in Cas, the most well characterised being Cas9, makes a double stranded break in the phage DNA (Figure 1).

Figure 1. CRISPR-Cas system

This is very similar to the adaptive immune system developed by eucaryotes. There is remarkable similarity between the CRISPR-Cas system, and the RNA interference seen in eucaryotes. The latter occurs in 2 ways through non-coding RNA. Firstly a single stranded antisense oligonucleotide (ASO) can hybridise with target mRNA. The central portion of the ASO has a DNA like configuration, and upon hybridisation with its target, activates an enzyme called RNAse-H which fragments the mRNA. A second type of non coding RNA exists in a double stranded form. This dsRNA is digested by an enzyme called dicer (RNAse III) into a 21-nucleotide fragment called small interfering RNA (siRNA). Inside the cytoplasm, the passenger strand of the siRNA is peeled away by an enzyme called Argonaute-2 (Ago2), while the guide strand is loaded onto an assembly called RNA induced silencing complex (RISC). Inside the RISC, Ago-2, which is functionally similar to the Cas proteins, will cleave the target mRNA once the siRNA has hybridised with the latter.

Micro RNA (miRNA) is another non-coding RNA, which works very similarly to siRNA. It is split by dicer from hairpin shaped pre-miRNA, which itself has been split off from a larger strand called primary miRNA by an enzyme called Drosha. the principal difference between siRNA and miRNA is that while siRNA has to hybridise completely with the target mRNA strand to cleave the latter through Ago-2, miRNA only partially hybridises with the target mRNA, at its noncoding 3' end. Thus, while siRNA is strictly specific for its cognate mRNA in a 1:1 stoichiometry, miRNA can neutralise myriad mRNA due to its requirement for only partial hybridisation. While siRNA acts by cleaving its target mRNA, miRNA acts by neutralising or inactivating the latter in most cases, and thus does not require Ago2.

Back to the CRISPR-Cas system. As the CRISPR-Cas system is present in 90% of archaea and around half of bacteria, it is believed that the system evolved in archaea and moved horizontally into bacteria.

Why do the Cas enzymes not destroy the spacer sequence in the host procaryote? This is prevented by the requirement that the guide strand must contain an additional sequence that would be complimentary to the protospacer adjacent motif (PAM) sequence, situated next to the protospacer in viruses. Without recognition of this PAM motif, the Cas complex will not cleave the protospacer sequence.

Two classes of Cas enzymes are recognised- Class I & Class II. Class I Cas enzymes are multicomponent with multiple Cas, while Class II Cas enzymes have a single Cas. Class I is further subdivided into Groups 1, 3 & 4, while Class II is comprised of groups 2, 5 & 6.

Class II Cas enzymes are particularly useful for genome editing. The most well characterised Cas enzyme- Cas9- is a member of Group 2, while Cas12a and 12b bbelong to Group 5 and Cas 13a and !3b are in Group 6.

The CRISPR-Cas9 complex contains the guide RNA. This is derived from an initial transcript of the spacer sequence called pre-CRISPR RNA or pre crRNA. The guide RNA is derived from the precrRNA and is called crRNA. The CRISPR-Cas9 complex contains a second RNA in addition to crRNA, called the transactivating RNA or traRNA. This latter is required for Cas9 packaging into the complex, but is not required for all Cas enzymes.

The crRNA is also called the single guide RNA or sgRNA.

While Cas9 cleaves only dsDNA, Cas12a cleaves both dsDNA and ssDNA, as well as RNA, and Cas 13a only cleaves RNA.

The CRISPR-Cas9 system has become the predominant tool for genome editing due to its simplicity of use and specificity. Genome editing exploits the principle that the cell will repair a break in DNA through one of two mechanisms- either non-homologous end joining (NHEJ) or homology directed repair (HDR). In NHEJ, the cell simply seals the break in DNA, by joining the two broken ends. This results in a high likelihood of insertions and deletions (indels). In HDR, the template strand is used to repair the break, thus resulting in a higher degree of fidelity. Thus, these 2 methods can be exploited in different ways in gene editing.

When the aim of gene editing is to knock out a defective or disease causing gene, NHEJ repair is targeted. This is applied for example, in subjects with HIV, where the receptor CCR5 in T-cells, which serves as an entry portal for HIV, is knocked out through the CRISPR-Cas9 tool.

On the other hand, when the purpose of gene editing is to replace a vital missing gene, say that encoding the common gamma chain in X-linked SCID, HDR is more suitable. Here, CRISPR Cas9 is used to make the double stranded break, but in addition, a replacement template containing the putative missing gene is introduced into the cell by electroporation. When the cell uses HDR to repair the break, the template containing the missing gene slots into place. This is more difficult to perform than knocking out genes through NHEJ, and as such has a lower success rate.

Of course, CRISPR-Cas9 is not without its challenges and drawbacks. For one, the enzyme Cas9 is derived from Steptococcus pyogenes, and most human beings have antibodies to it. However, as the enzyme is used as single use hit and run, this is not insurmountable. A second problem is that innate immune system may be stimulated by the crRNA and traRNA in CRISPR to produce interferons through TLR stimulation. Synthetic RNA tends to be less stimulatory, however.

A third problem is off target DNA destruction by Cas9, or sustained on target action beyond that required. Some ingenious solutions have been suggested to get around this problem. For example, an anti-oligonucleotide oligonucleotide can be injected to neutralise the sgRNA after it has served its purpose. This will only happen if a further oligonucleotide is included to target the PAM sequence in addition to the one targeting the stem loop sequence contained within the guide RNA.

Gene editing can occur ex vivo, or in vivo. In vivo administration requires carefully planned delivery to ensure that the RNA reaches its target site. For example, the liver takes up oligonucleotides, tagged with n-acetyl glucosamine, through its asialoglycoprotein receptors. Thus CRISPR-Cas9, primed to knock out the PCSK9 gene can be targeted to the liver to lower cholesterol. CAR-T therapy, designed to treat aggressive leukaemias, is a successful application of ex-vivo gene editing, where CD52 and TCR alpha are knocked out to make the infused T-cells resistant to alemtuzumab, administered to ablate the host T cells.

Monday, 6 April 2020

Immune Serum and the promise of Monoclonal Antibodies Against COVID19

According to an article in JAMA, immune serum from convalescent subjects has been used successfully against COVID19. This shouldn't come as a surprise. The technique was used for SARS and Ebola with success, and is still used in the treatment of Tetanus, and prophylaxis against Hepatitis B, Rabies and Chicken Pox following exposure.

While the preliminary results are encouraging, I'd caution against regarding it as some kind of cure though. Apart from the difficulties of using it widely (one patient- one recipient), there are several caveats.

Firstly, the antibodies in immune serum may provide protection against the virus temporarily, but will stymie the development of intrinsic antibodies or a T-cell response and therefore leave the subject vulnerable against re-infection if the pandemic continues.

Secondly, under certain conditions, infused antibodies in immune serum can lead to a phenomenon called Antibody Dependant Enhancement (ADE), which leads to increased virulence of the same virus or a related virus.

The immunoglobulin molecule (antibody)- IgG for all practical purposes, has two principal components- the Fab or antigen binding fragment and the Fc, which is the bit that binds to receptors on macrophages, neutrophils, T-cells and antigen presenting cells. The mechanism for ADE goes like this- the infused IgG binds to say, the spike protein in COVID through the Fab portion, and causes a conformational change in the receptor binding domain of the spike protein. It then binds to the host cell- say macrophages in the lungs, through the Fc portion, and actually facilitates the entry of the virus into the cell, thus increasing the cytolytic effect of the virus, before the host immunity has upgeared to deal with this new invader.

Something similar happened while trying to treat MERS cases in the past with immune serum. This increased the vulnerability of some serum recipients to infection with SARS, thought to be due to ADE. Later, a research group showed that a similar thing could happen with SARS itself, i.e immune SARS serum enhancing SARS severity.

Thirdly, serum carries the risk of transmitting other infections. While certain infective agents are screened for routinely- HIV, Hep B, Hep C, Syphilis universally, Malaria, Chaga's disease and HTLV in endemic areas and CMV for transplant recipients, it is impossible to screen for all known infective agents.

Fourth, foreign serum can lead to a Type III immune reaction called serum sickness. Yes, this was first demonstrated with horse serum, but can happen with human to human serum transfer at low levels.

So, what's the solution?

The really encouraging news to emerge from the successful use of immune serum lies in exploiting the promise of monoclonal antibodies. These are already in widespread use for autoimmune diseases- RA, Crohn's, Psoriasis, Lupus, etc, but are increasingly finding their way into infectious diseases.

They are easy to manufacture from scratch- take weeks to months from conception to readiness for use, rather than years. You simply extract the immune serum from a convalescent person, identify the memory B cell that makes the antibody that imparts immunity to your putative antigen by flow cytometry, clone the variable region of the immunoglobulin heavy and light chains and attach it to a Fc fragment. You can even use the Fab fragments themselves without a Fc portion, as with Certolizumab in RA.

This is not just theoretical. Palivizumab, a monoclonal antibody to the F protein of Respiratory Syncytial Virus, is used for prophylaxis in vulnerable infants against RSV induced bronchioloitis.

During the Zaire Ebolavirus outbreak in 2014-6, a combination of 3 mouse-human chimeric antibodies (like infliximab or rituximab) entered RCT. Unfortunately the latter was abandoned because the Ebola epidemic ended, and recruitment petered out (this is one of the biggest barriers to get drug companies to invest in such technology for infectious diseases).

A monoclonal antibody against Zika virus was waiting to enter clinical trial in pregnant women in Zika endemic areas. Again, the epidemic ended, and the enthusiasm waned.

The limiting factor is of course, cost. Monoclonal antibodies are expensive to mass produce and without the certainty of getting their money back, pharmaceutical companies have no incentive to change that.

But COVID is about to change everything. The cost to the world economy is likely to run into hundreds of billions of pounds, dwarfing the cost of production of such drugs. If COVID continues apace, the governments of the world, particularly the richer ones, will have no option but to pool their resources and incentivise the pharmaceutical sector to invest in this. After all, success is guaranteed, this is a mature and well tested technology, several companies have expertise in it, and there is an overwhelming moral imperative to save lives.

If not now, when?

Thursday, 2 April 2020

The JAK-STAT system

There are 4 JAKs- JAK1, JAK2, JAK3 and TYK2. There are 7 STATs- STAT1,2,3,4 and 6, and 5a & 5b. JAKs and STATs pair as homodimers or heterodimers.

As the accompanying diagram from New England Journal of Medicine shows, JAKs activate STATs by phosphorylating them. The latter then translocate to the nucleus to affect DNA synthesis.

JAK3 and TYK2 are mainly involved in immunity. The others have pleiotropic functions.

Numerous cytokines act by activating the JAK receptors, and their cognate STATs as follows:

Interferon-gamma activates JAK1 & JAK2, leading to activation of STAT1.
Interferon alpha and interferon beta activate JAK1 and TYK2, which activates STAT1-STAT2.
IL-6, and IL-10 activate STAT-3.
Erythropoetin is considered a cytokine as it activates JAK2. This turns on STAT5a.
Growth hormone activates JAK2, which activates STAT5b.
The IL-2 common gamma chain, which is shared by IL-2, IL-4, IL-7, IL-9, IL-12 and IL-15, activates JAK1-JAK3, and thus STAT5b
IL-7 alpha chain is shared by IL-7 and Thymic Stromal Lymphopoetin. It activates JAK1-JAK3, and thus STAT5b.
STAT4 is activated by IL12 and IL23, via JAK2 & TYK2.
STAT6 is activated by IL-4 and IL-13. The intervening JAKs are not certain.

Three different conditions are associated with STAT1 mutations. Inactivating autosomal dominant mutations are associated with Mendelian Susceptibility to Mycobacterial Diseases (MSMD), a relatively mild condition. Such mutations reduce signalling from Interferon-gamma, which therefore reduces intra-macrophagic killing of tuberculous and non-tuberculous mycobacteria.

However, autosomal recessive mutations of STAT1 are more severe, as they reduce signalling from both interferon-gamma, and alpha and beta-interferons. Thus, there is increased vulnerability to both mycobacterial and viral infections, particularly with Herpes zoster. Such subjects die quite early in life of fulminant infections.

Autosomal recessive mutations can cause a less severe phenotype, which is compatible with longer survival.

Activating mutations of STAT1, on the other hand, lead to a completely different phenotype. These subjects suffer from chronic mucocutaneous candidiasis (CMCC), and can have cerebral aneurysms and scoliosis. These mutations act in a dominant fashion.

Why does an activating mutation of STAT1 lead to CMCC? This is because, a constitutively activated STAT1 inhibits the activation of STAT3 by the IL-6 family of cytokines. STAT3 is essential for the action of IL-17alpha, IL-17F, and IL-22, three very similar cytokines that are necessary to defend against fungal infections.

A similar situation is seen in inactivating mutations of STAT3. This led to the discovery of Hyper-IgE syndrome, also called Job's syndrome, which is characterised by very high levels of IgE, Staphylococcal lung infections with cyst formation and boils, cold abscesses of the skin, recurrent eczema, chronic mucocutaneous candidiasis (in common with activating mutations of STAT1), infections with dimorphic fungi such as Histoplasmosis, delay in eruption of permanent teeth, a peculiar facies and coronary aneurysms.

Hyper IgE syndrome is caused by dominant negative mutations in STAT3. A similar picture can be inherited recessively due to mutations in DOKC8.

It is thought that the phenotype is due to the loss of Th17 T cells, which secrete IL-17 alpha, IL-17F and IL-22. Loss of these cytokines predisposes to Staphylococcal and fungal infections.

While CMCC is seen in dominant activating mutations of STAT1, it can also be seen in a recessive condition- polyglandular autoimmune syndrome type 1, where there are circulating antibodies to IL-17 alpha and IL-17F, thus emphasising the role of IL-17 in this disorder.

A similar split between inactivating and activating mutations is seen in Wiskott-Aldrich syndrome, which is X-linked recessive and due to inactivating mutations. Activating mutations of the same gene leads to severe congenital neutropaenia.

Mutations of JAK2, specifically JAK2 V617F is the most well known abnormality of the JAK STAT pathway. It is observed in 95% of subjects with primary polycythemia and around 50% of subjects with primary myelofibrosis and essential thrombocythaemia.

Inactivating mutations of JAK3 can lead to Severe Combined Immunodeficiency, as it is the common pathway for 6 cytokines. Such subjects have poorly functioning T cells, normal B cells and defective NK cells, and require bone marrow transplantation to survive. Inactivating mutations of JAK3 was suspected to contribute to this phenotype when it became clear that the common gamma chain acted through JAK3. Prior to this, SCID had been thought to due to X-linked mutations of the common gamma chain only.

A less severe picture is seen with mutations of the IL-7 alpha chain. Although this affects T cell function, NK cells are spared as the main cytokine driving NK cell function is IL-15, which acts through the common gamma chain.

As expected, polymorphisms of STAT-6 leads to an allergic diathesis as it is activated by IL-4 and IL-13. Similarly, polymorphisms of STAT-4 leads to SLE & RA.

I'll discuss medications exploiting the JAK pathways in another post.

Science Blog