• day ::

Knowledge | More than what you know about SNP

In scientific research, we often hear about SNP, but what exactly is SNP, why do we study SNP, and how do we carry out research work? Let's take a look below.

What exactly is SNP

Single nucleotide polymorphism (SNP) refers to the diversity of DNA sequences caused by single nucleotide transitions, transversions, insertions or deletions, with a variation frequency of >1%. SNPs exist widely, and there are about 3 × 106 SNP sites in the human body, with an average of 1 in 500-1000 base pairs. Among the huge number of SNP loci in the human body, only a few can cause amino acid changes, which depend on the position of the SNP and the type of mutation. As shown in the figure below, only SNPs with non-synonymous mutations in the coding gene region cause phenotypic changes.

图片1

SNPs are related to individual phenotypic differences, drug susceptibility, and disease susceptibility. They have important research value in precise nutrition, disease diagnosis and screening, and medication guidance. They are closely related to our lives. Studies have reported that SNP loci and related genes which are potentially associated with the symptoms of Covid-19 have been found [1]. In the process of evolution, many SNP loci has appeared in the Covid-19 virus, some of which even make the new coronavirus disease that has a potential risk of further enhancement of infectiousness and virulence.

Research on SNPs can be divided into two categories:

1. Analysis of unknown SNPs, including discovering new SNP sites and determining the relationship between an unknown SNP and a genetic disease;

2. Analysis of known SNPs, including genetic diversity studies of SNPs in different groups and genetic diagnosis of genetic diseases.

Let's take a look at the commonly used methods for detecting SNPs.

The detection of SNP is mostly carried out by PCR and sequencing. The base and site of mutation can be determined by detection.According to the detection throughput, detection methods can be divided into two categories: low-throughput and high-throughput. The low-throughput method can detect several to dozens of SNPs in one experiment while the high-throughput method can detect thousands of SNPs at one time with a higher cost. First, let's look at low-throughput methods, including Sanger sequencing, Taqman probes, and mass spectrometry detection.

Sanger sequencing

Sanger sequencing relies on dideoxynucleotide termination reaction to generate fragments of different lengths for sequencing. It is the "gold standard" for SNP detection. Sanger sequencing can not only determine the type and location of mutations, but also discover unknown SNP sites.

A1

The picture comes from the Internet, delete if infringing

The throughput of Sanger sequencing is low and the cost is relatively high. Sanger sequencing is suitable for the situation of few sites and few samples.

Taqman probes

Probe method is a SNP typing technology based on qPCR platform, which has the characteristics of fast speed, good specificity, high sensitivity and high accuracy.

The probe has a fluorescent group at the 5' end and a quenching group at the 3' end. During the PCR process, the probe binds to the DNA template, Taq enzyme extends and hydrolyzes the probe, releasing fluorescence, and the strength of fluorescence signal is monitored in real time by the instrument. In the SNP typing experiment, probes containing corresponding bases will be designed according to different genotypes, and different fluorophores will be added to distinguish them. At the same time the minor groove binder (MGB) group greatly reduces the ability of the base-mismatched probe to bind to the template, and the probability of being cleaved is also reduced, so the detected signal is very weak, while the correctly paired probe can bind to the template and be detected as fluorescent signal, thereby enabling typing.

A2

The Taqman probe method has a high cost and is suitable for the analysis of a small number of SNP sites in a large number of samples. Because the probe design of a single site is expensive. And if the sample size is large enough, the cost will be reduced on average.

Mass spectrometer detection

Mass spectrometry is a detection method based on matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS).

Principle and operation are as follows:

(1) Amplify the target fragment containing SNP by PCR (about 50 bp before and after the SNP site)

(2) Use shrimp alkaline phosphatase (SAP) to remove substrates and primers in the system, and add a single-base extension primer (the 3' end is complementary to the previous base of the SNP site) and ddNTPs

(3) Obtain the product fragment corresponding to the 3' end ddNTP and the SNP site allele, then the product fragment only has one more ddNTP complementary to the SNP than the single base extension primer;

(4) The molecular weight of the product fragment is detected by time-of-flight mass spectrometry, and the base information of the SNP site can be obtained by comparing it with the single-base extension primer.

In the first step of this method, the PCR product needs to be purified before single-base extension, and the single-base extension product also needs to be purified before the mass spectrometer is used. The operation is complicated, and it is mostly used in the medical field.

The SNP detection methods mentioned above are all low-throughput methods. When a large amount of SNP site information is to be obtained in one experiment, high-throughput methods are required. High-throughput methods are mostly based on next-generation sequencing technology, which mainly includes : Whole genome resequencing, whole exome capture sequencing, simplified genome sequencing and gene chip methods.

Whole genome resequencing

Whole genome sequence (WGS) performs whole genome sequencing on different individuals of known genome sequence species, and SNP site information can be obtained through sequence comparison.

The experimental principle of WGS is as follows: Extract DNA of samples, break genomic DNA randomly, recover fragments of 200-500 bp by electrophoresis, and add the adapters at both ends to construct a library. Using next-generation sequencing (taking the illumina platform as an example) method, prepare Cluster by bridge amplification, and the whole genome sequence is obtained by sequencing-by-synthesis.

A5

The picture comes from the Internet, delete if infringing

WGS analyzes the structural differences between individual genomes by means of bioinformatics. In addition to studying SNPs, it can also detect structural variation sites (SVs) and copy number variation sites (CNVs). Relatively speaking, the cost is high, and SNPs in intergenic regions, introns and exons are all detected, so the research efficiency of specific target regions is not high.

Whole exome capture sequencing

Whole exome sequencing (WES), WES is based on WGS, only capture and sequence the exon region, and then compare with the reference genome to screen the SNP loci. The WES experiment mainly includes the steps of library construction, exon targeted capture and sequencing analysis. There are some problems in the experiment that need to be paid attention to:

(1) Library construction: Fragment the genomic DNA into 200-500 bp, or treat it into 300 bp fragments with transposase, then construct a DNA library, add adapters and indexes at both ends of each fragment, perform fragment screening and then amplify and purify;

(2) Exon-targeted capture: The library is hybridized with a library of exon probes with biotin, then the probes are bound to magnetic beads with streptavidin. Collect the magnetic beads that capture the exon sequence. Elute and recover DNA. Build a library through PCR amplification;

(3) Sequencing analysis: Like WGS, based on next-generation sequencing technology, the gene sequence of the exon library is determined.

In the process of library construction, it is necessary to pay attention to factors such as sample quality, index, adapter concentration, etc., which have a great influence on the quality of the library. At the beginning of the experiment, it should be ensured that the DNA is not severely degraded, the purity is good, and the starting amount is sufficient (gDNA≥50 ng, FEPP≥100ng). And in order to avoid possible mismatch of sample tags in the subsequent sequencing process, the probability of mismatch can be reduced by adding UDI/UMI adapters, and the mismatch data can be filtered during subsequent analysis.

Compared with WGS, WES has a lower cost and higher cost performance. It is suitable for high-depth research of large samples, and the capture and sequencing regions are clear, which greatly improves the research efficiency of specific target regions.

Simplified genome sequencing

Simplified genome sequencing is the enrichment of genomic DNA by restriction endonucleases, and the digested fragments are distributed almost randomly on the genome, including introns, exons and other regions. The enzyme cleavage site of the same species is relatively stable, so the fragments detected by different individuals are basically the same. The main steps to simplify genome sequencing are:

a. Enzyme digestion of the genome;

b. Select a certain length of enzyme-digested DNA fragments to recover and build a library for sequencing

At present, there are two technologies commonly used: 2b-RAD and Super GBS.

The 2b-RAD technology uses type IIB restriction endonucleases (such as Bsa XI) for digestion. The cleavage sites of this enzyme are located on both sides of the recognition site, and the cleavage products are 33 bp tag sequences. These tags are enriched Afterwards, it can be used for high-throughput sequencing to achieve genome-wide SNP high-throughput screening and typing through bioinformatics analysis.

The principle of Super GBS is to use methylation-sensitive restriction endonuclease to digest genomic DNA to obtain gene region fragments, and then purify the digested fragments with magnetic beads. Select fragments of relatively uniform length to build a library, analyzing and obtaining SNP information and genotyping through high throughput sequencing. It is a fast, simple and low-cost genotyping method.

When the genome is large and has many repetitive sequences, it’s better to select the Super GBS technology. When the sample is degraded, the 2b-RAD technology is preferred.

Gene chip

Gene chip is also a SNP detection method based on the principle of single base extension. The size of the chip is similar to that of a glass slide. There are micron-scale grooves on the surface. Each groove is embedded with a microbead containing a large number of independent and dense oligonucleotide probes on the surface. There are two types of probes. The last base of the sequence tail corresponds to the previous base of the SNP site. The fragment to be tested is bound to the surface probe of the bead, and the ddNTP is labeled with different fluorescent dyes (A and T are green fluorescence, C and G are red fluorescence), The type of SNP is determined by scanning the fluorescent signal on the detection chip. At this time, A-C, A-G, T-C and T-G mutation types can be detected; the last base at the tail of the second probe is the base covering the SNP site. If it can be complementary, it can be extended, and a ddNTP with dye is added, and the scanning has fluorescence; if it cannot be complementary, it cannot be extended, and the scanning has no fluorescence.

This article introduces the commonly used SNP detection methods. Each method has its applicable scenarios and characteristics. In actual scientific research, you can choose the appropriate method according to the specific needs of the experiment.

Vazyme’s recommended Products (click on the Cat.No. to see the detail):

Category Product Name Size Cat.No.
qPCR ChamQ™ Geno-SNP Probe Master Mix 100 rxn/500 rxn Q811
NGS VAHTS® Universal DNA Library Prep Kit for Illumina V3 6 rxn/24 rxn ND607
NGS VAHTS® Universal Plus DNALibrary Prep Kit for Illumina V2 6 rxn/24 rxn ND627

Reference

1. Wu, P., Chen, D., Ding, W., Wu, P., Hou, H., Bai, Y., ... & Chen, G. (2021). The trans-omics landscape of COVID-19. Nature Communications, 12(1), 1-16.