Introduzione all’RNASeq
Introduzione all’analisi RNASeq in R
Dipartimento di Biomedicina e Prevenzione
Marco Chiapello, Revelo Datalab
Marzo 2026
Three Stages of Transcriptomic Analysis
Sample Preparation
Sample Preparation

RNA Extraction

Importance of High-Quality RNA

Enrich a specific type of RNA
mRNA
rRNAs and tRNAs (involved in mRNA translation)
Small nuclear RNAs (involved in splicing)
Small nucleolar RNAs (involved in the modification of rRNAs)
microRNA (regulate gene expression at the posttranscriptional level)
Long noncoding RNAs (chromatin remodelling, transcriptional control and posttranscriptional processing)
mRNA enrichment – Selectively enriching for poly(A)-tailed transcripts
RNA depletion – Selectively depleting abundant/off-target transcripts
RNA Fragmentation
Important
A sequencing library is essentially a pool of RNA fragments with adapters attached
Attachment of adapters
Adapters are a short, chemically-synthesised oligonucleotide that can be attached to the ends of DNA molecules (cDNA Synthesis: RNA to DNA)
Act as barcodes to identify where each nucleotide was originally located

Library quantification

How many samples do I need?
Power analysis
Power analysis
Power analysis
Type I error: controlled by the α value. Often set to 0.01 (1%) or 0.001 (0.1%) in RNA-seq experiments.
Type II error: controlled by the β value. (1−β) will give you the power of your analysis. Should be set to 70 or 80% to detect 70 or 80% of the differentially expressed genes. The number of biological replicates might be hard to reach in practice for RNA-seq experiments.
Power analysis
Type I error: controlled by the α value. Often set to 0.01 (1%) or 0.001 (0.1%) in RNA-seq experiments.
Type II error: controlled by the β value. (1−β) will give you the power of your analysis. Should be set to 70 or 80% to detect 70 or 80% of the differentially expressed genes. The number of biological replicates might be hard to reach in practice for RNA-seq experiments.
Effect size: this is a parameter you will set. For instance, if you want to investigate genes that differ between treatments with a difference of their mean of 2 then the effect size is equal to 2.
Power analysis
Type I error: controlled by the α value. Often set to 0.01 (1%) or 0.001 (0.1%) in RNA-seq experiments.
Type II error: controlled by the β value. (1−β) will give you the power of your analysis. Should be set to 70 or 80% to detect 70 or 80% of the differentially expressed genes. The number of biological replicates might be hard to reach in practice for RNA-seq experiments.
Effect size: this is a parameter you will set. For instance, if you want to investigate genes that differ between treatments with a difference of their mean of 2 then the effect size is equal to 2.
Sample size: the quantity you want to calculate.
Power analysis for RNA-seq
General Considerations
RNA-seq experiments often suffer from a low statistical power
General Considerations
RNA-seq experiments often suffer from a low statistical power
Low power can lead to a lack of reproducibility of the research findings
General Considerations
RNA-seq experiments often suffer from a low statistical power
Low power can lead to a lack of reproducibility of the research findings
The number of replicates is one of the critical parameter related to the power of an analysis
Replicates
Klaus B., EMBO J (2015) 34: 2727-2730
Do we need technical replicates?
No
No
Important
With the current RNA-Seq technologies, technical variation is much lower than biological variation and technical replicates are unneccessary
Do we need biological replicates?
YES
YES
Important
Biological replicates are absolutely essential for differential expression analysis
YES
Important
For differential expression analysis, the more biological replicates, the better the estimates of biological variation and the more precise our estimates of the mean expression levels
Biological replicates are of greater importance than sequencing depth

Introduction to RNA Sequencing
(Primary analysis)
The central dogma

Why Study RNA?
Why Study RNA?

Why Study RNA?

The Power of RNA-Seq
A Brief History of Sequencing
First generation NGS
Sanger Sequencing

Sanger Sequencing

Sanger Sequencing
Advantages
Gold standard method for accurate detection of single nucleotide variants and small insertions/deletions
Cost effective where single samples need to be tested very urgently
Less reliant on computational tools than NGS
Longer fragments (up to approximately 1000bp) can be sequenced than in short read NGS
Limitations
Limited throughput
Not cost effective for sequencing many genes in parallel
Can require a larger amount of input DNA than NGS
Sanger methods can only sequence short pieces of DNA–about 300 to 1000 base pairs.
The quality of a Sanger sequence is often not very good in the first 15 to 40 bases because that is where the primer binds.
Sequence quality degrades after 700 to 900 bases
Sanger Sequencing
Second generation NGS
Important
Second-generation NGS machines immediately began to drive the ‘genomics revolution’ by massively increased throughput by parallelizing many reactions
Second generation NGS
Second-generation sequencing platforms:
SOLiD: Sequencing by Oligonucleotide Ligation and Detection
454 GS FLX+: It uses pyrosequencing chemistry
NextSeq 550Dx: Sequence by synthesis
Sequence by Ligation


Considered to be one of the most accurate second-generation sequencing technologies
it can take up to seven days to complete a single run and its short read length of 35 bp
Thermo Fisher Scientific shut down all SOLiD sequencing platforms in 2016
Pyrosequencing

Large read lenght generation
High reagent cost
High error rate for homopolymers
Sequence by synthesis
Sequence by synthesis - History
1997: Evolution of a Novel Approach to Sequencing

Shankar Balasubramanian and David Klenerman
Sequence by synthesis - History
1998: Formation of Solexa

Sequence by synthesis - History
2004: Molecular Clustering Technology Integration

Cluster generation (also known as “bridge amplification”)
Sequence by synthesis - History
2005: phiX-174 Genome Sequencing
2005: Integration of Lynx Therapeutics
2007: Illumina Acquires Solexa


Sequence by synthesis - Process
Third generation NGS
Important
Third-generation methods allow direct sequencing of single DNA molecules
Third generation NGS
Third-generation NGS platforms:
Single-molecule real-time sequencing
Nanopore sequencing
SMRS
Nanopore sequencing
Single-Ended vs. Paired-End
Choosing Your Sequencing Strategy

Single-Ended vs. Paired-End
| Trade-off | Single-End Sequencing | Paired-End Sequencing |
|---|---|---|
| Cost | Lower | Higher |
| Information | Less | More |
| Simplicity | Simpler | More Complex |
| Accuracy | Lower | Higher |
| Sequencing Depth | Higher | Potentially Lower |
| Read Length | Shorter | Longer (effective) |
| Ideal Applications | Gene expression, small RNA-Seq | Genome assembly, variant calling, isoform identification |
Sequencing Depth
Sequencing Depth: Finding the Right Balance

Transcriptome Coverage
Transcriptome Coverage: Capturing the Full Picture

Emerging Trends and Future Directions
Single-Cell RNA Sequencing
Zooming In: Analyze gene expression in individual cells

Single-Cell RNA Sequencing
scRNA-Seq: advantages in biological research
Single-Cell RNA Sequencing
scRNA-Seq: advantages in biological research
Important
scRNA-seq represents a powerful tool for dissecting the complexities of gene expression and cellular heterogeneity across various biological contexts
Spatial Transcriptomics
Spatial Transcriptomics: mapping gene expression in context

Spatial Transcriptomics
Spatial Transcriptomics: from technology to discovery

Integration of RNA-Seq with other omics data
Beyond RNA: integrating with other omics data
Integration of RNA-Seq with other omics data
Unlocking biological insights through data integration

Domande?