SARS-CoV-2 Results

ScanFold results for the COVID-19 strain (SARS-CoV-2; NC_045512.2) can now be found on the RNAStructuromeDB: browse the results in IGV below, on JBrowse or download all ScanFold output files here.

The SARS-CoV-2 genome is a single stranded RNA molecule approximately 30,000 nucleotides long. The ScanFold program has been used to characterize its RNA folding landscape - highlighting regions of likely structure and function which serve as ideal targets for further analysis. Read about the analysis and our findings here: An in silico map of the SARS-CoV-2 RNA Structurome bioRxiv 2020

ScanFold results (using a 120 nt window and 1 nt step size) are shown in the IGV genome browser below. Scanning analysis window metrics are reported below ScanFold predicted structures (which are depicted as arc diagrams). Here, arcs depict base pairs and have been colored according to how unusually stable the depicted structure is: yellow indicates the structure is slightly more stable, green indicates one standard deviation more stable, and blue indicates two standard deviations more stable than expected based on sequence composition alone. When a structure is significantly more stable than expected it may indicate a potential for function. You can read more about this analysis and interpreting results in the papers linked below.

Read about the ScanFold method and how to interpret results in the following papers:

Andrews RJ, Baber L, Moss WN: Mapping the RNA structural landscape of viral genomes. Methods 2019
Andrews RJ, Roche J, Moss WN: ScanFold: an approach for genome-wide discovery of local RNA structural elements—applications to Zika virus and HIV PeerJ 2018.

The files listed below have been formatted for JBrowse (ie. BigWig tracks, indexed BGzipped GFF3 files, etc) and will be updated and available for download as they are added to the structurome.

SARS-CoV-2 Files:

Materials & Methods (Description and/or Program Settings) File
High Resolution Figures Zippped folder containing all high resolution figures (TIFF format) from Package icon
Supplemental Tables (10.1101/2020.04.17.045161) Supplemental Tables described in: Andrews et. al. "An in silico map of the SARS-CoV-2 RNA Structurome" bioRxiv 2020 Package icon
Scanning Window Results (SARS-CoV-2) This GFF3 file comprises all scanning window results (MFE, z-score, p-value, ED, native sequence, dot-bracket MFE structure, and centroid structure). It has been bg zipped for viewing in JBrowse or other genome viewers. Binary Data nc_045512.2.strand1.gff3.gz
SARS-CoV-2 Results The full results of the ScanFold analysis of SARS-CoV-2 have been zipped into this folder (including Supplementary Dataset 1 from Package icon
SARS-CoV-2-ExtractedStructures This file contains all structures which contained at least one base pair with a Zavg of < -1. The sequences comprising these structures are then refolded individually and z-scores and ensemble diversity are recalculated for the motif. Plain text icon extractedstructures.txt
Thermodynamic z-score BigWig track format for JBrowse or other genome browsers. The z-score is calculated for each window of the input sequence. For each window we have two sets of sequences: native and 100 randomized sequences with the same nt content. MFE values are calculated for each. If the native sequence always has a much lower MFE than the average of scrambled versions this will lead to a negative z-score (if the native sequence MFE is always more positive, i.e., less stable, then the z-score will be positive). The equation normalizes the value by dividing by the standard deviation between all MFEs. The magnitude of the z-score then, states the number of standard deviations the native (window) MFE is from the random MFEs. Binary Data
p-value BigWig track format for JBrowse or other genome browsers. These values report the ratio of MFE random values which were more stable than native during calculation of z-score (100x randomizations). Binary Data
MFE (kcal/mol) BigWig track format for JBrowse or other genome browsers. MFE values calculated for 120 nt sequences using RNAfold (v2.4.14). Binary Data
Ensemble Diversity Value BigWig track format for JBrowse or other genome browsers. High numbers indicate diverse structures can form, low numbers indicate a single dominant structure may be forming. Binary Data
SARS-CoV-2 Gene Features The SARS-CoV-2 gene features reported at NCBI in GFF3 bgzip format (accession NC_045512.2). Binary Data sequence.sorted.gff3.gz