DNA sequencing analysis

Understand the effects of genomic variation and mutations with DNA sequencing analysis

We routinely analyze whole genome, whole exome and targeted re-sequencing data of well-characterized organisms, such as human, to map identify small genetic variation (single nucleotide polymorphisms and short insertions and deletions (indels)) using established best practice methods. In order to help our clients interpret their variant data, we continuously develop our DNA sequencing analysis and variant annotation pipeline to include more and more information on each identified variant. For example, pathogenicity predictions and minor allele frequencies in databases such as 1000 Genomes and GnomAD provide an excellent way to filter irrelevant variation from your results.

Our clients working in oncology are also interested in characterizing somatic mutations that are not limited to small-scale genomic events. We have worked on characterizing gene copy number variation in tumor and cancer cell line samples from both microarray and NGS data, and integrating them with expression data to quantify oncogenic gene dosage effects in different tumor types. We have also concentrated on developing DNA sequencing analysis pipelines to discover copy number neutral genomic rearrangements leading to novel oncogenic fusion genes.

Third major application of DNA sequencing is genome assembly of non-model organisms. We produce genome assemblies based on WGS data which are then computationally post-processed for best possible quality. The assembled genomes are annotated using gene prediction, automated homology searches using genome databases, and gene annotation transfer from closely related organisms. Thorough annotation of novel genomes ensure the best possible starting point for transcriptome studies on these organisms.

Do you want to learn more how Genevia's bioinformatics as a service works, click here to read more.

Annotations are key to understanding genome analysis – we reflect the findings to public databases that are relevant in your research

Antti Ylipää
Antti Ylipää CEO, co-founder Genevia Technologies Oy

Get free information package about bioinformatics as a service to your email in pdf-form.

DNA sequencing analysis

See examples of DNA sequencing analysis below:

  • Variant calling Properly filtered list of variants helps you concentrate on relevant findings
    • Our statistical approaches to variant calling employ the current best practices which result in a reliable set of variants. Natural variants and single nucleotide polymorphisms can be called against any reference genome in any organism or even a genomic ensemble compiled from individual genomes from sequencing projects for better representation. In addition to the high confidence variants, we report regions of low coverage where the variant caller was not able to determine the sequence of samples. Whole genome, whole exome or targeted DNA-sequencing all enable variant calling equally well. The lists of variants can be further combined, compared and filtered in order to find disease-causing de novo germ line variants in trio studies, for example.

      Deliverables:

      • Full variant lists for all samples with evidence from data
      • Filtered variant lists based on any criteria (e.g. germ line control for mutations)
      • Low-coverage regions where variants could not be called
  • Variant annotation Turn list of variants into genomic information with relevant annotations
    • Genetic variants are annotated with information regarding their location in the genome, variant type (homozygous/heterozygous), evidence from data (supporting reads), functional classification for exonic variants, amino acid changes in all isoforms, database identifiers for known variants, observed minor allele frequencies in several genome databases, or even your own data. We also provide pathogenicity predictions for each exonic variant using several prediction software. Flexible ranking and filtering the variants based on these annotations enables easy interpretation of complex genomic data for a geneticist or a physician.

      Deliverables:

      • Functional and location annotation for every variant
      • Minor allele frequencies in relevant databases
      • Database identifiers for known variants
      • Pathogenicity predictions
  • Copy number analysis Explain regulatory and phenotypic differences with aberrant gene copy numbers
    • Gene copy numbers can be deduced from sequencing data using our statistical approaches for analyzing both coverage information and allele frequency information. The analysis yields copy numbers for chromosome-scale segments, each gene, as well as each exon independently. Gene copy numbers can be further integrated to expression data, for example, to find significant gene dosage effects.

      Deliverables:

      • Copy number for each chromosome
      • Gene copy number for each gene
      • Copy number for each exon
  • Genomic rearrangements Whole genome sequencing enables you to see every aberration in your genomes
    • Whole genome sequencing data coupled with mate pair information from paired-end sequencing can be used to study copy number neutral genomic rearrangements like inversions and translocations. These can result in fusion genes that are critically linked to formation of cancer, for example. We deliver the altered genome structure with ranked fusion genes that can be validated with RNA-sequencing data.

      Deliverables:

      • List of potential fusion genes
      • List of all rearrangements
  • Genome assembly and refinement Accurate genome builds act as the perfect starting point for any study
    • For simpler organisms, we offer assembly of their genomes de novo based on DNA-sequencing data. Our approach is based on building a consensus assembly from outputs of several assembly tools, and then running computational post-assembly improvement software. If a draft genome exists, we can refine it computationally by joining contigs and resolving errors using the improvement tools or additional DNA-seq or RNA-seq data.

      Deliverables:

      • Assembled contigs in FASTA format
      • Computationally refined genome assembly
      • Quality estimation scores
  • Genome annotation Computational annotations lets you pinpoint new genes in your genomes
    • Assembled genomes can always be annotated using gene prediction and oriC prediction software and/or based on RNA-seq data. We predict gene identities for all putative genes by comparing their sequence to several genome databases, and for genes with less sequence similarity, functions can be predicted by identified functional domains. If annotated genomes for close relatives exist, we can improve the annotation by transferring gene information to the unannotated genome using sequence alignment based approaches. The result is a comprehensive list of genes with their coordinates in the genome.

      Deliverables:

      • Loci of predicted genes
      • Fully annotated genes based on homolog searches
      • Validated genes based on RNA-seq data
  • Neoantigen discovery Leverage sequencing data to support your immunotherapy research
    • Identification of patient-specific tumor neoantigens (novel protein sequences that are created by tumor-specific DNA alterations) is one of the cornerstones of cancer immunotherapy. We can interrogate exome sequencing data for non-synonymous somatic mutations in coding regions, and translate these in silico to peptides containing the mutation. Additional RNA-sequencing can be used to focus on highly expressed genes to ascertain high epitope abundance as well as to look for alternative splicing, exon skipping and translocation based neoantigens. The lists of epitopes can be further filtered or ranked algorithmically by analyzing aspects such as the likelihood of proteasomal processing, transport into the endoplasmic reticulum and affinity for the relevant MHC class I alleles.

      Deliverables:

      • Full non-synonymous DNA variant lists
      • Expression levels for each mutated exon
      • Computationally ranked list potential epitopes
  • Cell-free DNA biomarker discovery Sequencing analysis of liquid biopsies for diagnostics development
    • Circulating cell-free DNA holds potential for non-invasive genomic biomarkers, in particular for prenatal diagnosis and oncology. The mere presence of certain DNA sequences in plasma can reveal a tumor undetected by other means. Furthermore, mutations detected in circulating DNA can be used as markers in personalizing treatment and prognosis. Our pipeline for cell-free DNA-based biomarker discovery starts with full quality control of the data, followed by statistical comparison of pathological and control groups to uncover biomarkers with the optimal combination of sensitivity and specificity. Considering biological factors along with clinical feasibility, we summarize the analysis by highlighting the most promising biomarker candidates.

      Deliverables:

      • List of biomarker candidates from cell-free DNA
      • Sensitivity and specificity estimations for each candidate
      • Database identifiers for known mutations and pathogenicity predictions
  • Metagenomic analysis Study microbial species composition and their changes in your samples
    • Metagenomics offers an unbiased view into the microbial diversity of ecological niches including samples from host organisms and soil. Using whole-genome or, alternatively, 16S sequencing data, we assemble the sequence reads into contigs and assign them to species or operational taxonomic units (OTUs). Then, we quantify the abundance of each taxa. In the case of multiple samples, we compare the abundances and associate them with host phenotype or environmental factors. For whole-genome studies, we identify and annotate genes using both sequence homology and computational gene prediction.

      Deliverables:

      • Quantitative characterization of microbial diversity
      • Association of species/OTU and host phenotype or other environmental factors
      • Identified and predicted genes with custom annotations