On Bioinformatics Workflow Design

Posted on Fri 26 April 2024 in commentary • Tagged with workflows

Since I was in grad school I've been writing bioinformatics workflows. Usually to process NGS data. The concept of a workflow is simple, and not limited to the domain of bioinformatics. However, a workflow (aka "pipeline") used to analyze data from next generation sequencing (again, will it ever be "current …


Continue reading

Making Volcano Plots With ggplot2

Posted on Sun 21 April 2024 in how-to • Tagged with bioinformatics, data-visualization, rnaseq

One of the, if not the, most common downstream analysis task I'm asked to perform on RNAseq data is to generate the venerable "Volcano Plot." These are kind of the bioinformatics equivalent of saying "Hey! Look how much data I have!" Regardless, they are a pretty good way to quickly …


Continue reading

Managing Software on a Multiuser Linux System - An Update

Posted on Sat 20 April 2024 in how-to • Tagged with sysadmin

Back in 2019, the halcyon days of yore, near the end of my time in graduate school I wrote a well-intentioned article about software management for multi-user linux systems (here). This original article was written based on my experiences as the de-facto sysadmin of our lab's bioinformatics server. I am …


Continue reading

Publications, Dissertations, Job Hunts, and a Pandemic

Posted on Sat 30 May 2020 in commentary • Tagged with grad school, jobs

I started this github site as a place to expand my professional reach by posting my random musings on bioinformatics, Linux, data science, and etc. I made a few reasonably cogent posts, but then life got in the way! It's been a really busy time, a very eventful year. I'm …


Continue reading

Just Write Your Own Python Parsers for .fastq Files

Posted on Thu 22 August 2019 in commentary • Tagged with bioinformatics, python, workflows

In contrast to the zen of python there are actually many ways to handle sequence data in Python. There are several packages on PyPI that provide parsers for sequence formats like .fastq and .fasta. I've never bothered with these, including the oft-used Biopython. I vaguely remembered Biopython being slower than …


Continue reading

The Snakemake Tutorial I Wish I Had

Posted on Mon 19 August 2019 in how-to • Tagged with bioinformatics, python, workflows, snakemake

Over the past few years the use of workflow managers in genomics and bioinformatics has grown greatly. This is a great thing for the field and adds to our ability to perform reproducible analyses, especially for pipelines with many steps. These are common in bioinformatics, but prior to the use …


Continue reading

Suggestions for Reproducible Bioinformatic Analyses

Posted on Fri 09 August 2019 in commentary • Tagged with bioinformatics, thoughts, workflows

Bioinformatic analyses often require lengthy workflows or pipleines, where the output of program A feeds into program B, and so on. These programs may also not output their results in a format which is convenient to use in the subsequent steps, requiring writing a conversion script, or piping its output …


Continue reading

Efficiently Filtering While Reading Data Into R (With Python?!)

Posted on Wed 17 July 2019 in how-to • Tagged with bioinformatics, data-science, r, python, big-data

Working with large amounts of tabular data is a daily occurance for both bioinformaticians and data scientists. There's a lot the two groups can learn from each other (great future post material). However, I recently ran into a situation that I was sure had to be relatively common. Apparently it …


Continue reading

Variations on RNAseq Workflows for DEG Analysis

Posted on Tue 09 July 2019 in commentary • Tagged with bioinformatics, thoughts, rnaseq, workflows

When analyzing RNAseq you're faced with many possible analysis pipelines. The biggest decision you need to make is what the purpose of your experiment is. I will make the assumption that most of the time people want to determine which genes are differentially expressed between two samples, genotypes, conditions, etc …


Continue reading

Making Better Metaplots With ggplot, Part 2

Posted on Fri 28 June 2019 in how-to • Tagged with bioinformatics, data-visualization

Last time we prepared our data using Deeptools.

Now we're going to do something kind of scandalous. R and python, living together in peace. What is this madness? I like R's ecosystem for manipulating data and plotting with the tidyverse. It still requires some tweaking, but with a bit of …


Continue reading