Polyglot R and Python Bioinformatics and Data Science Projects Using Jupyter Notebooks

Posted on Sun 02 June 2024 in how-to • Tagged with bioinformatics, data-science, jupyter, notebooks, tutorial, r, python, sysadmin

TLDR - Check out this github repo for a (still really wordy) example: polyglot_jupyter_example

If you're anything like me, and there are probably tens of you out there, you enjoy working in multiple programming languages for your bioinformatics/data science work. Perhaps you love the tidyverse R ecosystem for data manipulation …


Continue reading

Nextflow With First Class Metadata: A Minimal Example

Posted on Fri 24 May 2024 in how-to • Tagged with workflows, bioinformatics, RNAseq

TLDR - Check out this github repo for the full example: https://github.com/groverj3/minimal_nextflow_samplesheet_example

I recently wrote an article regarding some of my opinions on bioinformatics workflow design. I've written workflows in several languages over the years, but at this point it seems that Nextflow has become something of …


Continue reading

Bridging the Gap With Wet Lab Using R Shiny

Posted on Sat 04 May 2024 in how-to • Tagged with R, shiny, app, RNAseq, bioinformatics, data-visualization

How do you communicate results of an analysis? What tools do you use? Scientists that work in the wet lab are accustomed to firing up excel or some instrument-specific software and working with their own data. For genomics or other types of experiments in biology that result in large datasets …


Continue reading

Making Volcano Plots With ggplot2

Posted on Sun 21 April 2024 in how-to • Tagged with bioinformatics, data-visualization, rnaseq

One of the, if not the, most common downstream analysis task I'm asked to perform on RNAseq data is to generate the venerable "Volcano Plot." These are kind of the bioinformatics equivalent of saying "Hey! Look how much data I have!" Regardless, they are a pretty good way to quickly …


Continue reading

Managing Software on a Multiuser Linux System - An Update

Posted on Sat 20 April 2024 in how-to • Tagged with sysadmin

Back in 2019, the halcyon days of yore, near the end of my time in graduate school I wrote a well-intentioned article about software management for multi-user linux systems (here). This original article was written based on my experiences as the de-facto sysadmin of our lab's bioinformatics server. I am …


Continue reading

The Snakemake Tutorial I Wish I Had

Posted on Mon 19 August 2019 in how-to • Tagged with bioinformatics, python, workflows, snakemake

Over the past few years the use of workflow managers in genomics and bioinformatics has grown greatly. This is a great thing for the field and adds to our ability to perform reproducible analyses, especially for pipelines with many steps. These are common in bioinformatics, but prior to the use …


Continue reading

Efficiently Filtering While Reading Data Into R (With Python?!)

Posted on Wed 17 July 2019 in how-to • Tagged with bioinformatics, data-science, r, python, big-data

Working with large amounts of tabular data is a daily occurance for both bioinformaticians and data scientists. There's a lot the two groups can learn from each other (great future post material). However, I recently ran into a situation that I was sure had to be relatively common. Apparently it …


Continue reading

Making Better Metaplots With ggplot, Part 2

Posted on Fri 28 June 2019 in how-to • Tagged with bioinformatics, data-visualization

Last time we prepared our data using Deeptools.

Now we're going to do something kind of scandalous. R and python, living together in peace. What is this madness? I like R's ecosystem for manipulating data and plotting with the tidyverse. It still requires some tweaking, but with a bit of …


Continue reading

Making Better Metaplots With ggplot, Part 1

Posted on Thu 27 June 2019 in how-to • Tagged with bioinformatics, data-visualization

Commonly, in bioinformatics we're in the business of determining whether something, be it gene expression, or DNA methylation, or splicing, etc. is different between multiple conditions. Typically this would be done by comparing those data and using some kind of statistical test. However, with the continued advances in sequencing technologies …


Continue reading

Managing Software on a Multiuser Linux System

Posted on Tue 25 June 2019 in how-to • Tagged with sysadmin

When I started my Ph.D. I had a good amount of experience working in a Linux environment on my own computers. Mostly as a hobby. My advisor had bought a small server several years previous for a post-doc's project and I was offered this system to use for my …


Continue reading