Polyglot R and Python Bioinformatics and Data Science Projects Using Jupyter Notebooks

Posted on Sun 02 June 2024 in how-to • Tagged with bioinformatics, data-science, jupyter, notebooks, tutorial, r, python, sysadmin

TLDR - Check out this github repo for a (still really wordy) example: polyglot_jupyter_example

If you're anything like me, and there are probably tens of you out there, you enjoy working in multiple programming languages for your bioinformatics/data science work. Perhaps you love the tidyverse R ecosystem for data manipulation …


Continue reading

Just Write Your Own Python Parsers for .fastq Files

Posted on Thu 22 August 2019 in commentary • Tagged with bioinformatics, python, workflows

In contrast to the zen of python there are actually many ways to handle sequence data in Python. There are several packages on PyPI that provide parsers for sequence formats like .fastq and .fasta. I've never bothered with these, including the oft-used Biopython. I vaguely remembered Biopython being slower than …


Continue reading

The Snakemake Tutorial I Wish I Had

Posted on Mon 19 August 2019 in how-to • Tagged with bioinformatics, python, workflows, snakemake

Over the past few years the use of workflow managers in genomics and bioinformatics has grown greatly. This is a great thing for the field and adds to our ability to perform reproducible analyses, especially for pipelines with many steps. These are common in bioinformatics, but prior to the use …


Continue reading

Efficiently Filtering While Reading Data Into R (With Python?!)

Posted on Wed 17 July 2019 in how-to • Tagged with bioinformatics, data-science, r, python, big-data

Working with large amounts of tabular data is a daily occurance for both bioinformaticians and data scientists. There's a lot the two groups can learn from each other (great future post material). However, I recently ran into a situation that I was sure had to be relatively common. Apparently it …


Continue reading