DataScience

Federal R&D Spending on climate change

An EDA using `R` of federal government data of the R&D budget towards Climate Change.

Docker container and image management within Emacs

Note taken on [2020-02-24 Mon 07:21] Rewritten to improve clarity and grammatical corrections. Using the docker package in Emacs has saved several minutes of my time (for each command) related to docker, and just as important - a tonne of effort involved in hunting for docker container names, command history, copying the container ID’s and so on that are very typical steps of messing around with docker. Anybody learning docker will know that these commands are used so frequently that it becomes rather annoying quickly.

Setting up Continuous Integration (CI) for docker containers

This blog post goes through the process of setting up Continuous Integration for building docker images via Dockerhub and Github, and via Github Actions. It also contains a condensed summary of important notes from the documentation that will be handy as ready reference. Goal: Gain an overview of using Continuous Integration (CI) for automated builds of the docker images that built for a data science toolbox based on R (for now).

Notes - What they forgot to teach you about R

The book, ‘What they forgot to teach you about R’ being co-authored by <https://twitter.com/JennyBryan @JennyBryan> is not yet completed, however I was still compelled to go through the existing material as it was an engaging read. These are some notes captured from the book. Verbatim quotes from the book are encapsulated. My notes and observations are added in plain text. I recommend you cultivate a workflow in which you treat R processes (a.

A graphic overview of the 'binary' with respect to R packages

Recently there was a question as to what a Binary is, building off a question posted on the Rstudio community forum. I’ve always found these aspects interesting, and a little hard to keep track of the connections and flow - So I’ve made a flowchart that will help me remember and hopefully explain what is happening to a noob. In this process, I was able to remember One of the first documents I really enjoyed reading when I started learning how to use Linux.

Some notes on research-compendium

These are my notes while studying the research-compendium concept, which is essentially a bunch of guidelines to produce research that is ‘easily’ reproducible. The notes are mostly based on marwick-2018-packag-r , which is one canonical reading on the concept. Other references are mentioned throughout the text, and also collected separately. These notes were prepared a few weeks ago during a foray into Docker. They are neither complete not comprehensive - but will serve as a good refresher of the principle concepts.

Notes on Docker

Docker is a fascinating concept that could be potentially useful in many ways, especially in Data science, and making reproducible workflows / environments. There are several articles which have great introductions and examples of using docker in data science This is an evolving summary of my exploration with Docker. It should prove to be a handy refresher of commands and concepts. TODO What is Docker A brief summary of what Docker is all about.

Rapidly accessing cheatsheets to learn data science with Emacs

Matt Dancho’s course DSB-101-R is an awesome course to step into ROI driven business analytics fueled by Data Science. In this course, among many other things - he teaches methods to understand and use cheatsheets to gain rapid level-ups, especially to find information connecting various packages and functions and workflows. I have been hooked to this approach and needed a way to quickly refer to the different cheatsheets as needed.

Jupyter notebooks to Org source + Tower of Babel

This post provides a simple example demonstrating how a shell script can be called with appropriate variables from any Org file in Emacs. The script essentially converts a Jupyter notebook to Org source, and Babel is leveraged to call the script with appropriate variables from any Org file. This reddit thread and blog post elucidate the advantages of using Babel and Org mode over Jupyter notebooks. Directly editing code in a Jupyter notebook in a browser is not an attractive long term option and is inconvenient even in the short term.

Nteract : An interactive computing environment

A slide deck from Netflix, mentions using Nteract as their programming notebook, and prompted a mini exploration. This blog post by Safia Abdalla, (a maintainer/ developer of Nteract) introduces Nteract as an open source, desktop-based, interactive computing application that was designed to overcome a bunch of limitations in Jupyter Notebook’s design philosophy. One key difference (among many others) is the ability to execute code in a variety of languages within a single notebook, and it also appears that that the electron based desktop app should make it easier for beginners to start coding.

Technical notes : Research paper on learning/teaching data science

Title: Navigating Diverse Data Science Learning: Critical Reflections Towards Future Practice Author: Yehia Elkhatib Download link This are my notes on the above paper, which mainly deals with detailing the methods explored and implemented to impart a high quality of education in data science. The paper also provides an interesting breakup of the different roles in data science workflows. The importance of being able to work in a team is highlighted.