Data-Science

Docker driven Data Science with R

Cascading Docker images adapted from Matrix DS to create a reproducible, standard, consistent environment to run datascience projects and cater to development and production modes. The images enable deploying dashboard frameworks like shiny with rapidity & ease.

Federal R&D Spending on climate change

An EDA using `R` of federal government data of the R&D budget towards Climate Change.

Using ESS for Data Science

RStudio is a formidable IDE to work with and offers an environment to seamlessly work with multiple languages beyond R. It is especially convenient for tasks involving frequent visualisation of data frames and plots, and for use with Shiny app development. However, the text (i.e code) editing capabalities are still significantly lacking compared to the likes of Emacs and Vim. Besides this, it does not offer a seamless interface integrating task, time management and multi-language programming environments to the extent available within Org-mode via Emacs.

Docker container and image management within Emacs

Note taken on [2020-02-24 Mon 07:21] Rewritten to improve clarity and grammatical corrections. Using the docker package in Emacs has saved several minutes of my time (for each command) related to docker, and just as important - a tonne of effort involved in hunting for docker container names, command history, copying the container ID’s and so on that are very typical steps of messing around with docker. Anybody learning docker will know that these commands are used so frequently that it becomes rather annoying quickly.

Setting up Continuous Integration (CI) for docker containers

This blog post goes through the process of setting up Continuous Integration for building docker images via Dockerhub and Github, and via Github Actions. It also contains a condensed summary of important notes from the documentation that will be handy as ready reference. Goal: Gain an overview of using Continuous Integration (CI) for automated builds of the docker images that built for a data science toolbox based on R (for now).

Notes - What they forgot to teach you about R

The book, ‘What they forgot to teach you about R’ being co-authored by <https://twitter.com/JennyBryan @JennyBryan> is not yet completed, however I was still compelled to go through the existing material as it was an engaging read. These are some notes captured from the book. Verbatim quotes from the book are encapsulated. My notes and observations are added in plain text. I recommend you cultivate a workflow in which you treat R processes (a.

Some notes on research-compendium

These are my notes while studying the research-compendium concept, which is essentially a bunch of guidelines to produce research that is ‘easily’ reproducible. The notes are mostly based on marwick-2018-packag-r , which is one canonical reading on the concept. Other references are mentioned throughout the text, and also collected separately. These notes were prepared a few weeks ago during a foray into Docker. They are neither complete not comprehensive - but will serve as a good refresher of the principle concepts.

Notes on Docker

Docker is a fascinating concept that could be potentially useful in many ways, especially in Data science, and making reproducible workflows / environments. There are several articles which have great introductions and examples of using docker in data science This is an evolving summary of my exploration with Docker. It should prove to be a handy refresher of commands and concepts. TODO What is Docker A brief summary of what Docker is all about.

R notes and snippets

Lubridate - introductory technical paper This paper (Grolemund and Wickham) offers a good introduction and comparison between using lubridate and not using it, as well as several examples of using the library. It also offers some case studies which can serve as useful drill exercises. Importing multiple excel sheets from multiple excel files This is one approach to importing multiple sheets from multiple excel files into a list of tibbles. The goal is that each sheet is imported as a separate tibble.

MongoDB and NoSQL Databases

Introduction These are my notes on NoSQL databases and the prime differences between them and SQL databases. The notes are mostly based off the Udemy course Introduction to MongoDB, and therefore primarily focused on using MongoDB at the moment. Methodology and Tools Installing Mongodb The instructions are available in the mongoDB manual. This is for the Community edition, and on a Mac as welll as Linux machine (Antergos) Mac If never installed before, tap the resource first.

Rapidly accessing cheatsheets to learn data science with Emacs

Matt Dancho’s course DSB-101-R is an awesome course to step into ROI driven business analytics fueled by Data Science. In this course, among many other things - he teaches methods to understand and use cheatsheets to gain rapid level-ups, especially to find information connecting various packages and functions and workflows. I have been hooked to this approach and needed a way to quickly refer to the different cheatsheets as needed.

Nteract : An interactive computing environment

A slide deck from Netflix, mentions using Nteract as their programming notebook, and prompted a mini exploration. This blog post by Safia Abdalla, (a maintainer/ developer of Nteract) introduces Nteract as an open source, desktop-based, interactive computing application that was designed to overcome a bunch of limitations in Jupyter Notebook’s design philosophy. One key difference (among many others) is the ability to execute code in a variety of languages within a single notebook, and it also appears that that the electron based desktop app should make it easier for beginners to start coding.