Recommending Songs Using Cosine Similarity in R

Recommendation engines have a huge impact on our online lives. The content we watch on Netflix, the products we purchase on Amazon, and even the homes we buy are all served up using these algorithms. In this post, I’ll run through one of the key metrics used in developing recommendation engines: cosine similarity. First, I’ll give a brief overview of some vocabulary we’ll need to understand recommendation systems. Then, I’ll look at the math behind cosine similarity. [Read More]

Using R to Create Custom Color Palettes for Tableau

Have you ever wanted to define custom color palettes in Tableau, but didn’t know how? In this post, I’m going to walk through how we can use R to programmatically generate custom palettes in Tableau. Creating custom color palettes for Tableau has never been easier! This is going to be a short post, with just a little bit of R code. At the end of the post, you’ll see how to use R to generate custom color palettes to add to Tableau. [Read More]

Iterating on a 2016 Election Analysis

Jake Low wrote a really interesting piece that presented a few data visualizations that went beyond the typical 2016 election maps we’ve all gotten used to seeing. I liked a lot of things about Jake’s post, here are three I was particularly fond of: His color palette choices Each color palette that was used had solid perceptual properties and made sense for the data being visualized (i.e. diverging versus sequential) He made residuals from a model interesting by visualizing and interpreting them He explained the usage of a log-scale transformation in an intuitive way, putting it in terms of the data set being used for the analysis. [Read More]

Everything I Know About Machine Learning I Learned from Making Soup

Introduction In this post, I’m going to make the claim that we can simplify some parts of the machine learning process by using the analogy of making soup. I think this analogy can improve how a data scientist explains machine learning to a broad audience, and it provides a helpful framework throughout the model building process. Relying on some insight from the CRISP-DM framework, my own experience as an amateur chef, and the well-known iris data set, I’m going to explain why I think that the soup making and machine learning connection is a pretty decent first approximation you could use to understand the machine learning process. [Read More]

Golf, Tidy Data, and Using Data Analysis to Guide Strategy

Introduction I’m going to use this post to discuss some of the aspects of data science that interest me most (tidy data as well as using data to guide strategy). I’ll be discussing these topics through the lens of a data analysis of results from a few high school golf tournaments. I’m going to take a little bit of time to talk about tidy data. When I scraped the data used for this analysis, it wasn’t really stored in a tidy format, and there’s a good reason for that. [Read More]

An Introduction to the kmeans Algorithm

This post will provide an R code-heavy, math-light introduction to selecting the \(k\) in k means. It presents the main idea of kmeans, demonstrates how to fit a kmeans in R, provides some components of the kmeans fit, and displays some methods for selecting k. In addition, the post provides some helpful functions which may make fitting kmeans a bit easier. kmeans clustering is an example of unsupervised learning, where we do not have an output we’re explicitly trying to predict. [Read More]

My First Post

Welcome to my blog! I plan to use this website to present data explorations and analyses in a way that’s understandable to a broad audience. I hope to demonstrate the utility of applying ideas like machine learning, data visualization, and exploratory data analysis to day-to-day life to improve decision-making processes. I was inspired to create a blog after reading this post by David Robinson. New blog post: "Advice to aspiring data scientists: start a blog" https://t. [Read More]