Statistics And Data Science

The inverse-transform method for generating random variables in R

4 September, 2021

It's possible to generate random variables from a specific distribution by using the inverse of the cumulative distribution function. We'll see how to do this with a few specific examples in R.

Training vs test mean squared error in R

24 June, 2021

R Statistics and data science

Mean squared error (MSE) is an important measure of models that predict continuous variables. When we're using it to evaluate our model, however, we need to be careful that we're using the test MSE rather than the training MSE. We'll go over the differences here, using some examples in R.

Simple logistic regression with Python

18 February, 2021

Python Statistics and data science

Logistic regression is used to predict a binary response. Let's see how to use Python to construct a logistic regression model that'll tell us which factors affected the passengers' chance of survival.

Regression case study part two: New Zealand's greenhouse gas emissions

15 February, 2021

Python Statistics and data science

In part two, we dive deeper into some aspects of regression modelling, including multiple regression and evaluating model fit.

Regression case study part one: New Zealand's greenhouse gas emissions

31 January, 2021

Python Statistics and data science

This is the first of two posts describing the development and evaluation of a regression model of NZ's greenhouse gas emissions.

Statistical interactions between variables with Python

25 January, 2021

Python Statistics and data science

A statistical interaction occurs when the effect of one variable is moderated by another variable. Here, we'll show how the relationship between a car's fuel efficiency and engine displacement changes with the number of cylinders.

Simple Pearson correlation with Python

24 January, 2021

Python Statistics and data science

The Pearson correlation coefficient measures the relationship between two continuous variables. We'll show how to obtain a Pearson correlation coefficient in Python.

Simple chi-squared test with Python

21 January, 2021

Python Statistics and data science

The chi-squared test is used to determine the statistical significance of a categorical explanatory variable's effect on a categorical response variable. We'll show to how to conduct a simple chi-squared test in Python.

Simple ANOVA with Python

20 January, 2021

Python Statistics and data science

In this quick tutorial, we describe how to conduct a simple ANOVA statistical test on the iris dataset with Python.

Behavioural psychology and Python: Part 1

24 February, 2020

Python Statistics and data science

In The Undoing Project, Michael Lewis's excellent book about two groundbreaking behavioural psychologists, we get a glimpse at some of the mechanisms that drive our judgements and decisions. By coming up with interesting questions to pose to ...