Posts with the Statistics And Data Science tag

The inverse-transform method for generating random variables in R

R Statistics and data science

4 September, 2021

– It's possible to generate random variables from a specific distribution by using the inverse of the cumulative distribution function. We'll see how to do this with a few specific examples in R.



Training vs test mean squared error in R

R Statistics and data science

24 June, 2021

– Mean squared error (MSE) is an important measure of models that predict continuous variables. When we're using it to evaluate our model, however, we need to be careful that we're using the test MSE rather than the training MSE. We'll go over the differences here, using some examples in R.



Simple logistic regression with Python

Python Statistics and data science

18 February, 2021

– Logistic regression is used to predict a binary response. Let's see how to use Python to construct a logistic regression model that'll tell us which factors affected the passengers' chance of survival.



Regression case study part two: New Zealand's greenhouse gas emissions

Python Statistics and data science

15 February, 2021

– In part two, we dive deeper into some aspects of regression modelling, including multiple regression and evaluating model fit.



Regression case study part one: New Zealand's greenhouse gas emissions

Python Statistics and data science

31 January, 2021

– This is the first of two posts describing the development and evaluation of a regression model of NZ's greenhouse gas emissions.



Statistical interactions between variables with Python

Python Statistics and data science

25 January, 2021

– A statistical interaction occurs when the effect of one variable is moderated by another variable. Here, we'll show how the relationship between a car's fuel efficiency and engine displacement changes with the number of cylinders.



Simple Pearson correlation with Python

Python Statistics and data science

24 January, 2021

– The Pearson correlation coefficient measures the relationship between two continuous variables. We'll show how to obtain a Pearson correlation coefficient in Python.



Simple chi-squared test with Python

Python Statistics and data science

21 January, 2021

– The chi-squared test is used to determine the statistical significance of a categorical explanatory variable's effect on a categorical response variable. We'll show to how to conduct a simple chi-squared test in Python.



Simple ANOVA with Python

Python Statistics and data science

20 January, 2021

– In this quick tutorial, we describe how to conduct a simple ANOVA statistical test on the iris dataset with Python.



Behavioural psychology and Python: Part 1

Python Statistics and data science

24 February, 2020

– In The Undoing Project, Michael Lewis's excellent book about two groundbreaking behavioural psychologists, we get a glimpse at some of the mechanisms that drive our judgements and decisions. By coming up with interesting questions to pose to ...