1 Introduction to the project

According to the WHO Suicides organization, 800.000 committed suicide in 2018. This means every 40 seconds a person dies by suicide. This number is fortunately dropping. In this kernel I want to explore the evolution of suicide rate using the dataset provided here on Kaggle.

I’ll be using the powerful R language to do this analysis, my main focus is to understand what affects the suicide rate to decrease.

Let’s start by loading the packages we’ll be using throughout this study.

country year sex age suicides_no population suicides.100k.pop country.year HDI.for.year gdp_for_year…. gdp_per_capita…. generation
Albania 1987 male 15-24 years 21 312900 6.71 Albania1987 NA 2,156,624,900 796 Generation X
Albania 1987 male 35-54 years 16 308000 5.19 Albania1987 NA 2,156,624,900 796 Silent
Albania 1987 female 15-24 years 14 289700 4.83 Albania1987 NA 2,156,624,900 796 Generation X
Albania 1987 male 75+ years 1 21800 4.59 Albania1987 NA 2,156,624,900 796 G.I. Generation
Albania 1987 male 25-34 years 9 274300 3.28 Albania1987 NA 2,156,624,900 796 Boomers
Albania 1987 female 75+ years 1 35600 2.81 Albania1987 NA 2,156,624,900 796 G.I. Generation

1.1 Variable definition

Before we go further in this analysis, it is important to know what each column or variable in the dataset stands for.

I think only the columns suicide.100k, suicide.no and HDI.for.year are problematic. Let’s explain them :

suicide.100k stands for the number of death by suicide for a total 100.000 deaths.

suicide.no is the number of suicide.

HDI.for.year is the Human Development Index of the year.