2 Data exploration

In this section we are going to explore the data in order to find insights.

2.1 Missing values checking and fixing

Do we have any missing data ?

x
country 0
year 0
sex 0
age 0
suicides_no 0
population 0
suicides.100k.pop 0
country.year 0
HDI.for.year 19456
gdp_for_year…. 0
gdp_per_capita…. 0
generation 0

Only the HDI.for.year column contains missing values. What is the proportion of missing data in this column?

## [1] 69.9353

Near 70 % of the data is missing for this column. We’ll see how we can make use of this variable.

2.2 Qualitative variables frequencies

2.2.1 Génération

generation nb
Boomers 4990
G.I. Generation 2744
Generation X 6408
Generation Z 1470
Millenials 5844
Silent 6364

These are the number of occurences of each generation in the dataset.

X generation and silent are the most popular. Generation Z is the smallest group.

2.2.2 Age groups

Let’s now visualize the age groups

The age groups are all equally distributed.

2.2.3 By sex

How about the the sex group. They both are equally distributed

2.3 Data by year

Now do we have the same amount of data for each year ?

The dataset does not have all the data for each year. Each year varies. For example the last year 2016 has the fewest records. We need to keep this information in mind when we want to interpret the results of the analysis.