My Data Science Learning Journal 📈

I am an Economics Graduate Student with a passion for data and I want to document my journey learning data science techniques using this blog. Still updating!

EU Greenhous Gases Emissions

Please provide a pdf document of only one A4 page with the naming convention “3946 – LAST NAME – FIRST NAME.PDF”. The page should contain: A. the name of the applicant; B. a first chart showing for the year 2019 and each EU country “Total Greenhouse gas emissions (excluding LULUCF and memory items, including international aviation)” (y-axis) vs. “Population” (x-axis); possible sources of information: * https://ec....

August 14, 2021 · Luca Baggi

Dealing With Implicit Missing Data

Today I want to test some ways to deal with implicit missing values: namely, creating grids with several commands and performing full joins on our data. Let’s use again COVID-19 vaccinations data in Italy, available from the official repo. Load Data url_vaccinations <- 'https://raw.githubusercontent.com/italia/covid19-opendata-vaccini/master/dati/somministrazioni-vaccini-latest.csv' read_csv(url_vaccinations, col_types = cols( # parse as dates data_somministrazione = "D", # parse as factors fornitore = "f", area = "f", fascia_anagrafica = "f" # the rest, let it be guessed )) %>% # remove 'categoria' from several column names rename_with( ~ stringr::str_remove(....

January 28, 2021 · Luca Baggi

COVID-19 Vaccinations Data: Some Visualisations for Italy

Now that we have wrangled the data a bit, we can proceed with some visualisations. We want to plot three things: How many vaccinations are administered daily. How many doses have been administered so far and their ratio. See how regions perform in terms of doses administered and doses received. Data Wrangling Let’s load the data: read_csv( 'https://raw.githubusercontent.com/orizzontipolitici/covid19-vaccine-data/main/data_ita/doses_by_date_ita.csv', ) -> doses_by_date read_csv( 'https://raw.githubusercontent.com/orizzontipolitici/covid19-vaccine-data/main/data_ita/vaccinations_by_area_ita.csv', col_types = cols(area = col_factor()) ) -> vaccinations_by_area These two datasets are incompatible: first, data grouped by area needs to be grouped to the Italian level....

January 23, 2021 · Luca Baggi

Longer Versus Wider Table Format

Today’s goals is to compare readability with long versus wide format of data. Plus, we will fill a date column with missing values in a panel dataset. Let’s load the packages we are going to use: library(tidyverse) # set a theme to stay for the rest of the plotting theme_set(theme_minimal()) Load the Data We will be using COVID-19 vaccination data, which are available at the following repo. I also collected and wrangled the data here for Orizzonti Politici....

January 23, 2021 · Luca Baggi