Global Energy Trends

Insights into Consumption and Sustainability

The goal of this project is to analyze the complex relationships between economic and population growth, sustainable energy practices, and energy consumption
Author
Affiliation

DataDetectives - Ayesha, Abhishek, Shreemithra, Toluwanimi, Valerie, Alyssa

School of Information, University of Arizona

Introduction

In an era marked by escalating demands and environmental concerns, the question of predicting a nation’s power consumption becomes increasingly pivotal. The Our World in data energy data set covers a range of information on primary energy consumption, per capita energy, growth rates, energy mix, electricity composition, and other relevant factors This data set contains information from the year 1900 to 2022. By delving into these factors. This project aims to unravel the complex interplay between economic and population growth as well as sustainable energy practices by creating a dashboard.

Dataset

The Our World Our Data Energy dataset.

Code
import pandas as pd

# Load the dataset
data = pd.read_csv('data/owid-energy-data.csv')

# Display a few rows of the dataset
data.loc[7392:7397]
country year iso_code population gdp biofuel_cons_change_pct biofuel_cons_change_twh biofuel_cons_per_capita biofuel_consumption biofuel_elec_per_capita ... solar_share_elec solar_share_energy wind_cons_change_pct wind_cons_change_twh wind_consumption wind_elec_per_capita wind_electricity wind_energy_per_capita wind_share_elec wind_share_energy
7392 France 1996 FRA 57623180.0 1.774692e+12 49.895 0.867 45.209 2.605 33.493 ... 0.000 0.000 57.613 0.008 0.022 0.174 0.01 0.381 0.002 0.001
7393 France 1997 FRA 57839364.0 1.817073e+12 30.777 0.802 58.902 3.407 37.345 ... 0.000 0.000 49.773 0.011 0.033 0.173 0.01 0.569 0.002 0.001
7394 France 1998 FRA 58080344.0 1.882613e+12 -14.134 -0.482 50.367 2.925 36.673 ... 0.000 0.000 75.202 0.025 0.058 0.344 0.02 0.993 0.004 0.002
7395 France 1999 FRA 58352208.0 1.950262e+12 4.348 0.127 52.312 3.053 40.101 ... 0.000 0.000 89.486 0.052 0.109 0.685 0.04 1.872 0.008 0.004
7396 France 2000 FRA 58665456.0 2.031950e+12 19.585 0.598 62.223 3.650 42.274 ... 0.002 0.000 29.925 0.033 0.142 0.852 0.05 2.419 0.009 0.005
7397 France 2001 FRA 59014776.0 2.077915e+12 1.373 0.050 62.705 3.700 48.293 ... 0.002 0.001 172.917 0.243 0.385 2.203 0.13 6.521 0.024 0.012

6 rows × 129 columns

Dataset Description:

This dataset provides information on energy consumption for various regions around the world as well as information on energy consumption, gross domestic product, and population information across the years from 1900s to 2022.

Dimensions

(21590, 129)

This owid-energy-data.csv dataset contains 21,590 rows and 129 columns.

Provenance

This information is created by Our World in Data (OWID), a renowned institution acknowledged for its meticulous, evidence-based study on global matters. The dataset might integrate data from several worldwide sources, such as government agencies, research institutes, and industry publications, to offer a comprehensive perspective on global energy patterns. The data source link is attached here: LINK

Github Link

Code book Insights:

This data-set has about 129 columns but we will work on few columns, that is why we only provided the important data descriptions. Code book Insights is shown below:

Index Column Name Data Type Descriptions
country string Geographic location
year integer Year of observation.
population interger Population by country, available from 10,000 BCE to 2100, based on data and estimates from different sources. Data range does go from 10,000 BCE to 2100 but is pulled and presented based on the year (year of observation) variable.
gdp double Gross domestic product (GDP) measured in international-$ using 2011 prices to adjust for price changes over time (inflation) and price differences between countries. Calculated by multiplying GDP per capita with population.
renewables_electricity double Electricity generation from renewables (terawatt-hours)
electricity_generation double Total electricity generation (terawatt-hours)
fossil_electricity double Electricity generation from fossil fuels (terawatt-hours)

Why this dataset?

This data set was formed from the combination of multiple data sets related to topics on energy consumption, economic growth, and population growth over the years for different regions around the world. This data set provides in-depth information on energy since it covers most if not all the topics related to energy and energy consumption as well as other related areas. The data set was built upon a statistical review of world energy by the Energy Institute, International energy data by the United States Information Administration, Energy from fossil fuels from the Shift Dataportal, Yearly Electricity Data,  European Electricity Review (Ember), Combined Electricity, Energy Mix, Fossil fuel production, Primary Energy consumption, Electricity mix, Energy data set, Population, Regions, Income groups and GDP. All these reliable and updated information sources come together to make this data set the best one for this analysis.

Question 01

Is it possible to predict a nation’s power consumption by considering its population size, gross domestic product (GDP), and the percentage of electricity generated from renewable sources and changes across the years?

The Importance of this Question

Predicting a country’s power consumption is a difficult yet important taking on, necessary for guaranteeing sustained progress and energy stability. This subject explores the intricate relationship between economic progress, population growth, and sustainable energy consumption by examining a nation’s population size, Gross Domestic Product (GDP), and the proportion of electricity generated from renewable sources. The research can provide valuable guidance for policy decisions, energy planning, and infrastructure investment to fulfill future energy demands in a sustainable manner.

Dependency Columns

In order to investigate this research issue, the following data columns are essential:

Population (population) which functions as a surrogate for energy demand, as a larger population often necessitates greater energy consumption for residential, commercial, and industrial purposes.

A larger Gross Domestic Product (gdp) frequently corresponds to an augmented consumption of energy as a result of industrialization, commercial endeavors, and improved living conditions.

The proportion of electricity generated by renewable sources, expressed as a percentage of total electricity generation, is represented by the variable renewable_energy / electricity_generation. Assessing the proportion of renewable energy in a nation’s entire electricity mix is vital for understanding its transition to sustainable energy sources and the effects on overall energy consumption patterns.

Energy Consumption (energy_consumption): Although this column may cover all forms of energy use, including electricity and more, it is essential for the study. If there is a particular column that contains data on electricity consumption, such as Energy Consumption it would be more directly applicable to the subject at hand.

The variable year is used to analysis time series.

Analysis plan

Data Cleaning:

  • We will Verify the dataset for any anomalies or missing values and address them appropriately using techniques like imputation or removal of incomplete records.

Feature Engineering:

  • We will Calculate the percentage of electricity generated from renewable sources by dividing the renewable electricity consumption by the total electricity generation.
  • Standardize or normalize the relevant columns if deemed beneficial for analysis.

Exploratory Data Analysis (EDA):

  • We will Analyze and visualize the dataset to identify underlying patterns and relationships.
  • Explore the distribution of variables like population size, GDP, renewable energy percentage, and energy consumption across different countries and years.

Descriptive Statistics:

  • We will Generate statistical summaries for each dependence column to understand the distribution and central tendencies of the data.
  • Calculate measures such as mean, median, standard deviation, and correlation coefficients.

Correlation Analysis:

  • We will Determine the magnitude and direction of the association between energy consumption and independent variables like population size, GDP, and renewable energy percentage using correlation analysis.

Time Series Analysis:

  • We will Investigate trends, seasonality, and patterns over time for variables like energy consumption to understand past shifts and forecast future trends.

Regression Analysis:

  • We will Utilize basic linear regression to comprehend the correlation between energy consumption and independent variables individually.

Multiple Regression:

  • we will Apply multiple regression analysis to simultaneously consider multiple independent variables, such as population size, GDP, and renewable energy percentage, to predict energy usage more accurately.

Data Visualization

Line Graphs:

For time series data, create graphs that illustrate the patterns of energy consumption, population growth, GDP, and the proportion of renewable energy over a period of time.

Scatterplots:

Are used to visually represent the correlation between energy use and each of the independent variables. Heat-maps are a useful tool for visualizing the correlation matrix, as they effectively emphasize the strength of correlations between variables.

Bar and pie charts:

Are used to visually represent the breakdown of energy sources, with a specific emphasis on differentiating between renewable and non-renewable energy. We are not decided yet that which techniques will be used here.

Model Evaluation:

Evaluate the models by employing suitable metrics such as R-squared and RMSE for regression analysis to assess their performance and accuracy in predicting energy consumption.

Question 02

What countries or regions are engaging in sustainable energy practices and relying more on renewable energy compared to nonrenewable energy? Which countries are moving towards the trajectory of relying more on renewable energy and producing less greenhouse gas emissions?

The Importance of this Question:

The goal of this question is to analyze the energy practices of various countries and determine which countries are practicing sustainable energy and reducing the energy consumption of nonrenewable energy sources. Sustainable energy. This question seeks to determine which countries rely more on non-renewable energy sources compared to that of renewable energy sources. An analysis will also be done to determine which countries based on the current information is moving in the trajectory of using renewable energy sources using regression analysis. For this data the countries will be divided into countries with larger and smaller populations to handle any form of bias that may occur due to the population sizes of the various countries. A log transformation can also be done to the population to resolve this issue.

Dependency Columns:

To complete the analysis of this question, the following data columns are essential.

Population:

This functions as a surrogate for energy demand as a larger population often necessitates a greater energy consumption for residential commercial and industrial purposes.

Energy consumed:

These columns pertaining to the total amount of energy comsumed for each country will be assessed for the purpose of this analysis to determine the proportion of sustainable electrical energy sources from the total electricity consumption.

Greenhouse gas emissions:

This will determine which regions or countries produce the most green house gases and cause pollution.

Renewable and non-renewable consumption:

The renewable and non renewable consumption columns and all other related columns will be used to determine the countries engaging in sustainable energy practices.

Analysis plan

Data Preparation and Analysis

  • We have missing values in our dataset. Here, we will employ advanced machine learning imputation techniques like iterative imputation to obtain a reliable dataset. Then, we will check for skewness in important columns like population, GDP, energy consumed, renewable, and non-renewable consumption.

Feature Engineering

This step involves calculating the proportions of renewable and non-renewable energy sources relative to total energy consumption.

  • Calculate proportion of renewable energy: \[ \frac{{\text{{Solar consumption}} + \text{{Wind consumption}}}}{{\text{{Total energy consumption}}}} \]

  • Calculate proportion of non-renewable energy:\[ \frac{{\text{{Biofuel consumption}}}}{{\text{{Total energy consumption}}}} \]

These metrics reveal the usage of renewable and non-renewable energy relative to total consumption.

Exploratory Data Analysis (EDA):

  • We will Analyze the distribution of renewable and non-renewable energy consumption across countries using histograms.
  • Visualize trends in renewable energy adoption over time using line plots.
  • Investigate correlations between GDP and renewable energy usage through scatter plots.
  • Examine the relationship between population size and greenhouse gas emissions using scatter plots.

This phase involves analyzing and visualizing the dataset to discern patterns and relationships, providing valuable insights into sustainable energy practices and the transition towards renewable energy sources.

Descriptive Statistics:

  • We will Generate a statistical summary for each of the dependence columns, such as renewable energy consumption and non-renewable energy consumption, to ascertain their magnitude and association. This involves calculating measures such as mean, median, standard deviation, and correlation coefficients between these variables.

Regression Analysis:

  • We will utilize regression analysis with predictors such as renewable energy consumption, GDP, population size, and greenhouse gas emissions to identify countries maximizing sustainable energy practices. This analysis aims to determine the relationship between these variables and sustainable energy practices, providing insights for effective policy-making and intervention strategies.

Data Visualization

Maps: The use of maps will be employed to visualize the hotspots of countries which utilize renewable energy the most and those that focus on non-renewable energy sources. If external data is needed, we will incorporate it as well.

Line graphs: Line graphs will be used to determine the patterns in the use of certain renewable and non-renewable energy sources over the years for certain regions.

Density plots: Density plots will be used to visualize the distribution of data for renewable and non-renewable energy sources across continents and or other forms of regional boundaries.

Project Timeline

Project Tasks

This document outlines the tasks for our project, including their status, assignees, due dates, priorities, and summaries.

Task Name

Status

Assignee

Due

Priority

Summary

Dataset Complete All (6) March 27th High Brainstorming and selecting appropriate dataset.
Questions Complete All (6) March 29th High Determining questions to answer in analysis.
Proposal Complete All (6) April 3rd High Brainstorming and working on data analysis and proposal.
Peer evaluation: proposal Complete All (6) April 3rd High Performing evaluations on different teams’ project proposals.
Proposal feedback Complete All (6) April 8th High Discuss feedback given on our proposal and make necessary adjustments.
Instructor review: proposal Complete All (6) April 10th High Instructor will review the proposal and provide feedback on a later date.
About Complete Valerie April 17th High Fill in individual information in the About.qmd
Q1 : code Complete Ayesha May 1st High Writing code to create visualizations and performing analysis for question 1.
Q1 : write-up Complete Sheemithra May 2nd High 500 to 1000 word long description of question one analysis.
Q1 : presentation Complete Alyssa May 2nd High Construct slides with all visualizations, descriptions, and talking points needed to answer analysis questions for Q1.
Peer evaluation: code Complete All (6) May 1st High Performing evaluations on different teams’ project code.
Code feedback Complete All (6) May 1st High Discuss feedback given on our code and make necessary adjustments.
Q2 : code Complete

Valerie,

Abhishek, Tolu

May 1st High Writing code to create visualizations and performing analysis for question 2.
Q2 : write-up Complete

Valerie,

Abhishek, Tolu

May 2nd High 500 to 1000 word long description of question two analysis.
Q2 : presentation Complete

Valerie,

Abhishek, Tolu, Alyssa

May 2nd High Construct slides with all visualizations, descriptions, and talking points needed to answer analysis questions for Q2.
Dashboard Complete Abhishek, Tolu, Valerie May 2nd High Build an interactive Python Visualization Dashboard using Panel.
Final checks Complete Ayesha, Alyssa May 3rd Med Clean-up all clutter/mistakes found in write-up, presentation, and dashboard.
Presentation delegation Complete All (6) May 3rd High Determine who will speak to what slides.
Presentation practice Complete All (6) May 3rd High Do a mock presentation to ensure smooth transitions and correct interpretations of graphs and information.
Presentation Complete All (6) May 6th High Present findings in 5 minutes.
Team Member Evaluations Complete All (6) May 6th High Give feedback on each team member.
Peer evaluation: presentations Complete All (6) May 7th High Performing evaluations on different teams’ project presentations.

Repo Organization

The following folders comprise the project repository

  • .github/: This directory is designated for files associated with GitHub, encompassing workflows, actions, and templates tailored for issues.

  • _extra/: Reserved for miscellaneous files that don’t neatly fit into other project categories, providing a catch-all space for various supplementary documents.

  • _freeze/: Within this directory lie frozen environment files containing comprehensive information regarding the project’s environment configuration and dependencies.

  • data/: Specifically allocated for storing i data files crucial for the project’s functionality, encompassing input files, datasets, and other essential data resources.

  • images/: Serving as a repository for visual assets employed throughout the project, including diagrams, charts, and screenshots, this directory maintains visual elements integral to project documentation and presentation.

  • .gitignore: This file functions to specify exclusions from version control, ensuring that designated files and directories remain untracked by Git, thus streamlining the versioning process.

  • README.md: Serving as the primary hub of project information, this README document furnishes essential details encompassing project setup, usage instructions, and an overarching overview of project objectives and scope.

  • _quarto.yml: Acting as a pivotal configuration file for Quarto, this document encapsulates various settings and options governing the construction and rendering of Quarto documents, facilitating customization and control over document output.

  • about.qmd: This Quarto Markdown file supplements project documentation by providing additional contextual information, elucidating project purpose, contributor insights, and other pertinent project details.

  • index.qmd: index.qmd: This serves as the main documentation page for our project. This Quarto Markdown file provides detailed descriptions of our project, including all code and visualization .

References

[1] The Data source link is attached here: https://ourworldindata.org/energy

[2] Github Link: https://github.com/owid/energy-data