The goal of this project is to analyze the complex relationships between economic and population growth, sustainable energy practices, and energy consumption
In an era marked by escalating demands and environmental concerns, the question of predicting a nation’s power consumption becomes increasingly pivotal. The Our World in data energy data set covers a range of information on primary energy consumption, per capita energy, growth rates, energy mix, electricity composition, and other relevant factors This data set contains information from the year 1900 to 2022. By delving into these factors. This project aims to unravel the complex interplay between economic and population growth as well as sustainable energy practices by creating a dashboard.
Dataset
The Our World Our Data Energy dataset.
Code
import pandas as pd# Load the datasetdata = pd.read_csv('data/owid-energy-data.csv')# Display a few rows of the datasetdata.loc[7392:7397]
country
year
iso_code
population
gdp
biofuel_cons_change_pct
biofuel_cons_change_twh
biofuel_cons_per_capita
biofuel_consumption
biofuel_elec_per_capita
...
solar_share_elec
solar_share_energy
wind_cons_change_pct
wind_cons_change_twh
wind_consumption
wind_elec_per_capita
wind_electricity
wind_energy_per_capita
wind_share_elec
wind_share_energy
7392
France
1996
FRA
57623180.0
1.774692e+12
49.895
0.867
45.209
2.605
33.493
...
0.000
0.000
57.613
0.008
0.022
0.174
0.01
0.381
0.002
0.001
7393
France
1997
FRA
57839364.0
1.817073e+12
30.777
0.802
58.902
3.407
37.345
...
0.000
0.000
49.773
0.011
0.033
0.173
0.01
0.569
0.002
0.001
7394
France
1998
FRA
58080344.0
1.882613e+12
-14.134
-0.482
50.367
2.925
36.673
...
0.000
0.000
75.202
0.025
0.058
0.344
0.02
0.993
0.004
0.002
7395
France
1999
FRA
58352208.0
1.950262e+12
4.348
0.127
52.312
3.053
40.101
...
0.000
0.000
89.486
0.052
0.109
0.685
0.04
1.872
0.008
0.004
7396
France
2000
FRA
58665456.0
2.031950e+12
19.585
0.598
62.223
3.650
42.274
...
0.002
0.000
29.925
0.033
0.142
0.852
0.05
2.419
0.009
0.005
7397
France
2001
FRA
59014776.0
2.077915e+12
1.373
0.050
62.705
3.700
48.293
...
0.002
0.001
172.917
0.243
0.385
2.203
0.13
6.521
0.024
0.012
6 rows × 129 columns
Dataset Description:
This dataset provides information on energy consumption for various regions around the world as well as information on energy consumption, gross domestic product, and population information across the years from 1900s to 2022.
Dimensions
(21590, 129)
This owid-energy-data.csv dataset contains 21,590 rows and 129 columns.
Provenance
This information is created by Our World in Data (OWID), a renowned institution acknowledged for its meticulous, evidence-based study on global matters. The dataset might integrate data from several worldwide sources, such as government agencies, research institutes, and industry publications, to offer a comprehensive perspective on global energy patterns. The data source link is attached here: LINK
This data-set has about 129 columns but we will work on few columns, that is why we only provided the important data descriptions. Code book Insights is shown below:
Index
Column Name
Data Type
Descriptions
country
string
Geographic location
year
integer
Year of observation.
population
interger
Population by country, available from 10,000 BCE to 2100, based on data and estimates from different sources. Data range does go from 10,000 BCE to 2100 but is pulled and presented based on the year (year of observation) variable.
gdp
double
Gross domestic product (GDP) measured in international-$ using 2011 prices to adjust for price changes over time (inflation) and price differences between countries. Calculated by multiplying GDP per capita with population.
renewables_electricity
double
Electricity generation from renewables (terawatt-hours)
electricity_generation
double
Total electricity generation (terawatt-hours)
fossil_electricity
double
Electricity generation from fossil fuels (terawatt-hours)
Why this dataset?
This data set was formed from the combination of multiple data sets related to topics on energy consumption, economic growth, and population growth over the years for different regions around the world. This data set provides in-depth information on energy since it covers most if not all the topics related to energy and energy consumption as well as other related areas. The data set was built upon a statistical review of world energy by the Energy Institute, International energy data by the United States Information Administration, Energy from fossil fuels from the Shift Dataportal, Yearly Electricity Data, European Electricity Review (Ember), Combined Electricity, Energy Mix, Fossil fuel production, Primary Energy consumption, Electricity mix, Energy data set, Population, Regions, Income groups and GDP. All these reliable and updated information sources come together to make this data set the best one for this analysis.
Question 01
Is it possible to predict a nation’s power consumption by considering its population size, gross domestic product (GDP), and the percentage of electricity generated from renewable sources and changes across the years?
The Importance of this Question
Predicting a country’s power consumption is a difficult yet important taking on, necessary for guaranteeing sustained progress and energy stability. This subject explores the intricate relationship between economic progress, population growth, and sustainable energy consumption by examining a nation’s population size, Gross Domestic Product (GDP), and the proportion of electricity generated from renewable sources. The research can provide valuable guidance for policy decisions, energy planning, and infrastructure investment to fulfill future energy demands in a sustainable manner.
Dependency Columns
In order to investigate this research issue, the following data columns are essential:
Population (population) which functions as a surrogate for energy demand, as a larger population often necessitates greater energy consumption for residential, commercial, and industrial purposes.
A larger Gross Domestic Product (gdp) frequently corresponds to an augmented consumption of energy as a result of industrialization, commercial endeavors, and improved living conditions.
The proportion of electricity generated by renewable sources, expressed as a percentage of total electricity generation, is represented by the variable renewable_energy / electricity_generation. Assessing the proportion of renewable energy in a nation’s entire electricity mix is vital for understanding its transition to sustainable energy sources and the effects on overall energy consumption patterns.
Energy Consumption (energy_consumption): Although this column may cover all forms of energy use, including electricity and more, it is essential for the study. If there is a particular column that contains data on electricity consumption, such as Energy Consumption it would be more directly applicable to the subject at hand.
The variable year is used to analysis time series.
Analysis plan
Data Cleaning:
We will Verify the dataset for any anomalies or missing values and address them appropriately using techniques like imputation or removal of incomplete records.
Feature Engineering:
We will Calculate the percentage of electricity generated from renewable sources by dividing the renewable electricity consumption by the total electricity generation.
Standardize or normalize the relevant columns if deemed beneficial for analysis.
Exploratory Data Analysis (EDA):
We will Analyze and visualize the dataset to identify underlying patterns and relationships.
Explore the distribution of variables like population size, GDP, renewable energy percentage, and energy consumption across different countries and years.
Descriptive Statistics:
We will Generate statistical summaries for each dependence column to understand the distribution and central tendencies of the data.
Calculate measures such as mean, median, standard deviation, and correlation coefficients.
Correlation Analysis:
We will Determine the magnitude and direction of the association between energy consumption and independent variables like population size, GDP, and renewable energy percentage using correlation analysis.
Time Series Analysis:
We will Investigate trends, seasonality, and patterns over time for variables like energy consumption to understand past shifts and forecast future trends.
Regression Analysis:
We will Utilize basic linear regression to comprehend the correlation between energy consumption and independent variables individually.
Multiple Regression:
we will Apply multiple regression analysis to simultaneously consider multiple independent variables, such as population size, GDP, and renewable energy percentage, to predict energy usage more accurately.
Data Visualization
Line Graphs:
For time series data, create graphs that illustrate the patterns of energy consumption, population growth, GDP, and the proportion of renewable energy over a period of time.
Scatterplots:
Are used to visually represent the correlation between energy use and each of the independent variables. Heat-maps are a useful tool for visualizing the correlation matrix, as they effectively emphasize the strength of correlations between variables.
Bar and pie charts:
Are used to visually represent the breakdown of energy sources, with a specific emphasis on differentiating between renewable and non-renewable energy. We are not decided yet that which techniques will be used here.
Model Evaluation:
Evaluate the models by employing suitable metrics such as R-squared and RMSE for regression analysis to assess their performance and accuracy in predicting energy consumption.
Question 02
What countries or regions are engaging in sustainable energy practices and relying more on renewable energy compared to nonrenewable energy? Which countries are moving towards the trajectory of relying more on renewable energy and producing less greenhouse gas emissions?
The Importance of this Question:
The goal of this question is to analyze the energy practices of various countries and determine which countries are practicing sustainable energy and reducing the energy consumption of nonrenewable energy sources. Sustainable energy. This question seeks to determine which countries rely more on non-renewable energy sources compared to that of renewable energy sources. An analysis will also be done to determine which countries based on the current information is moving in the trajectory of using renewable energy sources using regression analysis. For this data the countries will be divided into countries with larger and smaller populations to handle any form of bias that may occur due to the population sizes of the various countries. A log transformation can also be done to the population to resolve this issue.
Dependency Columns:
To complete the analysis of this question, the following data columns are essential.
Population:
This functions as a surrogate for energy demand as a larger population often necessitates a greater energy consumption for residential commercial and industrial purposes.
Energy consumed:
These columns pertaining to the total amount of energy comsumed for each country will be assessed for the purpose of this analysis to determine the proportion of sustainable electrical energy sources from the total electricity consumption.
Greenhouse gas emissions:
This will determine which regions or countries produce the most green house gases and cause pollution.
Renewable and non-renewable consumption:
The renewable and non renewable consumption columns and all other related columns will be used to determine the countries engaging in sustainable energy practices.
Analysis plan
Data Preparation and Analysis
We have missing values in our dataset. Here, we will employ advanced machine learning imputation techniques like iterative imputation to obtain a reliable dataset. Then, we will check for skewness in important columns like population, GDP, energy consumed, renewable, and non-renewable consumption.
Feature Engineering
This step involves calculating the proportions of renewable and non-renewable energy sources relative to total energy consumption.
Calculate proportion of renewable energy: \[
\frac{{\text{{Solar consumption}} + \text{{Wind consumption}}}}{{\text{{Total energy consumption}}}}
\]
Calculate proportion of non-renewable energy:\[
\frac{{\text{{Biofuel consumption}}}}{{\text{{Total energy consumption}}}}
\]
These metrics reveal the usage of renewable and non-renewable energy relative to total consumption.
Exploratory Data Analysis (EDA):
We will Analyze the distribution of renewable and non-renewable energy consumption across countries using histograms.
Visualize trends in renewable energy adoption over time using line plots.
Investigate correlations between GDP and renewable energy usage through scatter plots.
Examine the relationship between population size and greenhouse gas emissions using scatter plots.
This phase involves analyzing and visualizing the dataset to discern patterns and relationships, providing valuable insights into sustainable energy practices and the transition towards renewable energy sources.
Descriptive Statistics:
We will Generate a statistical summary for each of the dependence columns, such as renewable energy consumption and non-renewable energy consumption, to ascertain their magnitude and association. This involves calculating measures such as mean, median, standard deviation, and correlation coefficients between these variables.
Regression Analysis:
We will utilize regression analysis with predictors such as renewable energy consumption, GDP, population size, and greenhouse gas emissions to identify countries maximizing sustainable energy practices. This analysis aims to determine the relationship between these variables and sustainable energy practices, providing insights for effective policy-making and intervention strategies.
Data Visualization
Maps: The use of maps will be employed to visualize the hotspots of countries which utilize renewable energy the most and those that focus on non-renewable energy sources. If external data is needed, we will incorporate it as well.
Line graphs: Line graphs will be used to determine the patterns in the use of certain renewable and non-renewable energy sources over the years for certain regions.
Density plots: Density plots will be used to visualize the distribution of data for renewable and non-renewable energy sources across continents and or other forms of regional boundaries.
Project Timeline
Project Tasks
This document outlines the tasks for our project, including their status, assignees, due dates, priorities, and summaries.
Task Name
Status
Assignee
Due
Priority
Summary
Dataset
Complete
All (6)
March 27th
High
Brainstorming and selecting appropriate dataset.
Questions
Complete
All (6)
March 29th
High
Determining questions to answer in analysis.
Proposal
Complete
All (6)
April 3rd
High
Brainstorming and working on data analysis and proposal.
Peer evaluation: proposal
Complete
All (6)
April 3rd
High
Performing evaluations on different teams’ project proposals.
Proposal feedback
Complete
All (6)
April 8th
High
Discuss feedback given on our proposal and make necessary adjustments.
Instructor review: proposal
Complete
All (6)
April 10th
High
Instructor will review the proposal and provide feedback on a later date.
About
Complete
Valerie
April 17th
High
Fill in individual information in the About.qmd
Q1 : code
Complete
Ayesha
May 1st
High
Writing code to create visualizations and performing analysis for question 1.
Q1 : write-up
Complete
Sheemithra
May 2nd
High
500 to 1000 word long description of question one analysis.
Q1 : presentation
Complete
Alyssa
May 2nd
High
Construct slides with all visualizations, descriptions, and talking points needed to answer analysis questions for Q1.
Peer evaluation: code
Complete
All (6)
May 1st
High
Performing evaluations on different teams’ project code.
Code feedback
Complete
All (6)
May 1st
High
Discuss feedback given on our code and make necessary adjustments.
Q2 : code
Complete
Valerie,
Abhishek, Tolu
May 1st
High
Writing code to create visualizations and performing analysis for question 2.
Q2 : write-up
Complete
Valerie,
Abhishek, Tolu
May 2nd
High
500 to 1000 word long description of question two analysis.
Q2 : presentation
Complete
Valerie,
Abhishek, Tolu, Alyssa
May 2nd
High
Construct slides with all visualizations, descriptions, and talking points needed to answer analysis questions for Q2.
Dashboard
Complete
Abhishek, Tolu, Valerie
May 2nd
High
Build an interactive Python Visualization Dashboard using Panel.
Final checks
Complete
Ayesha, Alyssa
May 3rd
Med
Clean-up all clutter/mistakes found in write-up, presentation, and dashboard.
Presentation delegation
Complete
All (6)
May 3rd
High
Determine who will speak to what slides.
Presentation practice
Complete
All (6)
May 3rd
High
Do a mock presentation to ensure smooth transitions and correct interpretations of graphs and information.
Presentation
Complete
All (6)
May 6th
High
Present findings in 5 minutes.
Team Member Evaluations
Complete
All (6)
May 6th
High
Give feedback on each team member.
Peer evaluation: presentations
Complete
All (6)
May 7th
High
Performing evaluations on different teams’ project presentations.
Repo Organization
The following folders comprise the project repository
.github/: This directory is designated for files associated with GitHub, encompassing workflows, actions, and templates tailored for issues.
_extra/: Reserved for miscellaneous files that don’t neatly fit into other project categories, providing a catch-all space for various supplementary documents.
_freeze/: Within this directory lie frozen environment files containing comprehensive information regarding the project’s environment configuration and dependencies.
data/: Specifically allocated for storing i data files crucial for the project’s functionality, encompassing input files, datasets, and other essential data resources.
images/: Serving as a repository for visual assets employed throughout the project, including diagrams, charts, and screenshots, this directory maintains visual elements integral to project documentation and presentation.
.gitignore: This file functions to specify exclusions from version control, ensuring that designated files and directories remain untracked by Git, thus streamlining the versioning process.
README.md: Serving as the primary hub of project information, this README document furnishes essential details encompassing project setup, usage instructions, and an overarching overview of project objectives and scope.
_quarto.yml: Acting as a pivotal configuration file for Quarto, this document encapsulates various settings and options governing the construction and rendering of Quarto documents, facilitating customization and control over document output.
about.qmd: This Quarto Markdown file supplements project documentation by providing additional contextual information, elucidating project purpose, contributor insights, and other pertinent project details.
index.qmd: index.qmd: This serves as the main documentation page for our project. This Quarto Markdown file provides detailed descriptions of our project, including all code and visualization .
---title: "Global Energy Trends"subtitle: "Insights into Consumption and Sustainability"author: - name: "DataDetectives - Ayesha, Abhishek, Shreemithra, Toluwanimi, Valerie, Alyssa" affiliations: - name: "School of Information, University of Arizona"description: "The goal of this project is to analyze the complex relationships between economic and population growth, sustainable energy practices, and energy consumption"format: html: code-fold: true code-tools: true code-overflow: wrap code-line-numbers: true embed-resources: trueeditor: visualcode-annotations: hoverexecute: warning: falsejupyter: python3---```{python}#| label: load-pkgs#| message: false#| echo: falseimport numpy as npimport pandas as pd```## IntroductionIn an era marked by escalating demands and environmental concerns, the question of predicting a nation’s power consumption becomes increasingly pivotal. The Our World in data energy data set covers a range of information on primary energy consumption, per capita energy, growth rates, energy mix, electricity composition, and other relevant factors This data set contains information from the year 1900 to 2022. By delving into these factors. This project aims to unravel the complex interplay between economic and population growth as well as sustainable energy practices by creating a dashboard.## DatasetThe Our World Our Data Energy dataset.```{python}#| label: load-datasetimport pandas as pd# Load the datasetdata = pd.read_csv('data/owid-energy-data.csv')# Display a few rows of the datasetdata.loc[7392:7397]```### Dataset Description:This dataset provides information on energy consumption for various regions around the world as well as information on energy consumption, gross domestic product, and population information across the years from 1900s to 2022.### Dimensions```{python}#| message: false#| warning: false#| echo: falsedata.shape```This **owid-energy-data.csv** dataset contains 21,590 rows and 129 columns.### ProvenanceThis information is created by Our World in Data (OWID), a renowned institution acknowledged for its meticulous, evidence-based study on global matters. The dataset might integrate data from several worldwide sources, such as government agencies, research institutes, and industry publications, to offer a comprehensive perspective on global energy patterns. The data source link is attached here: [LINK](https://ourworldindata.org/energy)[Github Link](https://github.com/owid/energy-data)### Code book Insights:This data-set has about 129 columns but we will work on few columns, that is why we only provided the important data descriptions. Code book Insights is shown below:+-----------+------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| **Index** | **Column Name** | **Data Type** | **Descriptions** |+-----------+------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| 1. | country | string | Geographic location |+-----------+------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| 2. | year | integer | Year of observation. |+-----------+------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| 3. | population | interger | Population by country, available from 10,000 BCE to 2100, based on data and estimates from different sources. Data range does go from 10,000 BCE to 2100 but is pulled and presented based on the year (year of observation) variable. |+-----------+------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| 4. | gdp | double | Gross domestic product (GDP) measured in international-\$ using 2011 prices to adjust for price changes over time (inflation) and price differences between countries. Calculated by multiplying GDP per capita with population. |+-----------+------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| 5. | renewables_electricity | double | Electricity generation from renewables (terawatt-hours) |+-----------+------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| 6. | electricity_generation | double | Total electricity generation (terawatt-hours) |+-----------+------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| 7. | fossil_electricity | double | Electricity generation from fossil fuels (terawatt-hours) |+-----------+------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+## Why this dataset?This data set was formed from the combination of multiple data sets related to topics on energy consumption, economic growth, and population growth over the years for different regions around the world. This data set provides in-depth information on energy since it covers most if not all the topics related to energy and energy consumption as well as other related areas. The data set was built upon a statistical review of world energy by the Energy Institute, International energy data by the United States Information Administration, Energy from fossil fuels from the Shift Dataportal, Yearly Electricity Data, European Electricity Review (Ember), Combined Electricity, Energy Mix, Fossil fuel production, Primary Energy consumption, Electricity mix, Energy data set, Population, Regions, Income groups and GDP. All these reliable and updated information sources come together to make this data set the best one for this analysis.## Question 01Is it possible to predict a nation's power consumption by considering its population size, gross domestic product (GDP), and the percentage of electricity generated from renewable sources and changes across the years?### The Importance of this QuestionPredicting a country's power consumption is a difficult yet important taking on, necessary for guaranteeing sustained progress and energy stability. This subject explores the intricate relationship between economic progress, population growth, and sustainable energy consumption by examining a nation's population size, Gross Domestic Product (GDP), and the proportion of electricity generated from renewable sources. The research can provide valuable guidance for policy decisions, energy planning, and infrastructure investment to fulfill future energy demands in a sustainable manner.### Dependency ColumnsIn order to investigate this research issue, the following data columns are essential:Population **(population)** which functions as a surrogate for energy demand, as a larger population often necessitates greater energy consumption for residential, commercial, and industrial purposes.A larger Gross Domestic Product **(gdp)** frequently corresponds to an augmented consumption of energy as a result of industrialization, commercial endeavors, and improved living conditions.The proportion of electricity generated by renewable sources, expressed as a percentage of total electricity generation, is represented by the variable **renewable_energy / electricity_generation**. Assessing the proportion of renewable energy in a nation's entire electricity mix is vital for understanding its transition to sustainable energy sources and the effects on overall energy consumption patterns.Energy Consumption (**energy_consumption)**: Although this column may cover all forms of energy use, including electricity and more, it is essential for the study. If there is a particular column that contains data on electricity consumption, such as ***Energy Consumption*** it would be more directly applicable to the subject at hand.The variable ***year*** is used to analysis time series.### Analysis plan#### **Data Cleaning**:- We will Verify the dataset for any anomalies or missing values and address them appropriately using techniques like imputation or removal of incomplete records.#### **Feature Engineering**:- We will Calculate the percentage of electricity generated from renewable sources by dividing the renewable electricity consumption by the total electricity generation.- Standardize or normalize the relevant columns if deemed beneficial for analysis.#### **Exploratory Data Analysis (EDA)**:- We will Analyze and visualize the dataset to identify underlying patterns and relationships.- Explore the distribution of variables like <font color="black"><i>population size</i></font>, <font color="black"><i>GDP</i></font>, <font color="black"><i>renewable energy percentage</i></font>, and <font color="black"><i>energy consumption</i></font> across different countries and years.#### **Descriptive Statistics**:- We will Generate statistical summaries for each dependence column to understand the distribution and central tendencies of the data.- Calculate measures such as mean, median, standard deviation, and correlation coefficients.**Correlation Analysis**:- We will Determine the magnitude and direction of the association between <font color="black"><i>energy consumption</i></font> and independent variables like <font color="black"><i>population size</i></font>, <font color="black"><i>GDP</i></font>, and <font color="black"><i>renewable energy percentage</i></font> using correlation analysis.#### **Time Series Analysis**:- We will Investigate trends, seasonality, and patterns over time for variables like <font color="black"><i>energy consumption</i></font> to understand past shifts and forecast future trends.#### **Regression Analysis**:- We will Utilize basic linear regression to comprehend the correlation between <font color="black"><i>energy consumption</i></font> and independent variables individually.#### **Multiple Regression**:- we will Apply multiple regression analysis to simultaneously consider multiple independent variables, such as <font color="black"><i>population size</i></font>, <font color="black"><i>GDP</i></font>, and <font color="black"><i>renewable energy percentage</i></font>, to predict <font color="black"><i>energy usage</i></font> more accurately.### **Data Visualization**#### **Line Graphs:**For time series data, create graphs that illustrate the patterns of energy consumption, population growth, GDP, and the proportion of renewable energy over a period of time.#### **Scatterplots**:Are used to visually represent the correlation between energy use and each of the independent variables. Heat-maps are a useful tool for visualizing the correlation matrix, as they effectively emphasize the strength of correlations between variables.#### **Bar and pie charts**:Are used to visually represent the breakdown of energy sources, with a specific emphasis on differentiating between renewable and non-renewable energy. We are not decided yet that which techniques will be used here.#### **Model Evaluation:**Evaluate the models by employing suitable metrics such as R-squared and RMSE for regression analysis to assess their performance and accuracy in predicting energy consumption.## Question 02What countries or regions are engaging in sustainable energy practices and relying more on renewable energy compared to nonrenewable energy? Which countries are moving towards the trajectory of relying more on renewable energy and producing less greenhouse gas emissions?### The Importance of this Question:The goal of this question is to analyze the energy practices of various countries and determine which countries are practicing sustainable energy and reducing the energy consumption of nonrenewable energy sources. Sustainable energy. This question seeks to determine which countries rely more on non-renewable energy sources compared to that of renewable energy sources. An analysis will also be done to determine which countries based on the current information is moving in the trajectory of using renewable energy sources using regression analysis. For this data the countries will be divided into countries with larger and smaller populations to handle any form of bias that may occur due to the population sizes of the various countries. A log transformation can also be done to the population to resolve this issue.### Dependency Columns:To complete the analysis of this question, the following data columns are essential.#### **Population:**This functions as a surrogate for energy demand as a larger population often necessitates a greater energy consumption for residential commercial and industrial purposes.#### **Energy consumed:**These columns pertaining to the total amount of energy comsumed for each country will be assessed for the purpose of this analysis to determine the proportion of sustainable electrical energy sources from the total electricity consumption.#### **Greenhouse gas emissions:**This will determine which regions or countries produce the most green house gases and cause pollution.#### **Renewable and non-renewable consumption:**The renewable and non renewable consumption columns and all other related columns will be used to determine the countries engaging in sustainable energy practices.### Analysis plan#### **Data Preparation and Analysis**- We have missing values in our dataset. Here, we will employ advanced machine learning imputation techniques like iterative imputation to obtain a reliable dataset. Then, we will check for skewness in important columns like <font color="black"><i><b>population</b></i></font>, <font color="black"><i><b>GDP</b></i></font>, <font color="black"><i><b>energy consumed</b></i></font>, <font color="black"><i><b>renewable</b></i></font>, and <font color="black"><i><b>non-renewable consumption</b></i></font>.#### **Feature Engineering**This step involves calculating the proportions of renewable and non-renewable energy sources relative to total energy consumption.- Calculate proportion of renewable energy: $$ \frac{{\text{{Solar consumption}} + \text{{Wind consumption}}}}{{\text{{Total energy consumption}}}} $$- Calculate proportion of non-renewable energy:$$ \frac{{\text{{Biofuel consumption}}}}{{\text{{Total energy consumption}}}} $$These metrics reveal the usage of renewable and non-renewable energy relative to total consumption.#### **Exploratory Data Analysis (EDA)**:- We will Analyze the distribution of <font color="black"><i><b>renewable</b></i></font> and <font color="black"><i><b>non-renewable</b></i></font> energy consumption across countries using histograms.- Visualize trends in <font color="black"><i><b>renewable energy</b></i></font> adoption over time using line plots.- Investigate correlations between <font color="black"><i><b>GDP</b></i></font> and <font color="black"><i><b>renewable energy</b></i></font> usage through scatter plots.- Examine the relationship between <font color="black"><i><b>population size</b></i></font> and <font color="black"><i><b>greenhouse gas emissions</b></i></font> using scatter plots.This phase involves analyzing and visualizing the dataset to discern patterns and relationships, providing valuable insights into sustainable energy practices and the transition towards renewable energy sources.#### **Descriptive Statistics**:- We will Generate a statistical summary for each of the dependence columns, such as <font color="black"><i><b>renewable energy consumption</b></i></font> and <font color="black"><i><b>non-renewable energy consumption</b></i></font>, to ascertain their magnitude and association. This involves calculating measures such as mean, median, standard deviation, and correlation coefficients between these variables.#### **Regression Analysis**:- We will utilize regression analysis with predictors such as <font color="black"><i><b>renewable energy consumption</b></i></font>, <font color="black"><i><b>GDP</b></i></font>, <font color="black"><i><b>population size</b></i></font>, and <font color="black"><i><b>greenhouse gas emissions</b></i></font> to identify countries maximizing sustainable energy practices. This analysis aims to determine the relationship between these variables and sustainable energy practices, providing insights for effective policy-making and intervention strategies.#### **Data Visualization****Maps:** The use of maps will be employed to visualize the hotspots of countries which utilize renewable energy the most and those that focus on non-renewable energy sources. If external data is needed, we will incorporate it as well.**Line graphs:** Line graphs will be used to determine the patterns in the use of certain renewable and non-renewable energy sources over the years for certain regions.**Density plots:** Density plots will be used to visualize the distribution of data for renewable and non-renewable energy sources across continents and or other forms of regional boundaries.## Project Timeline### Project TasksThis document outlines the tasks for our project, including their status, assignees, due dates, priorities, and summaries.+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| # Task Name | # Status | # Assignee | # Due | # Priority | # Summary |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Dataset | Complete | All (6) | March 27th | High | Brainstorming and selecting appropriate dataset. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Questions | Complete | All (6) | March 29th | High | Determining questions to answer in analysis. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Proposal | Complete | All (6) | April 3rd | High | Brainstorming and working on data analysis and proposal. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Peer evaluation: *proposal* | Complete | All (6) | April 3rd | High | Performing evaluations on different teams' project proposals. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Proposal feedback | Complete | All (6) | April 8th | High | Discuss feedback given on our proposal and make necessary adjustments. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Instructor review: *proposal* | Complete | All (6) | April 10th | High | Instructor will review the proposal and provide feedback on a later date. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| About | Complete | Valerie | April 17th | High | Fill in individual information in the **About.qmd** |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Q1 : *code* | Complete | Ayesha | May 1st | High | Writing code to create visualizations and performing analysis for question 1. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Q1 : *write-up* | Complete | Sheemithra | May 2nd | High | 500 to 1000 word long description of question one analysis. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Q1 : *presentation* | Complete | Alyssa | May 2nd | High | Construct slides with all visualizations, descriptions, and talking points needed to answer analysis questions for Q1. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Peer evaluation: *code* | Complete | All (6) | May 1st | High | Performing evaluations on different teams' project code. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Code feedback | Complete | All (6) | May 1st | High | Discuss feedback given on our code and make necessary adjustments. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Q2 : *code* | Complete | Valerie, | May 1st | High | Writing code to create visualizations and performing analysis for question 2. || | | | | | || | | Abhishek, Tolu | | | |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Q2 : *write-up* | Complete | Valerie, | May 2nd | High | 500 to 1000 word long description of question two analysis. || | | | | | || | | Abhishek, Tolu | | | |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Q2 : *presentation* | Complete | Valerie, | May 2nd | High | Construct slides with all visualizations, descriptions, and talking points needed to answer analysis questions for Q2. || | | | | | || | | Abhishek, Tolu, Alyssa | | | |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Dashboard | Complete | Abhishek, Tolu, Valerie | May 2nd | High | Build an interactive Python Visualization Dashboard using Panel. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Final checks | Complete | Ayesha, Alyssa | May 3rd | Med | Clean-up all clutter/mistakes found in write-up, presentation, and dashboard. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Presentation delegation | Complete | All (6) | May 3rd | High | Determine who will speak to what slides. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Presentation practice | Complete | All (6) | May 3rd | High | Do a mock presentation to ensure smooth transitions and correct interpretations of graphs and information. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Presentation | Complete | All (6) | May 6th | High | Present findings in 5 minutes. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Team Member Evaluations | Complete | All (6) | May 6th | High | Give feedback on each team member. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+| Peer evaluation: *presentations* | Complete | All (6) | May 7th | High | Performing evaluations on different teams' project presentations. |+----------------------------------+----------------+-------------------------+------------+------------+------------------------------------------------------------------------------------------------------------------------+# Repo OrganizationThe following folders comprise the project repository- **.github/:** This directory is designated for files associated with GitHub, encompassing workflows, actions, and templates tailored for issues.- **\_extra/:** Reserved for miscellaneous files that don't neatly fit into other project categories, providing a catch-all space for various supplementary documents.- **\_freeze/:** Within this directory lie frozen environment files containing comprehensive information regarding the project's environment configuration and dependencies.- **data/:** Specifically allocated for storing i data files crucial for the project's functionality, encompassing input files, datasets, and other essential data resources.- **images/:** Serving as a repository for visual assets employed throughout the project, including diagrams, charts, and screenshots, this directory maintains visual elements integral to project documentation and presentation.- **.gitignore:** This file functions to specify exclusions from version control, ensuring that designated files and directories remain untracked by Git, thus streamlining the versioning process.- **README.md:** Serving as the primary hub of project information, this README document furnishes essential details encompassing project setup, usage instructions, and an overarching overview of project objectives and scope.- **\_quarto.yml:** Acting as a pivotal configuration file for Quarto, this document encapsulates various settings and options governing the construction and rendering of Quarto documents, facilitating customization and control over document output.- **about.qmd:** This Quarto Markdown file supplements project documentation by providing additional contextual information, elucidating project purpose, contributor insights, and other pertinent project details.- **index.qmd:** index.qmd: This serves as the main documentation page for our project. This Quarto Markdown file provides detailed descriptions of our project, including all code and visualization .# References\[1\] The Data source link is attached here: <https://ourworldindata.org/energy>\[2\] Github Link: <https://github.com/owid/energy-data>