import numpy as np
import pandas as pd
import requests
Algo Aces Project Proposal
Proposal
Dataset
= pd.read_csv("data/fastfood_calories.csv")
df df.head()
Unnamed: 0 | restaurant | item | calories | cal_fat | total_fat | sat_fat | trans_fat | cholesterol | sodium | total_carb | fiber | sugar | protein | vit_a | vit_c | calcium | salad | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Mcdonalds | Artisan Grilled Chicken Sandwich | 380 | 60 | 7 | 2.0 | 0.0 | 95 | 1110 | 44 | 3.0 | 11 | 37.0 | 4.0 | 20.0 | 20.0 | Other |
1 | 2 | Mcdonalds | Single Bacon Smokehouse Burger | 840 | 410 | 45 | 17.0 | 1.5 | 130 | 1580 | 62 | 2.0 | 18 | 46.0 | 6.0 | 20.0 | 20.0 | Other |
2 | 3 | Mcdonalds | Double Bacon Smokehouse Burger | 1130 | 600 | 67 | 27.0 | 3.0 | 220 | 1920 | 63 | 3.0 | 18 | 70.0 | 10.0 | 20.0 | 50.0 | Other |
3 | 4 | Mcdonalds | Grilled Bacon Smokehouse Chicken Sandwich | 750 | 280 | 31 | 10.0 | 0.5 | 155 | 1940 | 62 | 2.0 | 18 | 55.0 | 6.0 | 25.0 | 20.0 | Other |
4 | 5 | Mcdonalds | Crispy Bacon Smokehouse Chicken Sandwich | 920 | 410 | 45 | 12.0 | 0.5 | 120 | 1980 | 81 | 4.0 | 18 | 46.0 | 6.0 | 20.0 | 20.0 | Other |
Dataset Description:
The “Fast Food Calories” dataset comprises nutritional information for various food items offered by eight distinct fast-food outlets. Each outlet’s menu is detailed, encompassing items such as burgers, fries, beverages, and salads. The dataset includes essential nutritional metrics like calories, fat content, protein, and carbohydrates, allowing for in-depth analysis and comparison of the health implications of different menu choices.
Provenance: The dataset’s sources include: research institutions, open data repositories (tidytuesday) and compiled data sets from various sources.
Dimensions: The dataset likely contains multiple columns representing different attributes of fast food items, such as name, restaurant, category, calorie count, fat content, protein content, carbohydrate content, etc. The number of rows corresponds to the number of individual fast food items for each restaurant in the dataset.
Relevance: Fast food consumption is a significant aspect of modern dietary habits, making this dataset relevant for understanding nutritional patterns and trends.
Why this Dataset:
It makes it possible to see how the allocation of calorie counts within various fast food item categories is visualized, providing information about the items’ primary tendencies and variability. Making educated dietary decisions is further facilitated by the ability to quickly identify variations in calorie distributions by comparing half violin plots across various fast-food categories. Comparing different fast-food establishments or menu items can be rendered simpler with this sort of visualization portrayal of calorie data, which helps consumers make healthier decisions. Researchers and policymakers can contribute to efforts to promote healthy eating habits in society and increase public knowledge of nutritional value by utilizing fast food calorie data in data visualization initiatives.
Data Mining Opportunities: The dataset provides ample opportunities for data mining tasks such as exploratory data analysis.
Questions
Q1) How many calories areconsumed on average per visit to each restaurant or outlet?
Q2) How do different food item categories vary in protien-fat ratio and do they meet up to the standards of the health metric?
Analysis plan
Question 1
Q1: We will use the function groupby.describe( )
which includes mean, mode, median, min, and max, to construct a table with summary statistics for each restaurant item in terms of calories.We are going to sum up all the calories for all the items in a restaurant and divide it with no of items gives avg calories gained by a person so we are going to use a half violin plot to showcase this result.
Variables used:
All the variables with the exception of the restaurant variable are numerical.
restaurant
: Name of fast food joint or outlet (categorical)
item
: Food item being examined
total_fat
: The total amount of fat in the food item
calories
: The number of calories in the food item
cholesterol
: The amount of cholesterol in the food item
Output of the analysis:
Summary table
:
Consists of the following:
Mean
Median
Max
Min
Mode
Half violin plot
Question 2
Q2: For question 2, the relevant columns will be protein, and item. To ensure when proteins and fats, we shall convert them from to calories. Using the protein-fat ratio, we can create a health metric that allows us to if a food item is healthy or harmful. We would like to visualize this using a half violin plot for better understanding of the metrics. A new table consisting only of the different food items from the dataset 'fastfood_calories.csv'
is created. The table will be named ‘categorized_items’.
We will then perform a left-join operation on this table with the original DataFrame using the pd.merge()
function by specifying it as a left-join with the function how = “left”
.
Variables used:
All the variables with the exception of the restaurant variable are numerical.
restaurant
: Name of fast food joint or outlet
item
: Food item being examined
total_fat
: The total amount of fat in the food item
calories
: The number of calories in the food item
cholesterol
: The amount of cholesterol in the food item
protein
:The amount of protein
Output of the analysis:
New columns:
Item_cat
: Defining the category of the particular food item
Protein_fat_ratio
: ratio of the amount of proteins to fats present in the food item .
Health_metric
: Standards set to check whether the food item considered has a well-balanced nutritional value.
Timeline for Project Proposal
Week of 05 Feb:
Prepare the proposal for peer review. Participate in peer reviews for other groups.
Week of 12 Feb:
Revise the proposal based on feedback from both peers and the instructor.
Make necessary adjustments to improve the proposal’s quality and clarity.:
Week of February 19:
Assign duties to team members.
Explore data on your own and start working on preliminary visualization ideas for comparison.
Start formatting the slide deck for the presentation.
Week of February 26 -
Complete the plot visualizations.
Completing the write-ups
Finalizing the interpretations
Beginning the presentation
Week of March 6th:
The presentation draft
Tidying up the website
Completing the presentation draft
Reviewing the internal project
March 10th:
Project is completed and the project is submitted.