Algo Aces Project Proposal

Proposal

import numpy as np
import pandas as pd
import requests

Dataset

df = pd.read_csv("data/fastfood_calories.csv")
df.head()

	Unnamed: 0	restaurant	item	calories	cal_fat	total_fat	sat_fat	trans_fat	cholesterol	sodium	total_carb	fiber	sugar	protein	vit_a	vit_c	calcium	salad
0	1	Mcdonalds	Artisan Grilled Chicken Sandwich	380	60	7	2.0	0.0	95	1110	44	3.0	11	37.0	4.0	20.0	20.0	Other
1	2	Mcdonalds	Single Bacon Smokehouse Burger	840	410	45	17.0	1.5	130	1580	62	2.0	18	46.0	6.0	20.0	20.0	Other
2	3	Mcdonalds	Double Bacon Smokehouse Burger	1130	600	67	27.0	3.0	220	1920	63	3.0	18	70.0	10.0	20.0	50.0	Other
3	4	Mcdonalds	Grilled Bacon Smokehouse Chicken Sandwich	750	280	31	10.0	0.5	155	1940	62	2.0	18	55.0	6.0	25.0	20.0	Other
4	5	Mcdonalds	Crispy Bacon Smokehouse Chicken Sandwich	920	410	45	12.0	0.5	120	1980	81	4.0	18	46.0	6.0	20.0	20.0	Other

Dataset Description:

The “Fast Food Calories” dataset comprises nutritional information for various food items offered by eight distinct fast-food outlets. Each outlet’s menu is detailed, encompassing items such as burgers, fries, beverages, and salads. The dataset includes essential nutritional metrics like calories, fat content, protein, and carbohydrates, allowing for in-depth analysis and comparison of the health implications of different menu choices.

Provenance: The dataset’s sources include: research institutions, open data repositories (tidytuesday) and compiled data sets from various sources.

Dimensions: The dataset likely contains multiple columns representing different attributes of fast food items, such as name, restaurant, category, calorie count, fat content, protein content, carbohydrate content, etc. The number of rows corresponds to the number of individual fast food items for each restaurant in the dataset.

Relevance: Fast food consumption is a significant aspect of modern dietary habits, making this dataset relevant for understanding nutritional patterns and trends.

Why this Dataset:

It makes it possible to see how the allocation of calorie counts within various fast food item categories is visualized, providing information about the items’ primary tendencies and variability. Making educated dietary decisions is further facilitated by the ability to quickly identify variations in calorie distributions by comparing half violin plots across various fast-food categories. Comparing different fast-food establishments or menu items can be rendered simpler with this sort of visualization portrayal of calorie data, which helps consumers make healthier decisions. Researchers and policymakers can contribute to efforts to promote healthy eating habits in society and increase public knowledge of nutritional value by utilizing fast food calorie data in data visualization initiatives.

Data Mining Opportunities: The dataset provides ample opportunities for data mining tasks such as exploratory data analysis.

Questions

Q1) How many calories areconsumed on average per visit to each restaurant or outlet?

Q2) How do different food item categories vary in protien-fat ratio and do they meet up to the standards of the health metric?

Analysis plan

Question 1

Q1: We will use the function groupby.describe( ) which includes mean, mode, median, min, and max, to construct a table with summary statistics for each restaurant item in terms of calories.We are going to sum up all the calories for all the items in a restaurant and divide it with no of items gives avg calories gained by a person so we are going to use a half violin plot to showcase this result.

Variables used:

All the variables with the exception of the restaurant variable are numerical.

restaurant: Name of fast food joint or outlet (categorical)

item: Food item being examined

total_fat: The total amount of fat in the food item

calories: The number of calories in the food item

cholesterol: The amount of cholesterol in the food item

Output of the analysis:

Summary table:

Consists of the following:

Mean

Median

Max

Min

Mode

Half violin plot

Question 2

Q2: For question 2, the relevant columns will be protein, and item. To ensure when proteins and fats, we shall convert them from to calories. Using the protein-fat ratio, we can create a health metric that allows us to if a food item is healthy or harmful. We would like to visualize this using a half violin plot for better understanding of the metrics. A new table consisting only of the different food items from the dataset 'fastfood_calories.csv' is created. The table will be named ‘categorized_items’. We will then perform a left-join operation on this table with the original DataFrame using the pd.merge() function by specifying it as a left-join with the function how = “left” .

Variables used:

All the variables with the exception of the restaurant variable are numerical.

restaurant: Name of fast food joint or outlet

item: Food item being examined

total_fat: The total amount of fat in the food item

calories: The number of calories in the food item

cholesterol: The amount of cholesterol in the food item

protein:The amount of protein

Output of the analysis:

New columns:

Item_cat: Defining the category of the particular food item

Protein_fat_ratio: ratio of the amount of proteins to fats present in the food item .

Health_metric: Standards set to check whether the food item considered has a well-balanced nutritional value.

Timeline for Project Proposal

Week of 05 Feb:

Prepare the proposal for peer review. Participate in peer reviews for other groups.

Week of 12 Feb:

Revise the proposal based on feedback from both peers and the instructor.

Make necessary adjustments to improve the proposal’s quality and clarity.:

Week of February 19:

Assign duties to team members.

Explore data on your own and start working on preliminary visualization ideas for comparison.

Start formatting the slide deck for the presentation.

Week of February 26 -

Complete the plot visualizations.

Completing the write-ups

Finalizing the interpretations

Beginning the presentation

Week of March 6th:

The presentation draft

Tidying up the website

⁠Completing the presentation draft ⁠

Reviewing the internal project

March 10th:

Project is completed and the project is submitted.