Algo Aces Project Proposal

Proposal

import numpy as np
import pandas as pd
import requests

Dataset

df = pd.read_csv("data/fastfood_calories.csv")
df.head()
Unnamed: 0 restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol sodium total_carb fiber sugar protein vit_a vit_c calcium salad
0 1 Mcdonalds Artisan Grilled Chicken Sandwich 380 60 7 2.0 0.0 95 1110 44 3.0 11 37.0 4.0 20.0 20.0 Other
1 2 Mcdonalds Single Bacon Smokehouse Burger 840 410 45 17.0 1.5 130 1580 62 2.0 18 46.0 6.0 20.0 20.0 Other
2 3 Mcdonalds Double Bacon Smokehouse Burger 1130 600 67 27.0 3.0 220 1920 63 3.0 18 70.0 10.0 20.0 50.0 Other
3 4 Mcdonalds Grilled Bacon Smokehouse Chicken Sandwich 750 280 31 10.0 0.5 155 1940 62 2.0 18 55.0 6.0 25.0 20.0 Other
4 5 Mcdonalds Crispy Bacon Smokehouse Chicken Sandwich 920 410 45 12.0 0.5 120 1980 81 4.0 18 46.0 6.0 20.0 20.0 Other

Dataset Description:

The “Fast Food Calories” dataset comprises nutritional information for various food items offered by eight distinct fast-food outlets. Each outlet’s menu is detailed, encompassing items such as burgers, fries, beverages, and salads. The dataset includes essential nutritional metrics like calories, fat content, protein, and carbohydrates, allowing for in-depth analysis and comparison of the health implications of different menu choices.

Provenance: The dataset’s sources include: research institutions, open data repositories (tidytuesday) and compiled data sets from various sources.

Dimensions: The dataset likely contains multiple columns representing different attributes of fast food items, such as name, restaurant, category, calorie count, fat content, protein content, carbohydrate content, etc. The number of rows corresponds to the number of individual fast food items for each restaurant in the dataset.

Relevance: Fast food consumption is a significant aspect of modern dietary habits, making this dataset relevant for understanding nutritional patterns and trends.

Why this Dataset:

It makes it possible to see how the allocation of calorie counts within various fast food item categories is visualized, providing information about the items’ primary tendencies and variability. Making educated dietary decisions is further facilitated by the ability to quickly identify variations in calorie distributions by comparing half violin plots across various fast-food categories. Comparing different fast-food establishments or menu items can be rendered simpler with this sort of visualization portrayal of calorie data, which helps consumers make healthier decisions. Researchers and policymakers can contribute to efforts to promote healthy eating habits in society and increase public knowledge of nutritional value by utilizing fast food calorie data in data visualization initiatives.

Data Mining Opportunities: The dataset provides ample opportunities for data mining tasks such as exploratory data analysis.

Questions

Q1) How many calories areconsumed on average per visit to each restaurant or outlet?

Q2) How do different food item categories vary in protien-fat ratio and do they meet up to the standards of the health metric?

Analysis plan

Question 1

Q1: We will use the function groupby.describe( ) which includes mean, mode, median, min, and max, to construct a table with summary statistics for each restaurant item in terms of calories.We are going to sum up all the calories for all the items in a restaurant and divide it with no of items gives avg calories gained by a person so we are going to use a half violin plot to showcase this result.

Variables used:

All the variables with the exception of the restaurant variable are numerical.

restaurant: Name of fast food joint or outlet (categorical)

item: Food item being examined

total_fat: The total amount of fat in the food item

calories: The number of calories in the food item

cholesterol: The amount of cholesterol in the food item

Output of the analysis:

Summary table:

Consists of the following:

Mean

Median

Max

Min

Mode

Half violin plot

Question 2

Q2: For question 2, the relevant columns will be protein, and item. To ensure when proteins and fats, we shall convert them from to calories. Using the protein-fat ratio, we can create a health metric that allows us to if a food item is healthy or harmful. We would like to visualize this using a half violin plot for better understanding of the metrics. A new table consisting only of the different food items from the dataset 'fastfood_calories.csv' is created. The table will be named ‘categorized_items’. We will then perform a left-join operation on this table with the original DataFrame using the pd.merge() function by specifying it as a left-join with the function how = “left” .

Variables used:

All the variables with the exception of the restaurant variable are numerical.

restaurant: Name of fast food joint or outlet

item: Food item being examined

total_fat: The total amount of fat in the food item

calories: The number of calories in the food item

cholesterol: The amount of cholesterol in the food item

protein:The amount of protein

Output of the analysis:

New columns:

Item_cat: Defining the category of the particular food item

Protein_fat_ratio: ratio of the amount of proteins to fats present in the food item .

Health_metric: Standards set to check whether the food item considered has a well-balanced nutritional value.

Timeline for Project Proposal

Week of 05 Feb:

Prepare the proposal for peer review. Participate in peer reviews for other groups.

Week of 12 Feb:

Revise the proposal based on feedback from both peers and the instructor.

Make necessary adjustments to improve the proposal’s quality and clarity.:

Week of February 19:

Assign duties to team members.

Explore data on your own and start working on preliminary visualization ideas for comparison.

Start formatting the slide deck for the presentation.

Week of February 26 -

Complete the plot visualizations.

Completing the write-ups

Finalizing the interpretations

Beginning the presentation

Week of March 6th:

The presentation draft

Tidying up the website

⁠Completing the presentation draft ⁠

Reviewing the internal project

March 10th:

Project is completed and the project is submitted.