This article Data Science Blogthon.
prologue
After working long hours in the office, I suddenly felt a storm raging in my stomach. Hey!i need foodThen hit the road and start looking for nearby restaurants. It could be part of a chain restaurant or an independent restaurant. Searching for nearby restaurants can lead restaurant owners to consider opening multiple locations to grow their business. concept is here. chain restaurant came from Some restaurants are owned by the people who run them.those restaurants are called privately owned restaurant.
This article provides an analysis of American chain stores and independent restaurants. AmericaWe know many things like –
-
Which restaurant has the best chain in America?
-
Which state has the most popular restaurants?
-
What kind of food do those restaurants have?
etc.
So let’s get started.
data
The dataset is collected from this GitHub repository. The complete dataset should merge part1, part2 and part3. Column descriptions are shown in the image below.
data dictionary
Importing Required Libraries
Before doing any data preprocessing and analysis, let’s import the necessary libraries so that we don’t have to face import issues while performing our main tasks.
Now let’s dive into the data preprocessing part.
Data preprocessing
Let’s read three pieces of data from GitHub and concatenate them.
After running the above code, you will see the output below.
So this data contains 705622 rows and 14 columns. An image of the first five rows of data is shown below.
You can see in the image above that the US states are abbreviated. state digit. In general, not everyone knows all the abbreviations of US states. Therefore, we need to convert these abbreviations to their full form. For this, I scrape a table from a website containing all the full forms of these abbreviations.Then merge that table with the original data and remove the abbreviations state digit.
Scrape tables from websites.
After running the above code, you will see the output below.
This table has 51 rows and 4 columns. I won’t show the entire table here.Click here to see the entire table here. Let’s merge this table into the original table.
One more thing, we don’t need all the columns of the original data, so we only get the columns we need from the original data. The final table looks like this.
please see UA_NAME digit. This column contains the names of cities and the states in which they are located, separated by columns. Here, we already have a column containing US states, so we remove the state part. To do this, just run the code below.
chain_data['UA_NAME_MOD'] = chain_data['UA_NAME'].str.split(',', expand=True)[0]
Here is the output:
see the difference between UA_NAME When UA_NAME_MOD column.The state part is removed from UA_NAME_MOD digit.now you can drop UA_NAME This is because we don’t use columns.
Now we are ready for the main task – data analysis.
data analysis
Statistics summary
First, review the statistical summary of the data as usual.
print(chain_data_sort.describe()) print(chain_data_sort.describe(include="O"))
The output of writing the first code is shown below.
From the above results,
-
It is easy to see that the maximum hotel frequency is 24333. Looking at the first five rows of data, we can see that Subway has the highest number of chains. However, this result does not mean that only Subway restaurants have the best chains. For that, we need to look further.
-
Hotel has the lowest frequency of 1, which indicates an independent restaurant.
Now let’s look at the second output shown below.
Well, I got a lot of information. Let’s check them one by one.
- there is 356,847 (356847) A unique restaurant in this dataset.which subway restaurant I have top frequency (24333). in short subway restaurant I have Maximum number of chainsNo need to check any more.
- from cooking column, we can see it restaurant have more popular than any other dish. Restaurant here means all kinds of food.
- most of the restaurants Los Angeles, California.
- newark is the village of Wayne County, New York, United States, 35 miles (56 km) southeast of Rochester, 48 miles (77 km) west of SyracuseThe table above shows that most of the restaurants are in this area.
Where are the most subway stores?
Now you know that the subway chain is the most expensive. But what about the culinary state and county they live in? But before that, we’ll write some functions.
Now let’s introduce these features.
-
multi_count_df Useful for transforming multiple outputs obtained using .value_counts() with different columns in a pandas dataframe.
-
count_df The function is responsible for transforming only one output obtained using .value_counts() A column in a pandas dataframe.
-
multi_donut_chart The function helps plot multiple donut charts in subplots.
- modded_bar_plot The function helps draw a modified version of the bar chart. More on this later.
Now let’s plot the desired chart.
After running the above code, you will see the output below.
Good vibes. Always plot your graphs so that they not only look attractive but are comfortable for our eyes. A few lines of code create these beautiful charts, but there is a lot of customization going on behind the scenes. If you want to know more about these customizations, this paper.
So let’s see what we got on these charts.
-
The first chart shows that most subway restaurants are located in: California, America.
-
The second graph shows that Subway dominates. Los Angelesthe most populous California county (Population: 9,829,544)If you don’t know what a county is, here’s the definition from Wikipedia.
-
Most subway restaurants are located in Los Angeles, but for urban areas, the third chart shows that: New York metropolitan area, Newark, the number of subway stores is the largest. again, long sandy beach When Anaheim, the city of Los Angeles, When New York metropolitan area, Newark.
- The restaurant is the most popular food on the subway, shown in the last donut chart. Here restaurant food means a regular restaurant serving all kinds of food.
This result relies on the entire subway restaurant data. What if you’re only interested in California Subway restaurants?
The output looks like this:
The first and second graphs are California When Los Angeles As seen before, subway outlets are the most numerous. Here he was 3rd and his 4th chart changed significantly. If you select the metropolitan area of Los Angeles, Anaheim When long sandy beach Came here first. And most of the Subway restaurants in this area are regular restaurants.
Which restaurant chain has the most stores after Subway?
That’s the end of the subway story. Which restaurants have the most chains after Subway? For this, we need to get the various restaurants in the dataset and get the restaurants with a frequency greater than 5 (because we are only considering chain restaurants Frequencies less than 5 are not considered chained). Then plot a bar chart.
After running the above code,
Here’s what I’m talking about. This bar chart has no boxes, x-labels, y-labels, only the necessary part of the bar chart is displayed. From this plot, we can see that McDonald’s is the second largest chain after the subway. Now let’s find out about McDonald’s whereabouts.
cols = ['State', 'CNTY_NAME', 'UA_NAME_MOD', 'Cuisine'] dfs = multi_count_df(mcdonalds_chains, cols) titles = ["McDonald's in US States", "McDonald's in Counties", "McDonald's in Urbans", "McDonald's Cuisines"] fig2 = multi_donut_charts(dfs, 2, 2, cols, colors, titles) fig2.show()
The output of the above code is shown below.
The result is the same as what I’ve seen in Subway restaurants.It seems that the most popular restaurants are gathered Los Angeles, California.
What about independent restaurants?
Good enough for a chain restaurant. Now you will see statistics about independent restaurants. An independent restaurant is a restaurant with a degree of 1. First filter these restaurants from the data, then check if they are mostly in Los Angeles. let’s do it.
independent_rest = chain_data[chain_data['Frequency']==1] cols = ['State', 'CNTY_NAME', 'UA_NAME_MOD', 'Cuisine'] dfs = multi_count_df(independent_rest, cols) titles = ["Independent Restaurants in US States", "Independent Restaurants in Counties", "Independent Restaurants in Urbans", "Independent Restaurants Cuisines"] fig = multi_donut_charts(dfs, 2, 2, cols, colors, titles) fig.show()
There are restaurants of all types Los Angeles, CaliforniaLos Angeles is heaven for foodies. Looking at Urbans newark win. All restaurants are regular restaurants serving different types of food.
What restaurants serve America’s most popular cuisine?
So far, nothing is known about the most popular dish among Americans. After surfing a bit on the internet, I found the following: Most Americans prefer Italian food first, Mexican food second, and Chinese food third ( here)Now let’s take a look at the restaurants that serve the popular dishes. First, chain stores stand out.
Phew! It’s been a while and I’m getting different results. I thought I would see the name Subway in the Italian restaurant category. But we got another restaurant. olive garden It has the most Italian restaurants.for Mexican When Chinese cooking, taco bell When panda express Each has the most outlets. So are these restaurant locations primarily in California?
Our guess is not bad.that’s all olive gardenmost outlets in texas Excluding that CaliforniaOtherwise, Taco Bell and Panda Express outlets are mostly CaliforniaNow what about counties and cities? let’s do this
the outlet of olive garden Mainly located in Harris, TexasIt has stores from both Taco Bell and Panda Express. Los Angeles, CaliforniaNow let’s look at Urban.just replace CNTY_NAME When UA_NAME_MOD The above code will display the following output:
Most Olive Garden stores are located in the following areas: Dallas, the third largest city in Texas.fort Worth, the fifth largest city in Texas; When Arlington. Los Angeles, Long Beach, Anaheim It is the country with the most Taco Bell and Panda Express outlets. For those who love Italian food Dallas, Fort Worth, Arlington – three cities in Texas, should be your first choice.
Well, this is all about chain restaurants.how is it independent restaurantOf course, it is also popular. But here’s the problem. I found so many restaurants with the same frequency that I can’t tell which restaurant outlets serve mostly Italian food by that frequency. But we do know which states have the most independent restaurants serving Italian food.
The chart above pretty much tells us. Italian restaurant (private management) is located in New York. On the other side, Mexican restaurant (individually owned) When Chinese restaurant (private management) Mainly located in California.
-
Plotting customized graphs using seaborn and matplotlib – how to create attractive graphs that are pleasing to the eye. When,
-
As a bonus, here are some facts about American chains and independent restaurants.
But the analysis doesn’t end here. Let me know in the comments if you have any. If there is any problem on my part, I am always here to listen to you. You can also do a lot more with your plots – there are so many options for customization. Experiment with the parameters to see which one does what. This is not difficult for you.
If you want more articles like this, visit the AnalyticsVidhya profile.
Media shown in this article are not owned by Analytics Vidhya and are used at the author’s discretion.