1. Introduction¶
Everyone loves Lego (unless you ever stepped on one). Did you know by the way that "Lego" was derived from the Danish phrase leg godt, which means "play well"? Unless you speak Danish, probably not.
In this project, we will analyze a fascinating dataset on every single lego block that has ever been built!
# Nothing to do here
2. Reading Data¶
A comprehensive database of lego blocks is provided by Rebrickable. The data is available as csv files and the schema is shown below.
Let us start by reading in the colors data to get a sense of the diversity of lego sets!
# Import modules
import pandas as pd
# Read colors data
colors = pd.read_csv('datasets/colors.csv')
# Print the first few rows
colors.head()
id | name | rgb | is_trans | |
---|---|---|---|---|
0 | -1 | Unknown | 0033B2 | f |
1 | 0 | Black | 05131D | f |
2 | 1 | Blue | 0055BF | f |
3 | 2 | Green | 237841 | f |
4 | 3 | Dark Turquoise | 008F9B | f |
3. Exploring Colors¶
Now that we have read the colors
data, we can start exploring it! Let us start by understanding the number of colors available.
# How many distinct colors are available?
num_colors = colors.shape[0]
num_colors
135
4. Transparent Colors in Lego Sets¶
The colors
data has a column named is_trans
that indicates whether a color is transparent or not. It would be interesting to explore the distribution of transparent vs. non-transparent colors.
# colors_summary: Distribution of colors based on transparency
colors_summary = colors.groupby(colors['is_trans']).count()
colors_summary
id | name | rgb | |
---|---|---|---|
is_trans | |||
f | 107 | 107 | 107 |
t | 28 | 28 | 28 |
5. Explore Lego Sets¶
Another interesting dataset available in this database is the sets
data. It contains a comprehensive list of sets over the years and the number of parts that each of these sets contained.
Let us use this data to explore how the average number of parts in Lego sets has varied over the years.
%matplotlib inline
# Read sets data as `sets`
sets = pd.read_csv('datasets/sets.csv')
print(sets.head())
# Create a summary of average number of parts by year: `parts_by_year`
parts_by_year = sets[['year', 'num_parts']].groupby('year', as_index=False).count()
# Plot trends in average number of parts by year
parts_by_year.plot(x = 'year', y = 'num_parts')
parts_by_year.head()
set_num name year theme_id num_parts 0 00-1 Weetabix Castle 1970 414 471 1 0011-2 Town Mini-Figures 1978 84 12 2 0011-3 Castle 2 for 1 Bonus Offer 1987 199 2 3 0012-1 Space Mini-Figures 1979 143 12 4 0013-1 Space Mini-Figures 1979 143 12
year | num_parts | |
---|---|---|
0 | 1950 | 7 |
1 | 1953 | 4 |
2 | 1954 | 14 |
3 | 1955 | 28 |
4 | 1956 | 12 |
# themes_by_year: Number of themes shipped by year
# themes_by_year: Number of themes shipped by year
themes_by_year = sets[['year', 'theme_id']].groupby('year', as_index=False).count()
themes_by_year.head()
year | theme_id | |
---|---|---|
0 | 1950 | 7 |
1 | 1953 | 4 |
2 | 1954 | 14 |
3 | 1955 | 28 |
4 | 1956 | 12 |
7. Wrapping It All Up!¶
Lego blocks offer an unlimited amount of fun across ages. We explored some interesting trends around colors, parts, and themes.
# Nothing to do here