Mapping Systems
Toggle menu

Tutorials

Vector Data Apis

01. Describing and visualizing data

Getting Started

  • Download MapPLUTO data from NYC DCP.
  • Unzip the file and place contents in the Data directory at the root of this repo
  • Make sure you have installed all requisite libraries by running pip install -r requirements.txt with your virtual environment activated. For guidance on setting up or activating your virtual environment, refer to the notes in 00_Getting_started.

Goals

  • Download data from NYC DCP’s open data portal
  • Load data from file
  • Explore spatial and non-spatial elements of the dataset
  • Visualize spatial and non-spatial elements of the dataset

The following libraries will allow us to import and explore our data. Note that in some cases we import the entire library and even use a short-hand reference (e.g. import geopandas as gpd), while in other cases we only import submodules (e.g. from lonboard._layer import PolygonLayer). This is largely a matter of preference, but importing submodules can help reduce memory usage and improve performance, especially when working with large datasets or in an app you’ve developed.

# the bare minimum
import matplotlib.pyplot as plt  # for plotting
import geopandas as gpd  # for geospatial data handling
from matplotlib.lines import Line2D

# more advanced
from lonboard._map import Map
from lonboard.layer import PolygonLayer  # for mapping in 3D
from lonboard.colormap import (
    apply_categorical_cmap,
    apply_continuous_cmap,
)  # for assigning colors
from palettable.colorbrewer.sequential import PuRd_9  # for color palettes
from matplotlib.colors import LogNorm  # for logarithmic normalization
import pygwalker as pyg  # for creating interactive data visualizations

Load PLUTO data

MapPLUTO is New York City’s tax lot database, which contains detailed information about the city’s land parcels, including their size, zoning, and ownership. The dataset is updated quarterly and is a foundational resource for understanding the city’s built environment.

Here, we create a variable (pluto) and use geopandas to read the file into memory. We use a “relative” path to reference the file.

pluto = gpd.read_file("../Data/nyc_mappluto_24v1_1_shp/MapPLUTO.shp")

Basic exploration

A great way to familiarize yourself with a dataset is to look at the first few rows. Scroll to the right- you’ll see an ellipsis (...) indicating that there are more columns (there are 95!). The rightmost column is the geometry, which contains the spatial information for each row. That’s what we’ll use to map later, but all of the other information contains data that can help us gain a deeper understanding of the dataset.

pluto.head()
vertical-align: top; } .dataframe thead th { text-align: right; }
Borough Block Lot CD BCT2020 BCTCB2020 CT2010 CB2010 SchoolDist Council ... FIRM07_FLA PFIRM15_FL Version DCPEdited Latitude Longitude Notes Shape_Leng Shape_Area geometry
0 MN 1 10 101 1000500 10005000003 5 1000 02 1 ... 1 1 24v1.1 NaN 40.688766 -74.018682 None 0.0 7.478663e+06 POLYGON ((980898.728 191409.779, 980881.798 19...
1 MN 97 33 101 1001501 10015013007 15.01 3014 02 1 ... 1 1 24v1.1 t 40.707789 -74.002009 None 0.0 2.839154e+03 POLYGON ((983690.664 197185.709, 983700.362 19...
2 MN 97 35 101 1001501 10015013007 15.01 3014 02 1 ... 1 1 24v1.1 t 40.707728 -74.002117 None 0.0 2.531493e+03 POLYGON ((983660.178 197162.227, 983697.276 19...
3 MN 97 36 101 1001501 10015013007 15.01 3014 02 1 ... 1 1 24v1.1 t 40.707687 -74.002207 None 0.0 1.825158e+03 POLYGON ((983608.867 197131.146, 983629.531 19...
4 MN 97 43 101 1001501 10015013007 15.01 3014 02 1 ... 1 1 24v1.1 t 40.707374 -74.002705 None 0.0 1.057095e+03 POLYGON ((983498.787 196968.26, 983479.066 197...

5 rows × 95 columns

First, let’s get a sense of what columns are available in the dataset.

pluto.columns
Index(['Borough', 'Block', 'Lot', 'CD', 'BCT2020', 'BCTCB2020', 'CT2010',
       'CB2010', 'SchoolDist', 'Council', 'ZipCode', 'FireComp', 'PolicePrct',
       'HealthCent', 'HealthArea', 'Sanitboro', 'SanitDistr', 'SanitSub',
       'Address', 'ZoneDist1', 'ZoneDist2', 'ZoneDist3', 'ZoneDist4',
       'Overlay1', 'Overlay2', 'SPDist1', 'SPDist2', 'SPDist3', 'LtdHeight',
       'SplitZone', 'BldgClass', 'LandUse', 'Easements', 'OwnerType',
       'OwnerName', 'LotArea', 'BldgArea', 'ComArea', 'ResArea', 'OfficeArea',
       'RetailArea', 'GarageArea', 'StrgeArea', 'FactryArea', 'OtherArea',
       'AreaSource', 'NumBldgs', 'NumFloors', 'UnitsRes', 'UnitsTotal',
       'LotFront', 'LotDepth', 'BldgFront', 'BldgDepth', 'Ext', 'ProxCode',
       'IrrLotCode', 'LotType', 'BsmtCode', 'AssessLand', 'AssessTot',
       'ExemptTot', 'YearBuilt', 'YearAlter1', 'YearAlter2', 'HistDist',
       'Landmark', 'BuiltFAR', 'ResidFAR', 'CommFAR', 'FacilFAR', 'BoroCode',
       'BBL', 'CondoNo', 'Tract2010', 'XCoord', 'YCoord', 'ZoneMap', 'ZMCode',
       'Sanborn', 'TaxMap', 'EDesigNum', 'APPBBL', 'APPDate', 'PLUTOMapID',
       'FIRM07_FLA', 'PFIRM15_FL', 'Version', 'DCPEdited', 'Latitude',
       'Longitude', 'Notes', 'Shape_Leng', 'Shape_Area', 'geometry'],
      dtype='str')

We can also check the data types of each column- we can see there are a mix of numeric (int32, int64, float64) and string (O = object) data types, as well as some geometry data (geopandas.array.GeometryDtype).

list(pluto.dtypes)
[<StringDtype(na_value=nan)>,
 dtype('int64'),
 dtype('int32'),
 dtype('int32'),
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 dtype('int32'),
 dtype('int64'),
 <StringDtype(na_value=nan)>,
 dtype('int32'),
 dtype('int32'),
 dtype('int32'),
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 dtype('O'),
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 dtype('int32'),
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 dtype('int64'),
 dtype('int64'),
 dtype('int64'),
 dtype('int64'),
 dtype('int64'),
 dtype('int64'),
 dtype('int64'),
 dtype('int64'),
 dtype('int64'),
 dtype('int64'),
 <StringDtype(na_value=nan)>,
 dtype('int64'),
 dtype('float64'),
 dtype('int64'),
 dtype('int64'),
 dtype('float64'),
 dtype('float64'),
 dtype('float64'),
 dtype('float64'),
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 dtype('float64'),
 dtype('float64'),
 dtype('float64'),
 dtype('int32'),
 dtype('int32'),
 dtype('int32'),
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 dtype('float64'),
 dtype('float64'),
 dtype('float64'),
 dtype('float64'),
 dtype('int64'),
 dtype('float64'),
 dtype('int64'),
 <StringDtype(na_value=nan)>,
 dtype('int64'),
 dtype('int64'),
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 dtype('float64'),
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 <StringDtype(na_value=nan)>,
 dtype('float64'),
 dtype('float64'),
 dtype('O'),
 dtype('float64'),
 dtype('float64'),
 <geopandas.array.GeometryDtype at 0x11fe01e80>]

Exploring a categorical column

A great way to get a sense of a field is to look at the frequency of values in a categorical field. here, we look at the LandUse field, which describes the type of land use for each property in the dataset.

pluto.LandUse.value_counts()
LandUse
01    565933
02    131623
04     55966
11     24904
05     21260
03     12916
08     12052
06      9431
10      9344
07      6033
09      4708
Name: count, dtype: int64

🧐 What do those numbers mean? Let’s look at the data dictionary

# now we can remap the numbers into something more meaningful

land_use_codes = {
    "01": "One & Two Family Buildings",
    "02": "Multi-Family Walk-Up Buildings",
    "03": "Multi-Family Elevator Buildings",
    "04": "Mixed Residential & Commercial Buildings",
    "05": "Commercial & Office Buildings",
    "06": "Industrial & Manufacturing",
    "07": "Transportation & Utility",
    "08": "Public Facilities & Institutions",
    "09": "Open Space & Outdoor Recreation",
    "10": "Parking Facilities",
    "11": "Vacant Land",
}

We can “map” the Land Use codes to more descriptive names. the map() function will replace instances of the “key” with the corresponding “value” in the dictionary.

pluto["LandUse"] = pluto.LandUse.map(land_use_codes)
# now when we perform operations on the LandUse field,
# we can use the more meaningful names.
# let's look at the frequency of values in the LandUse field again
pluto.LandUse.value_counts()
LandUse
One & Two Family Buildings                  565933
Multi-Family Walk-Up Buildings              131623
Mixed Residential & Commercial Buildings     55966
Vacant Land                                  24904
Commercial & Office Buildings                21260
Multi-Family Elevator Buildings              12916
Public Facilities & Institutions             12052
Industrial & Manufacturing                    9431
Parking Facilities                            9344
Transportation & Utility                      6033
Open Space & Outdoor Recreation               4708
Name: count, dtype: int64

⚠️ Caution! So far we have been counting the number of rows, but not saying anything about the area of each land use type. To do so, we can sum across rows of the same type. We can use the groupby() function to group the data by a categorical column type and then sum the values in another column (in this case, the lot area) the area for each type.

Grouping by a categorical column

Here, we are grouping the data by the landuse column and then summing the values in the lotarea column for each group. The result is a new DataFrame that contains the total lot area for each land use type. We can sum or manipulate multiple columns at a time, and can chain together multiple operations as you can see below.

pluto.groupby("LandUse").LotArea.sum().sort_values(ascending=False)
LandUse
One & Two Family Buildings                  1827228783
Open Space & Outdoor Recreation             1628442460
Transportation & Utility                     642314001
Vacant Land                                  553561312
Public Facilities & Institutions             521971866
Multi-Family Walk-Up Buildings               482673749
Multi-Family Elevator Buildings              353406241
Commercial & Office Buildings                274905042
Mixed Residential & Commercial Buildings     265149020
Industrial & Manufacturing                   222481816
Parking Facilities                            83803716
Name: LotArea, dtype: int64
# more complex grouping
# We can also use the `agg()` function to apply multiple aggregation functions to different columns at once.
# For example, we can calculate the total lot area and the average building area for each land use type:
landuse_summary = (
    pluto.groupby("LandUse")
    .agg({"LotArea": "sum", "BldgArea": "mean"})
    .reset_index()
    .rename(
        columns={"LotArea": "Total Lot Area", "BldgArea": "Average Building Area"},
    )
)
landuse_summary
vertical-align: top; } .dataframe thead th { text-align: right; }
LandUse Total Lot Area Average Building Area
0 Commercial & Office Buildings 274905042 38267.802070
1 Industrial & Manufacturing 222481816 20607.787721
2 Mixed Residential & Commercial Buildings 265149020 18029.961834
3 Multi-Family Elevator Buildings 353406241 87934.512155
4 Multi-Family Walk-Up Buildings 482673749 5597.010842
5 One & Two Family Buildings 1827228783 1887.522283
6 Open Space & Outdoor Recreation 1628442460 7637.659728
7 Parking Facilities 83803716 3040.385916
8 Public Facilities & Institutions 521971866 46048.863508
9 Transportation & Utility 642314001 13238.668821
10 Vacant Land 553561312 27.817098

Now let’s plot the total lot area for each land use type.

# there are many ways to visualize this data, and levels to refining the graphic style.
# This is the simplest possible example, using matplotlib to create a bar chart
pluto.groupby("LandUse").LotArea.sum().sort_values(ascending=False).plot.bar()
plt.title("Total lot area by land use type")
Text(0.5, 1.0, 'Total lot area by land use type')

Output

# now the same but for building area - note this is total building area, not average
pluto.groupby("LandUse").BldgArea.sum().sort_values(ascending=False).plot.bar()
plt.title("Total building area by land use type")
Text(0.5, 1.0, 'Total building area by land use type')

Output

Below is a more complicated example where we can plot two variables against each other to compare.

Note how we are manipulating a “figure” (fig) and “axes” (ax) object to create a more complex plot. This is a common pattern in data visualization libraries like Matplotlib and Seaborn, where you can create a figure and then add multiple axes to it, each with its own plot. You can see several invocations of ax where we set various properties.

We also use two copies of our pluto dataset as inputs - note how each copy is aggregated over a different variable.

# plot both lot and building area on the same plot with a secondary y-axis
fig, ax = plt.subplots()
by_lot_area = pluto.groupby("LandUse").LotArea.sum().sort_values(ascending=False)

by_lot_area.plot.bar(ax=ax, color="orange")

# get order to apply below
order = {v: i for i, v in enumerate(by_lot_area.index)}

ax.set_ylabel("Lot Area")
ax.set_xlabel("Land Use Type")

ax2 = ax.twinx()
pluto.groupby("LandUse").BldgArea.sum().reindex(by_lot_area.index).plot.bar(
    ax=ax2, edgecolor="black", color="none"
)
ax2.set_ylabel("Building Area")

plt.title("Total lot and building area by land use type")

# add legends
ax.legend(["Lot Area"], loc="upper left")
ax2.legend(["Building Area"], loc="upper right")

Output

numeric column

We can use the describe() function to get a summary of the numeric columns in the dataset.

pluto["NumFloors"].describe()
count    856819.000000
mean          2.357554
std           2.012707
min           0.000000
25%           2.000000
50%           2.000000
75%           2.500000
max         104.000000
Name: NumFloors, dtype: float64
# We can see that the average number of floors is 2.35, but the maximum is 104! and that the majority of buildings have 1-3 floors.
# Let's visualize the distribution of the number of floors using a histogram
plt.figure(figsize=(10, 6))
pluto["NumFloors"].hist(bins=30, edgecolor="black")
plt.title("Distribution of Number of Floors")
plt.xlabel("Number of Floors")
plt.ylabel("Frequency")
plt.xlim(0, 20)  # limit x-axis to 20 for better visibility
plt.xticks(range(0, 21))  # set x-ticks to integers from 0 to 20
plt.grid(axis="y", linestyle="--", alpha=0.7)
plt.show()

Output

Interactive plotting

We can use the pygwalker library to create an interactive visualization of the data. Especially as we are becoming familiar with the dataset, this can be a useful way to explore the data and see how different variables relate to each other.

# pygwalker doesn't suppert geospatial data directly, so we need to drop the geometry column.
# Be sure to keep a copy of the original data, we'll need it later!
pluto_non_spatial = pluto.drop(columns=["geometry"])

# Invoke pygwalker, begin exploring the data interactively
pyg.walk(pluto_non_spatial)
Box(children=(HTML(value='\n<div id="ifr-pyg-000651c56678c794erSEcFydGiQOb75j" style="height: auto">\n    <hea…

Your turn:

  • Take a few minutes to explore the dataset on your own.
  • Try creating graphs that show the relationship between different variables.
  • Let’s discuss what you find or what you’re exploring.

Mapping

Creating a static map

In this example, we are using matplotlib to map our pluto data. Under the hood, matplotlib is drawing each individual polygon (>800,000), which is resource intensive and hard to discern on a map! This is a good place to start, but we’ll soon move on to more advanced mapping techniques.

pluto.plot(figsize=(10, 10)).set_axis_off()
plt.title("NYC MapPLUTO")
Text(0.5, 1.0, 'NYC MapPLUTO')

Output

cmap = {
    "One & Two Family Buildings": "#ffff00",
    "Multi-Family Walk-Up Buildings": "#fffb00",
    "Multi-Family Elevator Buildings": "#ffc800",
    "Mixed Residential & Commercial Buildings": "#ff4000",
    "Commercial & Office Buildings": "#ff0000",
    "Industrial & Manufacturing": "#7700ff",
    "Transportation & Utility": "#808080",
    "Public Facilities & Institutions": "#001580",
    "Open Space & Outdoor Recreation": "#219F21",
    "Parking Facilities": "#A6A6AB",
    "Vacant Land": "#222222",
    "Unknown": "#000000",
}
pluto.fillna({"LandUse": "Unknown"}, inplace=True)
vertical-align: top; } .dataframe thead th { text-align: right; }
Borough Block Lot CD BCT2020 BCTCB2020 CT2010 CB2010 SchoolDist Council ... FIRM07_FLA PFIRM15_FL Version DCPEdited Latitude Longitude Notes Shape_Leng Shape_Area geometry
0 MN 1 10 101 1000500 10005000003 5 1000 02 1 ... 1 1 24v1.1 NaN 40.688766 -74.018682 None 0.0 7.478663e+06 POLYGON ((980898.728 191409.779, 980881.798 19...
1 MN 97 33 101 1001501 10015013007 15.01 3014 02 1 ... 1 1 24v1.1 t 40.707789 -74.002009 None 0.0 2.839154e+03 POLYGON ((983690.664 197185.709, 983700.362 19...
2 MN 97 35 101 1001501 10015013007 15.01 3014 02 1 ... 1 1 24v1.1 t 40.707728 -74.002117 None 0.0 2.531493e+03 POLYGON ((983660.178 197162.227, 983697.276 19...
3 MN 97 36 101 1001501 10015013007 15.01 3014 02 1 ... 1 1 24v1.1 t 40.707687 -74.002207 None 0.0 1.825158e+03 POLYGON ((983608.867 197131.146, 983629.531 19...
4 MN 97 43 101 1001501 10015013007 15.01 3014 02 1 ... 1 1 24v1.1 t 40.707374 -74.002705 None 0.0 1.057095e+03 POLYGON ((983498.787 196968.26, 983479.066 197...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
856814 SI 8050 76 503 5024800 50248001014 248 1016 31 51 ... NaN NaN 24v1.1 NaN 40.509174 -74.250491 None 0.0 3.533919e+03 POLYGON ((914549.948 124915.949, 914577.436 12...
856815 SI 8050 78 503 5024800 50248001014 248 1016 31 51 ... NaN NaN 24v1.1 NaN 40.509303 -74.250326 None 0.0 8.015543e+03 POLYGON ((914705.62 124922.971, 914689.091 124...
856816 SI 8050 83 503 5024800 50248001014 248 1016 31 51 ... NaN NaN 24v1.1 NaN 40.509152 -74.250189 None 0.0 5.078841e+03 POLYGON ((914740.567 124877.857, 914721.621 12...
856817 SI 8050 86 503 5024800 50248001014 248 1016 31 51 ... NaN NaN 24v1.1 NaN 40.508963 -74.250274 None 0.0 1.318642e+04 POLYGON ((914777.738 124829.873, 914758.578 12...
856818 SI 8050 89 503 5024800 50248001014 248 1016 31 51 ... NaN NaN 24v1.1 NaN 40.508834 -74.250187 None 0.0 1.247199e+04 POLYGON ((914810.954 124786.995, 914791.371 12...

856819 rows × 95 columns

pluto.LandUse.unique()
<ArrowStringArray>
[        'Public Facilities & Institutions',
           'Multi-Family Walk-Up Buildings',
 'Mixed Residential & Commercial Buildings',
               'One & Two Family Buildings',
          'Multi-Family Elevator Buildings',
                              'Vacant Land',
            'Commercial & Office Buildings',
          'Open Space & Outdoor Recreation',
                 'Transportation & Utility',
                       'Parking Facilities',
                                  'Unknown',
               'Industrial & Manufacturing']
Length: 12, dtype: str
# here we make a new column called "color" that maps the LandUse values to colors
pluto["color"] = pluto["LandUse"].map(cmap)
pluto["color"].unique()
<ArrowStringArray>
['#001580', '#fffb00', '#ff4000', '#ffff00', '#ffc800', '#222222', '#ff0000',
 '#219F21', '#808080', '#A6A6AB', '#000000', '#7700ff']
Length: 12, dtype: str
ax = pluto.plot(
    color=pluto["color"],
    figsize=(10, 10),
    legend=True,
).set_axis_off()
plt.title("NYC MapPLUTO")


# populate legend items based on dict from above
legend_colors = [
    Line2D([0], [0], marker="o", color="w", markerfacecolor=c, markersize=10)
    for c in cmap.values()
]
labels = cmap.keys()
plt.legend(legend_colors, labels, loc="upper left")

Output

Your turn:

pluto.LotArea.plot.hist(bins=100, log=True)
<Axes: ylabel='Frequency'>

Output

Prep for interactive mapping

pluto_wgs = pluto.to_crs("epsg:4326")

Visualize a categorical variable

cmap_rgb = {k: list(int(v[i : i + 2], 16) for i in (1, 3, 5)) for k, v in cmap.items()}
cmap_rgb
{'One & Two Family Buildings': [255, 255, 0],
 'Multi-Family Walk-Up Buildings': [255, 251, 0],
 'Multi-Family Elevator Buildings': [255, 200, 0],
 'Mixed Residential & Commercial Buildings': [255, 64, 0],
 'Commercial & Office Buildings': [255, 0, 0],
 'Industrial & Manufacturing': [119, 0, 255],
 'Transportation & Utility': [128, 128, 128],
 'Public Facilities & Institutions': [0, 21, 128],
 'Open Space & Outdoor Recreation': [33, 159, 33],
 'Parking Facilities': [166, 166, 171],
 'Vacant Land': [34, 34, 34],
 'Unknown': [0, 0, 0]}
len(pluto_wgs[pluto_wgs["LandUse"].isna()])
0
pluto.fillna({"LandUse": "Unknown"}, inplace=True)
vertical-align: top; } .dataframe thead th { text-align: right; }
Borough Block Lot CD BCT2020 BCTCB2020 CT2010 CB2010 SchoolDist Council ... PFIRM15_FL Version DCPEdited Latitude Longitude Notes Shape_Leng Shape_Area geometry color
0 MN 1 10 101 1000500 10005000003 5 1000 02 1 ... 1 24v1.1 NaN 40.688766 -74.018682 None 0.0 7.478663e+06 POLYGON ((980898.728 191409.779, 980881.798 19... #001580
1 MN 97 33 101 1001501 10015013007 15.01 3014 02 1 ... 1 24v1.1 t 40.707789 -74.002009 None 0.0 2.839154e+03 POLYGON ((983690.664 197185.709, 983700.362 19... #fffb00
2 MN 97 35 101 1001501 10015013007 15.01 3014 02 1 ... 1 24v1.1 t 40.707728 -74.002117 None 0.0 2.531493e+03 POLYGON ((983660.178 197162.227, 983697.276 19... #ff4000
3 MN 97 36 101 1001501 10015013007 15.01 3014 02 1 ... 1 24v1.1 t 40.707687 -74.002207 None 0.0 1.825158e+03 POLYGON ((983608.867 197131.146, 983629.531 19... #ff4000
4 MN 97 43 101 1001501 10015013007 15.01 3014 02 1 ... 1 24v1.1 t 40.707374 -74.002705 None 0.0 1.057095e+03 POLYGON ((983498.787 196968.26, 983479.066 197... #ff4000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
856814 SI 8050 76 503 5024800 50248001014 248 1016 31 51 ... NaN 24v1.1 NaN 40.509174 -74.250491 None 0.0 3.533919e+03 POLYGON ((914549.948 124915.949, 914577.436 12... #ffff00
856815 SI 8050 78 503 5024800 50248001014 248 1016 31 51 ... NaN 24v1.1 NaN 40.509303 -74.250326 None 0.0 8.015543e+03 POLYGON ((914705.62 124922.971, 914689.091 124... #ffff00
856816 SI 8050 83 503 5024800 50248001014 248 1016 31 51 ... NaN 24v1.1 NaN 40.509152 -74.250189 None 0.0 5.078841e+03 POLYGON ((914740.567 124877.857, 914721.621 12... #ffff00
856817 SI 8050 86 503 5024800 50248001014 248 1016 31 51 ... NaN 24v1.1 NaN 40.508963 -74.250274 None 0.0 1.318642e+04 POLYGON ((914777.738 124829.873, 914758.578 12... #ffff00
856818 SI 8050 89 503 5024800 50248001014 248 1016 31 51 ... NaN 24v1.1 NaN 40.508834 -74.250187 None 0.0 1.247199e+04 POLYGON ((914810.954 124786.995, 914791.371 12... #ffff00

856819 rows × 96 columns

now, we can plot the data using lonboard to create an interactive map

df = pluto_wgs[["LandUse", "geometry"]].copy()
df["LandUse"] = df["LandUse"].astype("category")

layer = PolygonLayer.from_geopandas(
    df[["LandUse", "geometry"]],
    get_fill_color=apply_categorical_cmap(
        df["LandUse"],
        cmap=cmap_rgb,
    ),
)
m = Map(layer)
m
VBox(children=(<lonboard._map.Map object at 0x4a52a4440>, VBox(children=(ErrorOutput(), ErrorOutput(), ErrorOu…

visualize a continuous variable

df = pluto_wgs[["NumFloors", "geometry"]]

normalizer = LogNorm(1, df.NumFloors.max(), clip=True)
normalized_floors = normalizer(df.NumFloors)

layer = PolygonLayer.from_geopandas(
    df[["NumFloors", "geometry"]],
    get_fill_color=apply_continuous_cmap(normalized_floors, cmap=PuRd_9),
)
m = Map(layer)
m
VBox(children=(<lonboard._map.Map object at 0x4158c6490>, VBox(children=(ErrorOutput(), ErrorOutput(), ErrorOu…
df = pluto_wgs[["NumFloors", "geometry"]]

normalizer = LogNorm(1, df.NumFloors.max(), clip=True)
normalized_floors = normalizer(df.NumFloors)


layer = PolygonLayer.from_geopandas(
    df[["NumFloors", "geometry"]],
    get_fill_color=apply_continuous_cmap(normalized_floors, cmap=PuRd_9),
    extruded=True,
    get_elevation=pluto_wgs["NumFloors"] * 14,
)
m = Map(
    layer, view_state={"longitude": -73.97, "latitude": 40.73, "zoom": 10, "pitch": 45}
)
m
VBox(children=(<lonboard._map.Map object at 0x4158c6210>, VBox(children=(ErrorOutput(), ErrorOutput(), ErrorOu…
df = pluto_wgs[pluto_wgs.YearBuilt > 2010][["NumFloors", "geometry"]].copy()

normalizer = LogNorm(1, df.NumFloors.max(), clip=True)
normalized_floors = normalizer(df.NumFloors)

layer = PolygonLayer.from_geopandas(
    df[["NumFloors", "geometry"]],
    get_fill_color=apply_continuous_cmap(normalized_floors, cmap=PuRd_9),
    extruded=True,
    get_elevation=df["NumFloors"] * 14,
)
m = Map(layer)
m
VBox(children=(<lonboard._map.Map object at 0x4a52f8050>, VBox(children=(ErrorOutput(), ErrorOutput(), ErrorOu…
pluto_wgs.YearBuilt.nunique()
252
def categorize_buildings(r):
    if r.YearBuilt < 1900:
        return "Pre-1900"
    elif r.YearBuilt < 1950:
        return "1900-1950"
    elif r.YearBuilt < 2000:
        return "1950-2000"
    else:
        return "Post-2000"
pluto_wgs["year_category"] = pluto_wgs.apply(categorize_buildings, axis=1)
pluto_wgs.year_category.value_counts()
year_category
1900-1950    482376
1950-2000    231164
Pre-1900      79173
Post-2000     64106
Name: count, dtype: int64
year_built_ma = {
    "Pre-1900": "[255,0,0]",
    "1900-1950": "#00ff00",
    "1950-2000": "#0000ff",
    "Post-2000": "#ff00ff",
}
df = pluto_wgs[["year_category", "geometry"]]

layer = PolygonLayer.from_geopandas(
    df[["year_category", "geometry"]],
    get_fill_color=apply_categorical_cmap(df["year_category"], cmap=cmap_rgb),
)
m = Map(layer)
m
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[40], line 5
      1 df = pluto_wgs[["year_category", "geometry"]]
      2 
      3 layer = PolygonLayer.from_geopandas(
      4     df[["year_category", "geometry"]],
----> 5     get_fill_color=apply_categorical_cmap(df["year_category"], cmap=cmap_rgb),
      6 )
      7 m = Map(layer)
      8 m

File ~/miniforge3/envs/cdp26/lib/python3.14/site-packages/lonboard/colormap.py:223, in apply_categorical_cmap(values, cmap, alpha)
    220     lut[:, 3] = 255
    222 for i, key in enumerate(dictionary):
--> 223     color = cmap[key.as_py()]
    225     if isinstance(color, str):
    226         color = _to_rgba_no_colorcycle(color, alpha=alpha)

KeyError: '1900-1950'

To read more about lonboard and mapping in 3d, see here for some tips: https://developmentseed.org/lonboard/latest/examples/overture-maps/#imports