Datasets API

Open In Colab

This guide provides detailed instructions on how to use the gfw-api-python-client to access various datasets available through the Global Fishing Watch API. Currently, it focuses on retrieving SAR (Synthetic Aperture Radar) fixed infrastructure data. The Datasets API allows you to retrieve this information, either by specifying tile coordinates or a geographic geometry. Here is a Jupyter Notebook version of this guide with more usage examples.

Note: See the Datasets, SAR (Synthetic-Aperture Radar) Data Caveats, and Terms of Use pages in the GFW API documentation for details on GFW data, API licenses, and rate limits.

Prerequisites

Getting Started

To interact with the Datasets endpoints, you first need to instantiate the gfw.Client and then access the datasets resource:

import os

import gfwapiclient as gfw


access_token = os.environ.get(
    "GFW_API_ACCESS_TOKEN",
    "<OR_PASTE_YOUR_GFW_API_ACCESS_TOKEN_HERE>",
)

gfw_client = gfw.Client(
    access_token=access_token,
)

The gfw_client.datasets object provides methods to retrieve data from various datasets. The get_sar_fixed_infrastructure method allows you to access SAR fixed infrastructure data. This method returns a result object, which offers convenient ways to access the data as Pydantic models using .data() or as pandas DataFrames using .df().

Tip: Use IPython or Python 3.11+ with python -m asyncio to run gfw-api-python-client code interactively, as these environments support executing async / await expressions directly in the console.

Retrieving SAR Fixed Infrastructure Data by Tile Coordinates (get_sar_fixed_infrastructure with z, x, y)

You can retrieve SAR fixed infrastructure data for a specific tile using its zoom level (z), x-coordinate (x), and y-coordinate (y).

z = 1
x = 0
y = 1

sar_infrastructure_result = await gfw_client.datasets.get_sar_fixed_infrastructure(
    z=z, x=x, y=y
)

Access the list of SAR fixed infrastructure items as Pydantic models

sar_infrastructure_data = sar_infrastructure_result.data()
sar_infrastructure = sar_infrastructure_data[-1]
print(
    (
        sar_infrastructure.structure_id,
        sar_infrastructure.label,
        sar_infrastructure.lat,
        sar_infrastructure.lon,
    )
)
print(sar_infrastructure.model_dump())

Output:

(646348, 'oil', -39.15082587013905, 177.96658840984458)

Access the SAR fixed infrastructure items as a DataFrame

sar_infrastructure_df = sar_infrastructure_result.df()
print(sar_infrastructure_df.info())
print(sar_infrastructure_df.head())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1156 entries, 0 to 1155
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype
---  ------                --------------  -----
 0   structure_id          1156 non-null   int64
 1   lat                   1156 non-null   float64
 2   lon                   1156 non-null   float64
 3   label                 1156 non-null   object
 4   structure_start_date  1156 non-null   datetime64[ns, UTC]
 5   structure_end_date    857 non-null    datetime64[ns, UTC]
 6   label_confidence      1156 non-null   object
dtypes: datetime64[ns, UTC](2), float64(2), int64(1), object(2)
memory usage: 63.3+ KB

Retrieving SAR Fixed Infrastructure Data by Geometry (get_sar_fixed_infrastructure with geometry)

You can also retrieve SAR fixed infrastructure data for a specific geographic area defined by a GeoJSON geometry.

geometry = {
    "type": "Polygon",
    "coordinates": [
        [
            [-180.0, -85.0511287798066],
            [-180.0, 0.0],
            [0.0, 0.0],
            [0.0, -85.0511287798066],
            [-180.0, -85.0511287798066],
        ]
    ],
}

sar_infrastructure_result = await gfw_client.datasets.get_sar_fixed_infrastructure(
    geometry=geometry
)

Access the list of SAR fixed infrastructure items as Pydantic models

sar_infrastructure_data = sar_infrastructure_result.data()
sar_infrastructure = sar_infrastructure_data[-1]
print(
    (
        sar_infrastructure.structure_id,
        sar_infrastructure.label,
        sar_infrastructure.lat,
        sar_infrastructure.lon,
    )
)
print(sar_infrastructure.model_dump())

Output:

(646348, 'oil', -39.15082587013905, 177.96658840984458)

Access the SAR fixed infrastructure items as a DataFrame

sar_infrastructure_df = sar_infrastructure_result.df()
print(sar_infrastructure_df.info())
print(sar_infrastructure_df.head())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1156 entries, 0 to 1155
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype
---  ------                --------------  -----
 0   structure_id          1156 non-null   int64
 1   lat                   1156 non-null   float64
 2   lon                   1156 non-null   float64
 3   label                 1156 non-null   object
 4   structure_start_date  1156 non-null   datetime64[ns, UTC]
 5   structure_end_date    857 non-null    datetime64[ns, UTC]
 6   label_confidence      1156 non-null   object
dtypes: datetime64[ns, UTC](2), float64(2), int64(1), object(2)
memory usage: 63.3+ KB

Next Steps

Explore the Usage Guides and Workflow Guides for other API resources. Check out the following resources: