Bulk Download API

Open In Colab

This guide provides detailed instructions on how to use the gfw-api-python-client to access the Bulk Download API, which is designed to support workflows that require bulk access to data, including integration with platforms and tools used by data engineers and researchers. Here is a Jupyter Notebook version of this guide with more usage examples.

Note: See the Datasets, Data Caveats, SAR (Synthetic-Aperture Radar) Data Caveats, and Terms of Use pages in the GFW API documentation for details on GFW data, API licenses, and rate limits.

Prerequisites

Getting Started

To interact with the Bulk Download endpoints, you first need to instantiate the gfw.Client and then access the bulk_downloads resource:

import time
import os

import gfwapiclient as gfw


access_token = os.environ.get(
    "GFW_API_ACCESS_TOKEN",
    "<OR_PASTE_YOUR_GFW_API_ACCESS_TOKEN_HERE>",
)

gfw_client = gfw.Client(
    access_token=access_token,
)

The gfw_client.bulk_downloads object provides methods to:

  • Create bulk reports based on specific filters and spatial parameters.

  • Monitor previously created bulk report generation status.

  • Get signed URL to download previously created bulk report data, metadata and region geometry (in GeoJSON format) files.

  • Query previously created bulk report data records in JSON format.

These methods return a result object, which offers convenient ways to access the data as Pydantic models using .data() or as pandas DataFrames using .df().

Tip: Use IPython or Python 3.11+ with python -m asyncio to run gfw-api-python-client code interactively, as these environments support executing async / await expressions directly in the console.

Create a Bulk Report (create_bulk_report)

The create_bulk_report() method allows you create a bulk report based on specified filters and spatial parameters. The name parameter is mandatory. Please learn more about create a bulk report here and check its data caveats here and here.

timestamp = int(time.time() * 1000)
dataset = "public-fixed-infrastructure-data:latest"
region_dataset = "public-eez-areas"
region_id = "8466"  # Argentinian Exclusive Economic Zone
name = f"{dataset.split(':')[0]}_{region_dataset}__{region_id}_{timestamp}"

create_bulk_report_result = await gfw_client.bulk_downloads.create_bulk_report(
    name=name,
    dataset=dataset,
    region={
        "dataset": region_dataset,
        "id": region_id,
    },
    filters=["label = 'oil'", "label_confidence = 'high'"],
)

Access Create a Bulk Report Result as Pydantic models

create_bulk_report_data = create_bulk_report_result.data()
print((
    create_bulk_report_data.id,
    create_bulk_report_data.name,
    create_bulk_report_data.status,
    create_bulk_report_data.created_at,
))

Output:

('c5e32895-4374-41d2-8b2e-ac414ed6757f',
 'public-fixed-infrastructure-data_public-eez-areas__8466_1768085547174',
 'pending',
 datetime.datetime(2026, 1, 10, 22, 52, 30, 9000, tzinfo=TzInfo(0)))

Access Create a Bulk Report Result as a DataFrame

create_bulk_report_df = create_bulk_report_result.df()

print(create_bulk_report_df.info())
print(create_bulk_report_df.head())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   id          1 non-null      object
 1   name        1 non-null      object
 2   file_path   1 non-null      object
 3   format      1 non-null      object
 4   filters     1 non-null      object
 5   geom        1 non-null      object
 6   status      1 non-null      object
 7   owner_id    1 non-null      int64
 8   owner_type  1 non-null      object
 9   created_at  1 non-null      datetime64[ns, UTC]
 10  updated_at  1 non-null      datetime64[ns, UTC]
 11  file_size   0 non-null      object
dtypes: datetime64[ns, UTC](2), int64(1), object(9)
memory usage: 228.0+ bytes

Get Bulk Report by ID (get_bulk_report_by_id)

The get_bulk_report_by_id() method allows you retrieves metadata and status of the previously created bulk report based on the provided bulk report ID. The id parameter is mandatory. Please learn more about get bulk report by id report here and check its data caveats here and here.

Important: We recommend to use this method to poll the status of previously created bulk report, if it takes several minutes or hours to generate until it status is done or failed.

bulk_report_result = await gfw_client.bulk_downloads.get_bulk_report_by_id(
    id=create_bulk_report_data.id
)

Access Get Bulk Report by ID Result as Pydantic models

bulk_report_data = bulk_report_result.data()

print((
    create_bulk_report_data.id,
    create_bulk_report_data.name,
    create_bulk_report_data.status,
    create_bulk_report_data.created_at,
))

Output:

('c5e32895-4374-41d2-8b2e-ac414ed6757f',
 'public-fixed-infrastructure-data_public-eez-areas__8466_1768085547174',
 'pending',
 datetime.datetime(2026, 1, 10, 22, 52, 30, 9000, tzinfo=TzInfo(0)))

Access Get Bulk Report by ID Result as a DataFrame

bulk_report_df = bulk_report_result.df()

print(bulk_report_df.info())
print(bulk_report_df.head())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   id          1 non-null      object
 1   name        1 non-null      object
 2   file_path   1 non-null      object
 3   format      1 non-null      object
 4   filters     1 non-null      object
 5   geom        1 non-null      object
 6   status      1 non-null      object
 7   owner_id    1 non-null      int64
 8   owner_type  1 non-null      object
 9   created_at  1 non-null      datetime64[ns, UTC]
 10  updated_at  1 non-null      datetime64[ns, UTC]
 11  file_size   0 non-null      object
dtypes: datetime64[ns, UTC](2), int64(1), object(9)
memory usage: 228.0+ bytes

Get All Bulk Reports Created by User or Application (get_all_bulk_reports)

The get_all_bulk_reports() method allows you retrieves a list of metadata and status of the previously created bulk reports based on specified pagination, sorting, and filtering criteria. Please learn more about get all bulk reports created by user or application here and check its data caveats here and here.

bulk_reports_result = await gfw_client.bulk_downloads.get_all_bulk_reports(
    status="done",
)

Access All Created Bulk Reports Result as Pydantic models

bulk_reports_data = bulk_reports_result.data()
bulk_report_item = bulk_reports_data[-1]

print((
    bulk_report_item.id,
    bulk_report_item.name,
    bulk_report_item.status,
    bulk_report_item.created_at,
))

Output:

('0c0cada1-72dd-4fb0-bdf6-7fe8c7fdb1e3',
 'sar-fixed-infrastructure-data-20241207-region-1',
 'done',
 datetime.datetime(2025, 12, 7, 10, 3, 12, 371000, tzinfo=TzInfo(0)))

Access All Created Bulk Reports Result as a DataFrame

bulk_reports_df = bulk_reports_result.df()

print(bulk_reports_df.info())
print(bulk_reports_df.head())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   id          7 non-null      object
 1   name        7 non-null      object
 2   file_path   7 non-null      object
 3   format      7 non-null      object
 4   filters     7 non-null      object
 5   geom        4 non-null      object
 6   status      7 non-null      object
 7   owner_id    7 non-null      int64
 8   owner_type  7 non-null      object
 9   created_at  7 non-null      datetime64[ns, UTC]
 10  updated_at  7 non-null      datetime64[ns, UTC]
 11  file_size   7 non-null      float64
dtypes: datetime64[ns, UTC](2), float64(1), int64(1), object(8)
memory usage: 804.0+ bytes

Get Bulk Report File Download URL (get_bulk_report_file_download_url)

The get_bulk_report_file_download_url() method allows you retrieves signed URL that points to a downloadable file hosted on Global Fishing Watch’s cloud infrastructure to download file(s) (i.e., "DATA", "README", or "GEOM") of the previously created bulk report. The id parameter is mandatory. Please learn more about get bulk report file download url here and check its data caveats here and here.

bulk_report_file_download_url_result = (
    await gfw_client.bulk_downloads.get_bulk_report_file_download_url(
        id=bulk_reports_data[0].id, file="DATA"
    )
)

Access Get Bulk Report File Download URL Result as Pydantic models

bulk_report_file_download_url_data = bulk_report_file_download_url_result.data()

print(bulk_report_file_download_url_data.url)

Output:

'https://storage.googleapis.com/gfw-api-bulk-pro-us-central1/705f2f9a-f695-43f1-a4bf-7746f3deb091/data.json.gz?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=api-bulk-pro%40gfw-production.iam.gserviceaccount.com%2F20260110%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20260110T225232Z&X-Goog-Expires=60&X-Goog-SignedHeaders=host&X-Goog-Signature=481a4ff7244b7286f303b37bb7941c291a26d1e3502debdb7611b8cb2d5edf37bc7aa0287b15a11c2f69f72e88791da3f76873a2fd7d08f911691c35ee8e095b825615510de8256f8cd275211997141e026837e118d86e01c026c457dc1f47d43ff2cb07131c3d21e7908c847bf1e3d87cd4773f02e8e4512a7c15e93799de186b9ea004be50cd3e53292f01e9393595a81c42cc3686f65d280f4f16076759da4722c17c2a6a698393c919cdd083402421a1bbf425b618244b3a9b30e48b770a9dc7f9eed8e63af04f8e31f0b6723fdf76fa7262ded89e7a375fbaea3b031bf29db22b1961878facd79c92d633ab6aa2309c0ce3982104d9835058ecd829bee8'

Access Get Bulk Report File Download URL Result as a DataFrame

bulk_report_file_download_url_df = bulk_report_file_download_url_result.df()

print(bulk_report_file_download_url_df.info())
print(bulk_report_file_download_url_df.iloc[0]["url"])

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   url     1 non-null      object
dtypes: object(1)
memory usage: 140.0+ bytes

Query Bulk Fixed Infrastructure Data Report (query_bulk_fixed_infrastructure_data_report)

The query_bulk_fixed_infrastructure_data_report() method allows you retrieves data records of a previously created fixed infrastructure data (i.e., public-fixed-infrastructure-data:latest dataset) bulk report data in JSON format based on specified pagination, sorting, and including criteria. The id parameter is mandatory. Please learn more about query bulk fixed infrastructure data report in JSON format here and check its data caveats here and here.

bulk_fixed_infrastructure_data_report_result = (
    await gfw_client.bulk_downloads.query_bulk_fixed_infrastructure_data_report(
        id=bulk_reports_data[0].id
    )
)

Access Query Bulk Fixed Infrastructure Data Report Result as Pydantic models

bulk_fixed_infrastructure_data_report_data = (
    bulk_fixed_infrastructure_data_report_result.data()
)

bulk_fixed_infrastructure_data_report_item = bulk_fixed_infrastructure_data_report_data[
    -1
]

print((
    bulk_fixed_infrastructure_data_report_item.structure_id,
    bulk_fixed_infrastructure_data_report_item.lat,
    bulk_fixed_infrastructure_data_report_item.lon,
    bulk_fixed_infrastructure_data_report_item.label,
    bulk_fixed_infrastructure_data_report_item.label_confidence,
))

Output:

('1051638', -53.0895574340617, -67.32289149541135, 'oil', 'high')

Access Query Bulk Fixed Infrastructure Data Report Result as a DataFrame

bulk_fixed_infrastructure_data_report_result_df = (
    bulk_fixed_infrastructure_data_report_result.df()
)

print(bulk_fixed_infrastructure_data_report_result_df.info())
print(bulk_fixed_infrastructure_data_report_result_df.head())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1238 entries, 0 to 1237
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype
---  ------                --------------  -----
 0   detection_id          1237 non-null   object
 1   detection_date        1238 non-null   datetime64[ns]
 2   structure_id          1238 non-null   object
 3   lat                   1238 non-null   float64
 4   lon                   1238 non-null   float64
 5   structure_start_date  1238 non-null   datetime64[ns]
 6   structure_end_date    7 non-null      datetime64[ns]
 7   label                 1238 non-null   object
 8   label_confidence      1238 non-null   object
dtypes: datetime64[ns](3), float64(2), object(4)
memory usage: 87.2+ KB

Next Steps

Explore the Usage Guides and Workflow Guides for other API resources to understand how you can combine the reporting and statistical capabilities of the 4Wings API with vessel information, event data, and more. Check out the following resources: