Bulk Download API¶
This guide provides detailed instructions on how to use the gfw-api-python-client to access the Bulk Download API, which is designed to support workflows that require bulk access to data, including integration with platforms and tools used by data engineers and researchers. Here is a Jupyter Notebook version of this guide with more usage examples.
Note: See the Datasets, Data Caveats, SAR (Synthetic-Aperture Radar) Data Caveats, and Terms of Use pages in the GFW API documentation for details on GFW data, API licenses, and rate limits.
Prerequisites¶
Before using the
gfw-api-python-client, ensure it is installed (see the Getting Started guide) and that you have obtained an API access token from the Global Fishing Watch API portal.
Getting Started¶
To interact with the Bulk Download endpoints, you first need to instantiate the gfw.Client and then access the bulk_downloads resource:
import time
import os
import gfwapiclient as gfw
access_token = os.environ.get(
"GFW_API_ACCESS_TOKEN",
"<OR_PASTE_YOUR_GFW_API_ACCESS_TOKEN_HERE>",
)
gfw_client = gfw.Client(
access_token=access_token,
)
The gfw_client.bulk_downloads object provides methods to:
Create bulk reports based on specific filters and spatial parameters.
Monitor previously created bulk report generation status.
Get signed URL to download previously created bulk report data, metadata and region geometry (in GeoJSON format) files.
Query previously created bulk report data records in JSON format.
These methods return a result object, which offers convenient ways to access the data as Pydantic models using .data() or as pandas DataFrames using .df().
Tip: Use IPython or Python 3.11+ with
python -m asyncioto rungfw-api-python-clientcode interactively, as these environments support executingasync/awaitexpressions directly in the console.
Create a Bulk Report (create_bulk_report)¶
The create_bulk_report() method allows you create a bulk report based on specified filters and spatial parameters. The name parameter is mandatory. Please learn more about create a bulk report here and check its data caveats here and here.
timestamp = int(time.time() * 1000)
dataset = "public-fixed-infrastructure-data:latest"
region_dataset = "public-eez-areas"
region_id = "8466" # Argentinian Exclusive Economic Zone
name = f"{dataset.split(':')[0]}_{region_dataset}__{region_id}_{timestamp}"
create_bulk_report_result = await gfw_client.bulk_downloads.create_bulk_report(
name=name,
dataset=dataset,
region={
"dataset": region_dataset,
"id": region_id,
},
filters=["label = 'oil'", "label_confidence = 'high'"],
)
Access Create a Bulk Report Result as Pydantic models¶
create_bulk_report_data = create_bulk_report_result.data()
print((
create_bulk_report_data.id,
create_bulk_report_data.name,
create_bulk_report_data.status,
create_bulk_report_data.created_at,
))
Output:
('c5e32895-4374-41d2-8b2e-ac414ed6757f',
'public-fixed-infrastructure-data_public-eez-areas__8466_1768085547174',
'pending',
datetime.datetime(2026, 1, 10, 22, 52, 30, 9000, tzinfo=TzInfo(0)))
Access Create a Bulk Report Result as a DataFrame¶
create_bulk_report_df = create_bulk_report_result.df()
print(create_bulk_report_df.info())
print(create_bulk_report_df.head())
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 1 non-null object
1 name 1 non-null object
2 file_path 1 non-null object
3 format 1 non-null object
4 filters 1 non-null object
5 geom 1 non-null object
6 status 1 non-null object
7 owner_id 1 non-null int64
8 owner_type 1 non-null object
9 created_at 1 non-null datetime64[ns, UTC]
10 updated_at 1 non-null datetime64[ns, UTC]
11 file_size 0 non-null object
dtypes: datetime64[ns, UTC](2), int64(1), object(9)
memory usage: 228.0+ bytes
Get Bulk Report by ID (get_bulk_report_by_id)¶
The get_bulk_report_by_id() method allows you retrieves metadata and status of the previously created bulk report based on the provided bulk report ID. The id parameter is mandatory. Please learn more about get bulk report by id report here and check its data caveats here and here.
Important: We recommend to use this method to poll the status of previously created bulk report, if it takes several minutes or hours to generate until it status is
doneorfailed.
bulk_report_result = await gfw_client.bulk_downloads.get_bulk_report_by_id(
id=create_bulk_report_data.id
)
Access Get Bulk Report by ID Result as Pydantic models¶
bulk_report_data = bulk_report_result.data()
print((
create_bulk_report_data.id,
create_bulk_report_data.name,
create_bulk_report_data.status,
create_bulk_report_data.created_at,
))
Output:
('c5e32895-4374-41d2-8b2e-ac414ed6757f',
'public-fixed-infrastructure-data_public-eez-areas__8466_1768085547174',
'pending',
datetime.datetime(2026, 1, 10, 22, 52, 30, 9000, tzinfo=TzInfo(0)))
Access Get Bulk Report by ID Result as a DataFrame¶
bulk_report_df = bulk_report_result.df()
print(bulk_report_df.info())
print(bulk_report_df.head())
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 1 non-null object
1 name 1 non-null object
2 file_path 1 non-null object
3 format 1 non-null object
4 filters 1 non-null object
5 geom 1 non-null object
6 status 1 non-null object
7 owner_id 1 non-null int64
8 owner_type 1 non-null object
9 created_at 1 non-null datetime64[ns, UTC]
10 updated_at 1 non-null datetime64[ns, UTC]
11 file_size 0 non-null object
dtypes: datetime64[ns, UTC](2), int64(1), object(9)
memory usage: 228.0+ bytes
Get All Bulk Reports Created by User or Application (get_all_bulk_reports)¶
The get_all_bulk_reports() method allows you retrieves a list of metadata and status of the previously created bulk reports based on specified pagination, sorting, and filtering criteria. Please learn more about get all bulk reports created by user or application here and check its data caveats here and here.
bulk_reports_result = await gfw_client.bulk_downloads.get_all_bulk_reports(
status="done",
)
Access All Created Bulk Reports Result as Pydantic models¶
bulk_reports_data = bulk_reports_result.data()
bulk_report_item = bulk_reports_data[-1]
print((
bulk_report_item.id,
bulk_report_item.name,
bulk_report_item.status,
bulk_report_item.created_at,
))
Output:
('0c0cada1-72dd-4fb0-bdf6-7fe8c7fdb1e3',
'sar-fixed-infrastructure-data-20241207-region-1',
'done',
datetime.datetime(2025, 12, 7, 10, 3, 12, 371000, tzinfo=TzInfo(0)))
Access All Created Bulk Reports Result as a DataFrame¶
bulk_reports_df = bulk_reports_result.df()
print(bulk_reports_df.info())
print(bulk_reports_df.head())
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 7 non-null object
1 name 7 non-null object
2 file_path 7 non-null object
3 format 7 non-null object
4 filters 7 non-null object
5 geom 4 non-null object
6 status 7 non-null object
7 owner_id 7 non-null int64
8 owner_type 7 non-null object
9 created_at 7 non-null datetime64[ns, UTC]
10 updated_at 7 non-null datetime64[ns, UTC]
11 file_size 7 non-null float64
dtypes: datetime64[ns, UTC](2), float64(1), int64(1), object(8)
memory usage: 804.0+ bytes
Get Bulk Report File Download URL (get_bulk_report_file_download_url)¶
The get_bulk_report_file_download_url() method allows you retrieves signed URL that points to a downloadable file hosted on Global Fishing Watch’s cloud infrastructure to download file(s) (i.e., "DATA", "README", or "GEOM") of the previously created bulk report. The id parameter is mandatory. Please learn more about get bulk report file download url here and check its data caveats here and here.
bulk_report_file_download_url_result = (
await gfw_client.bulk_downloads.get_bulk_report_file_download_url(
id=bulk_reports_data[0].id, file="DATA"
)
)
Access Get Bulk Report File Download URL Result as Pydantic models¶
bulk_report_file_download_url_data = bulk_report_file_download_url_result.data()
print(bulk_report_file_download_url_data.url)
Output:
'https://storage.googleapis.com/gfw-api-bulk-pro-us-central1/705f2f9a-f695-43f1-a4bf-7746f3deb091/data.json.gz?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=api-bulk-pro%40gfw-production.iam.gserviceaccount.com%2F20260110%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20260110T225232Z&X-Goog-Expires=60&X-Goog-SignedHeaders=host&X-Goog-Signature=481a4ff7244b7286f303b37bb7941c291a26d1e3502debdb7611b8cb2d5edf37bc7aa0287b15a11c2f69f72e88791da3f76873a2fd7d08f911691c35ee8e095b825615510de8256f8cd275211997141e026837e118d86e01c026c457dc1f47d43ff2cb07131c3d21e7908c847bf1e3d87cd4773f02e8e4512a7c15e93799de186b9ea004be50cd3e53292f01e9393595a81c42cc3686f65d280f4f16076759da4722c17c2a6a698393c919cdd083402421a1bbf425b618244b3a9b30e48b770a9dc7f9eed8e63af04f8e31f0b6723fdf76fa7262ded89e7a375fbaea3b031bf29db22b1961878facd79c92d633ab6aa2309c0ce3982104d9835058ecd829bee8'
Access Get Bulk Report File Download URL Result as a DataFrame¶
bulk_report_file_download_url_df = bulk_report_file_download_url_result.df()
print(bulk_report_file_download_url_df.info())
print(bulk_report_file_download_url_df.iloc[0]["url"])
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 url 1 non-null object
dtypes: object(1)
memory usage: 140.0+ bytes
Query Bulk Fixed Infrastructure Data Report (query_bulk_fixed_infrastructure_data_report)¶
The query_bulk_fixed_infrastructure_data_report() method allows you retrieves data records of a previously created fixed infrastructure data (i.e., public-fixed-infrastructure-data:latest dataset) bulk report data in JSON format based on specified pagination, sorting, and including criteria. The id parameter is mandatory. Please learn more about query bulk fixed infrastructure data report in JSON format here and check its data caveats here and here.
bulk_fixed_infrastructure_data_report_result = (
await gfw_client.bulk_downloads.query_bulk_fixed_infrastructure_data_report(
id=bulk_reports_data[0].id
)
)
Access Query Bulk Fixed Infrastructure Data Report Result as Pydantic models¶
bulk_fixed_infrastructure_data_report_data = (
bulk_fixed_infrastructure_data_report_result.data()
)
bulk_fixed_infrastructure_data_report_item = bulk_fixed_infrastructure_data_report_data[
-1
]
print((
bulk_fixed_infrastructure_data_report_item.structure_id,
bulk_fixed_infrastructure_data_report_item.lat,
bulk_fixed_infrastructure_data_report_item.lon,
bulk_fixed_infrastructure_data_report_item.label,
bulk_fixed_infrastructure_data_report_item.label_confidence,
))
Output:
('1051638', -53.0895574340617, -67.32289149541135, 'oil', 'high')
Access Query Bulk Fixed Infrastructure Data Report Result as a DataFrame¶
bulk_fixed_infrastructure_data_report_result_df = (
bulk_fixed_infrastructure_data_report_result.df()
)
print(bulk_fixed_infrastructure_data_report_result_df.info())
print(bulk_fixed_infrastructure_data_report_result_df.head())
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1238 entries, 0 to 1237
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 detection_id 1237 non-null object
1 detection_date 1238 non-null datetime64[ns]
2 structure_id 1238 non-null object
3 lat 1238 non-null float64
4 lon 1238 non-null float64
5 structure_start_date 1238 non-null datetime64[ns]
6 structure_end_date 7 non-null datetime64[ns]
7 label 1238 non-null object
8 label_confidence 1238 non-null object
dtypes: datetime64[ns](3), float64(2), object(4)
memory usage: 87.2+ KB
Next Steps¶
Explore the Usage Guides and Workflow Guides for other API resources to understand how you can combine the reporting and statistical capabilities of the 4Wings API with vessel information, event data, and more. Check out the following resources: