FHIR Bulk Data Export Operation
Exporting large FHIR datasets from an FHIR server using regular RESTful API can be a tricky task: each resource type requires a separate API call and, taking pagination into account, it can result in hundreds of requests. FHIR Bulk Data Export is an operation aimed at solving this issue.
The FHIR Bulk Export Service is intended to fulfill the 21st Century Cures bulk data export requirements. Bulk export operation allows to configure and invoke data export within one API call, whether that be data for all patients, data for a subset (defined group) of patients, or all FHIR data on the server. The FHIR data export process happens asynchronously to lower the load on system performance and the results are available for several days to download from media storage.
The FHIR Bulk Export service enables API consumers to export USCDI (United States Core Data for Interoperability) clinical data for all patients in a particular context.
This implementation is based on Bulk Data Access IG.
Process overview
- Client invokes Bulk Data Export process by sending Kick-Off request.
- Server validates the Kick-Off request and responds with a link to a job in the Content-Location header.
- Client checks job status by polling from the Content-Location header. If the export process is complete, the response will contain links to the generated files.
- Client downloads generated files using links, provided in the job.
A sequence diagram of the bulk export workflow provided by HL7 can be found here
Implementation notes
- Exported files are hosted in a protected AWS S3 compatible bucket in
.ndjson
format. - The URLs returned in the job completion response manifest are AWS S3 Self-Signed URLs. These URLs are valid for a period of 7 days after manifest retrieval.
- When polling for job status or canceling a job, FHIR client must have a valid auth token with the same client ID as the one used to initiate the export job.
- Group-level export targets Patient Compartment for resources required by USCDI v2. This means we export resources that are referenced by a resource within the patient compartment and excludes resources with no data available on the patient record. Additionally, the server provides Encounter, Location, Organization, and Practitioner resources as they are referenced as must support elements in required resources.
- For the FHIR Bulk Data Export JWT assertion-based authentication is required for the client regarding SMART Backend Services Protocol Details. For the supported authorization in the Kodjin FHIR Server API refer to the link.
Kick-Off request
The Bulk Data Export process can be invoked via GET request for smaller sets of query params, or via POST request, supplying parameters in the FHIR Parameters Resource, for larger ones.
Levels of export
There are three endpoints available to customize export for a particular case of use:
Name | Description | URL Syntax |
---|---|---|
System Level Export | Export data from a FHIR server, whether or not it is associated with a patient. | [base url]/$export |
All Patients | FHIR Operation to obtain a detailed set of FHIR resources of diverse resource types pertaining to all patients. | [base url]/Patient/$export |
Group of Patients | FHIR Operation to obtain a detailed set of FHIR resources of diverse resource types pertaining to all members of a specified Group. | [base url]/Group/[id]/$export |
Parameters
Query parameter | Type | Status | Description |
---|---|---|---|
_since |
FHIR instant | supported | Resources are included in the response if their state has changed after the supplied time (e.g. if Resource.meta.lastUpdated is later than the supplied _since time) |
_type |
string of comma-delimited FHIR resource types | supported | Response is filtered to only include resources of the specified resource types(s) |
_elements |
string of comma-delimited FHIR Elements | supported | Unlisted, non-mandatory elements are omitted from the resources returned. Elements should be of the form [resource type].[element name] (eg. Patient.id ) or [element name] (eg. id ) and only root elements in a resource are permitted. If the resource type is omitted, the element returned for all resources in the response where it is applicable. Mandatory elements are always returned whether they are requested or not. |
patient |
FHIR Reference | supported | Applied only for POST requests .Return resources in patient compartments belonging to patients from the list. |
_typeFilter |
string of comma delimited values. | supported | String of comma separated FHIR REST search queries. When provided, the server filter the data in the response to only include resources that meet the specified criteria. |
_outputFormat |
string | supported | Can be set to application/fhir+ndjson or application/ndjson or ndjson . |
Note that for Patient level Export
and Group level export
, patients are always exported regardless of search params, if they are referenced by resources, present in the response.
Headers
There are two required header parameters defined by the current $export specification:
Accept
-application/fhir+json
Prefer
-respond-async
Examples
The example below demonstrates how to configure a kick-off request to export only patients from a group who had a reaction to immunization. In this case, _typeFilter
contains a search immunization query with :missing=false
modifier.
Example - _typeFilter
parameter use
Try _typeFilter
export with our Kodjin FHIR Server Postman Collection
Some requests may contain a lot of filter parameters. In this case, it is convenient to use a POST request and supply filter parameters in the body. The example below demonstrates how to export resources filtered by practitioner ID. The request contains the _typeFilter parameter for each resource type.
Example - use of POST request
curl --location --request POST 'https://demo.kodjin.com/fhir/$export' \
--header 'content-type: application/json' \
--header 'prefer: respond-async' \
--data-raw '{"resourceType" : "Parameters",
"parameter" : [
{"name":"_since",
"valueInstant": "2022-01-01T00:00:00Z"},
{"name":"_type",
"valueString": "Observation, Condition, Procedure, Immunization"},
{"name":"_typeFilter",
"valueString": "Observation?performer=Practitioner/9bac339d-ac3b-4715-bf9a-1dab1dec7fa2"},
{"name":"_typeFilter",
"valueString": "Condition?asserter=Practitioner/9bac339d-ac3b-4715-bf9a-1dab1dec7fa2"},
{"name":"_typeFilter",
"valueString": "Procedure?performer=Practitioner/9bac339d-ac3b-4715-bf9a-1dab1dec7fa2"},
{"name":"_typeFilter",
"valueString": "Immunization?performer=Practitioner/9bac339d-ac3b-4715-bf9a-1dab1dec7fa2"}
]
}'
Try POST export with our Kodjin FHIR Server Postman Collection
In some cases, you will need only a short set of fields for analysis instead of the entire resource. The example below demonstrates how to export only condition and observation codes using an _elements parameter.
Example - Use of _elements
parameter
Try export specific element with our Kodjin FHIR Server Postman Collection
Status Request
When the Data Export process is invoked, it can take time for the server to generate all the files. A client can check the status of the job export by polling from the Content-Location
header, returned on the Kick-Off Request.
Response can be one of:
Status | Descriotion | Example |
---|---|---|
In-progress |
Returned by the server while it is processing the $export request. | Status: 202 Accepted |
Error |
Returned by the server if the export operation fails. | Status: 500 Internal Server Error |
Complete |
Returned by the server when the export operation has completed. | Status: 200 OK |
Example - Status request
Try status request with our Kodjin FHIR Server Postman Collection
Retrieving Data
When the server completes the files generation, the response contains links to the generated files that can now be downloaded by the client. Links are signed URLs. The files contain data in NDJSON format, with each resource type in a separate file.
Exported files stored in the AWS S3-compatible storage connected to Kodjin FHIR Server. The time while exported files are accessible for downloading defines in the fhir-server-search-export
service configuration. The default value is 10 days. See Kodjin Configuration page for time-to-live (TTL) configuration.
Example of a response
{
"output": [
{
"type": "Observation",
"url": "https://demo.kodjin.com/io/kodjin-export/5df2f390-2285-4f0d-8917-f1064fb6479a/Observation.ndjson?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20220805%2F%2Fs3%2Faws4_request&X-Amz-Date=20220805T085248Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=fe04d160c9405d322cf511bb2fb47f5c584ae8bdb2fba962db2b6a1c7c19125a"
},
{
"type": "Condition",
"url": "https://demo.kodjin.com/io/kodjin-export/5df2f390-2285-4f0d-8917-f1064fb6479a/Condition.ndjson?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20220805%2F%2Fs3%2Faws4_request&X-Amz-Date=20220805T085248Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=b0ed0b70defe60154574ca9fb7ea927efeaf8b6cefaf33132c936cce0977eca3"
},
{
"type": "Patient",
"url": "https://demo.kodjin.com/io/kodjin-export/5df2f390-2285-4f0d-8917-f1064fb6479a/Patient.ndjson?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20220805%2F%2Fs3%2Faws4_request&X-Amz-Date=20220805T085248Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=b3655b6c577f4f7c5860587d3379a6f2a42f2129636091005eb43c62c75acf87"
}
],
"request": "curl -X GET 'https://demo.kodjin.com/fhir/Group/3a457da3-b10e-48f9-b78e-467c396f8092/$export?_type=Observation,Condition&_since=2022-07-13T00:00:00Z&_typeFilter=Observation%3Fcode%3Dhttp://loinc.org|718-7,Condition%3Fcategory%3Dhttp://terminology.hl7.org/CodeSystem/condition-category|encounter-diagnosis' -H 'prefer:respond-async'",
"transactionTime": "2022-08-05T08:52:45.692Z"
}
Example of NDJSON file
{"id":"5c41cecf-cf81-434f-9da7-e24e5a99dbc2","name":[{"given":["Brenda"],"family":["Jackson"]}],"gender":"female","birthDate":"1956-10-14T00:00:00.000Z","resourceType":"Patient"}
{"id":"3fabcb98-0995-447d-a03f-314d202b32f4","name":[{"given":["Bram"],"family":["Sandeep"]}],"gender":"male","birthDate":"1994-11-01T00:00:00.000Z","resourceType":"Patient"}
{"id":"945e5c7f-504b-43bd-9562-a2ef82c244b2","name":[{"given":["Sandy"],"family":["Hamlin"]}],"gender":"female","birthDate":"1988-01-24T00:00:00.000Z","resourceType":"Patient"}
Delete Request
Bulk Data Export process can be stopped by sending a delete request to the URL, returned in the Kick-Off response Content-Location header.
Example - Delete Request
DELETE https://demo.kodjin.com/fhir/export/da4438a9-e1d7-4400-9d80-4ed23fbbccc3
Try to delete export with our Kodjin FHIR Server Postman Collection
Security and multi-tenancy in the export
All data exported during an export operation aligns with the scopes granted and provided in the access token for read interactions.
For example, if the client's scope is limited to patient/Observation.read and the Kodjin FHIR Server contains both Observations and Conditions, only Observations will be exported.
If the multi-tenancy option is enabled in the Kodjin FHIR Server, Kodjin will export data that belongs to the tenant provided in the token claim.