Skip to content

Bulk Data Export

Exporting large FHIR datasets from an FHIR server using regular RESTful API can be a tricky task: each resource type requires a separate API call and, taking pagination into account, it can result in hundreds of requests. Bulk Data Export is an operation aimed at solving this issue.

The FHIR Bulk Export Service is intended to fulfill the 21st Century Cures bulk data export requirements. Bulk export operation allows to configure and invoke data export within one API call, whether that be data for all patients, data for a subset (defined group) of patients, or all FHIR data on the server. The export process happens asynchronously to lower the load on system performance and the results are available for several days to download from media storage.

The FHIR Bulk Export service enables API consumers to export USCDI (United States Core Data for Interoperability) clinical data for all patients in a particular context.

This implementation is based on Bulk Data Access IG.

170.315 (b)(10) Electronic Health Information (EHI) Export

170.315 (b)(10) specifically addresses the Electronic Health Information (EHI) Export requirement. To comply with this requirement, Kodjin FHIR Server offers full support through its Bulk Data Export feature. Before using Bulk Data Export (BDE) to facilitate EHI Export for B.10, we recommend reviewing the technical documentation provided below for setting up BDE. For comprehensive information on meeting the B.10 regulation, please visit our dedicated 170.315 (b)(10) page.

Process overview

  1. Client invokes Bulk Data Export process by sending Kick-Off request.
  2. Server validates the Kick-Off request and responds with a link to a job in the Content-Location header.
  3. Client checks job status by polling from the Content-Location header. If the export process is complete, the response will contain links to the generated files.
  4. Client downloads generated files using links, provided in the job.

A sequence diagram of the bulk export workflow provided by HL7 can be found here

Implementation notes

  • Exported files are hosted in a protected AWS S3 compatible bucket in .ndjson format.
  • The URLs returned in the job completion response manifest are AWS S3 Self-Signed URLs. These URLs are valid for a period of 7 days after manifest retrieval.
  • When polling for job status or canceling a job, FHIR client must have a valid auth token with the same client ID as the one used to initiate the export job.
  • Group-level export targets Patient Compartment for resources required by USCDI v2. This means we export resources that are referenced by a resource within the patient compartment and excludes resources with no data available on the patient record. Additionally, the server provides Encounter, Location, Organization, and Practitioner resources as they are referenced as must support elements in required resources.
  • For the Bulk Export JWT assertion-based authentication is required for the client regarding SMART Backend Services Protocol Details. For the supported authorization in the Kodjin FHIR Server API refer to the link.

Kick-Off request

The Bulk Data Export process can be invoked via GET request for smaller sets of query params, or via POST request, supplying parameters in the FHIR Parameters Resource, for larger ones.

Levels of export

There are three endpoints available to customize export for a particular case of use:

Name Description URL Syntax
System Level Export Export data from a FHIR server, whether or not it is associated with a patient. [base url]/$export
All Patients FHIR Operation to obtain a detailed set of FHIR resources of diverse resource types pertaining to all patients. [base url]/Patient/$export
Group of Patients FHIR Operation to obtain a detailed set of FHIR resources of diverse resource types pertaining to all members of a specified Group. [base url]/Group/[id]/$export

Parameters

Query parameter Type Status Description
_since FHIR instant supported Resources are included in the response if their state has changed after the supplied time (e.g. if Resource.meta.lastUpdated is later than the supplied _since time)
_type string of comma-delimited FHIR resource types supported Response is filtered to only include resources of the specified resource types(s)
_elements string of comma-delimited FHIR Elements supported Unlisted, non-mandatory elements are omitted from the resources returned. Elements should be of the form [resource type].[element name] (eg. Patient.id) or [element name] (eg. id) and only root elements in a resource are permitted. If the resource type is omitted, the element returned for all resources in the response where it is applicable. Mandatory elements are always returned whether they are requested or not.
patient FHIR Reference supported Applied only for POST requests .Return resources in patient compartments belonging to patients from the list.
_typeFilter string of comma delimited values. supported String of comma separated FHIR REST search queries. When provided, the server filter the data in the response to only include resources that meet the specified criteria.
_outputFormat string supported Can be set to application/fhir+ndjson or application/ndjson or ndjson.

Note that for Patient level Export and Group level export, patients are always exported regardless of search params, if they are referenced by resources, present in the response.

Headers

There are two required header parameters defined by the current $export specification:

  • Accept - application/fhir+json
  • Prefer - respond-async

Examples

The example below demonstrates how to configure a kick-off request to export only patients from a group who had a reaction to immunization. In this case, _typeFilter contains a search immunization query with :missing=false modifier.

Example - _typeFilter parameter use

curl --location --request GET 'https://kodjin-example.edenlab.dev//fhir/Group/0a60d2a2-38ce-49f6-ac45-42347193af50/$export?_type=Immunization&_typeFilter=Immunization%3Freaction-date:missing%3Dfalse' \
--header 'content-type: application/json' \
--header 'prefer: respond-async' \
--data-raw ''

Some requests may contain a lot of filter parameters. In this case, it is convenient to use a POST request and supply filter parameters in the body. The example below demonstrates how to export resources filtered by practitioner ID. The request contains the _typeFilter parameter for each resource type.

Example - use of POST request

curl --location --request POST 'https://kodjin-example.edenlab.dev//fhir/$export' \
--header 'content-type: application/json' \
--header 'prefer: respond-async' \
--data-raw '{"resourceType" : "Parameters",
"parameter" : [
    {"name":"_since",
    "valueInstant": "2022-01-01T00:00:00Z"},
    {"name":"_type",
    "valueString": "Observation, Condition, Procedure, Immunization"},
    {"name":"_typeFilter",
    "valueString": "Observation?performer=Practitioner/9bac339d-ac3b-4715-bf9a-1dab1dec7fa2"},
    {"name":"_typeFilter",
    "valueString": "Condition?asserter=Practitioner/9bac339d-ac3b-4715-bf9a-1dab1dec7fa2"},
    {"name":"_typeFilter",
    "valueString": "Procedure?performer=Practitioner/9bac339d-ac3b-4715-bf9a-1dab1dec7fa2"},
    {"name":"_typeFilter",
    "valueString": "Immunization?performer=Practitioner/9bac339d-ac3b-4715-bf9a-1dab1dec7fa2"}
]
}'

In some cases, you will need only a short set of fields for analysis instead of the entire resource. The example below demonstrates how to export only condition and observation codes using an _elements parameter.

Example - Use of _elements parameter

curl --location --request GET 'https://kodjin-example.edenlab.dev//fhir/$export?_type=Observation,Condition&_since=2022-07-13T00:00:00Z&_elements=code' \
--header 'content-type: application/json' \
--header 'prefer: respond-async' \
--data-raw ''

Status Request

When the Data Export process is invoked, it can take time for the server to generate all the files. A client can check the status of the job export by polling from the Content-Location header, returned on the Kick-Off Request.

Response can be one of:

Status Descriotion Example
In-progress Returned by the server while it is processing the $export request.
Status: 202 Accepted
Error Returned by the server if the export operation fails.
Status: 500 Internal Server Error
Content-Type: application/json

{
"resourceType": "OperationOutcome",
"id":"1",
"issue": [ {
"severity": "error",
"code": "processing",
"details": {
"text": "An internal timeout has occurred"
}
} ]
}
Complete Returned by the server when the export operation has completed.
Status: 200 OK
Expires: Mon, 22 Jul 2022 23:59:59 GMT
Content-Type: application/json

{
"transactionTime": "2022-08-21T00:00:00Z",
"request": "https://example.com/fhir/Patient/$export?_type=Patient,Observation",
"output": [
{
"type": "Patient",
"url": "https://example.com/output/patient_file_1.ndjson"
},
{
"type": "Patient",
"url": "https://example.com/output/patient_file_2.ndjson"
},
{
"type": "Observation",
"url": "https://example.com/output/observation_file_1.ndjson"
}
],
"deleted": [
{
"type": "Bundle",
"url": "https://example.com/output/del_file_1.ndjson"
}
],
"error": [
{
"type": "OperationOutcome",
"url": "https://example.com/output/err_file_1.ndjson"
}
]
}

Example - Status request

GET https://kodjin-example.edenlab.dev/fhir/export/5df2f390-2285-4f0d-8917-f1064fb6479a

Retrieving Data

When the server completes the files generation, the response contains links to the generated files that can now be downloaded by the client. Links are signed URLs. The files contain data in NDJSON format, with each resource type in a separate file.

Example of a response

{
    "output": [
        {
            "type": "Observation",
            "url": "https://kodjin-example.edenlab.dev/io/kodjin-export/5df2f390-2285-4f0d-8917-f1064fb6479a/Observation.ndjson?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20220805%2F%2Fs3%2Faws4_request&X-Amz-Date=20220805T085248Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=fe04d160c9405d322cf511bb2fb47f5c584ae8bdb2fba962db2b6a1c7c19125a"
        },
        {
            "type": "Condition",
            "url": "https://kodjin-example.edenlab.dev/io/kodjin-export/5df2f390-2285-4f0d-8917-f1064fb6479a/Condition.ndjson?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20220805%2F%2Fs3%2Faws4_request&X-Amz-Date=20220805T085248Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=b0ed0b70defe60154574ca9fb7ea927efeaf8b6cefaf33132c936cce0977eca3"
        },
        {
            "type": "Patient",
            "url": "https://kodjin-example.edenlab.dev/io/kodjin-export/5df2f390-2285-4f0d-8917-f1064fb6479a/Patient.ndjson?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20220805%2F%2Fs3%2Faws4_request&X-Amz-Date=20220805T085248Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=b3655b6c577f4f7c5860587d3379a6f2a42f2129636091005eb43c62c75acf87"
        }
    ],
    "request": "curl -X GET 'https://kodjin-example.edenlab.dev/fhir/Group/3a457da3-b10e-48f9-b78e-467c396f8092/$export?_type=Observation,Condition&_since=2022-07-13T00:00:00Z&_typeFilter=Observation%3Fcode%3Dhttp://loinc.org|718-7,Condition%3Fcategory%3Dhttp://terminology.hl7.org/CodeSystem/condition-category|encounter-diagnosis' -H 'prefer:respond-async'",
    "transactionTime": "2022-08-05T08:52:45.692Z"
}

Example of NDJSON file

{"id":"5c41cecf-cf81-434f-9da7-e24e5a99dbc2","name":[{"given":["Brenda"],"family":["Jackson"]}],"gender":"female","birthDate":"1956-10-14T00:00:00.000Z","resourceType":"Patient"}
{"id":"3fabcb98-0995-447d-a03f-314d202b32f4","name":[{"given":["Bram"],"family":["Sandeep"]}],"gender":"male","birthDate":"1994-11-01T00:00:00.000Z","resourceType":"Patient"}
{"id":"945e5c7f-504b-43bd-9562-a2ef82c244b2","name":[{"given":["Sandy"],"family":["Hamlin"]}],"gender":"female","birthDate":"1988-01-24T00:00:00.000Z","resourceType":"Patient"}

Delete Request

Bulk Data Export process can be stopped by sending a delete request to the URL, returned in the Kick-Off response Content-Location header.

Example - Delete Request

DELETE https://kodjin-example.edenlab.dev/fhir/export/da4438a9-e1d7-4400-9d80-4ed23fbbccc3