Working with genetic files

In order to create Genotype object we need to store genetic file in Platform Blob Storage (PBS). Every file stored in PBS is represented as an object with unique name and object_id.

Import genetic file from 23andMe

Genetic files downloaded from 23andMe are text based genetic files which contain lists of SNP (Single Nucleotide Polymorphism) variations organized into columns (rsid, chromosome, position and genotype).

Other providers (like AncestryDNA, MyHeritage or FamilyTreeDNA) generates similar text based, tabular data files and can be used in the same way as described in the rest of this guide.

There are two options how you can create Genotype object for a given genetic file:

upload given genetic file using Files API
upload given genetic file to GCS bucket (only available for tenants configured to use private GCS bucket as PBS)

For the rest of this guide, let’s assume that genotype data is locally stored in a text file called genotype_user1040.txt. And let’s assume that we want to upload local file to genotypes/genotype_user1040.txt PBS object:

Env vars:

LOCAL_PATH=genotype_user1040.txt
DEST_PATH=genotypes/genotype_user1040.txt

Upload using Files API

File upload is a two-step process:

Fetch temporarily signed upload URL for an object in PBS
Upload local genotype file to temporarily signed URL

Fetch temporarily signed upload URL


curl --request GET \
--url https://myportal.lifenome.com/api/core-api/files/upload/${DEST_PATH}/ \
--header "Authorization: Bearer ${ACCESS_TOKEN}" \

Successful response will return generated upload URL (UPLOAD_URL) in the upload_url field, which can then be used in the second step, where genotype file is actually uploaded:


curl --request PUT \
  --url ${UPLOAD_URL} \
  --data-binary @${LOCAL_PATH}

Upload URL is a temporarily signed URL, which gives time-limited access to a specific cloud storage resource.

At this stage our genotype file is stored as PBS object and information about the object can be fetched by invoking Get File Details operation.


curl --request GET \
  --url https://myportal.lifenome.com/api/core-api/files/file/${DEST_PATH}/ \
  --header "Authorization: Bearer ${ACCESS_TOKEN}"

Response:


{
  "name": "string",
  "object_uri": "string",
  "content_type": "string",
  "created_at": "2019-08-24T14:15:22Z",
  "updated_at": "2019-08-24T14:15:22Z",
  "size": 0,
  "md5_hash": "string",
  "crc32c": "string",
  "etag": "string",
  "metadata": null
}

The object_uri property is globaly unique resource identifier that is used to link PBS objects to other platform resources (e.g. to specify source_uri property of Genotype object)

If we want to create Genotype object for SAMPLE_ID and PBS object we need to invoke Create Genotype operation.


curl --request PUT \
  --url https://myportal.lifenome.com/api/core-api/samples/${SAMPLE_ID}/genotype/ \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer ${ACCESS_TOKEN}" \
  --data "{\"source_uri\": \"${OBJECT_URI}\"}"

Direct upload to GCS bucket

If your tenant is configured to use private bucket as PBS then you have complete control over underlying GCS bucket and can upload files as you like (GCS web console, gsutils CLI…).

Once the genotype files are uploaded to PBS they can be used in the same way as files uploaded via Files API. You need to read object_uri of genotype file and create genotype object by invoking Create Genotype operation.

Import VCF genetic files

In order to process VCF files in the most efficient way we require that every uploaded VCF file is accompanied with corresponding tabix (TBI) file. TBI file must be named same as VCF file with additional suffix .tbi.

For the rest of this guide, let’s assume that genotype data is locally stored in a VCF file called genotype_user1041.vcf.gz and that . And let’s assume that we want to upload local file to genotypes/genotype_user1041.vcf.gz PBS object:

Env vars:

LOCAL_VCF_PATH=genotype_user1041.vcf.gz
DEST_VCF_PATH=genotypes/genotype_user1041.vcf.gz
LOCAL_TBI_PATH=genotype_user1041.vcf.gz.tbi
DEST_TBI_PATH=genotypes/genotype_user1041.vcf.gz.tbi

Note: DEST_TBI_PATH is equal to DEST_VCF_PATH with additional .tbi suffix!

Upload using Files API

The process of importing VCF file is the same as the process of importing file from 23andMe except that you need to upload two files (vcf and tbi).

Upload VCF file.


curl --request GET \
  --url https://myportal.lifenome.com/api/core-api/files/upload/${DEST_VCF_PATH}/ \
  --header "Authorization: Bearer ${ACCESS_TOKEN}"


curl --request PUT \
  --url ${UPLOAD_URL} \
  --data-binary @${LOCAL_VCF_PATH}

Upload TBI file.


curl --request GET \
  --url https://myportal.lifenome.com/api/core-api/files/upload/${DEST_TBI_PATH}/ \
  --header "Authorization: Bearer ${ACCESS_TOKEN}"


curl --request PUT \
  --url ${UPLOAD_URL} \
  --data-binary @${LOCAL_TBI_PATH}

Get object_uri for uploaded VCF file:


curl --request GET \
  --url https://myportal.lifenome.com/api/core-api/files/${DEST_VCF_PATH}/ \
  --header "Authorization: Bearer ${ACCESS_TOKEN}"

The response will contain object_uri property which is used to specify source_uri for newly created Genotype:


curl --request PUT \
  --url https://myportal.lifenome.com/api/core-api/samples/${SAMPLE_ID}/genotype/ \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer ${ACCESS_TOKEN}" \
  --data "{\"source_uri\": \"${OBJECT_URI}\"}"

Direct upload to GCS bucket

If your tenant is configured to use private bucket as PBS then you have complete control over underlying GCS bucket and can upload files as you like (GCS web console, gsutils CLI…).

Once the VCF file and TABIX files are uploaded to PBS they can be used in the same way as files uploaded via Files API. You need to read object_uri of genotype file and create genotype object by invoking Create Genotype operation.