In order to create Genotype object we need to store genetic file in Platform Blob Storage (PBS). Every file stored in PBS is represented as an object with unique name and object_id.
Import genetic file from 23andMe
Genetic files downloaded from 23andMe are text based genetic files which contain lists of SNP (Single Nucleotide Polymorphism) variations organized into columns (rsid, chromosome, position and genotype).
Other providers (like AncestryDNA, MyHeritage or FamilyTreeDNA) generates similar text based, tabular data files and can be used in the same way as described in the rest of this guide.
There are two options how you can create Genotype object for a given genetic file:
- upload given genetic file using Files API
- upload given genetic file to GCS bucket (only available for tenants configured to use private GCS bucket as PBS)
For the rest of this guide, let's assume that genotype data is locally stored in a
text file called genotype_user1040.txt
. And let's assume that we want to
upload local file to genotypes/genotype_user1040.txt
PBS object:
Env vars:
LOCAL_PATH=genotype_user1040.txt
DEST_PATH=genotypes/genotype_user1040.txt
Upload using Files API
File upload is a two-step process:
- Fetch temporarily signed upload URL for an object in PBS
- Upload local genotype file to temporarily signed URL
Fetch temporarily signed upload URL
curl --request GET \
--url https://se1.lifenome.com/platform-api/api/core-api/files/upload/${DEST_PATH}/ \
--header "Authorization: Bearer ${ACCESS_TOKEN}" \
Successful response will return generated upload URL (UPLOAD_URL
) in the
upload_url
field, which can then be used in the second step, where genotype
file is actually uploaded:
curl --request PUT \
--url ${UPLOAD_URL} \
--data-binary @${LOCAL_PATH}
Upload URL is a temporarily signed URL, which gives time-limited access to a specific cloud storage resource.
At this stage our genotype file is stored as PBS object and information about the object
can be fetched by invoking
Get File Details
operation.
curl --request GET \
--url https://se1.lifenome.com/platform-api/api/core-api/files/${DEST_PATH}/ \
--header "Authorization: Bearer ${ACCESS_TOKEN}"
Response:
{
"name": "string",
"object_uri": "string",
"content_type": "string",
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z",
"size": 0,
"md5_hash": "string",
"crc32c": "string",
"etag": "string",
"metadata": null
}
The object_uri
property is globaly unique resource identifier that is used to link
PBS objects to other platform resources (e.g. to specify source_uri
property of Genotype object)
If we want to create Genotype object for SAMPLE_ID
and PBS object we need to invoke
Create Genotype
operation.
curl --request PUT \
--url https://se1.lifenome.com/platform-api/api/core-api/samples/${SAMPLE_ID}/genotype/ \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${ACCESS_TOKEN}" \
--data "{\"source_uri\": \"${OBJECT_URI}\"}"
Direct upload to GCS bucket
If your tenant is configured to use private bucket as PBS then you have complete control over underlying GCS bucket and can upload files as you like (GCS web console, gsutils CLI...).
Once the genotype files are uploaded to PBS they can be used in the same way as files
uploaded via Files API. You need to read object_uri
of genotype file and create genotype
object by invoking Create Genotype
operation.
Import VCF genetic files
In order to process VCF files in the most efficient way we require
that every uploaded VCF file is accompanied with corresponding tabix (TBI) file.
TBI file must be named same as VCF file with additional suffix .tbi
.
For the rest of this guide, let's assume that genotype data is locally stored in a
VCF file called genotype_user1041.vcf.gz
and that . And let's assume that we want to
upload local file to genotypes/genotype_user1041.vcf.gz
PBS object:
Env vars:
LOCAL_VCF_PATH=genotype_user1041.vcf.gz
DEST_VCF_PATH=genotypes/genotype_user1041.vcf.gz
LOCAL_TBI_PATH=genotype_user1041.vcf.gz.tbi
DEST_TBI_PATH=genotypes/genotype_user1041.vcf.gz.tbi
Note: DEST_TBI_PATH
is equal to DEST_VCF_PATH
with additional .tbi
suffix!
Upload using Files API
The process of importing VCF file is the same as the process of importing file from 23andMe except that you need to upload two files (vcf and tbi).
Upload VCF file.
curl --request GET \
--url https://se1.lifenome.com/platform-api/api/core-api/files/upload/${DEST_VCF_PATH}/ \
--header "Authorization: Bearer ${ACCESS_TOKEN}"
curl --request PUT \
--url ${UPLOAD_URL} \
--data-binary @${LOCAL_VCF_PATH}
Upload TBI file.
curl --request GET \
--url https://se1.lifenome.com/platform-api/api/core-api/files/upload/${DEST_TBI_PATH}/ \
--header "Authorization: Bearer ${ACCESS_TOKEN}"
curl --request PUT \
--url ${UPLOAD_URL} \
--data-binary @${LOCAL_TBI_PATH}
Get object_uri for uploaded VCF file:
curl --request GET \
--url https://se1.lifenome.com/platform-api/api/core-api/files/${DEST_VCF_PATH}/ \
--header "Authorization: Bearer ${ACCESS_TOKEN}"
The response will contain object_uri
property which is used to specify source_uri
for newly created Genotype:
curl --request PUT \
--url https://se1.lifenome.com/platform-api/api/core-api/samples/${SAMPLE_ID}/genotype/ \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${ACCESS_TOKEN}" \
--data "{\"source_uri\": \"${OBJECT_URI}\"}"
Direct upload to GCS bucket
If your tenant is configured to use private bucket as PBS then you have complete control over underlying GCS bucket and can upload files as you like (GCS web console, gsutils CLI...).
Once the VCF file and TABIX files are uploaded to PBS they can be used in the same way as files
uploaded via Files API. You need to read object_uri
of genotype file and create genotype
object by invoking Create Genotype
operation.