Introducing Dynamic Schemas

JSON Schema

Song uses JSON Schema to describe the structure of metadata that will be stored for each analysis. Data is submitted to Song in JSON format and undergoes validation against the data model schema. This schema ensures the presence of required fields and validates the contents of each field, ensuring adherence to the desired data type and allowed values. This validation process preserves the integrity and quality of the metadata within Song.

Analysis Schemas

In Song, metadata is captured and submitted as analyses. An analysis represents a collection of one or more files and includes a complete metadata record describing those files.

When you submit an analysis to Song, you'll choose an 'analysis type' to dictate the data model used for validation. This is defined in your analysis file using the analysis type field.

The schema associated with the analysis type consists of two parts:

  1. A minimal, base schema containing the essential fields for all analyses, including basic patient data, submitter IDs, and file details.

  2. A flexible dynamic schema that the Song administrator can configure and upload to define specific analysis types.

These schema components ensure accurate and consistent metadata validation within Song.

The Song Base Schema

The base schema is a minimal data set needed for a schema. The base schema data includes basic non-identifiable primary keys of patient data, including:

  • Donor ID, Specimen ID, and Sample ID
  • Basic cancer sample descriptors

The base schema can be seen in the code block shown below:

"studyId": "EXAMPLE",
"analysisType": {
"name": "sequencing_experiment"
"samples": [
"submitterSampleId": "exammple-sample-id",
"matchedNormalSubmitterSampleId": null,
"sampleType": "Amplified DNA",
"specimen": {
"submitterSpecimenId": "exammple-specimen-id",
"specimenType": "Normal",
"tumourNormalDesignation": "Normal",
"specimenTissueSource": "Blood derived"
"donor": {
"submitterDonorId": "exammple-donor-id",
"gender": "Male"

The base schema and the allowed values for all fields are defined by the Song base meta-schema, which is referenced below.

Song base schema as JSON Schema
"name": "variant_calling_test",
"version": 1,
"createdAt": "2021-03-04T23:22:42.025146",
"schema": {
"$schema": "",
"id": "analysisPayload",
"type": "object",
"definitions": {
"common": {
"md5": {
"type": "string",
"pattern": "^[a-fA-F0-9]{32}$"
"submitterId": {
"type": "string",
"pattern": "^[A-Za-z0-9\\-\\._]{1,64}$"
"info": {
"type": "object"
"file": {
"fileType": {
"type": "string",
"enum": [
"fileData": {
"type": "object",
"required": [
"properties": {
"dataType": {
"type": "string"
"fileName": {
"type": "string",
"pattern": "^[A-Za-z0-9_\\.\\-\\[\\]\\(\\)]+$"
"fileSize": {
"type": "integer",
"min": 0
"fileAccess": {
"type": "string",
"enum": [
"fileType": {
"$ref": "#/definitions/file/fileType"
"fileMd5sum": {
"$ref": "#/definitions/common/md5"
"info": {
"$ref": "#/definitions/common/info"
"donor": {
"gender": {
"type": "string",
"enum": [
"donorData": {
"type": "object",
"required": [
"properties": {
"submitterDonorId": {
"$ref": "#/definitions/common/submitterId"
"gender": {
"$ref": "#/definitions/donor/gender"
"info": {
"$ref": "#/definitions/common/info"
"specimen": {
"specimenTissueSource": {
"type": "string",
"enum": [
"Blood derived",
"Blood derived - bone marrow",
"Blood derived - peripheral blood",
"Bone marrow",
"Buccal cell",
"Lymph node",
"Solid tissue",
"Cerebrospinal fluid",
"Pleural effusion",
"Mononuclear cells from bone marrow",
"Buffy coat",
"specimenType": {
"type": "string",
"enum": [
"Normal - tissue adjacent to primary tumour",
"Primary tumour",
"Primary tumour - adjacent to normal",
"Primary tumour - additional new primary",
"Recurrent tumour",
"Metastatic tumour",
"Metastatic tumour - metastasis local to lymph node",
"Metastatic tumour - metastasis to distant location",
"Metastatic tumour - additional metastatic",
"Xenograft - derived from primary tumour",
"Xenograft - derived from tumour cell line",
"Cell line - derived from xenograft tumour",
"Cell line - derived from tumour",
"Cell line - derived from normal"
"tumourNormalDesignation": {
"type": "string",
"enum": [
"specimenData": {
"type": "object",
"required": [
"properties": {
"submitterSpecimenId": {
"$ref": "#/definitions/common/submitterId"
"specimenTissueSource": {
"$ref": "#/definitions/specimen/specimenTissueSource"
"tumourNormalDesignation": {
"$ref": "#/definitions/specimen/tumourNormalDesignation"
"specimenType": {
"$ref": "#/definitions/specimen/specimenType"
"specimenClass": {
"not": {}
"info": {
"$ref": "#/definitions/common/info"
"analysisType": {
"type": "object",
"required": [
"properties": {
"name": {
"type": "string"
"version": {
"type": [
"sample": {
"sampleTypes": {
"type": "string",
"enum": [
"Total DNA",
"Amplified DNA",
"Other DNA enrichments",
"Total RNA",
"Ribo-Zero RNA",
"polyA+ RNA",
"Other RNA fractions"
"sampleData": {
"type": "object",
"required": [
"properties": {
"submitterSampleId": {
"$ref": "#/definitions/common/submitterId"
"sampleType": {
"$ref": "#/definitions/sample/sampleTypes"
"info": {
"$ref": "#/definitions/common/info"
"required": [
"properties": {
"analysisId": {
"not": {}
"studyId": {
"type": "string",
"minLength": 1
"analysisType": {
"allOf": [
"$ref": "#/definitions/analysisType"
"samples": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"allOf": [
"$ref": "#/definitions/sample/sampleData"
"required": [
"properties": {
"specimen": {
"$ref": "#/definitions/specimen/specimenData"
"donor": {
"$ref": "#/definitions/donor/donorData"
"if": {
"properties": {
"specimen": {
"properties": {
"tumourNormalDesignation": {
"const": "Tumour"
"then": {
"properties": {
"matchedNormalSubmitterSampleId": {
"$ref": "#/definitions/common/submitterId"
"required": [
"else": {
"properties": {
"matchedNormalSubmitterSampleId": {
"const": null
"required": [
"files": {
"type": "array",
"minItems": 1,
"items": {
"$ref": "#/definitions/file/fileData"