Overture is a collection of open-source, extendable solutions for big-data genomic science that research institutions and individual researchers can use to support their work.
The rapid advancement and clinical potential of next-generation genomic sequencing has resulted in a proliferation of genomic data, as well as intense research efforts focused on analyzing genomes to diagnose patients and offer a personalized medicine experience. With this data boom, there are growing needs for:
The Overture software suite addresses these challenges, promoting FAIR (Findable, Accessible, Interoperable, Resuable) data sharing by overcoming the major bottlenecks in storing and distributing genome-scale datasets and providing an intuitive data portal for data browsing and querying. Overture is composed of highly customizable components that facilitate genome projects to stand up a data sharing portal with extendibility.
The development of Overture has been driven primarily by biomedical disciplines in genomics, bioinformatics and computational biology generating large amount of genomics data. However, recognizing the immense value and applicability of the platform to other big-data disciplines, the Overture team has now bundled the software suite into an easily-deployable, easily-configurable Data Management System (DMS) bundle. The DMS bundle provides turnkey installation, configuration, and deployment of the Overture software suite. By vastly lowering the techncal barrier in deploying Overture, we aim to increase adoption by users in other scientific domains and disciplines.
Illustrated above are the five core Overture components:
Component | Purpose |
---|---|
Score | Manages cloud-based data object storage and transfer. |
Song | Manages the metadata associated with the data objects. |
Maestro | Indexes the metadata in Song into Elasticsearch. |
Arranger | Generates an easily-configurable web portal interface with faceted search against the Elasticsearch index. |
Ego | Manages user identity for authentication, authorization and data security. |
The core components described above are currently available as individual containers where the software code and its dependencies are packaged together. However, the DMS platform simplifies their setup and removes technical barriers by bundling the core components together and making it easy for both large and small projects to install, configure, and deploy these services, ultimately standing up a data sharing portal at the end of the process.
All the core components work together to support an end-to-end data management lifecycle, as described below:
The first general availability release of the DMS bundle includes the following features.
To facilitate rapid, turnkey deployment of the Overture software suite, the DMS providers administrators with an easy-to-use, interactive configuration questionnaire. This script is executed via the command-line, walking the administrator step-by-step through all configuration inputs needed to stand up the platform. Recommended defaults and links to detailed documentation are provided along the way to further guide them through the process.
After all configuration inputs are supplied, they are saved to a file. The configuration file is then used to deploy the Overture services to a single node or cluster, using similarly easy-to-use commands.
All configurations are saved to a file and can be retrieved and viewed at any time via the command-line interface. This allows administrators to verify the configuration values were captured correctly before attemping to deploy to their cluster.
The current DMS release supports deployment to a single-cluster environment, using one of two available deployment modes:
Mode | Use Case | Access | Application Layer Security |
---|---|---|---|
Local | The purpose of local mode is to deploy and host the DMS only on a local machine's resources. For example, deploying to an individual user's laptop, or a private VM in the cloud. Local mode is typically used for solo users or small teams with shared access to a laptop or private VM. | Local host only | HTTP only |
Server | The purpose of server mode is to deploy and host the DMS system using resources available on separate or external infrastructure from your local machine. For example, deploying to a VM on a cloud infrastructure, or your organization's internal IT infrastructure, etc. The intention of server mode is to make the DMS system available to external users, by exposing them via a configured domain name and securely over HTTPS. | Externally via custom domain name | HTTPS over TLS/SSL |
A detailed documentation site is available to help administrators with the configuration and installation process, and to help end users understand the data sharing web portal.
While running the interactive configuration questionnaire, documentation links are also provided each step of the way, to further guide administrators through the process.
The DMS platform bundles together, or in other cases integrates with, robust, quality 3rd party software providers to support critical parts of the data management lifecycle or specific use cases.
Currently, these 3rd party software providers work in conjunction with the DMS:
Provider | Purpose |
---|---|
OAUTH 2.0 Providers | Secure authentication & authorization (e.g. Google, GitHub, LinkedIn, ORCiD) |
Object Storage Services | Secure object storage for data transfer (e.g. Amazon S3, MS Azure, OpenStack with Ceph, MinIo) |
Elasticsearch | Efficient, optimized data search & indexing capabilities |
Certbot | Security certificate generation for exposing services over HTTPS |