Song, Score and the services they depend on (Postgres, MinIO, and Kafka) will be set up next. These Overture services will manage, track and store all our data on the backend.
The file transfer service Score is compatible with any S3 storage provider; for simplicity, we will use the open-source object Storage provider MinIO for this setup.
docker run -d --name minio \-p 9000:9000 \-e MINIO_ACCESS_KEY=admin \-e MINIO_SECRET_KEY=admin123 \-v $./persistentStorage/data-minio:/data \minio/minio:RELEASE.2018-05-11T00-29-24Z \server /data
Create buckets in MinIO: Run the following command to create two buckets in the running MinIO server using the MinIO CLI - this will create an object bucket to store uploaded files and a state bucket for metadata files managed by Score.
docker run --name minio-client \--entrypoint /bin/sh \minio/mc -c \'/usr/bin/mc alias set myminio http://host.docker.internal:9000 admin admin123 && \/usr/bin/mc mb myminio/state && \/usr/bin/mc mb myminio/object && \/usr/bin/mc mb myminio/object/data && \echo "" > /tmp/heliograph && \/usr/bin/mc put /tmp/heliograph myminio/object/data/heliograph && \rm /tmp/heliograph && \exit 0;'
You should now be able to access the MinIO console from the browser at localhost:9000
-v $./persistentStorage/data-minio:/data
configures MinIO to store data in our local file system instead of in the docker container. Files you upload to MinIO will be stored at the path ./persistentStorage/data-minio
.alias set myminio http://host.docker.internal:9000 admin admin123
creates an alias
for the MinIO server, with an admin
user with a the password admin123
.mb myminio/state
creates a bucket named "state". The "state" bucket is designated for storing application state data. This could include metadata about the objects stored in the "object" bucket.mb myminio/object
creates another bucket named "object". The "object" bucket is intended for storing the actual content objects, such as VCFs, BAMs, etc. put
command seeds an empty' heliograph' file within the object storage data folder. Score uses this dummy file to test that the server can successfully communicate with the storage provider and that your client can successfully retrieve files from it, too.docker run --name song-db \-e POSTGRES_USER=admin \-e POSTGRES_PASSWORD=admin123 \-e POSTGRES_DB=songDb-v ./persistentStorage/data-song-db:/var/lib/postgresql/data \-d postgres:11.1
song-db
with the username admin
, password admin123
and a database within it called songDb
.-v ./persistentStorage/song-db-data:/var/lib/postgresql/data
/. This volume stores Songs Postgres data in our local filesystem instead of the docker container; in other words, the data contained in the Song database will be stored at the path ./persistentStorage/song-db-data:/var/lib/postgresql/data
.By setting up MinIO alongside PostgreSQL, we have created an environment capable of handling both relational and object data storage, simulating a backend infrastructure similar to what you'd find in a cloud-based application.
Run Kafka: Use the following command to pull and run the Kafka docker container
docker run -d --name kafka \--platform linux/amd64 \-p 9092:9092 -p 29092:29092 \-e KAFKA_PROCESS_ROLES="broker,controller" \-e KAFKA_NODE_ID=1 \-e KAFKA_LISTENERS="PLAINTEXT://kafka:9092,CONTROLLER://kafka:9093" \-e KAFKA_ADVERTISED_LISTENERS="PLAINTEXT://kafka:9092" \-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP="PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT" \-e KAFKA_INTER_BROKER_LISTENER_NAME="PLAINTEXT" \-e KAFKA_CONTROLLER_QUORUM_VOTERS="1@kafka:9093" \-e KAFKA_CONTROLLER_LISTENER_NAMES="CONTROLLER" \-e KAFKA_LOG_DIRS="/var/lib/kafka/data" \-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \-e KAFKA_AUTO_CREATE_TOPICS_ENABLE=false \-e KAFKA_NUM_PARTITIONS=1 \-e CLUSTER_ID="q1Sh-9_ISia_zwGINzRvyQ" \confluentinc/cp-kafka:7.6.1
Kafka serves as a distributed streaming platform, enabling high-throughput, fault-tolerant, and scalable messaging between Song and Maestro. Kafka acts as the backbone messaging system for Song and Maestro, facilitating asynchronous communication ensuring efficient and reliable job execution, queuing, and processing.
For more detailed information on Kafka configurations, please refer to the official Confluent Kafka documentation.
Song is our data cataloging system. It will provide submission validations and track and manage all our metadata and file data.
.env.song
with the following content:# ==============================# Song Environment Variables# ==============================# Spring Run ProfilesSPRING_PROFILES_ACTIVE=prod,secure,kafka# Flyway variablesSPRING_FLYWAY_ENABLED=true# Song VariablesID_USELOCAL=trueSCHEMAS_ENFORCELATEST=true# Score VariablesSCORE_URL=http://score:8087SCORE_ACCESSTOKEN=# Keycloak VariablesAUTH_SERVER_PROVIDER=keycloakAUTH_SERVER_CLIENTID=dmsAUTH_SERVER_CLIENTSECRET=t016kqXfI648ORoIP5gepqCzqtsRjlccAUTH_SERVER_TOKENNAME=apiKeyAUTH_SERVER_KEYCLOAK_HOST=http://keycloak:8080AUTH_SERVER_KEYCLOAK_REALM=myrealmAUTH_SERVER_SCOPE_STUDY_PREFIX=STUDY.AUTH_SERVER_SCOPE_STUDY_SUFFIX=.WRITEAUTH_SERVER_SCOPE_SYSTEM=song.WRITESPRING_SECURITY_OAUTH2_RESOURCESERVER_JWT_JWK_SET_URI=http://keycloak:8080/realms/myrealm/protocol/openid-connect/certsAUTH_SERVER_INTROSPECTIONURI=http://keycloak:8080/realms/myrealm/apikey/check_api_key/# Postgres VariablesSPRING_DATASOURCE_URL=jdbc:postgresql://song-db:5432/songDb?stringtype=unspecifiedSPRING_DATASOURCE_USERNAME=adminSPRING_DATASOURCE_PASSWORD=admin123# Kafka VariablesSPRING_KAFKA_BOOTSTRAPSERVERS=http://kafka:9092SPRING_KAFKA_TEMPLATE_DEFAULTTOPIC=song-analysis# Swagger VariableSWAGGER_ALTERNATEURL=/swagger-api
We will update our SCORE_ACCESSTOKEN
value after portal deployment. Once Stage, our portal UI is deployed, we can more easily generate an admin API key with the appropriate credentials
Profile | Description |
---|---|
prod | Optimized for production use with minimal initialization and direct database connections. |
secure | Focuses on security with OAuth2 JWTs for API protection. This profile calls for a public key location for JWT verification and an introspection URI for authenticating clients. |
kafka | Targets Kafka integration, specifying Kafka bootstrap servers and a default topic for message exchange. Does not include specific configurations for other services or security settings. |
SPRING_FLYWAY_ENABLED
variable enables the initialization of the Song database with a Flyway database migration, setting up the necessary tables for API interactions. This migration utilizes SQL scripts located within Song and found here. Without this setting, the database would remain uninitialized, leading to generic SQL errors (SQL Error: 0) with a SQLState of 42P01, corresponding to an undefined_table.ID_USELOCAL
mode indicates that Song will handle ID management internally, storing identifiers within its own system. Song can also be configured to use external ID management, for more information see our documentation for ID management in Song.SCHEMAS_ENFORCELATEST
to true
, the Song server will enforce that data conforms to the latest schema versions. Conversely, if set to false
, data can be submitted to any schema version specified with the metadata submission. For more information, see our documentation on Song Schema Management.SCORE_URL
specifies the future URL of the Score service (http://host.docker.internal:8087
).SCORE_ACCESSTOKEN
is used by Song for authorized communication with Score. For example, during data publication Song will need to call Score to check if object’s exists before publishing this access token, generated by Keycloak, encodes the permissions neccesary to communicate securly.AUTH_SERVER_PROVIDER
specifies the authentication server provider, in this case, Keycloak.AUTH_SERVER_CLIENTID
the client ID assigned to the application by Keycloak. This identifier is used by the application to authenticate itself to the Keycloak server.AUTH_SERVER_CLIENTSECRET
the client secret associated with the client ID. This secret is used by the application to prove its identity to the Keycloak server.AUTH_SERVER_TOKENNAME
: the name of the token issued by Keycloak. This token is used by the application to authenticate subsequent requests to protected resources.AUTH_SERVER_KEYCLOAK_HOST
the URL where the Keycloak server is hosted.AUTH_SERVER_KEYCLOAK_REALM
the realm in Keycloak that contains the users and roles. The realm encapsulates the grouping of applications and users configured to Keycloak for this application.AUTH_SERVER_SCOPE_STUDY_PREFIX
the prefix added to the scope claim in the token. Scopes define the level of access granted to the token holder. In this case, it indicates a specific type of access related to studies.AUTH_SERVER_SCOPE_STUDY_SUFFIX
the suffix added to the scope claim in the token, further defining the level of access. Here, it specifies write access to study-related resources.AUTH_SERVER_SCOPE_SYSTEM
is the scope for system-level permissions, indicating write access to system resources managed by the application.SPRING_SECURITY_OAUTH2_RESOURCESERVER_JWT_JWK_SET_URI
is the URL where the JSON Web Key Set (JWS) for the JWT tokens is located. This key set is used by the application to validate the signature of the JWT tokens issued by Keycloak.AUTH_SERVER_INTROSPECTIONURI
the URL used by the application to check the validity of a token against the Keycloak server. Introspection allows the application to verify that a token has not been revoked or expired.SPRING_DATASOURCE_URL
, SPRING_DATASOURCE_USERNAME
, SPRING_DATASOURCE_PASSWORD
are the connection details for the PostgreSQL database. The value for the SPRING_DATASOURCE_URL
needs to be appended with ?stringtype=unspecified
(Song as it is coded requires string type to be unspecified to interact with JSONb columns).SPRING_KAFKA_BOOTSTRAP-SERVERS
and SPRING_KAFKA_TEMPLATE_DEFAULT-TOPIC
specifies the bootstrap servers and default topics for message publishing.SWAGGER_ALTERNATEURL
specifies an custome URL for accessing the Swagger UI (/swagger-ui
)..env.song
file:docker run -d \--name song \--platform linux/amd64 \-p 8080:8080 \--env-file .env.song \ghcr.io/overture-stack/song-server:5.2.0
Once running you should be able to access the Song Swagger UI from http://localhost:8080/swagger-ui
Score is a fault tolerant multi-part parallel transfer service made to facilitate transfers of file data to and from object storage.
.env.score
with the following content:# ==============================# Score Environment Variables# ==============================# Spring VariablesSPRING_PROFILES_ACTIVE=default,collaboratory,prod,secure,jwtSERVER_PORT=8087# Song VariableMETADATA_URL=http://song:8080# Score VariablesSERVER_SSL_ENABLED="false"# Object Storage VariablesS3_ENDPOINT=http://host.docker.internal:9000S3_ACCESSKEY=adminS3_SECRETKEY=admin123S3_SIGV4ENABLED=trueS3_SECURED=falseOBJECT_SENTINEL=heliographBUCKET_NAME_OBJECT=objectBUCKET_NAME_STATE=stateUPLOAD_PARTSIZE=1073741824UPLOAD_CONNECTION_TIMEOUT=1200000# Keycloak VariablesAUTH_SERVER_PROVIDER=keycloakAUTH_SERVER_CLIENTID=dmsAUTH_SERVER_CLIENTSECRET=t016kqXfI648ORoIP5gepqCzqtsRjlccAUTH_SERVER_TOKENNAME=apiKeyAUTH_SERVER_KEYCLOAK_HOST=http://keycloak:8080AUTH_SERVER_KEYCLOAK_REALM=myrealmAUTH_SERVER_SCOPE_STUDY_PREFIX=STUDY.AUTH_SERVER_SCOPE_DOWNLOAD_SUFFIX=.READAUTH_SERVER_SCOPE_DOWNLOAD_SYSTEM=score.READAUTH_SERVER_SCOPE_UPLOAD_SYSTEM=score.WRITEAUTH_SERVER_SCOPE_UPLOAD_SUFFIX=.WRITEAUTH_SERVER_URL=http://keycloak:8080/realms/myrealm/apikey/check_api_key/AUTH_JWT_PUBLICKEYURL=http://keycloak:8080/oauth/token/public_keySPRING_SECURITY_OAUTH2_RESOURCESERVER_JWT_JWK_SET_URI=http://keycloak:8080/realms/myrealm/protocol/openid-connect/certs
Profile | Description |
---|---|
collaboratory | Configures the service for use with an S3 backend. |
prod | Optimizes the service for production use, enabling S3 security features and specifying the metadata server URL. |
secure | Implements OAuth authentication, specifying the authentication server URL, token name, client ID, client secret, and scopes for download and upload operations. |
JWT | The JWT (JSON Web Token) profile is used to configure the authentication method based on JWT. This profile includes settings to obtain the public key for token validation from an OAuth server. |
SERVER_PORT
and SERVER_SSL_ENABLED
specifies the port for the Score service (8087
) and disables SSL (false
), indicating HTTP communication. SSL is disabled to simplify deployment by avoiding the need to configure SSL certificates for HTTPS. This configuration should only be used in development environments and not in production.METADATA_URL
points to the url for our previously deployed song-server at http://song:8080
.S3_ENDPOINT
, S3_ACCESSKEY
, S3_SECRETKEY
, BUCKET_NAME_OBJECT
, BUCKET_NAME_STATE
defines access to object storage, including the endpoint (minio:9001
), access key (admin
), secret key (admin123
), bucket names for objects (object
) and state (state
).UPLOAD_PARTSIZE
specifies the maximum size of individual parts when uploading large files to an object storage service. Large files are typically split into smaller parts to facilitate parallel uploads and to manage network bandwidth efficiently. If network bandwidth is limited, smaller part sizes might be beneficial to keep the upload process moving quickly. On the other hand, if the application requires high throughput and can afford to wait longer for uploads to complete, larger part sizes might be preferable.UPLOAD_CONNECTION_TIMEOUT
This variable sets the timeout duration for establishing a connection to the object storage service during the upload process. It is measured in milliseconds (ms). Adjusting the connection timeout allows for fine-tuning the application's tolerance for network latency and variability.
Keycloak
), the Keycloak server's host (http://keycloak:8080
), and the realm (myrealm
) that contains the users and roles. This setup is crucial for securing applications by directing them to the correct Keycloak instance and realm for authentication and authorization processes.apiKey
), client ID (score
), and client secret (scoresecret
) used for authentication. These elements are essential for establishing a secure connection between the application and the Keycloak server, ensuring that only authorized applications can access protected resources.http://keycloak:8080/realms/myrealm/apikey/check_api_key/
) and the location of the JSON Web Key Set (JWS) for validating JWT tokens (http://keycloak:8080/realms/myrealm/protocol/openid-connect/certs
). These mechanisms ensure that tokens are valid and have not been tampered with, maintaining the security of the authentication process.--env-file
option:docker run -d \--name score \--platform linux/amd64 \-p 8087:8087 \--env-file .env.score \ghcr.io/overture-stack/score-server:5.11.0
Once running you should be able to access the Score Swagger UI from http://localhost:8087/swagger-ui