DeepBlue API - examples

These examples are ready to use - Copy and paste into your favorite python environment. (Python version 2.6 or higher is required.)

Searching for experiments

We use the search command to find experiments that contain the texts H3k27AC, blood, and peaks in their metadata.

We put the names in single quotes to show that these names must be in the metadata.

Listing experiments

We use the list_experiments command to list all experiments with the corresponding values in theirs metadata.

Accessing the extra-metadata

We use the info command to access an experiment's metadata and its extra-metadata fields.

Select epigenomic data

We use the select_experiments command to select all genomic regions from the two informed experiments.

We use the count_regions command with the query_id value returned by the select_experiments.

The count_regions command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.

Output with desired columns

We use the select_experiments command to select the genomic regions from the experiments that are in the chromosome 1, position 0 to 50,000,000.

We use the get_regions command with the query_id value returned by the select_experiments and the desired file columns. The columns @NAME and @BIOSOURCE include the experiment name and the experiment biosource in the row output.

The get_regions command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.

Filter epigenomic data by metadata

We use the list_samples command to obtain all samples from the biosource myeloid cell from the BLUEPRINT project. The list_samples returns a list of samples with their IDs and content.

We extract the IDs from this list and use it in the select_regions command.

The select_regions command selects the genomic regions that are in the chromosome 1, position 0 to 50,000 of all experiments that have the given samples IDs.

Then, we use the get_regions command with the parameters: query_id returned by the select_regions and the desired file columns. The columns @NAME, SAMPLE_ID, and @BIOSOURCE include the experiment name, the sample ID, and the experiment biosource in the row output.

The get_regions command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.

Filter epigenomic data by region attributes

We use the select_experiments command for selecting the genomic regions from the experiments that are in the chromosome 1, position 0 to 50,000,000.

We filter the genomic regions that have the value of the column SIGNAL_VALUE higher than 10.

We filter the genomic regions that have the value of the column PEAK higher than 1000.

Then, we use the get_regions command with the parameters: query_id returned by the select_experiments and the desired file columns. The columns @NAME and @BIOSOURCE include the experiment name and the experiment biosource in the row output.

The get_regions command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.

Find overlapping regions

We use the select_experiments command to select the genomic regions from the experiments that are in the chromosome 1, position 0 to 50,000,000.

We use the select_annotations command to select the genomic regions in the chromosome 1 of the annotation promoters of the genome assembly GRCh38.

The command intersection filters all regions of the query_id that overlap with at least one promoters_id region.

We use the get_regions command with the parameters: query_id returned by the select_experiments and the desired file columns. The columns @NAME and @BIOSOURCE include the experiment name and the experiment biosource in the row output.

The get_regions command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.

Retrieve DNA sequences

We use the select_experiments command to select the genomic regions from the experiments that are in the chromosome 1, position 0 to 50,000,000.

We filter the genomic regions that have the value of the column SIGNAL_VALUE higher than 10.

We filter the genomic regions that have the value of the column PEAK higher than 1000.

The meta-column @LENGTH contains the genomic region length, and we filter the genomic regions where this value is smaller than 2,000.

We use the get_regions with the query_id returned by the select_experiments and the desired file columns. In this case, we use the meta-column @SEQUENCE, that includes the DNA Sequence in the genomic region output.

The get_regions command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.

DNA motif matching operations

We use the find_motif command to find the locations where the motif TATAA happens in the chromosome one of the genome GRCh38.

We use the select_experiments command to select the genomic regions from the experiments that are in the chromosome 1, position 0 to 50,000,000.

The command intersection filters all regions of the query_id that overlap with at least one tataa_regions region.

We use the get_regions with the query_id returned by the select_experiments and the desired file columns. In this case, we use the meta-column @SEQUENCE, that includes the DNA Sequence in the genomic region output.

The get_regions command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.

Counting motifs in the regions

We use the select_experiments command to select an experiment of interest.

We use the @COUNT.MOTIF() meta-column to count how many times the given motif appear in the region.

We use the get_regions with the query_id returned by the select_experiments and the columns and meta-columns defined in the previous line.

The get_regions command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.

Genes

We use the select_genes command to select the gene RP11-34P13 from the gene set gencode v23.

The selected genes behave like a regular genomic region that, for example, can be filtered by their content.

We use the @GENE_ATTRIBUTE meta-column to select the genomic regions that are lincRNA.

We use the get_regions with the query_id returned by the select_experiments and the desired file columns.

The get_regions command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.

Aggregate and summarize regions

We use the get_regions command with the query_id value returned by the select_experiments and the desired file columns.

We use the select_annotations command to select the genomic regions, from position 0 to 50,000,000 in the chromosome 1 of the annotation CpG Islands of the genome assembly GRCh38.

The command aggregate aggregates the query_id regions using the cpg_islands regions as boundaries.

The aggregate values can be accessed through the @AGG meta-columns.

Tiling regions

We use the get_regions command with the query_id value returned by the select_experiments and the desired file columns.

We use the tiling_regions command to generate a set of consecutive genomic regions of size 100,000 on the chromosome 1 of the genome assembly GRCh38.

The command aggregate aggregates the query_id regions by their column named VALUE, using the cpg_islands regions as boundaries.

The aggregation values can be accessed through the @AGG meta-columns.

Flanking regions

We use the select_genes command to generate a set of genes from the gene set gencode v19.

The flank command obtains flanking regions based on the existing regions. First, we generate regions that starts 2500bp before the regions and with the length of 2000bp. After, we generate the regions that starts 1500 bases pair after the regions end and have 500 base pairs. We consider the regions strand in both cases.

The merge_queries command merges the region sets defined by the query IDs. We merge the two flanking regions sets with the genes' regions set.

We use the get_regions with the query_id that is returned by the merge_queries.

Calculated columns

We use the get_regions command with the query_id value returned by the select_experiments and the desired file columns.

We use the select_annotations command to select the genomic regions, from position 0 to 50,000,000 in the chromosome 1 of the annotation CpG Islands of the genome assembly GRCh38.

The command aggregate aggregates the query_id regions by their column named VALUE, using the cpg_islands regions as boundaries.

We select the aggregated regions that aggregated at least one region from the selected experiments (@AGG.COUNT > 0).

The aggregation values can be accessed through the @AGG meta-columns.

We use the get_regions with the query_id value returned by the select_experiments and the desired file columns. We use the @CALCULATED meta-column to transform the aggregate region @AGG.MEAN value to its log scale value.

Score matrix

The experiments list contains the names of the experiments fow which we want to build a score matrix.

We will build the score matrix using the column named VALUE.

We select the CpG islands, which will be used as aggregated regions boundaries.

The score_matrix command receives the dictionary with the experiments names and columns that will be used for aggregation, the regions' boundaries, and the operation that will be performed (min, max, mean, var, sd, median, count).

The score_matrix command is asynchronous. It means that the user receives a request_id and should use the info command to check the status of this request.

The processing is over when the request_status value is done or failed.

The request data is retrieved using the get_request_data command.