-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utility scripts for Deblur #119
base: master
Are you sure you want to change the base?
Conversation
help="Output OTU summary (.tsv)") | ||
|
||
def make_otu_summary(input_biom_fp, output_summary_fp): | ||
"""Summarize distribution information about each OTU (sequnece) in a Deblur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sequnece
-> sequence
Thanks! Fixed.
… On Dec 6, 2016, at 1:25 PM, Jamie Morton ***@***.***> wrote:
@mortonjt commented on this pull request.
In scripts/summarize_otu_distributions.py <#119 (review)>:
> +import pandas as pd
+import numpy as np
+import biom
+
***@***.***()
***@***.***('--input_biom_fp', '-i', required=True,
+ type=click.Path(resolve_path=True, readable=True, exists=True,
+ file_okay=True),
+ help="Input rarefied OTU table (.biom)")
***@***.***('--output_summary_fp', '-o', required=True,
+ type=click.Path(resolve_path=True, readable=True, exists=False,
+ file_okay=True),
+ help="Output OTU summary (.tsv)")
+
+def make_otu_summary(input_biom_fp, output_summary_fp):
+ """Summarize distribution information about each OTU (sequnece) in a Deblur
sequnece -> sequence
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#119 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFa_ZofG5OlU3RKuoHUOFfPAOGl_zbrbks5rFdLSgaJpZM4LF5X3>.
|
file_okay=True), | ||
help="Output OTU summary (.tsv)") | ||
|
||
def make_otu_summary(input_biom_fp, output_summary_fp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it make sense for this to be part of biom?
samples = table.ids(axis='sample') | ||
otus = table.ids(axis='observation') | ||
for idx, cdat in enumerate(table.iter_data(axis='observation')): | ||
otu_total_obs[otus[idx]] = np.sum(cdat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this forloop can be replaced with calls to Table.sum
Yeah, I was thinking that also. I think the only part that is specific to Deblur is calling the OTU identifier a "sequence" in the column header and maybe a few other places. That can easily be changed. Should I issue a PR to biom?
… On Dec 6, 2016, at 5:53 PM, Daniel McDonald ***@***.***> wrote:
@wasade commented on this pull request.
In scripts/summarize_otu_distributions.py:
> +
+ Input biom table must be rarefied for results to be meaningful."""
+
+ # Read OTU table (must be rarefied)
+ table = biom.load_table(input_biom_fp)
+ num_samples = len(table.ids(axis='sample'))
+
+ # Get arrays of sample IDs and OTUs (sequences), dicts per OTU of total
+ # observations, number of samples, list of samples, and taxonomy
+ otu_total_obs = {}
+ otu_num_samples = {}
+ otu_list_samples = {}
+ samples = table.ids(axis='sample')
+ otus = table.ids(axis='observation')
+ for idx, cdat in enumerate(table.iter_data(axis='observation')):
+ otu_total_obs[otus[idx]] = np.sum(cdat)
this forloop can be replaced with calls to Table.sum
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Sure. deblur could import later the object and revise the column name afterward |
I have deleted For |
I think 18S is working now, so we should be good. After positive filtering with Silva 18S, the top three 5' tetramers are GCTA, GCTC, and ACAC, which which are found in slightly >50% of OTUs. |
Going through old PRs. I think it would be fantastic to get this merged, but in my opinion the following few items would be great if possible:
|
These all sound reasonable. Could someone help me with the unit tests? I'm a noob. Daniel, maybe I can bribe you with beer?
… On Oct 16, 2017, at 6:14 PM, Daniel McDonald ***@***.***> wrote:
Going through old PRs. I think it would be fantastic to get this merged, but in my opinion the following few items would be great if possible:
shift the primary logic to library code, and wrap with unit tests
parameterize the amplicon to detect. It would be great if a few more tetramers were available
for the provided tetramers, would it be possible to have citable material (either publications or analysis) which provides support for reporting the amplicon type?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#119 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFa_ZoKMFsqKYtTx2fSX-CDRYYnuK1Akks5ss_99gaJpZM4LF5X3>.
|
Two scripts are included here:
verify_amplicon_type.py
-- uses the first 4 nt to guess which kind of amplicon the OTU sequences come from. Currently supports 16S, 18S, and ITS. While a given study should all start at the same place in the sequence, and this should also be true for the same primer set, it is helpful to be able to check the that all the studies in a meta-analysis start at the same 5' position. Additionally, one might have some sequences of unknown origin. This will help identify what they are without blast and such.summarize_otu_distributions.py
-- gives for each OTU in a biom table, the number, fraction, and rank of samples in which an OTU is found, and the abundance, fraction, and rank of observations represented by that OTU. It also provides the taxonomy and a list of all the samples the OTU is found in. Importantly, the script requests that the user feeds in a rarefied biom table. This code was developed for the 'OTU sequence lookup' effort, and is mostly a wrapper for some biom commands, but I think it should have general utility for Deblur users.