Releases: monarch-initiative/ontogpt
v0.3.8
Highlights
Try out ontogpt list-templates
to see all the available extraction templates, including the new data_sheets_schema and updated clinical dietitian notes schema.
What's Changed
- Updates for dietitian notes extraction by @caufieldjh in #313
- Add data_sheets_schema template by @caufieldjh in #310
- Add further context to dietitian notes schema by @caufieldjh in #319
- Add list-templates command by @caufieldjh in #326
- Fix for errors on OWL write by @caufieldjh in #327
- Update prefixes defined in schemas by @caufieldjh in #328
- Version changes for v0.3.8 by @caufieldjh in #329
Full Changelog: v0.3.7...v0.3.8
v0.3.7
What's Changed
- Documentation updates by @caufieldjh in #302
- First draft of dietician notes extraction schema by @caufieldjh in #288
- Dentin dysplasia by @sierra-moxon in #187
- Testing figures template. by @balhoff in #171
- Start on kidney template by @justaddcoffee in #150
- Try extraction to nmdc-schema by @caufieldjh in #148
- Rebuild all templates by @caufieldjh in #303
- Enable running extractions on all files in a directory by @caufieldjh in #304
- Remove TALISMAN functions by @caufieldjh in #309
- Restructure SPIRESEngine to accomodate abstract model by @caufieldjh in #306
New Contributors
Full Changelog: v0.3.6...v0.3.7
v0.3.6
What's Changed
- Make introspection pydantic v2 compatible, fixes #294 by @cmungall in #295
- Add three simple extraction templates by @caufieldjh in #296
- Create draft template for extracting metadata to Datasheets by @caufieldjh in #231
- Use correct pydantic-version parameter with gen-pydantic by @caufieldjh in #299
- Dependency updates; bump version to 0.3.6 by @caufieldjh in #300
Full Changelog: v0.3.5...v0.3.6
v0.3.5
Minor updates plus new templates for Gene Ontology term extraction (go_terms
and go_terms_relational
).
What's Changed
- Add section to docs on starting with OWL by @caufieldjh in #274
- Expand docs re: OWL templating by @caufieldjh in #286
- Added links to a tutorial and a presentation about OntoGPT and CurateGPT by @nlharris in #285
- Proposing main OntoGPT logo by @caufieldjh in #275
- Add GO terms extraction templates by @caufieldjh in #292
- Bump version and update lockfile by @caufieldjh in #293
Full Changelog: v0.3.4...v0.3.5
v0.3.4
This release includes a variety of bugfixes and new options for running evaluations.
What's Changed
- Bump project version to 0.3.3 by @caufieldjh in #218
- Name params in call to complete by @caufieldjh in #225
- Fixes for the BC5CDR evaluation by @caufieldjh in #226
- Catch requests errors when call to NodeNormalizer endpoint fails by @caufieldjh in #234
- Reduce redundancy in named entity list in evals by @caufieldjh in #236
- Update documentation by @caufieldjh in #220
- Repair clinical note generation and basic completion in CLI by @caufieldjh in #239
- Parameter mis-usage fixes for CLI, OpenAI client, SPIRES engine, and gene set utils by @caufieldjh in #241
- Add medical action extraction template by @caufieldjh in #208
- Bugfixes for Wikipedia extract and search commands by @caufieldjh in #242
- Add a template specifically for extracting human phenotypes by @caufieldjh in #214
- Add parameter to disable chunking in SPIRES evals by @caufieldjh in #245
- fix typo by @andrewsu in #247
- Add
model
parameter to evaluation engine by @caufieldjh in #249 - Fix #250 by @caufieldjh in #251
- Add NER eval for BC5CDR by @caufieldjh in #252
- Fix #253 by @caufieldjh in #254
- Add
gpt-4-1106-preview
by @caufieldjh in #258 - Add two more items to troubleshooting doc by @caufieldjh in #263
- Add MAXO annotation evaluation by @caufieldjh in #259
- Update dependencies for OpenAI API client and Pydantic compatibility and GPT4All by @caufieldjh in #271
New Contributors
Full Changelog: v0.3.3...v0.3.4
v0.3.3
This version includes a bugfix, an updated OpenAI model list, and more clearly delineated documents in YAML output. See more details below!
What's Changed
- Replace deprecated OpenAI models by @caufieldjh in #210
- Tidy up the CLI's
write_extraction
function by @caufieldjh in #213 - Fix for #215 by @caufieldjh in #216
Full Changelog: v0.3.2...v0.3.3
v0.3.2
This release primarily concerns bugfixes. Thanks to all users who have provided feedback!
What's Changed
- Add option to show prompt by @caufieldjh in #183
- Quick fix for type mismatch in embed command by @caufieldjh in #185
- Updates for pydantic 2 compatibility by @caufieldjh in #189
- Repairs for incorrect model specification by @caufieldjh in #192
- Add SPIRES logo by @caufieldjh in #196
- Improve intro based on more up-to-date main/docs/index.md by @nlharris in #194
- Fixed bug where named arguments did not match up in recurse function. by @cmungall in #198
- adding more prompts and example text to improve results by @diatomsRcool in #116
- Address errors in using gene requests cache by @caufieldjh in #203
- Make kgx tsv by @hrshdhgd in #149
- Fix #204 and the remainder of the HPOA evaluation by @caufieldjh in #207
- Run mypy through tox and address type check errors by @caufieldjh in #202
New Contributors
Full Changelog: v0.3.1...v0.3.2
v0.3.1
Highlights
Access to open models through the llm
package
llm provides easy access to LLMs from OpenAI and beyond, including the GPT4All set of open models.
You may now specify one of these models by using the -m
or --model
option with most commands.
When calling a model for the first time, llm
will download a local copy.
Example:
ontogpt extract -t mendelian_disease.MendelianDisease -i tests/input/cases/mendelian-disease-cmt2e.txt -m nous-hermes-13b
Or extract from PubMed abstracts:
ontogpt pubmed-annotate -t drug "propranolol mode of action" --model nous-hermes-13b --limit 5
Or generate clinical case report text:
ontogpt clinical-notes -d "patient with chronic muscle pain and hypoplastic toenails" --sections "Past Medical History" -m nous-hermes-13b
See the full list of model options with
ontogpt list-models
Updated dependency requirements
OntoGPT should now be compatible with Pydantic versions less than, equal to, or greater than 2. Many of these changes happened upstream within the broader LinkML ecosystem.
What's Changed
- Implement llm api by @caufieldjh in #167
- Update documentation by @caufieldjh in #170
- Fix #174 - add missing CLI parameters by @caufieldjh in #175
- A fix for encountering missing PubmedData fields by @caufieldjh in #178
- Dependency updates for llm (and pydantic compatibility, in particular) by @caufieldjh in #182
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Highlights
Generate-and-Extract Command
This release adds a new command generate-extract
that composes two operations.
- generate a natural language description
- parse the NL description using SPIRES
Cell Type Use Case
(This use case based on a conversation with @dosumis)
For example, given a cell type such as Acinar Cell Of Salivary Gland
, generate a description using GPT describing many aspects of the cell type, from it's marker genes through to its function and diseases it is implicated in.
After that use the cell-type schema (https://w3id.org/ontogpt/cell_type) to extract this into structured form. As an optional next step use linkml-owl to generate OWL TBox axioms
Iterative generate-extract
The command can be executed in iterative mode - this will traverse the extracted subtypes with each iteration, gradually building up an ontology that is entirely generated from the "latent knowledge" in the LLM
Here is a screenshot of an ontology generated entirely using OntoGPT by traversing from "Interneuron" downwards:
There are many oddities about it, currently each iteration is independent so it has no way of knowing if it is has already made a concept, but an interesting proof of principle. The ugly pct-encoded labels indicate cases where it couldn't match to an existing concept in CL or other ontology, and may represent KB gaps to be filled
More thoughts here: cell type summaries
What's Changed
- Playing around: adding a phenotype extractor by @matentzn in #14
- add unit test to makefile by @cmungall in #16
- Linted and minor flake8 edits by @hrshdhgd in #15
- Add linter to workflow by @hrshdhgd in #17
- Improve dependencies, add a web optional by @vemonet in #21
- add recipe for test by @sierra-moxon in #23
- added pad krapow recipe by @justaddcoffee in #25
- Add recipe URL by @pkalita-lbl in #24
- Add Walforf Salad URL by @caufieldjh in #26
- Add rajma pulao to recipe-urls.csv by @turbomam in #27
- Adding gene set enrichment by @cmungall in #30
- enrichment by @cmungall in #31
- README updates; add project.Makefile by @caufieldjh in #32
- allow use of different models, entailing different API endpoints. extending enrichment comparison. by @cmungall in #34
- Add CITATION and version updater by @caufieldjh in #35
- Ingest and extract things from literature about inflammatory bowel disease by @justaddcoffee in #36
- eval enrich by @cmungall in #37
- Create dental-restoration-material-composite-polymer-1.txt by @wdduncan in #53
- Create dental-restoration-material-composite-resin-1.txt by @wdduncan in #52
- Create dental-restoration-material-ceramic-composite-1.txt by @wdduncan in #51
- Create dental-restoration-material-ceramic-composite-resin-1.txt by @wdduncan in #50
- Create dental-restoration-material-ceramic-composite-polymer-2.txt by @wdduncan in #49
- Create dental-restoration-material-ceramic-composite-polymer-1.txt by @wdduncan in #48
- Create dental-restoration-material-ceramic-composite-polymer-resin-2.txt by @wdduncan in #47
- Create dental-restoration-material-ceramic-composite-polymer-resin-1.txt by @wdduncan in #46
- Create dental-restoration-material-composite-2.txt by @wdduncan in #45
- Create dental-restoration-material-composite-1.txt by @wdduncan in #44
- Create dental-restoration-material-polymer-1.txt by @wdduncan in #42
- Create dental-restoration-material-resin-2.txt by @wdduncan in #41
- Create dental-restoration-material-ceramic-2.txt by @wdduncan in #39
- Create dental-restoration-material-ceramic-1.txt by @wdduncan in #38
- Create dental-restoration-material-resin-1.txt by @wdduncan in #40
- Create dental-restoration-material-polymer-2.txt by @wdduncan in #43
- similarity by @cmungall in #57
- Add option to provide path to input file by @caufieldjh in #56
- Bicluster enrichment by @realmarcin in #62
- Added command and code for computing euclidian distances between embeddings by @justaddcoffee in #58
- Flake8 fixes + lint by @hrshdhgd in #63
- enrichment changes by @cmungall in #65
- Missed parenthesis for random.SystemRamdom() by @hrshdhgd in #67
- Change citation updater in Makefile to get_version by @caufieldjh in #68
- Raise FileNotFoundError if filepath for extract is missing by @caufieldjh in #72
- Makefile uses all templates by @caufieldjh in #69
- interactive-mode by @cmungall in #71
- Added command to generate mock clinical notes by @justaddcoffee in #74
- Bump version of oaklib by @cmungall in #73
- msigdb hallmark gene sets by @realmarcin in #78
- use prompts for enrichment by @cmungall in #80
- fixing gene sets and updating analysis by @cmungall in #81
- Adding schema for ontology issues in github. refactor enrichment by @cmungall in #83
- p-value templates with edited end markers to run multiple independent… by @realmarcin in #82
- Add diagnostic_procedure template by @caufieldjh in #29
- Fix for web-ontogpt not working on new install by @caufieldjh in #85
- geneweaver format by @cmungall in #89
- re-ran notebook by @cmungall in #90
- re-ran notebooks for enrichGPT by @cmungall in #95
- Update documentation by @caufieldjh in #92
- Autogenerate docs by @caufieldjh in #98
- Fix for doc generation by @caufieldjh in #100
- Added streamlit app for spindoctor by @cmungall in #101
- Study class as tree root in environment_sample template by @sujaypatil96 in #104
- update sections in README by @sujaypatil96 in #105
- Fixed a bug where the 'skip_annotators' option was being ignored by @daikiad in #108
- first trait commits by @cmungall in #109
- first draft of biotic interaction template by @diatomsRcool in #107
- One more fix for biotic interaction template by @caufieldjh in #113
- Adding a GPT-based reasoner, for evaluation purposes. by @cmungall in #112
- more prompt language and adding ENVTHES ontology by @realmarcin in #118
- Add general framework for specifying models by name and source by @caufieldjh in #99
- Adding a MappingEngine by @cmungall in #121
- removing importlib dependency by @cmungall in #122
- very small typo fix by @PR0CK0 in https://github.c...
v0.2.11
Highlights
Generate-Extract
ontogpt generate-extract -m gpt-4 -t cell_type "Acinar Cell Of Salivary Gland"
This does two things
- asks GPT to generate a summary of the cell type
- parses/extracts knowledge from that cell type
This rescuscitates the original HALO idea. We could in principle directly generate an entire knowledgebase in structured form from the latent GPT KB
Example output:
extracted_object:
cell_type: Acinar cell of a salivary gland
parents:
- CL:0000066
subtypes:
- CL:0000313
- CL:0000319
localizations:
- UBERON:0001044
- UBERON:0009842
diseases:
- AUTO:Sj%C3%B6gren%27s%20syndrome
- MONDO:0021357
named_entities:
- id: CL:0000066
label: Epithelial cell
- id: CL:0000313
label: Serous cells
- id: CL:0000319
label: Mucous cells
- id: UBERON:0001044
label: Salivary gland
- id: UBERON:0009842
label: Acinus
- id: AUTO:Sj%C3%B6gren%27s%20syndrome
label: Sjögren's syndrome
- id: MONDO:0021357
label: Salivary gland tumors
Cell Type Templates
This PR also demonstrates using subclasses for more refined subtypes
Compare the two:
ontogpt generate-extract -m gpt-4 -t cell_type "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"
- 1ontogpt generate-extract -m gpt-4 -t cell_type.InterneuronDocument "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"`
The first uses the generic base class. the second uses a subclass designed for interneurons, which has an extra slot for projection fields
Example output:
extracted_object:
cell_type: L2/3 Intratelencephalic Projecting Glutamatergic Neuron of the Primary
Motor Cortex
range: Not mentioned
parents:
- AUTO:excitatory%20neuron
subtypes:
- AUTO:Not%20mentioned
localizations:
- UBERON:0000956
- UBERON:0001384
genes:
- AUTO:Not%20mentioned
diseases:
- MONDO:0005180
- MONDO:0020128
projects_to_or_from:
- UBERON:0001893
named_entities:
- id: UBERON:0001893
label: telencephalon
- id: AUTO:excitatory%20neuron
label: excitatory neuron
- id: AUTO:Not%20mentioned
label: Not mentioned
- id: UBERON:0000956
label: cerebral cortex
- id: UBERON:0001384
label: primary motor cortex
- id: MONDO:0005180
label: Parkinson's disease
- id: MONDO:0020128
label: motor neuron disease
What's Changed
Full Changelog: v0.2.10...v0.2.11