Skip to content

Releases: monarch-initiative/ontogpt

v0.3.8

08 Feb 21:24
fc9a243
Compare
Choose a tag to compare

Highlights

Try out ontogpt list-templates to see all the available extraction templates, including the new data_sheets_schema and updated clinical dietitian notes schema.

What's Changed

Full Changelog: v0.3.7...v0.3.8

v0.3.7

19 Jan 16:00
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.6...v0.3.7

v0.3.6

20 Dec 21:47
200ced3
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.5...v0.3.6

v0.3.5

14 Dec 19:31
160cc37
Compare
Choose a tag to compare

Minor updates plus new templates for Gene Ontology term extraction (go_terms and go_terms_relational).

What's Changed

Full Changelog: v0.3.4...v0.3.5

v0.3.4

21 Nov 19:49
Compare
Choose a tag to compare

This release includes a variety of bugfixes and new options for running evaluations.

What's Changed

New Contributors

Full Changelog: v0.3.3...v0.3.4

v0.3.3

25 Sep 17:09
1125d83
Compare
Choose a tag to compare

This version includes a bugfix, an updated OpenAI model list, and more clearly delineated documents in YAML output. See more details below!

What's Changed

Full Changelog: v0.3.2...v0.3.3

v0.3.2

19 Sep 19:40
ba40fee
Compare
Choose a tag to compare

This release primarily concerns bugfixes. Thanks to all users who have provided feedback!

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.3.2

v0.3.1

24 Aug 02:51
ac4c060
Compare
Choose a tag to compare

Highlights

Access to open models through the llm package

llm provides easy access to LLMs from OpenAI and beyond, including the GPT4All set of open models.
You may now specify one of these models by using the -m or --model option with most commands.
When calling a model for the first time, llm will download a local copy.

Example:

ontogpt extract -t mendelian_disease.MendelianDisease -i tests/input/cases/mendelian-disease-cmt2e.txt -m nous-hermes-13b

Or extract from PubMed abstracts:

ontogpt pubmed-annotate -t drug "propranolol mode of action" --model nous-hermes-13b --limit 5

Or generate clinical case report text:

ontogpt clinical-notes -d "patient with chronic muscle pain and hypoplastic toenails" --sections "Past Medical History" -m nous-hermes-13b

See the full list of model options with

ontogpt list-models

Updated dependency requirements

OntoGPT should now be compatible with Pydantic versions less than, equal to, or greater than 2. Many of these changes happened upstream within the broader LinkML ecosystem.

What's Changed

Full Changelog: v0.3.0...v0.3.1

v0.3.0

03 Aug 00:06
c53ed42
Compare
Choose a tag to compare

Highlights

Generate-and-Extract Command

This release adds a new command generate-extract that composes two operations.

  • generate a natural language description
  • parse the NL description using SPIRES

Cell Type Use Case

(This use case based on a conversation with @dosumis)

For example, given a cell type such as Acinar Cell Of Salivary Gland
, generate a description using GPT describing many aspects of the cell type, from it's marker genes through to its function and diseases it is implicated in.

After that use the cell-type schema (https://w3id.org/ontogpt/cell_type) to extract this into structured form. As an optional next step use linkml-owl to generate OWL TBox axioms

Iterative generate-extract

The command can be executed in iterative mode - this will traverse the extracted subtypes with each iteration, gradually building up an ontology that is entirely generated from the "latent knowledge" in the LLM

Here is a screenshot of an ontology generated entirely using OntoGPT by traversing from "Interneuron" downwards:

image

There are many oddities about it, currently each iteration is independent so it has no way of knowing if it is has already made a concept, but an interesting proof of principle. The ugly pct-encoded labels indicate cases where it couldn't match to an existing concept in CL or other ontology, and may represent KB gaps to be filled

More thoughts here: cell type summaries

What's Changed

Read more

v0.2.11

31 Jul 21:59
90d3eaa
Compare
Choose a tag to compare

Highlights

  • Add a combined generate-extract command, fixes #158
  • Adds cell type templates, fixes #159

Generate-Extract

ontogpt generate-extract -m gpt-4 -t cell_type "Acinar Cell Of Salivary Gland"

This does two things

  1. asks GPT to generate a summary of the cell type
  2. parses/extracts knowledge from that cell type

This rescuscitates the original HALO idea. We could in principle directly generate an entire knowledgebase in structured form from the latent GPT KB

Example output:

extracted_object:
  cell_type: Acinar cell of a salivary gland
  parents:
    - CL:0000066
  subtypes:
    - CL:0000313
    - CL:0000319
  localizations:
    - UBERON:0001044
    - UBERON:0009842
  diseases:
    - AUTO:Sj%C3%B6gren%27s%20syndrome
    - MONDO:0021357
named_entities:
  - id: CL:0000066
    label: Epithelial cell
  - id: CL:0000313
    label: Serous cells
  - id: CL:0000319
    label: Mucous cells
  - id: UBERON:0001044
    label: Salivary gland
  - id: UBERON:0009842
    label: Acinus
  - id: AUTO:Sj%C3%B6gren%27s%20syndrome
    label: Sjögren's syndrome
  - id: MONDO:0021357
    label: Salivary gland tumors

Cell Type Templates

This PR also demonstrates using subclasses for more refined subtypes

Compare the two:

  1. ontogpt generate-extract -m gpt-4 -t cell_type "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"
  2. 1ontogpt generate-extract -m gpt-4 -t cell_type.InterneuronDocument "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"`

The first uses the generic base class. the second uses a subclass designed for interneurons, which has an extra slot for projection fields

Example output:

extracted_object:
  cell_type: L2/3 Intratelencephalic Projecting Glutamatergic Neuron of the Primary
    Motor Cortex
  range: Not mentioned
  parents:
    - AUTO:excitatory%20neuron
  subtypes:
    - AUTO:Not%20mentioned
  localizations:
    - UBERON:0000956
    - UBERON:0001384
  genes:
    - AUTO:Not%20mentioned
  diseases:
    - MONDO:0005180
    - MONDO:0020128
  projects_to_or_from:
    - UBERON:0001893
named_entities:
  - id: UBERON:0001893
    label: telencephalon
  - id: AUTO:excitatory%20neuron
    label: excitatory neuron
  - id: AUTO:Not%20mentioned
    label: Not mentioned
  - id: UBERON:0000956
    label: cerebral cortex
  - id: UBERON:0001384
    label: primary motor cortex
  - id: MONDO:0005180
    label: Parkinson's disease
  - id: MONDO:0020128
    label: motor neuron disease

What's Changed

  • Adding generate-extract command, 158. Add cell type templates #159 by @cmungall in #162

Full Changelog: v0.2.10...v0.2.11