-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding more docs on how to use LLMs with OAK (#819)
* Adding more docs on how to use LLMs with OAK * format
- Loading branch information
Showing
7 changed files
with
789 additions
and
470 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"metadata": {}, | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# LLM Tutorial\n", | ||
"\n", | ||
"This walks through using OAK through an LLM wrapper.\n", | ||
"\n", | ||
"See also [How-to guide](https://incatools.github.io/ontology-access-kit/howtos/use-llms.html).\n", | ||
"\n", | ||
"Note for this to work, you must either install OAK with llm extras, or do a separate install\n", | ||
"of `pipx install llm`.\n", | ||
"\n", | ||
"You will also need the API keys for an LLM service, or a proxy to a local model.\n", | ||
"\n", | ||
" " | ||
], | ||
"id": "cf2572dda785deed" | ||
}, | ||
{ | ||
"metadata": {}, | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Annotate Command\n", | ||
"\n", | ||
"Note the first time you run this it may be slow, as it needs to perform an initial embedding.\n", | ||
"\n", | ||
"Here we use the standard OAK `annotate` command, but instead of the usual adapter (e.g. `sqlite:obo:cl`), we pass in a wrapped adapter, using the `gpt4-o` model.\n", | ||
"\n", | ||
"We strongly recommend passing in categories, as this helps the model ground the kinds of terms you are interested in." | ||
], | ||
"id": "95ff062ec749f629" | ||
}, | ||
{ | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-10-22T02:00:12.384637Z", | ||
"start_time": "2024-10-22T01:59:41.305531Z" | ||
} | ||
}, | ||
"cell_type": "code", | ||
"source": "!runoak --stacktrace -i llm:{gpt-4o}:sqlite:obo:cl annotate \"sequencing was performed on splenic and thymic macrophages\" --category CellType \n", | ||
"id": "8044c89577c9625", | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"object_id: CL:0000871\r\n", | ||
"object_label: splenic macrophage\r\n", | ||
"object_categories:\r\n", | ||
"- CellType\r\n", | ||
"subject_label: splenic macrophages\r\n", | ||
"\r\n", | ||
"---\r\n", | ||
"object_id: CL:0000866\r\n", | ||
"object_label: thymic macrophage\r\n", | ||
"object_categories:\r\n", | ||
"- CellType\r\n", | ||
"subject_label: thymic macrophages\r\n", | ||
"start: 40\r\n", | ||
"end: 58\r\n" | ||
] | ||
} | ||
], | ||
"execution_count": 3 | ||
}, | ||
{ | ||
"metadata": {}, | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Currently the specific span coordinates are only returns for concepts that can be clearly mapped back to the text.\n", | ||
"\n", | ||
"You can also use the standard `--whole-text` (`-W`) option to match the entire text span, rather than to annotate segments:" | ||
], | ||
"id": "a014e2a04badc986" | ||
}, | ||
{ | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-10-22T02:04:59.758809Z", | ||
"start_time": "2024-10-22T02:04:43.271571Z" | ||
} | ||
}, | ||
"cell_type": "code", | ||
"source": "!runoak --stacktrace -i llm:{gpt-4o}:sqlite:obo:cl annotate -W \"macrophage found in the thymus\" --category CellType ", | ||
"id": "4cdcabaad7e6268e", | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"object_id: CL:0000866\r\n", | ||
"object_label: thymic macrophage\r\n", | ||
"subject_label: macrophage found in the thymus\r\n" | ||
] | ||
} | ||
], | ||
"execution_count": 6 | ||
}, | ||
{ | ||
"metadata": {}, | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Suggesting Definitions\n", | ||
"\n" | ||
], | ||
"id": "7f06875f274fbd06" | ||
}, | ||
{ | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-10-22T02:02:45.880910Z", | ||
"start_time": "2024-10-22T02:02:27.963766Z" | ||
} | ||
}, | ||
"cell_type": "code", | ||
"source": [ | ||
"!runoak -i llm:sqlite:obo:uberon generate-definitions \\\n", | ||
" finger toe \\\n", | ||
" --style-hints \"write definitions in formal genus-differentia form\"" | ||
], | ||
"id": "3a9f92f9e258b301", | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"add definition 'A manual digit is a type of anatomical structure characterized as one of the distal appendages found on the human hand, distinct from those structures on other limbs, and is primarily comprised of phalanges, a metacarpal bone, and associated soft tissue.' to UBERON:0002389\r\n", | ||
"add definition 'A pedal digit is a type of anatomical structure that is a subdivision of the limb and is specifically located at the distal end of the pes, commonly known as the foot, in vertebrates.' to UBERON:0001466\r\n" | ||
] | ||
} | ||
], | ||
"execution_count": 5 | ||
}, | ||
{ | ||
"metadata": {}, | ||
"cell_type": "code", | ||
"outputs": [], | ||
"execution_count": null, | ||
"source": "", | ||
"id": "3f64b6dc3ae0a288" | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 2 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython2", | ||
"version": "2.7.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,3 +7,4 @@ Adapter Examples | |
:maxdepth: 2 | ||
|
||
Ubergraph/Ubergraph-Tutorial | ||
LLM/LLM-Tutorial |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.