Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix KeyError when accessing lineage dictionary in OBO parser #233

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

manvikri22
Copy link

Description

This pull request addresses an issue where accessing the lineage dictionary resulted in a KeyError if the superentity was not found.

Changes Made

  • Implemented a conditional check to ensure that the superentity exists in the lineage dictionary before accessing it.
  • Added a warning message to inform users when a superentity is not found, improving the robustness of the parser.

Motivation

This change enhances error handling in the OBO parser, making it more resilient to inconsistencies in the ontology files and providing clearer feedback to users during debugging.

- Added a check to ensure the superentity exists in the lineage dictionary before attempting to access it.
- Added a warning message for better debugging when a superentity is missing.
@althonos
Copy link
Owner

althonos commented Oct 2, 2024

Hi @manvikri22 , do you have an example where such an error happens? My impression is that the KeyError would happen here if a superentity is not declared for a subentity, so it may be better to resolve this more globally to support dangling entities (#225).

@manvikri22
Copy link
Author

manvikri22 commented Oct 3, 2024

Hi @althonos,

Thank you for the feedback!

You're right that the KeyError could occur when a superentity is referenced but not declared (a dangling entity). In the case I encountered, I was working with an OBO file where certain is_a relationships pointed to terms that were either missing or improperly defined. Here's a minimal example that triggers the error:

[Term] id: CL:0000540 name: hematopoietic stem cell is_a: BFO:0000040 ! some undefined biological entity

In this case, the term BFO:0000040 is referenced as a superclass but is not declared in the file, leading to the KeyError when the symmetrize_lineage() function tries to access it. The fix I proposed ensures that the missing entity is handled gracefully by logging a warning, but you're right that this might need a broader resolution to support dangling entities in a more global way.

If you'd like, I can update the pull request or work on a more comprehensive fix based on your guidance.
Do you have specific suggestions or references for the dangling entity resolution? I could also look into updating the parser to log missing entities while preserving the overall structure, or perhaps adding an option to skip undefined superentities gracefully.

Let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants