-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing Atlassian Confluence #154
Comments
Hi @pudo Ex Confluence developer here. Great to hear that Connie is getting used by some editors. I might be able to provide a couple of random thoughts that may be useful in getting that data into Aleph... First thing to consider is whether the editor is using Confluence Cloud or Confluence Server. Although the products have the same name the codebases are (now) pretty divergent and the way you achieve things can be significantly different depending on which product you want to interact with. Fun times. One aspect of Confluence that is common to both cloud and server is the export function. If the space is relatively static, meaning if the editor has finished working on their notes and simply wants to import into Aleph then it might be easier to have the editor export the space using the Confluence export feature (there are numerous export options, html and xml for example). This export could then be ingested and transformed into something that Aleph/FtM can handle. If that's not viable then you'll either want to get the rendered content for each page using the Confluence API or find a way of scaping the page with Memorious, which would leave you with the SSO/2FA challenge. To work around challenges with SSO and 2FA you might be able to create a plugin that is installed on the Confluence instance. This plugin would have access to page content, comments, and attachments and could call back to an API to record that same information in Aleph. Cloud plugins are effectively microservices and can be written in a bunch of different languages, Server plugins are built in Java. So, that might be something else to consider. Another entirely random thought here would be to switch things around and, rather than export data from Confluence into Aleph, build an integration from Aleph into Confluence. |
Confluence-space-export-155300.html.zip The attached is a basic Confluence space export in HTML format. It contains content and attachments but unfortunately no comments. Importing this directly into Aleph produces output similar to the following: It also exports a page which holds the structure of the space, so sub pages etc. What is somewhat annoying is that the links don't work so you can't navigate the space easily once it has been uploaded into Aleph. With that said it might be possible to extend the html ingestor to handle this? |
We have this recurring request from some editors to index project Confluence wikis into Aleph. The idea is to index all the reporters notes from a given wiki space into an investigation casefile. What we'd need to figure out:
The text was updated successfully, but these errors were encountered: