OpenCTI: Import Document Connector Runs Unexpectedly
We've identified a peculiar behavior within the OpenCTI platform where the Import Document connector runs in the background even when only the Import Document AI connector is explicitly selected. This can lead to mixed or incorrect draft content, causing confusion and potentially inaccurate data processing. This article delves into the details of this issue, providing a step-by-step guide to reproduce it, and highlighting the expected versus actual outputs.
Description
When ingesting a file into the platform, we observed that the Import Document connector unexpectedly runs in the background, despite our intention to only utilize the Import Document AI connector. The results from both connectors are then merged into the same draft, leading to a blend of interpretations that may not be accurate. This can be especially problematic when the two connectors interpret the same data differently, as demonstrated in the reproducible steps below. Understanding and resolving this issue is crucial for maintaining data integrity and ensuring that the platform behaves as expected.
Environment
- OCTI 6.8.15
Reproducible Steps
To reproduce this issue, follow these steps:
Set up:
- Ensure that the “Italy” country with “IT” as an alias exists in your platform.
- Set the Import Document connector to “automatic” mode.
- Set the Import Document AI connector to “manual” mode.
Action:
-
Navigate to the “Data > Import” menu.
-
Import the provided file (
test-import.txt) using the “step-by-step” approach.We identified the IT sector as the main target. -
Select only the Import Document AI connector.
-
Choose “Draft” as the validation mode.
-
Import the file and navigate to the created draft.
Results:
Examine the results in the draft. You'll likely find that the country “Italy” is added to the draft due to the presence of “IT” in the sentence “We identified the IT sector as the main target.” However, the Import Document AI connector correctly identifies that “IT” in this context does not refer to the country Italy and does not add it. This incorrect addition originates from the Import Document connector.
To confirm that both connectors are running, you can re-import the file with the Import Document connector disabled. You'll observe that “Italy” no longer appears in the draft. Further evidence of both connectors running simultaneously can be found in the work logs:
-
Work log for Import Document AI:
-
Work log for Import Document, showing an additional “operation” related to “Italy”:
This step-by-step reproduction clearly demonstrates the unexpected behavior of the Import Document connector running in the background, even when only the Import Document AI connector is selected.
Expected Output
The expected behavior is that only the selected connector (Import Document AI) should execute, without any interference from other connectors. This ensures that the draft content is solely based on the interpretation of the chosen connector.
Actual Output
The actual output is that the Import Document connector runs in the background, leading to a mix of results in the draft content. This can cause confusion and potentially inaccurate data processing, requiring additional cleanup work to rectify the discrepancies.
Additional Information
The issue may stem from the Import Document connector being set to “automatic” mode. While switching it to “manual” might mitigate the problem, the core issue remains: even when set to “automatic,” the Import Document connector should not run when it's not explicitly selected during import. The current behavior is confusing and counterintuitive.
It's also worth noting that when both Import Document and Import Document AI are selected during import, they both run as expected, creating separate containers in the draft. However, when only Import Document AI is selected, both connectors still run, but only the container from Import Document AI is visible, which is inconsistent and adds to the complexity of the issue. This inconsistency further highlights the need for a clear and predictable behavior regarding connector execution.
Implications of the Issue
The unexpected behavior of the Import Document connector has several significant implications for users of the OpenCTI platform:
- Data Inaccuracy: The inclusion of results from an unintended connector can lead to inaccurate data being added to the draft, requiring manual correction and validation.
- Increased Workload: Users may need to spend additional time reviewing and cleaning up draft content to ensure its accuracy, which increases the overall workload.
- Confusion and Frustration: The inconsistent behavior of the connectors can cause confusion and frustration among users, especially when trying to understand why certain data is present in the draft.
- Potential for Errors: The mixed results from different connectors can increase the potential for errors in data processing and analysis, which can have serious consequences in critical applications.
Potential Solutions and Workarounds
While a permanent fix for this issue is being developed, there are several potential solutions and workarounds that users can employ to mitigate its impact:
- Disable Import Document Connector: If you primarily use the Import Document AI connector, you can temporarily disable the Import Document connector to prevent it from running in the background. However, this may not be feasible if you occasionally need to use the Import Document connector for other tasks.
- Set Import Document to Manual: As mentioned earlier, setting the Import Document connector to “manual” mode might prevent it from running automatically. However, this has not been definitively confirmed and may not completely resolve the issue.
- Careful Review of Draft Content: Regardless of the selected connectors, it's essential to carefully review the draft content to identify and correct any inaccuracies caused by the unexpected behavior of the Import Document connector.
- Use Separate Instances: For critical applications, consider using separate OpenCTI instances for different types of data processing, one with the Import Document connector enabled and one with it disabled. This can help to isolate the issue and prevent it from affecting other processes.
Conclusion
The unexpected behavior of the Import Document connector running in the background when only the Import Document AI connector is selected poses a significant challenge for OpenCTI users. By understanding the issue, following the reproducible steps, and implementing the suggested solutions and workarounds, users can mitigate its impact and ensure the accuracy of their data. A permanent fix for this issue is crucial to ensure the reliability and usability of the OpenCTI platform.
For more information about OpenCTI and its connectors, please visit the OpenCTI website.