Karakeep: Browserless Shows Disconnected But Crawling Works
Introduction
This article addresses an issue encountered in Karakeep version 0.29.1 where the Browserless service connection is reported as disconnected with a 'fetch failed' error, despite the crawler functioning correctly. This problem arises specifically when using Browserless as the crawler and connecting to it via a websocket. The user, who recently upgraded from version 0.27.1, noticed this discrepancy in the newly introduced Service Connections section. While the crawler successfully performs its tasks, the health check indicates a disconnection, leading to confusion and potential concerns about the service's stability. This article explores the details of the bug, the steps to reproduce it, the expected behavior, and potential causes, offering a comprehensive understanding of the issue within the Karakeep environment.
Describe the Bug
A user reported that after upgrading to Karakeep version 0.29.1, the Browserless service connection is shown as disconnected with a fetch failed error, even though the crawler is working without any issues. The user is utilizing browserless as the crawler in their Karakeep deployment, with the connection established over a websocket. Despite the crawler successfully performing its tasks, the Service Connections section indicates that the Browser connection is disconnected. This discrepancy raises concerns about the accuracy of the health status reporting and whether it reflects a genuine issue with the service. The user provided logs demonstrating that crawls are completing successfully, further emphasizing the contradiction between the reported status and the actual functionality.
The user's configuration involves connecting to Browserless over a websocket, which might be a contributing factor to the issue. The configuration is set up as follows:
- name: BROWSER_WEB_URL
value: "ws://browserless:3000"
The logs also reveal that the connection to the Playwright browser is being closed when not actively in use, which could be triggering the health check to report the service as unhealthy. This behavior is logged periodically:
2025-12-04T23:52:01.589Z info: [Crawler] The Playwright browser got disconnected. Will attempt to launch it again.
2025-12-04T23:52:01.589Z info: [Crawler] Connecting to existing browser instance: ws://browserless:3000
2025-12-04T23:52:01.590Z info: [Crawler] Successfully resolved IP address, new address: ws://10.43.203.45:3000/
The user suggests that maintaining a persistent connection when there is nothing to crawl might be unnecessary, and the service should ideally be considered healthy as long as it can establish a connection when needed.
Steps to Reproduce
To reproduce this bug, follow these steps:
- Use a Browserless image as the crawler for Karakeep.
- Set up the connection over a websocket.
- Observe that the health of the browser is shown as disconnected in the Service Connections section.
- Verify that crawls are working correctly despite the disconnected status.
This issue seems to stem from how Karakeep monitors the health of the Browserless service when connected via websockets. The intermittent disconnections, which are normal when the crawler is not in active use, are being misinterpreted as a persistent failure, leading to the incorrect status report.
Expected Behaviour
The expected behavior is that the service should be shown as healthy, as long as it is able to establish a connection and perform crawls successfully. The health check should not interpret intermittent disconnections as a permanent failure, especially when the service reconnects and functions correctly when needed. A more robust health check mechanism could consider the service healthy if it can successfully initiate crawls, even if the connection is not continuously active. This would provide a more accurate representation of the service's actual status and avoid unnecessary alerts or concerns.
Screenshots and Additional Context
Here is a screenshot illustrating the issue:
The screenshot shows the Service Connections section in Karakeep, where the Browser connection is marked as disconnected despite the crawler functioning correctly. This visual representation further emphasizes the discrepancy between the reported status and the actual functionality of the Browserless service.
Device Details
No device details were provided in the original report.
Exact Karakeep Version
The Karakeep version being used is 0.29.1.
Troubleshooting
The user has confirmed that they have checked the troubleshooting guide and have not found a solution to their problem.
Root Cause Analysis
Delving deeper into the problem, the root cause appears to be how Karakeep's health check mechanism interprets the websocket connection to Browserless. Websockets, by nature, can be transient, especially when there's no active data transfer. Karakeep might be expecting a persistent, always-on connection, which isn't how Browserless is designed to operate. This expectation leads to the system flagging Browserless as disconnected whenever the websocket link isn't actively transmitting data. However, this doesn't necessarily mean the service is down or malfunctioning. Browserless can quickly re-establish the connection when a crawl request comes in, as evidenced by the user's logs showing successful crawls. Therefore, the health check is providing a misleading status, causing unnecessary alarm.
Potential Solutions
To address this issue, several solutions can be considered:
-
Adjust Health Check Sensitivity: Modify Karakeep's health check to be less sensitive to brief disconnections. Instead of immediately flagging Browserless as disconnected, the system could implement a retry mechanism or a grace period. If the connection is re-established within a certain timeframe, the service should be considered healthy.
-
Implement a Heartbeat Mechanism: Introduce a heartbeat mechanism where Karakeep periodically pings Browserless to check its availability. This would provide a more accurate assessment of the service's status than simply monitoring the websocket connection.
-
Differentiate Between Connection States: Distinguish between different connection states. A disconnected state could be further categorized into idle and unavailable. If the connection is merely idle due to inactivity, the service should still be considered healthy. Only when the service is truly unavailable (e.g., due to a server outage) should it be flagged as unhealthy.
-
Provide Configuration Options: Allow users to configure the health check parameters, such as the retry interval, grace period, and heartbeat frequency. This would enable users to fine-tune the health check to match their specific environment and usage patterns.
Impact and Recommendations
The impact of this bug is primarily on the user experience. The misleading health status can cause unnecessary concern and potentially lead to users investigating a non-existent problem. While the crawler continues to function correctly, the inaccurate status reporting undermines confidence in the system.
It is recommended that the Karakeep development team investigate this issue and implement one of the proposed solutions. A more robust and accurate health check mechanism would greatly improve the user experience and provide a more reliable indication of the service's actual status.
Conclusion
In conclusion, the issue of Browserless being reported as disconnected in Karakeep, despite functioning correctly, stems from a misinterpretation of websocket connection states by the health check mechanism. The intermittent disconnections, which are normal during periods of inactivity, are being incorrectly flagged as a persistent failure. To resolve this, the health check mechanism needs to be adjusted to be more tolerant of brief disconnections, potentially through the implementation of a retry mechanism, a heartbeat system, or a more nuanced categorization of connection states. Addressing this issue will improve the user experience and provide a more accurate representation of the service's health. For more information on websockets and their behavior, you can visit the WebSocket Wikipedia page. This will help you understand how websockets work and why they might disconnect when not in use.