Track Report Changes: New Tools For Evolution History

by Alex Johnson 54 views

Ever found yourself staring at a report, wondering when exactly a certain insight popped up, or what changed between the last two versions? If you're nodding along, you're not alone! Living Reports are fantastic, keeping track of when sections and insights are created or updated, but a crucial piece of the puzzle has been missing: a clear way to see the entire evolution timeline of a report. This means we couldn't easily spot what differed between outline versions, pinpoint the exact discovery moment of specific insights, track the overall progression of our analysis, or neatly distinguish between early exploratory content and finalized, validated findings. That's where our exciting new proposal comes in, aiming to provide robust tools for tracking report evolution history.

The Need for Report Evolution Tracking

Imagine you're deep into analyzing a complex dataset, and your report is evolving rapidly. You might add a section here, refine an insight there, and perhaps even dabble in some speculative analysis that you later decide to prune. Without a clear history, it's like trying to navigate a maze blindfolded. You might know when a section was last touched, but what actually changed? Was it a minor tweak, or a complete overhaul? Did a new, game-changing insight emerge yesterday, or was it lurking in the report for weeks? Understanding report evolution history is key for several reasons. Firstly, it aids in intelligent report maintenance. When you can see the progression, you can more easily identify outdated or redundant information and clean up the report efficiently. Secondly, it's invaluable for auditing and debugging. If an unexpected change occurs, tracing back through the timeline, possibly with detailed diffs, can help pinpoint the exact moment and cause of the alteration. Thirdly, for collaborative projects, it provides transparency and accountability, allowing team members to see who contributed what and when. Finally, it supports a more structured analytical process. By differentiating between exploratory phases and confirmed findings, teams can ensure the final report is built on a solid, validated foundation. This capability is not just a 'nice-to-have'; it's becoming essential for managing complex, dynamic reports effectively. The current limitations mean that valuable historical context is lost, making troubleshooting and comprehensive analysis more challenging than it needs to be. We need a way to unlock this historical data, making reports not just living documents, but documented living documents.

Two Paths to Unlocking Report History

To address the critical need for tracking report evolution, we've devised two compelling solutions, each offering a unique approach to accessing your report's historical data. These options are designed to be flexible and powerful, catering to different user needs and integration preferences. The first option, which we'll call Option A: Add include_audit=True to get_report, integrates the historical view directly into the primary report retrieval function. Think of it as adding a comprehensive 'history' tab directly within your report's main view. When you call the get_report function, you'll have the option to set include_audit=True. If you do, the response will be enhanced to include an audit_trail object. This object will provide a summary of the report's evolution, including the total number of versions, the initial creation timestamp, the last updated time, and critically, a list of recent_changes. Each entry in recent_changes will detail the version number, the timestamp of that version, the actor (who made the change, be it a user or an agent), and a concise summary of what was modified, such as 'Added 2 insights, modified 1 section'. This approach offers a quick, high-level overview of recent modifications, perfect for users who primarily need to understand what's new or changed without digging too deep into the specifics of every single alteration. It keeps the workflow simple – just one call to get your report and its recent history.

Option A: get_report with include_audit=True

This approach embeds the audit information directly within the get_report call. It's designed for users who want a quick snapshot of recent activity without needing a separate tool. When you make a get_report request and set the include_audit parameter to True, the returned JSON will now contain an additional field: audit_trail. This audit_trail object provides a curated summary of the report's historical modifications. It includes key metadata such as total_versions, indicating the complete number of historical states the report has gone through, and the created_at and last_updated_at timestamps, giving you the bookends of the report's lifecycle. The most valuable part, however, is the recent_changes array. Each element in this array represents a significant update event. It details the specific version number associated with the change, the precise timestamp when it occurred, the actor responsible (which could be an automated agent or a specific user), and a human-readable summary of the modifications. For instance, you might see entries like 'Added 2 insights, modified 1 section' or 'Added section "Key Findings"'. This format is incredibly useful for quickly understanding the recent trajectory of the report. If you're just looking to catch up on what happened in the last day or week, this provides that information efficiently. It streamlines the process of staying updated on a constantly evolving document, allowing you to grasp the narrative of changes at a glance. The benefit here is simplicity and directness; you get your report data and a concise history in a single response, minimizing the need for extra API calls or complex data processing for common historical queries. This makes it ideal for dashboards, quick reviews, or integration into workflows where immediate context is paramount.

Option B: New get_report_timeline Tool

For those who need a more granular and customizable view of a report's history, we propose Option B: A New get_report_timeline Tool. This dedicated tool is built for deep dives into the report's past. Unlike Option A, which provides a summary, get_report_timeline allows you to fetch a detailed chronological log of all changes. You can specify a from_version and to_version to retrieve a specific range of historical states, making it perfect for investigating particular periods. Furthermore, an optional include_diffs=True parameter enables the retrieval of detailed content changes for each event. The response structure is rich: it includes the report_id and title, followed by a timeline array. Each entry in the timeline represents a distinct version and contains the version number, timestamp, actor, and request_id (useful for tracing specific operations). It also details the changes made in that version, categorizing additions, modifications, and removals of sections and insights. The true power lies in the optional diff object, which, if requested, provides the actual content differences. For example, you could see the precise text added or modified for a specific insight. Beyond the detailed timeline, this tool also offers a statistics object, providing aggregated insights like total_versions, total_sections_ever_added, total_insights_ever_added, the most_modified_section, and even the busiest_day for changes. This comprehensive approach makes get_report_timeline the go-to solution for in-depth analysis, historical auditing, and understanding the complete lifecycle and evolution patterns of your reports.

Making History: Implementation Details

Implementing these powerful historical tracking features requires a few key steps, primarily revolving around how we capture and store the changes that occur within a report. The foundation for both proposed solutions lies in enhancing our existing evolve_report functionality to meticulously record every significant alteration. Currently, when a report outline is updated, evolve_report already generates a summary of the changes made. Our first implementation step is to ensure this summary is consistently captured and stored in an audit log. This means modifying the update_report_outline function to append a new audit_entry to a designated log whenever changes are saved. This audit_entry will be a structured record containing crucial details: the version number of the report after the changes, the exact timestamp of the update, the actor who performed the action (whether it’s a user or an automated agent), the request_id associated with the operation (essential for traceability), and a detailed breakdown of the changes themselves. These changes will be categorized into sections_added, sections_modified, sections_removed, insights_added, insights_modified, and insights_removed, providing a comprehensive picture of what was altered in that specific version.

The second crucial implementation step is defining where this audit log will reside. We propose storing the audit log as a new file, audit.jsonl, within the report's directory structure. This file will be located at ~/.igloo_mcp/reports/by_id/<report-id>/audit.jsonl. The .jsonl extension signifies a JSON Lines format, meaning each line in the file will be a self-contained JSON object representing a single audit entry. This append-only format is efficient for logging and querying historical data. When a new change occurs, a new JSON object representing the audit_entry is simply appended to the end of this file. This ensures that the history is preserved and that older versions of the log are not overwritten, maintaining a complete and immutable record of the report's evolution. Finally, we need to implement the retrieval mechanisms. For Option A, this involves modifying the get_report function to conditionally read the audit.jsonl file when include_audit=True is specified and appending the relevant recent entries to the audit_trail field in the response. For Option B, the new get_report_timeline function will be responsible for reading the audit.jsonl file, filtering entries based on the provided version range (from_version, to_version), optionally including detailed diffs (which would require a more sophisticated diff generation logic, possibly comparing adjacent versions' states), and calculating aggregate statistics from the collected timeline data. This structured approach ensures that historical data is not only captured accurately but also made easily accessible through well-defined interfaces.

Storing the Audit Log

As detailed above, the chosen method for storing the report evolution history is a new file named audit.jsonl located within each report's dedicated directory. This file will reside alongside existing files like outline.json and metadata.json at the path ~/.igloo_mcp/reports/by_id/<report-id>/. The use of the .jsonl (JSON Lines) format is deliberate. It means that each line in the audit.jsonl file is a complete, independent JSON object. This structure is highly beneficial for several reasons. Firstly, it allows for efficient, incremental appending of new audit entries. Whenever a change is made to a report, the corresponding audit_entry JSON object is simply written as a new line at the end of the file. This append-only nature ensures that the historical record is preserved without the risk of overwriting previous entries, maintaining an immutable log of all modifications. Secondly, the JSON Lines format simplifies parsing. When retrieving the timeline, we can read the file line by line, parse each line as a JSON object, and build the timeline array. This avoids the need to load the entire file into memory at once, which is particularly advantageous for reports with extensive histories, making the retrieval process more scalable and performant. The audit_entry itself, as described in the implementation, will contain all necessary information: version, timestamp, actor, request_id, and a detailed changes object. This granular recording ensures that every significant action taken on a report is logged, providing the rich data needed for both the summary view in get_report(include_audit=True) and the detailed timeline analysis offered by get_report_timeline. This file-based, line-delimited JSON approach strikes a balance between ease of implementation, performance, and the ability to store detailed historical data effectively.

3. Adding Timeline Retrieval Logic

The heart of our solution lies in the ability to retrieve and present this historical data. The get_report_timeline function is the dedicated API endpoint for this purpose. Its primary responsibility is to read the audit.jsonl file, process its contents, and return the historical information in a structured format. The function begins by constructing the path to the audit.jsonl file for the given report_id. A crucial check is performed: if the audit.jsonl file does not exist, it means no audit history has been recorded for this report yet. In such cases, the function returns a clear message, such as `{