GitHub Workflows: Indexing Published And Unpublished Content

Dec 9, 2025 by Alex Johnson 61 views

In the ever-evolving landscape of software development, staying current with infrastructure and tools is paramount. As the Adobe Experience Manager (AEM) Sites team embarks on a significant migration from GitHub to Azure DevOps, a crucial challenge has emerged concerning the indexing of published and unpublished content. This article delves into the details of a recent spike, focusing on the backend (BE) engineering efforts required to remove our reliance on specific GitHub workflows. Our goal is to ensure seamless content indexing and management within the new Azure DevOps environment, ultimately enhancing the efficiency and reliability of our AEM Sites platform. The necessity for this migration stems from a identified gap in functionality: GitHub utilizes a specific GitHub App that triggers events precisely when a resource is published. Azure DevOps, unfortunately, does not offer a direct, built-in equivalent for this particular event. This means our existing system, which depends on these GitHub action events to index published content and then remove those indexes when content is unpublished, needs a robust redesign. This spike is dedicated to exploring and defining that redesign, ensuring our content remains accurately represented and discoverable without the previous GitHub-specific triggers. We aim to provide a clear, actionable path forward for the development team.

Understanding the Need for Change: Beyond GitHub Actions

Let's dive deeper into why this migration and the subsequent backend engineering spike are so critical for our AEM Sites platform. The core issue revolves around content indexing and the mechanisms we use to keep our search functionalities up-to-date. In our current GitHub-based setup, a specialized GitHub App plays a vital role. It acts as a vigilant monitor, detecting and signaling whenever a resource is published. This signal is then used by our backend systems to trigger the indexing process, making that content searchable. Conversely, when a resource is unpublished, another event is captured, prompting the removal of its index to ensure users only see currently available content. This finely tuned process is essential for maintaining the integrity and relevance of our search results. However, as we transition to Azure DevOps, we face a significant hurdle: Azure DevOps, in its standard configuration, does not possess a native equivalent to this GitHub App's real-time publishing event notification. This absence creates a functional gap that we must bridge with our own backend infrastructure solutions. The spike we're discussing is precisely about designing and implementing this bridge. It's not just about replicating functionality; it's about creating a more resilient and adaptable system that is independent of specific third-party application triggers. By focusing on the backend effort, we aim to build a solution that can effectively manage content publishing and unpublishing events within Azure DevOps, ensuring that our content remains accurately indexed and searchable without the dependency on the GitHub workflows that have served us thus far. This proactive approach is key to a smooth and successful migration, minimizing disruption and maximizing the benefits of our new development environment.

Analyzing Existing GitHub Workflows: A Deep Dive

To effectively design a new solution, our first and most crucial step in this spike was to meticulously analyze the existing GitHub workflows that handle resource publishing and unpublishing. This wasn't just a cursory glance; it involved a deep dive into the mechanics of how these workflows currently operate, the specific triggers they respond to, and the data they process. We needed to understand the intricate logic that dictates when content gets indexed and when it gets removed from our search indexes. This analysis primarily focused on identifying the precise GitHub App events that are being leveraged. These events act as the signals that our backend systems are currently listening for. By understanding the exact nature of these signals – what information they contain, how frequently they occur, and their reliability – we can better assess what needs to be replicated or replaced in Azure DevOps.

Furthermore, we examined the payload of these events. What data is transmitted when a resource is published or unpublished? This data is crucial for our indexing process. Is it just a notification, or does it include metadata about the resource itself? Understanding this will help us determine what information our new backend solution will need to fetch or process. We also looked at the dependencies within these workflows. Are there other services or systems that these GitHub workflows rely on? Identifying these dependencies is critical to ensure that when we move away from GitHub, we don't inadvertently break other parts of our ecosystem. This detailed examination allows us to map out the current state comprehensively. It provides a clear picture of what works, what might be brittle, and what needs to be fundamentally re-architected. Without this thorough understanding of the existing GitHub workflows, any attempt to build a new solution would be akin to constructing a building without blueprints – prone to errors, inefficiencies, and ultimately, failure. This investigative phase is the bedrock upon which our recommended approach will be built, ensuring that our solution is not only functional but also robust and future-proof.

Evaluating Options and Identifying Dependencies: Charting the Course Forward

Following our in-depth analysis of the existing GitHub workflows, the next logical step in this spike was to evaluate potential options and meticulously identify any new dependencies that our proposed backend solution might introduce. This phase is about exploring the technological landscape and making informed decisions about the best path forward. Since Azure DevOps doesn't offer a direct replacement for the GitHub App's event-driven publishing notifications, we had to consider alternative strategies. Our evaluation centered on identifying backend mechanisms that could effectively detect changes in content status – specifically, when a resource is published or unpublished.

Several avenues were explored. One option involved leveraging Azure's own eventing services, such as Azure Event Grid or Azure Service Bus, to create custom event triggers. Another path might have been to implement a polling mechanism that periodically checks for changes in our content repository or a dedicated content management system API. We also considered the possibility of integrating with Azure DevOps' own eventing capabilities, even if they don't directly map to the GitHub App's specific triggers. Each of these options came with its own set of pros and cons, including implementation complexity, scalability, cost, and performance implications.

Crucially, during this evaluation, we paid close attention to identifying any new dependencies. Moving away from GitHub workflows means we need to ensure our new solution can reliably interact with our content sources and our indexing services. This might involve dependencies on specific Azure services, database connections, or APIs of other internal systems. We needed to assess the stability and availability of these potential dependencies. For instance, if we decide to use a custom polling mechanism, we need to ensure that the content source API we query is robust and doesn't have rate limits that would impede our indexing process. Similarly, if we opt for an event-driven approach using Azure Event Grid, we need to understand the configuration and management overhead associated with it. This thorough evaluation and dependency identification process is vital for selecting a solution that is not only effective in achieving our goals but also sustainable and manageable in the long run. It lays the groundwork for a confident and well-informed recommendation.

Providing a Recommended Approach: The Path to Independence

After a comprehensive analysis of the existing GitHub workflows and a thorough evaluation of various technical options and their associated dependencies, we are now prepared to provide a recommended approach for our backend efforts. The primary objective is to decouple our content indexing process from the specific GitHub action events, thereby eliminating our reliance on the GitHub App for publishing and unpublishing notifications. Our recommendation centers on implementing a robust backend solution within Azure DevOps that can independently manage these crucial content lifecycle events.

We propose leveraging a combination of Azure services to create a custom event-driven architecture. This architecture will involve a mechanism to detect changes in content status directly within our content repository or through a dedicated content management system (CMS) API. Instead of relying on external GitHub events, our backend services will proactively monitor for these changes. For instance, we could implement a scheduled task or a webhook listener that queries the CMS API for recently published or unpublished content. Upon detecting a change, this mechanism will then publish custom events to an Azure Service Bus or Azure Event Grid.

These custom events will carry the necessary metadata about the content (e.g., resource ID, content type, publication status). Our indexing service, which will also reside within Azure DevOps, will subscribe to these events. When it receives a 'published' event, it will initiate the indexing process for the relevant content. Conversely, upon receiving an 'unpublished' event, it will trigger the removal of the content's index. This approach offers several key advantages: it eliminates the direct dependency on GitHub workflows, making our system more resilient to changes in external tooling; it provides a more granular and controlled method for managing indexing, as we define the events and logic ourselves; and it aligns perfectly with the event-driven nature of modern cloud architectures, offering better scalability and maintainability. The dependencies identified during the evaluation phase, such as the specific CMS APIs or Azure messaging services, have been deemed manageable and within our team's expertise to implement and maintain. This recommended approach ensures that our AEM Sites content remains accurately indexed and discoverable as we transition to Azure DevOps, setting a solid foundation for future development.

Deliverables and Next Steps: Building the Future

As a result of this spike, we have a clear set of deliverables and a defined path forward for implementation. The final recommendation is to adopt a custom event-driven architecture within Azure DevOps, leveraging Azure Service Bus or Event Grid to manage content publishing and unpublishing events independently of GitHub workflows. This approach ensures that our content indexing remains accurate and efficient, removing the critical dependency that necessitated this spike.

Accompanying this recommendation are the next steps, which have been outlined in a follow-up ticket designed for seamless implementation. This ticket will detail the technical specifications, required resources, and a phased rollout plan. It will serve as the actionable roadmap for our engineering team to build out this new infrastructure. The implementation will involve developing the content change detection mechanism, configuring the Azure messaging service, and ensuring our indexing service can effectively consume these new custom events. We will also include provisions for comprehensive testing and monitoring to guarantee the reliability and performance of the new system. By taking these deliberate steps, we are not only addressing the immediate challenge posed by the migration but also building a more robust, scalable, and future-ready content indexing solution for our AEM Sites platform. The transition to Azure DevOps, coupled with this independently managed indexing process, will empower us to deliver a superior experience for our users.

For further insights into cloud migration strategies and best practices for managing development pipelines, you can refer to resources from Microsoft Azure and DevOps.com. These sites offer extensive documentation, case studies, and expert advice on navigating complex transitions and optimizing your development workflows.