GreytHR

Reduction in Incident MTTR and Observability Support Tickets

Leveraging Aiden AI Agent via natural language queries

results

50%

Reduction in MTTD & MTTR

90%

Reduction in O11Y support tickets

65%

Reduction in Manual Effort for Incidents Remediation

Abhishek Gaurav

Head of Engineering & DevOps

Aiden transformed how our engineers interact with observability. Natural language insights replaced complex queries and reduced dependency on SREs.

highlights

Background

Techstack

Challenges

StackGen Solution

Results

Schedule a demo

greytHR is a full-suite HRMS platform designed to automate and simplify complex, recurring, and critical HR and payroll functions, ensuring compliance and security. With over 50 tools, greytHR offers ‘Hire-to-Retire’ solutions for People Operations, including advanced modules for recruiting, onboarding, engaging, paying, appraising, retaining, and retiring employees.

The platform also leverages AI-driven analytics and recommendations to enhance employee engagement throughout the entire employee lifecycle. Trusted by CFOs and loved by CHROs, greytHR serves businesses of various sizes and is adaptable across industries like manufacturing, SaaS, healthcare, hospitality, education, and retail.

As India’s leading HRMS and payroll provider, greytHR is rapidly expanding in the MEA and SEA regions, offering world-class Made-in-India software solutions to emerging markets. The company proudly serves over 30,000 clients, managing 3 million+ employees across 25+ countries.

Fragmented Dashboards
Across Clusters:

Each Kubernetes cluster had its own set of dashboards. Engineers had to switch between multiple dashboards to understand system health, making it difficult to get a unified view of the platform. This fragmentation significantly increased Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).

No Correlation Between
Metrics, Logs, and Traces:

Metrics, logs, and traces existed in silos. During incidents, teams manually stitched together data from different tools to identify root causes. The lack of correlation led to longer troubleshooting cycles and delayed incident resolution.

Rising Support Tickets
With Service Growth:

As the number of microservices increased, so did customer-reported issues and internal alerts. The operations team spent more time reacting to incidents instead of proactively improving system reliability.

Scalability and Maintenance Challenges
With Open-Source Tools:

While open-source observability tools worked initially, maintaining and scaling them across multiple environments became operationally expensive. Upgrades, tuning, and managing integrations added ongoing overhead to the platform team.

Manual and Time-Consuming
Reporting:

Weekly incident and anomaly reports were created manually by aggregating data from multiple sources. This process was error-prone, time-consuming, and diverted engineering effort away from higher-value work.

Unified Observability
Across Clusters:

StackGen provided a centralized observability layer that consolidated metrics, logs, and traces across AWS and GCP environments. Engineers could now view platform health from a single pane of glass instead of managing cluster-specific dashboards.

Correlated Metrics, Logs,
and Traces:

By automatically correlating telemetry data, StackGen enabled faster root-cause analysis. Engineers could move seamlessly from a high-level metric anomaly to the exact logs and traces responsible for the issue, dramatically reducing investigation time.

Aiden: Natural Language Observability
for Self-Service Insights:

One of the biggest differentiators for greytHR was Aiden, StackGen’s AI-powered observability chatbot.

Before adopting Aiden, engineers had to rely on SREs to write complex LogQL, PromQL, and TraceQL queries to extract insights from observability data. This resulted in a large volume of support tickets raised just to answer questions such as:

With Aiden, engineering teams can now ask these questions in natural language and instantly receive insights powered by correlated metrics, logs, and traces without needing to understand query languages.

This shift enabled true self-service observability, dramatically reducing dependency on the SRE team while empowering engineers to diagnose issues independently.

Scalable and Low-Maintenance
Observability:

By moving away from heavily customized open-source setups, greytHR reduced the operational burden of maintaining observability tooling. StackGen scaled effortlessly as new services and clusters were added, without increasing maintenance complexity.

Automated Incident and
Anomaly Reporting:

StackGen eliminated the need for manual weekly reports. Incident summaries, trends, and anomaly insights were generated automatically by Aiden, providing leadership and operations teams with consistent and reliable visibility into platform stability.

Reduced MTTD
and MTTR:

With unified dashboards and correlated telemetry, greytHR reduced detection times by 45-55% and resolution times by 55-65%. Engineers could identify issues faster and resolve them with greater confidence.

Improved
Operational Efficiency:

Automation and AI-assisted troubleshooting reduced manual incident response effort by 60-70%. The platform team reclaimed 15-20 engineering hours per week, enabling them to focus on reliability improvements and feature delivery.

Better Visibility for Engineering
and Leadership:

Automated reports and AI-generated summaries reduced manual reporting effort by 85-95%, providing clear insights into system health, incident trends, and recurring problem areas without manual data collection.

Future-Ready Observability
at Scale:

Future-Ready Observability at Scale with StackGen and Aiden, greytHR now has a scalable, intelligent observability foundation that grows with the platform, supporting increasing service complexity without added operational burden.

Platform Overview

MCP Server

Integrations Overview

Aiden for SRE

Aiden for Infrastructure

Aiden for Observability

Systems Don't Lie: Director of Engineering, PocketFM on Reducing Uncertainty During Incidents

Agentic Developer Experience

Brownfield Applications

Greenfield Applications

Managed OSS Observability

Developers

DevOps

Engineering Leaders

Platform Engineers

SRE

Systems Don't Lie: Director of Engineering, PocketFM on Reducing Uncertainty During Incidents

About

Newsroom

Contact Us

Careers

Analysts

Systems Don't Lie: Director of Engineering, PocketFM on Reducing Uncertainty During Incidents

Blog

Videos & Webinars

Whitepapers, E-books and Brochures

Events

Stacked Up

Documentation

Case Studies

Systems Don't Lie: Director of Engineering, PocketFM on Reducing Uncertainty During Incidents

GreytHR

Reduction in Incident MTTR and Observability Support Tickets

50%

90%

65%

Abhishek Gaurav

highlights

Background

Techstack

Challenges

StackGen Solution

Results

ObserveNow

Aiden for OSS O11Y

Blog

Download Brochure