GreytHR

Reduction in Observability Support Tickets

Leveraging Aiden AI Agent via natural language queries

results

50%

Reduction in MTTD & MTTR

90%

Reduction in O11Y support tickets

65%

Reduction in Manual Effort for Incidents Remediation

GreytHR hero image-1

Background

greytHR is a full-suite HRMS platform designed to automate and simplify complex, recurring, and critical HR and payroll functions, ensuring compliance and security. With over 50 tools, greytHR offers ‘Hire-to-Retire’ solutions for People Operations, including advanced modules for recruiting, onboarding, engaging, paying, appraising, retaining, and retiring employees.

The platform also leverages AI-driven analytics and recommendations to enhance employee engagement throughout the entire employee lifecycle. Trusted by CFOs and loved by CHROs, greytHR serves businesses of various sizes and is adaptable across industries like manufacturing, SaaS, healthcare, hospitality, education, and retail.

As India’s leading HRMS and payroll provider, greytHR is rapidly expanding in the MEA and SEA regions, offering world-class Made-in-India software solutions to emerging markets. The company proudly serves over 30,000 clients, managing 3 million+ employees across 25+ countries.

Techstack

aws-1
google
kubernetes-svgrepo-com
cloudstack
github
argo-horizontal-black

Challenges

As greytHR scaled its services and customer base, observability became increasingly difficult to manage.

group-arrows-rotate 1
Fragmented Dashboards
Across Clusters:


Each Kubernetes cluster had its own set of dashboards. Engineers had to switch between multiple dashboards to understand system health, making it difficult to get a unified view of the platform. This fragmentation significantly increased Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).

back-up 1
No Correlation Between
Metrics, Logs, and Traces:


Metrics, logs, and traces existed in silos. During incidents, teams manually stitched together data from different tools to identify root causes. The lack of correlation led to longer troubleshooting cycles and delayed incident resolution.

improvement 1
Rising Support Tickets
With Service Growth:


As the number of microservices increased, so did customer-reported issues and internal alerts. The operations team spent more time reacting to incidents instead of proactively improving system reliability.

hybrid-work 1
Scalability and Maintenance Challenges
With Open-Source Tools:


While open-source observability tools worked initially, maintaining and scaling them across multiple environments became operationally expensive. Upgrades, tuning, and managing integrations added ongoing overhead to the platform team.

session-timeout 1
Manual and Time-Consuming
Reporting:


Weekly incident and anomaly reports were created manually by aggregating data from multiple sources. This process was error-prone, time-consuming, and diverted engineering effort away from higher-value work.

StackGen Solution

To overcome these challenges, greytHR adopted StackGen Observability with Aiden, StackGen’s AI-powered observability assistant, to unify monitoring, accelerate incident response, and reduce operational overhead.

logo

worldwide-network (1) 1
Unified Observability
Across Clusters:


StackGen provided a centralized observability layer that consolidated metrics, logs, and traces across AWS and GCP environments. Engineers could now view platform health from a single pane of glass instead of managing cluster-specific dashboards.

curve-arrow 1
Correlated Metrics, Logs,
and Traces:


By automatically correlating telemetry data, StackGen enabled faster root-cause analysis. Engineers could move seamlessly from a high-level metric anomaly to the exact logs and traces responsible for the issue, dramatically reducing investigation time.

dashboard-panel 1
Aiden: Natural Language Observability
for Self-Service Insights:


One of the biggest differentiators for greytHR was Aiden, StackGen’s AI-powered observability chatbot.


Before adopting Aiden, engineers had to rely on SREs to write complex LogQL, PromQL, and TraceQL queries to extract insights from observability data. This resulted in a large volume of support tickets raised just to answer questions such as:


"Why did latency spike for this service yesterday?"
"Which services were impacted during the last payroll run?"
"Show errors correlated with this deployment."

empty


With Aiden, engineering teams can now ask these questions in natural language and instantly receive insights powered by correlated metrics, logs, and traces without needing to understand query languages.

This shift enabled true self-service observability, dramatically reducing dependency on the SRE team while empowering engineers to diagnose issues independently.

chart-histogram 1
Scalable and Low-Maintenance
Observability:


By moving away from heavily customized open-source setups, greytHR reduced the operational burden of maintaining observability tooling. StackGen scaled effortlessly as new services and clusters were added, without increasing maintenance complexity.

big-data-analytics 1
Automated Incident and
Anomaly Reporting:


StackGen eliminated the need for manual weekly reports. Incident summaries, trends, and anomaly insights were generated automatically by Aiden, providing leadership and operations teams with consistent and reliable visibility into platform stability.

Results

speedometer-arrow 1
Reduced MTTD
and MTTR:


With unified dashboards and correlated telemetry, greytHR reduced detection times by 45-55% and resolution times by 55-65%. Engineers could identify issues faster and resolve them with greater confidence.

engine-algorithm (1) 1
Improved
Operational Efficiency:


Automation and AI-assisted troubleshooting reduced manual incident response effort by 60-70%. The platform team reclaimed 15-20 engineering hours per week, enabling them to focus on reliability improvements and feature delivery.

big-data (2) 1
Better Visibility for Engineering
and Leadership:


Automated reports and AI-generated summaries reduced manual reporting effort by 85-95%, providing clear insights into system health, incident trends, and recurring problem areas without manual data collection.

chat-arrow-down 1
Future-Ready Observability
at Scale:


Future-Ready Observability at ScaleWith StackGen and Aiden, greytHR now has a scalable, intelligent observability foundation that grows with the platform, supporting increasing service complexity without added operational burden.