How We Solved VM Uptime Monitoring for Azure and On-Prem Machines

At Cloudforce, we pride ourselves on going beyond the obvious. While our AI capabilities often take center stage, many of the solutions we’re most proud of are the ones built behind the scenes—crafted quietly and strategically to solve real-world problems for our clients.

Recently, one of our customers asked for a dashboard that could monitor VM uptime across both Azure and on-prem environments. It seemed like a simple ask at first—until we realized there was no native uptime metric available in Azure Monitor Logs (formerly Log Analytics) that covered both scenarios. Even Azure’s own “VM Availability” metric (still in preview) fell short, and it offered no support for non-Azure machines.

So, we rolled up our sleeves and created our own.

Thinking Differently: From Problem to Possibility

Instead of stopping at “not supported,” we explored what was available. We realized that every VM reports a heartbeat—a regular signal sent every minute that confirms the machine is up and running. That heartbeat, we realized, could become the foundation for our own uptime metric.

By querying heartbeat data with Kusto Query Language (KQL), we calculated how many heartbeats were received versus how many were expected during a given time range. In doing so, we were able to derive uptime for each machine—whether it lived in the cloud or on-prem.

This approach was simple, elegant, and—most importantly—effective.

What We Learned (and What to Watch Out For)

Like any custom solution, this one came with trade-offs. If a heartbeat wasn’t received, we interpreted that as downtime. That means even short network hiccups or temporary agent issues could skew the data.

We also had to consider data retention policies. If the Log Analytics Workspace purges data after a short window (say, 30 days), historical uptime queries would fail. To solve for that, we set up an additional pipeline: we sent the query results to a dedicated database where we could store and build upon the data over time—enabling historical reporting, trending, and visual dashboards.

📊 In the end, the client got what they asked for—plus a scalable, repeatable method we can now apply to other environments.

Why This Work Matters

Solutions like this are a reminder that innovation isn’t always flashy—it’s about looking at existing tools in new ways, asking better questions, and refusing to accept limitations as final answers. What began as a simple request became an opportunity to rethink how we define uptime visibility across hybrid environments.

Huge shoutout to my teammate Atem Njinju, whose collaboration, insights, and dedication were instrumental in bringing this solution to life for our client teams. It’s a reminder that the best technical solutions are often built through teamwork and shared curiosity.

At Cloudforce, we don’t just work in the cloud—we find new ways to make it work for our clients.

And this kind of solution is just the beginning. From custom metrics to enterprise-ready AI, our team is constantly pushing boundaries. Learn how our nebulaONE® platform is helping organizations unlock deeper insights, smarter automation, and a more intelligent cloud ecosystem.

👉 Explore nebulaONE and see how Cloudforce is redefining what’s possible with Azure + AI.

Matthew Day
Author

Matthew grew up in College Park, MD, only a mile from the University of Maryland. He always loved sports and computers growing up, and ended up falling in love with baseball, which he played from the time he was five up through college. Matthew is a graduate of Coppin State University, where he spent four years as a starter and team captain on their Division I baseball team. Matthew is very thankful for his time there, as it showed him how to remain focused, work hard, lead and take responsibility for a group. Matthew speaks French, a little Russian, and he's learning Spanish. As a Cloud Services Associate at Cloudforce, Matthew also enjoys learning new technology and helping others learn as well.

Recommended for you.