We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results

Principal Software Engineer

Microsoft
United States, Georgia, Atlanta
Oct 22, 2025
OverviewAre you a customer-obsessed, AI-curious problem-solver who thrives in an inclusive, collaborative global team? The Azure CXP team's mission is to transform Microsoft Cloud customers into fans. Through our deep engineering engagements with customers and teams across Microsoft, we analyze and amplify customer needs and drive the vision to improve Cloud quality, security, and reliability. Our culture of growth mindset and empowerment are central to who we are and how we work. We are customer-obsessed problem-solvers. We orchestrate deep engagements in areas like incident management, support, and enablement. We analyze and amplify those customer voices, both within our own team, and across the Cloud + AI team, bringing the customer connection to the Quality vision for Azure. We innovate ways to scale what we learn across our customer base. Diversity and inclusion are central to who we are, how we work, and what we enable our customers to achieve. We know that empowering our customers starts with empowering our team to show up authentically, work in ways that are best for them, and achieve their career goals. Azure Reliability team is a multidisciplinary engineering organization part of CXP dedicated to making, "Azure the safest and most reliable Cloud". We are the Azure Reliability team; a multidisciplinary engineering organization committed to making Azure the world's safest and most reliable cloud. For Azure's most critical services and products. Our software engineers work closely with product teams to enhance availability, reliability, observability, and operability across our planet-scale systems. We prioritize long-term platform improvements through engineering over repetitive manual tasks. Increasingly, we leverage AI to detect anomalies, predict incidents, and automate operational workflows, amplifying our ability to scale reliability across Azure. Our teams contribute to product architecture, share knowledge, and code, and focus on building reusable solutions that benefit multiple teams and services.Every day, our customers stake their business and reputation on our cloud. You can help #AzCXP provide our customers with the world-class cloud services they need to succeed.Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
ResponsibilitiesDesign and implement world-class distributed systems to support billions of users worldwide.Improve the reliability and resilience of key Azure products.Define and maintain system reliability goals through Service Level Objectives (SLOs).Enhance production systems with improvements in observability, telemetry, alerting, incident and change management, and deployment safety.Build reusable automation and scalable processes to support multiple engineering teams in achieving their reliability goals.Influence product architecture and roadmap to embed reliability as a core design principle.Contribute directly to product code to drive reliability improvements.Leverage AI technologies to detect anomalies, predict incidents, and automate operational workflows at scale.Design and develop secure, modular, reliable, testable, and observable distributed services and solutions.Collaborate with internal and external stakeholders to align efforts and deliver cohesive outcomes.Drive continuous improvement in engineering processes and codebases.Develop automation solutions to prevent or resolve service issues before they impact users.Apply AI tools and techniques to reduce operational toil and scale practices across complex environments.Gain domain knowledge of Microsoft's business ecosystem and contribute to seamless, end-to-end user experiences.
Applied = 0

(web-675dddd98f-24cnf)