SRE & Observability Jobs In Hyderabad

Ready to make a difference in your career, your life, the world?

At Virtusa, your passion will drive the kind of innovation that transforms industries, economies, and lives throughout the world. Join Virtusa, a global leader in digital busines strategy, digital engineering, and IT services and solutions. Explore all the ways you can make a difference. Learn more about working at Virtusa

Who we are

Businesses today require transformational change at a scale and speed that defies traditional ways of working. We spark change through our Digital Transformation Studio that delivers deep digital engineering and industry expertise through client-specific and integrated agile scrum teams.

Virtusa helps business move forward, faster by combining deep industry expertise and frictionless technology delivery. Learn more about working at Virtusa

 

Join Our Team

We’re always on the look out for new talent. Check out our current openings and apply today!

Results (1-11 of 11)

List Grid
SRE & Observability Jobs In Hyderabad
JOB TITLE
CATEGORY
LOCATION
JOB ID
Hyderabad

SRE

Role: SREExperience: 6 to 10 yearsWork Mode: HybridWork timings: 2pm to 11pmLocation: Chennai & HyderabadPrimary Skills: SREYou are passionate about driving SRE / DevSecOps mindset and culture in a fast-paced, challenging environment where you get the opportunity to work with a spectrum of latest tools and technologies to drive forward Automation, Observability and CI/CD automationYou are actively looking to improve implemented solutions, understand the efficacy of collaboration, work with cross functional teams to build and improve CI/CD pipeline and improve automation (reduce Toil).As a member of this team, you possess the ability to inspire and leverage your experience to inject new knowledge and skills into an already high performing team.Help Identifying areas of improvement, especially when it comes to Observability, Proactiveness, Automation & Toil Management.Strategic approach with clear objectives to improve System Availability, Performance Optimization, and improve Incident MTBuild and maintain Reliable Engineering Systems using SRE and DevSecOps models with special focus on Event Management (monitoring/alerts), Self Healing and Reliability testingStrong programming skills with experience in API and Webhook development using Dynatrace, GitHub workflows, Ansible, CDK, Type/Java script, Python, Node.js, Ruby, PowerShell, and Shell Scripting languages.Strong understanding of Cloud computing (AWS)Strong understanding of SDLC and DevSecOpsExperience in CI/CD pipeline tools such as JIRA, GitHub, Bitbucket, Artifactory, Ansible, or equivalentWorking knowledge of Lambda, Glue and CDKKnowledge of cloud services: Application integration, functions, Cloud Databases, data warehouse and analytics, Machine Learning, Developer Tools, Security and identity managementKnowledge of software development practices, concepts, and technology obtained through formal training and/or work experience.Knowledge of required programming languages and can code with minimum guidance.Understand functional aspects and technical behavior of the underlying operating system, development environment, and deployment practices.
Learn More
CREQ216451
SRE & Observability
India, Hyderabad
Hyderabad

UNIX Admin

UNIX AdminAt least three years of experience working as a UNIX Administrator.Extensive knowledge of UNIX and LINUX operating systems, storage environments, network protocols and file systems.Familiarity with multiple information technologies that include operating systems, server virtualization, cloud based infrastructure, automation, and middleware with expertise in several of the technologies, including RedHat/CentOS Linux, Solaris, VMWare, X86 hardware and AWS.Creating and maintaining UNIX user accounts and access management systems.Creating and setting standardized backup and recovery policies, as well as security policies. Applying patches and upgrades when necessary.Assisting in resolution of hardware software platform problems in complex multilayered environment.Analyzing impact of software changes across other functional units.Maintaining space utilization statistics, forecast future space requirements.Establish, implement, and record Unix Administrator best practices.Implementing and managing system to proactively monitor infrastructure.Maintain responsibility for rackingderacking servers and network connections.Making recommendations for upgrades to hardware and software based on existing operations, pending needs and available budget.Hands on experience in UNIX Administrator Extensive knowledge of UNIX and LINUX operating systems, storage environments, network protocols and file systems. Familiarity with multiple information technologies that include operating systems, server virtualization, cloud based infrastructure, automation, and middleware with expertise in several of the technologies including RedHatCentOS Linux Solaris VMWare X86 hardware and AWS. Mandatory Skills Hands on experience in UNIX Administrator Extensive knowledge of UNIX and LINUX operating systems, storage environments, network protocols and file systems. Familiarity with multiple information technologies that include operating systems, server virtualization, cloud based infrastructure, automation, and middleware with expertise in several of the technologies including RedHat/CentOS Linux Solaris VMWare X86 hardware and AWS.
Learn More
CREQ215969
SRE & Observability
India, Hyderabad
Hyderabad

Site Reliability Engineer

Experience Minimum 6 years of relevant work experience with AppDynamics set up in critical production environments Has experience working with AWS and on-prem hosted applications in hybrid cloud Experience in implementing APM and RUM for end-to-end tracing and custom alerts with AppDynamics Core Capabilities Expert level knowledge on AppDynamics integration with agents as well as APM and RUM AWS proficiency with containers and Cloudwatch is key Ability to configure custom alerts and monitors with AppDynamics Ability to build end-to-end observability using AppDynamics from user interaction all the way into infrastructure Good understanding of AppDynamics integration capabilities with other systems Ability to build custom AppDynamics dashboards Ansible or Powershell knowledge is helpful Ability to write SQLs and use AppDynamics to observe database transactions Qualification: AppDynamics official certification or alternative certification from Udemy, Coursera or other platforms Role & Responsibilities: Implement the entire observability solution using AppDynamics for .NET monolithic and Java based microservices applications and its infrastructure Implement AppDynamics RUM, APM setup and Log consolidation Build integrations for observability into on-prem and cloud hosted applications using AppDynamics and ensure the deployment as well as continuous running of agents Instrument and expose traces from monolithic .NET applications and Java microservices using AppDynamics libraries Set up monitoring of database queries and performance of application transactions with AppDynamics Consult and guide a team of observability engineers to implement the AppDynamics solution Train a new team and hand over in-life maintenance of the AppDynamics solution built
Learn More
CREQ207831
SRE & Observability
India, Hyderabad
Hyderabad

SRE - Core GCP

Experience Minimum 7 years of work experience as an SRE (not Traditional Production Support) covering integration platforms on cloud-based deployments Coding / automation scripting experience in any programming language, particularly for integration tier and middleware Working as a DevOps Engineer or SRE in mission critical applications and infrastructure Working experience with GCP (Google Cloud), particularly with GKE is important Working with AppDynamics and Splunk for monitoring and setting up observability is key CI CD tool chains, setting up and running deployment pipelines and propagating changes on different environments Core Capabilities Maintaining middleware such as Kafka (open source) and MQ as well as application servers (Tomcat) Maintain Hazelcast Data storage platform clusters and Control M job schedulers GCP and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc Kubernetes cluster management, monitoring and remediation. Knowledge of Docker is important Automating deployments and scripting self-healing workflows based on telemetry Define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability Work with code as well as configuration artifacts to debug and fix issues that may arise Knowledge of applying SRE practices to daily operations is key Must be inclined to work on proof-of-concept solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging Ability to work in shifts in office is mandatory; this is a 24 / 7 on-desk operation Qualification Computer Science and or Engineering degrees are preferred SRE Foundation certification by DevOps Institute or any other equivalent certification on SRE by a recognized body is mandatory CKA certification GCP Cloud Digital Leader certification at a minimum is mandatory; Cloud Engineer level is a bonus Hazelcast Platform Operations certification badge Role & Responsibilities Work as part of a 24 / 7 on-desk team in shifts that will manage middleware and associated applications that are being consumed globally incident, change, event, problem management Debugging integrations and consumers at the code level Work with CI CD pipelines and automate new change rollouts. Change deployment and sanity testing is part of the scope Set up and configure an observability product, preferably AppDynamics or Splunk for end-to-end traceability and log analytics Be the guardian to ensure high reliability of the applications, middleware, storage platforms, scheduler (and its jobs) and underlying cloud infrastructure Define and set up SLIs as well as SLOs while continuously refining thresholds Set up anomaly detection and auto-remediation workflows Ensure all alerts and incidents within scope are actioned upon before breaching SLOs
Learn More
CREQ207816
SRE & Observability
India, Hyderabad
Hyderabad

Site Reliability Engineer

Experience Minimum 7 years of work experience as an SRE (not Traditional Production Support) covering integration platforms on cloud-based deployments Coding / automation scripting experience in any programming language, particularly for integration tier and middleware Working as a DevOps Engineer or SRE in mission critical applications and infrastructure Working experience with GCP (Google Cloud), particularly with GKE is important Working with AppDynamics and Splunk for monitoring and setting up observability is key CI CD tool chains, setting up and running deployment pipelines and propagating changes on different environments Core Capabilities Maintaining middleware such as Kafka (open source) and MQ as well as application servers (Tomcat) Maintain Hazelcast Data storage platform clusters and Control M job schedulers GCP and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc Kubernetes cluster management, monitoring and remediation. Knowledge of Docker is important Automating deployments and scripting self-healing workflows based on telemetry Define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability Work with code as well as configuration artifacts to debug and fix issues that may arise Knowledge of applying SRE practices to daily operations is key Must be inclined to work on proof-of-concept solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging Ability to work in shifts in office is mandatory; this is a 24 / 7 on-desk operation Qualification Computer Science and or Engineering degrees are preferred SRE Foundation certification by DevOps Institute or any other equivalent certification on SRE by a recognized body is mandatory CKA certification GCP Cloud Digital Leader certification at a minimum is mandatory; Cloud Engineer level is a bonus Hazelcast Platform Operations certification badge Role & Responsibilities Work as part of a 24 / 7 on-desk team in shifts that will manage middleware and associated applications that are being consumed globally incident, change, event, problem management Debugging integrations and consumers at the code level Work with CI CD pipelines and automate new change rollouts. Change deployment and sanity testing is part of the scope Set up and configure an observability product, preferably AppDynamics or Splunk for end-to-end traceability and log analytics Be the guardian to ensure high reliability of the applications, middleware, storage platforms, scheduler (and its jobs) and underlying cloud infrastructure Define and set up SLIs as well as SLOs while continuously refining thresholds Set up anomaly detection and auto-remediation workflows Ensure all alerts and incidents within scope are actioned upon before breaching SLOs
Learn More
CREQ207815
SRE & Observability
India, Hyderabad
Hyderabad

SRE - Core GCP

Experience Minimum 7 years of work experience as an SRE (not Traditional Production Support) covering integration platforms on cloud-based deployments Coding / automation scripting experience in any programming language, particularly for integration tier and middleware Working as a DevOps Engineer or SRE in mission critical applications and infrastructure Working experience with GCP (Google Cloud), particularly with GKE is important Working with AppDynamics and Splunk for monitoring and setting up observability is key CI CD tool chains, setting up and running deployment pipelines and propagating changes on different environments Core Capabilities Maintaining middleware such as Kafka (open source) and MQ as well as application servers (Tomcat) Maintain Hazelcast Data storage platform clusters and Control M job schedulers GCP and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc Kubernetes cluster management, monitoring and remediation. Knowledge of Docker is important Automating deployments and scripting self-healing workflows based on telemetry Define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability Work with code as well as configuration artifacts to debug and fix issues that may arise Knowledge of applying SRE practices to daily operations is key Must be inclined to work on proof-of-concept solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging Ability to work in shifts in office is mandatory; this is a 24 / 7 on-desk operation Qualification Computer Science and or Engineering degrees are preferred SRE Foundation certification by DevOps Institute or any other equivalent certification on SRE by a recognized body is mandatory CKA certification GCP Cloud Digital Leader certification at a minimum is mandatory; Cloud Engineer level is a bonus Hazelcast Platform Operations certification badge Role & Responsibilities Work as part of a 24 / 7 on-desk team in shifts that will manage middleware and associated applications that are being consumed globally incident, change, event, problem management Debugging integrations and consumers at the code level Work with CI CD pipelines and automate new change rollouts. Change deployment and sanity testing is part of the scope Set up and configure an observability product, preferably AppDynamics or Splunk for end-to-end traceability and log analytics Be the guardian to ensure high reliability of the applications, middleware, storage platforms, scheduler (and its jobs) and underlying cloud infrastructure Define and set up SLIs as well as SLOs while continuously refining thresholds Set up anomaly detection and auto-remediation workflows Ensure all alerts and incidents within scope are actioned upon before breaching SLOs
Learn More
CREQ207788
SRE & Observability
India, Hyderabad
Page Please enter expected page number. of 1

GO

Previous

Next

Enjoy the freedom to innovate

At Virtusa, we're thinkers and doers; we thrive on collaboration, competition, and endless curiosity. We get stuff done, together. And we're looking for people with bold, fresh ideas, and that certain spark, who embody what it takes to be a #Virtusan and can move at the speed of change.

Related Content