Job Listing Description

Site Reliability Engineer

Description:
Site Reliability Engineering (SRE) applies software engineering techniques and discipline to production operations to attack major problems and fix them for good. Our customers count on us to provide extraordinary availability, scalability and security for our services. SRE should be comfortable with taking on new engineering challenges, defining potential solutions, and implementing designs in a team environment. This position will play an important role in our organization’s evolution towards contemporary application and infrastructure management practices and will be expected to both guide and support the team’s growth and learning. SRE is new and members of this team will have the chance to influence the direction for a critical and global SRE organization. SRE will also be focused on addressing the hot/tactical/engineering issues that are impacting the ongoing integration activities within Technology Services (TS)
Responsibilities:
• Build holistic visibility into SLIs, SLOs, and SLAs, dependency graphs, past performance of software, network, and system to ensure that we can continue to scale without increasing operational burden or toil.
• Assess the current state of the environment and drive “SWAT” initiatives in collaboration with the rest of the Organization to ensure transparency, resiliency, stability, reliability etc... Across both Applications & Infrastructure stack. SWAT initiatives for future state can vary from Incident Analysis leveraging ML & AI/ Assisting with Datacenter Stability & Consolidation effort to Application Transformation [Monolithic to Microservices, PaaS etc.]
• Enables the adoption and implementation of cloud-based application reliability, resiliency, and observability /deployment best practices for production & non-prod environments including public cloud migration of our mission critical applications from the on-prem data-centers.
• Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems.
• Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform.
• Step back to observe patterns and develop innovative tools and automation to minimize toil. Use those learnings to drive the best operational practices.
• Monitor and report on service level objectives for a given applications services. Work with business and product owners to establish key performance indicators.
• Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
• Partner with the broader Fiserv organization to build a culture of rigorously learning from incidents.
• Share your knowledge by giving brown bags, tech talks, and evangelizing appropriate tech and engineering best practices.
• Unblock, support, and effectively communicate across teams to achieve results.
• Define roadmap and architecture based on technology and business needs. &
 

 
Job Number: 2010120102
Job Location: Alpharetta, GA
Duration: 6 months
Input Date: 11/14/2020
Last Updated: 11/25/2020
Firm Name: PDS TECHNICAL SERVICES
Attention: Karen Reno
Address: 300 E JOHN CARPENTER FWY STE 700
City, State: IRVING, TX 75062
Phone: 214/647-9600
800 Phone: 800/270-4737
Fax Phone: 214/647-9630
Website: https://pdsjobs.force.com/candidates/job_detail?id=a1i1T000003PAIUQA4&URLSource=cjhunter

Previous Listing       Next Listing
Back to Abbreviated Search Results
Back to Complete Search Results
Back to Advanced Job Search

Phone: (425) 806-5200
Fax: (425) 806-5585
Email: staff@cjhunter.com
ContractJobHunter is a service of:
C.E. Publications, Inc.
P.O. Box 3006, Bothell, WA 98041-3006, USA
Disclaimer
The content of this website is Copyright 2020 C.E. Publications, Inc.