Site Reliability Engineer
Information Technology and Services
Any Graduation Degree
08 Sep 2020
As a Site Reliability Engineer for MarketPlace Operations @Walmart, you’ll have the opportunity to
- Enjoy working on challenges that no one has solved yet
- Influence Engineering teams to design applications which are Cloud ready
- Be the first Line in handling any issues for one of the largest Private Cloud Infrastructure
- Define monitoring needs for ensuring Best Customer Experience
- Partner with other Engineering teams to have the right tool set to deliver Best Customer Experience on Walmart eCommerce Site
Our Ideal Candidate
- A technical strong and high performing individual with excellent communication skills with strong customer focus and appetite to learn and deliver.
- Capability to program in at least one language, ideally Python or Perl, but Ruby, C/C++, Java, or others are okay
- Experience with Unix/Linux systems with scripting experience in Shell, Perl or Python
- Strong knowledge of core protocols and tech such as: TCP/IP, HTTP, DNS, load balancers, distributed file systems, key-value and relational databases
- Extensive experience with configuration management tools such as Puppet, Chef, Salt, or Ansible
- Experience with specific software such as Hadoop, Kafka, Spark, CouchBase, and similar technologies is desirable, but the ability to quickly learn new technology is most important
- Capable of technical deep-dives into code, networking, systems, and storage with very bright, experienced engineers
- Expertise in problem solving and analyzing global scale distributed systems.
- Logging and Monitoring experience designing, deploying and running systems like Splunk, ELK, New Relic or other APM solutions
- Work with product delivery teams to identify architectural issues and ensure timely and smooth delivery of features into operations.
- Identify gaps in processes, skills, tooling, technology choices and work with upper management to drive improvements within the organization.
- Excellent written and verbal communication skills in order to influence architectural and process level change in the organization.
- Build and Maintain Walmart’s next generation of infrastructure Platform
- Administration of production infrastructure
- Drive improvements in all aspects of service delivery, including change management, continuous delivery, security, monitoring and reliability Database administration in a mission-critical, 24/7 environment which include e-commerce, accounting, warehouse management and decision support systems
- Own end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence; automate response to all non-exceptional service conditions
- Own the day-to-day health, uptime, monitoring, and reliability of services and server infrastructure
- Design, implement, and support high-performance, highly-available services and infrastructure
- Improve the efficiency and flexibility of our datacenters
- Build and maintain models for growth and capacity planning
- Deployment, support and monitoring of new platforms and application stacks