Senior Data Engineer
Information Technology and Services
Any Graduation Degree
07 Sep 2020
- Apache Spark
- Google Cloud Platform
- Apache Kafka
We are a passionate bunch of technologists, product and operationally savvy folks who believe in solving the last mile problem for India. Our primary focus is mobility and changing all facets of mobility across India and not just in Metros.
What will you find
- You get to work with the best minds and learn from the leaders who have built Rapido
- No boundaries and no white-topping setup - where there is a lot of freedom to explore and expand your horizons of what you want to do in your career.
- High energy teams with passion for value oriented tech
- Witness hyper growth at a scale which you only get to read or hear about in the news
- Tech - which will support scale and growth - Currently at 5k RPS and expecting exponential growth.
- Data platform collecting this across entire transactional backend estate, ~TB query window size for daily analysis, realtime analytical+predictive apps supporting demand, supply and other levers to improve customer experience and operational efficiency.
Ideally, you should have:
- Technical Expertise: Experience building data pipelines and data centric applications using distributed storage platforms like HDFS, S3, NoSql databases (Hbase, Cassandra, etc) and distributed processing platforms like Hadoop, Spark, Hive, Oozie, Airflow, etc in a production setting
- Hands on experience in any of MapR, Cloudera, Hortonworks and/or Cloud (GCP stack is a bonus) based Hadoop distributions.
- Focus on excellence: Has practical experience of Data-Driven Approaches, Is familiar with the application of Data Security strategy, Is familiar with well known data engineering tools and platforms e.g Kafka, Spark, Hadoop, Experience creating and building Big Data Architecture
- Technical depth and breadth : Able to build and operate Data Pipelines, Build and operate Data Storage, Is familiar with Infrastructure definition and automation in this context. Is aware of adjacent technologies to the ones they have worked on. Good understanding of Data Modelling.
- Creating complex data processing pipelines, as part of diverse, high energy teams
- Designing scalable implementations of the models developed by our Data Scientists
- Hands-on programming based on TDD, usually in a pair programming environment
- Deploying data pipelines in production based on Continuous Delivery practices