- Data Pipeline solutions based on the requirements and incorporating various optimization techniques based on various sources involved and data volume.
- Understanding of storage architectures such as Data Warehouse, Data Lake, and Lake houses Deciding tech stack and development standards, proposing tech solutions and architectural patterns and recommending best practices for the big data solution
- Providing thought leadership and mentoring to the data engineering team on how data should be stored and processed more efficiently and quickly at scale
- Ensure adherence with Security and Compliance policies for the products
- Stay up to date with evolving cloud technologies and development best practices including open-source software.
- Work in an Agile Environment and provide optimized solutions to the customers and JIRA for project management
- Proven problem-solving skills with the ability to anticipate roadblocks, diagnose problems and generate effective solutions
- Analyze market segments and customer base to develop market solutions
- Experience in working with batch processing / real-time systems using various
- Enhance/Support solutions using Pyspark/EMR, SQL and databases, AWS Athena, S3, Redshift, Lambda, AWS Glue, and other Data Engineering technologies.
- Proficiency in SQL Writing, SQL Concepts, Data Modelling Techniques, Data validation, Data quality check & Data Engineering Concepts
- Proficiency in design, creation, deployment, review and get the final sign off from the client by following the best practices in SDLC of existing and new products.
- Experience in technologies like Databricks, HDFS, Redshift, Hadoop, S3, Athena, RDS, Elastic MapReduce on AWS or similar services in GCP/Azure
- Scheduling and monitoring of Spark jobs using tools like Airflow, Oozie
- Familiar with version control tools like Git, Code Commit, Jenkins, Code Pipeline
- Work in a Cross functional team along with other Data Engineers, QA Engineers, and DevOps Engineers.
- Develop, test, and implement data solutions based on finalized design documents.
- Familiar with Unix/Linux and Shell Scripting
Experience: 4-7 years of experience
- Excellent communication and problem-solving skills.
- Highly proficient in Project Management principles, methods, techniques, and tools
- Minimum 2 to 4 years of working experience in Pyspark, SQL, AWS development
- Experience of working as a mentor for junior team members
- Hands on experience in ETL process, performance optimization techniques are a must
- Candidate should have taken part in Architecture design and discussion
- Minimum of 4 years of experience in working with batch processing/ real-time systems
- Using various technologies like Databricks, HDFS, Redshift, Hadoop, Elastic MapReduce on AWS, Apache Spark, Hive/Impala and HDFS and NoSQL databases or similar services in Azure or GCP
- Minimum of 4 years of experience working in Datawarehouse or Data Lake Projects in a role beyond just Data consumption.
- Minimum of 4 years of extensive working knowledge in AWS building scalable solutions. Equivalent level of experience in Azure or Google Cloud is also acceptable
- Minimum of 3 years of experience in programming languages (preferably Python)
- Experience in Pharma Domain will be a very Big Plus.
- Familiar with tools like Git, Code Commit, Jenkins, Code Pipeline
- Familiar with Unix/Linux and Shell Scripting
Additional Skills:
- Exposure to Pharma and life sciences would be an added advantage.
- Certified in any cloud technologies like AWS, GCP, Azure.