- Data Pipeline solutioning based on the requirements and incorporating various optimization techniques based on various sources involved and data volume.
- Understanding of storage architectures such as Data Warehouse, Data Lake, and Lake houses
- Deciding tech stack and development standards, proposing tech solutions and architectural patterns, and recommending best practices for the big data solution
- Providing thought leadership and mentoring to the data engineering team on how data should be stored and processed more efficiently and quickly at scale
- Ensure adherence with Security and Compliance policies for the products
- Stay up to date with evolving cloud technologies and development best-practices including open-source software.
- Work in an Agile Environment and provide optimized solutions to the customers and JIRA for project management
- Proven problem-solving skills with the ability to anticipate roadblocks, diagnose problems and generate effective solutions
- Analyze market segments and customer base to develop market solutions
- Experience in working with batch processing / real-time systems using various
- Enhance/Support solutions using Pyspark/EMR, SQL and databases, AWS Athena, S3, Redshift, Lambda, AWS Glue, and other Data Engineering technologies.
- Proficiency in SQL Writing, SQL Concepts, Data Modelling Techniques, Data validation, Data quality check & Data Engineering Concepts
- Proficiency in design, creation, deployment, review and get the final sign off from the client by following the best practices in SDLC of existing and new products.
- Experience in technologies like Databricks, HDFS, Redshift, Hadoop, S3, Athena, RDS, Elastic MapReduce on AWS or similar services in GCP/Azure
- Scheduling and monitoring of Spark jobs using tools like Airflow, Oozie
- Familiar with version control tools like Git, Code Commit, Jenkins, Code Pipeline
- Work in a Cross functional team along with other Data Engineers, QA Engineers, and DevOps Engineers.
- Develop, test, and implement data solutions based on finalized design documents.
- Familiar with Unix/Linux and Shell Scripting
SKILL SET
- BE, BS or MS in Computer Science or related field with overall 4+ Years of
working experience
- Excellent communication and problem-solving skills.
- Highly proficient in Project Management principles, methods, techniques, and tools
- Minimum 4 years of working experience in Pyspark, SQL, AWS development
- Experience of working as a mentor for junior team members
- Hands on experience in ETL process, performance optimization techniques are a must
- Candidate should have taken part in Architecture design and discussion
- Minimum of 4 years of experience in working with batch processing/ real-time systems
- Using various technologies like Databricks, HDFS, Redshift, Hadoop, Elastic MapReduce on AWS, Apache Spark, Hive/Impala and HDFS and NoSQL databases or similar services in Azure or GCP
- Minimum of 4 years of experience working in Datawarehouse or Data Lake Projects in a role beyond just Data consumption.
- Minimum of 4 years of extensive working knowledge in AWS building scalable solutions. Equivalent level of experience in Azure or Google Cloud is also acceptable
- Minimum of 4 years of experience in programming languages (preferably Python)
- Experience in Pharma Domain will be a very Big Plus.
- Familiar with tools like Git, Code Commit, Jenkins, Code Pipeline
- Familiar with Unix/Linux and Shell Scripting
Additional Skills:
- Exposure to Pharma and life sciences would be an added advantage.
- Certified in any cloud technologies like AWS, GCP, Azure.
- Data Pipeline solutioning based on the requirements and incorporating various optimization techniques based on various sources involved and data volume.
- Understanding of storage architectures such as Data Warehouse, Data Lake, and Lake houses
- Deciding tech stack and development standards, proposing tech solutions and architectural patterns, and recommending best practices for the big data solution
- Providing thought leadership and mentoring to the data engineering team on how data should be stored and processed more efficiently and quickly at scale
- Ensure adherence with Security and Compliance policies for the products
- Stay up to date with evolving cloud technologies and development best-practices including open-source software.
- Work in an Agile Environment and provide optimized solutions to the customers and JIRA for project management
- Proven problem-solving skills with the ability to anticipate roadblocks, diagnose problems and generate effective solutions
- Analyze market segments and customer base to develop market solutions
- Experience in working with batch processing / real-time systems using various
- Enhance/Support solutions using Pyspark/EMR, SQL and databases, AWS Athena, S3, Redshift, Lambda, AWS Glue, and other Data Engineering technologies.
- Proficiency in SQL Writing, SQL Concepts, Data Modelling Techniques, Data validation, Data quality check & Data Engineering Concepts
- Proficiency in design, creation, deployment, review and get the final sign off from the client by following the best practices in SDLC of existing and new products.
- Experience in technologies like Databricks, HDFS, Redshift, Hadoop, S3, Athena, RDS, Elastic MapReduce on AWS or similar services in GCP/Azure
- Scheduling and monitoring of Spark jobs using tools like Airflow, Oozie
- Familiar with version control tools like Git, Code Commit, Jenkins, Code Pipeline
- Work in a Cross functional team along with other Data Engineers, QA Engineers, and DevOps Engineers.
- Develop, test, and implement data solutions based on finalized design documents.
- Familiar with Unix/Linux and Shell Scripting
SKILL SET
- BE, BS or MS in Computer Science or related field with overall 4+ Years of
working experience
- Excellent communication and problem-solving skills.
- Highly proficient in Project Management principles, methods, techniques, and tools
- Minimum 4 years of working experience in Pyspark, SQL, AWS development
- Experience of working as a mentor for junior team members
- Hands on experience in ETL process, performance optimization techniques are a must
- Candidate should have taken part in Architecture design and discussion
- Minimum of 4 years of experience in working with batch processing/ real-time systems
- Using various technologies like Databricks, HDFS, Redshift, Hadoop, Elastic MapReduce on AWS, Apache Spark, Hive/Impala and HDFS and NoSQL databases or similar services in Azure or GCP
- Minimum of 4 years of experience working in Datawarehouse or Data Lake Projects in a role beyond just Data consumption.
- Minimum of 4 years of extensive working knowledge in AWS building scalable solutions. Equivalent level of experience in Azure or Google Cloud is also acceptable
- Minimum of 4 years of experience in programming languages (preferably Python)
- Experience in Pharma Domain will be a very Big Plus.
- Familiar with tools like Git, Code Commit, Jenkins, Code Pipeline
- Familiar with Unix/Linux and Shell Scripting
Additional Skills:
- Exposure to Pharma and life sciences would be an added advantage.
- Certified in any cloud technologies like AWS, GCP, Azure.