Hands on experience in developing routines using Hadoop Map Reduce, Pyspark, HIVE, SQOOP and Linux scripting.
Expertise in: Hadoop, Hive, Spark (Pyspark), Spark Streaming with Kafka, Sqoop.
Demonstrated knowledge on Cloud Computing Fundamentals either on AWS(EC2, EMR, Data Lakes and Analytics on AWS) or Azure(HDInsight, Data Factory .)
Hands on experience in developing routines using Hadoop Map Reduce, Pyspark, HIVE, SQOOP and Linux scripting.
Expertise in: Hadoop, Hive, Spark (Pyspark), Spark Streaming with Kafka, Sqoop.
Demonstrated knowledge on Cloud Computing Fundamentals either on AWS(EC2, EMR, Data Lakes and Analytics on AWS) or Azure(HDInsight, Data Factory .)
Create PySpark Jobs for data transformation and aggregation, Pyspark query tuning and performance optimization.
Demonstrated knowledge and use of the following languages: Python, Scala, Shell Scripts, JSON, SQL.
Demonstrated performance in all areas of the SDLC, specifically related to ETL solutions.
Design data processing pipelines.
Experience in implementing production data pipelines and creation of repeatable ingestion patterns.
Experience with various databases and platforms, including but not limited to: DB2, Oracle, Tera