Job Summary
To build reliable data integration solutions, clean, transform, and analyze vast amounts of big data from various systems using Spark and other ETL tools to provide ready-to-use dataset to data scientists and data analysts, while ensuring data quality and integrity. Collaborate with stakeholders to design scalable and efficient data solutions that enable informed decision-making, and compliance with data governance.
Key Responsibilities
Data Ingestion and Extraction
- Develop and implement efficient data ingestion pipelines to acquire and extract large volumes of structured and unstructured data. Ensure data integrity and quality during the ingestion process.
- Integrate various data sources and formats into a unified data ecosystem.
Data Processing and Transformation
- Design and execute data processing workflows to clean, transform, and enrich raw data. Develop scalable data processing algorithms and techniques to handle big data volumes efficiently.
- Optimize data processing pipelines for performance and reliability.
Data Storage and Management
- Create and maintain data storage architectures that cater to the specific needs of big data applications. Implement robust data management strategies, including data partitioning, indexing, and compression techniques.
- Ensure data security, privacy, and compliance with relevant regulations.
Data Analysis and Modeling
- Collaborate with data scientists and analysts to understand their requirements and translate them into scalable data models. Apply data visualization techniques to communicate insights effectively.
Performance Optimization
- Identify and implement strategies to enhance the performance and efficiency of big data applications and systems. Conduct performance tuning, load testing, and capacity planning to meet scalability and throughput requirements.
- Monitor system performance and troubleshoot issues related to data processing, storage, and retrieval.
Data Governance and Compliance
- Establish and enforce data governance policies, standards, and best practices. Ensure compliance with data regulations, such as GDPR or HIPAA, by implementing appropriate data protection measures.
- Conduct data audits and implement data quality controls to maintain data accuracy and consistency.
Collaboration and Communication
- Collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to understand their data requirements and provide technical support. Communicate complex technical concepts and findings to non-technical stakeholders in a clear and concise manner.
- Participate in knowledge sharing activities and contribute to the continuous improvement of data engineering practices.
Documentation and Documentation
- Document data engineering processes, workflows, and system architectures for future reference and knowledge transfer.
- Prepare technical documentation, including data dictionaries, data lineage, and system specifications.
- Create and maintain documentation related to data governance, compliance, and security protocols.
EDUCATION
General Education
- BSc in Computer Science/Engineering or related field.
Evidence of strong industry/sector participation and relevant professional certifications such as:
- Azure Data Engineer Associate
- Databricks Certified Data Engineer Associate
- Databricks Certified Data Engineer Professional
- Amazon Web Services (AWS) Certified Data Analytics – Specialty
- Cloudera Data Platform Generalist Certification
- Data Science Council of America (DASCA) Associate Big Data Engineer
- Data Science Council of America (DASCA) Senior Big Data Engineer
- Google Professional Data Engineer
- IBM Certified Solution Architect – Cloud Pak for Data v4.x
- IBM Certified Solution Architect – Data Warehouse V1
EXPERIENCE
General Experience
- At least 3 years of developing, deploying, and managing robust ETL/ELT data solutions, preferably in a reputable Financial Institution or FinTech company.