We are seeking a versatile Data Science Engineer to join our team. You will be responsible for designing, building, and operationalizing advanced data-driven solutions that support business decision-making and product innovation. This role spans the entire data lifecycle: from architecting robust ingestion pipelines in Azure Data Factory to developing and deploying machine learning models within Microsoft Fabric.
Collaborating closely with the Senior Manager of Enterprise Applications & Data, you will lead discussions with cross-functional teams to design appropriate data architectures and build robust pipelines using Azure cloud services. This position requires a unique blend of technical depth in data engineering and the analytical skills necessary to extract meaningful insights from complex datasets. You will ensure that the underlying data infrastructure is performant, governed, and scalable.
This role combines strong statistical and machine learning expertise with robust software engineering and data engineering practices. The ideal candidate can develop models end-to-end—from data ingestion and feature engineering through deployment, monitoring, and optimization in production environments.
MAJOR RESPONSIBILITES:
- Uses Data Pipeline Development to design, build, and maintain enterprise-scale ETL/ELT pipelines using Azure Data Factory and Fabric Data Factory.
- Leverages Microsoft Fabric (OneLake, Lakehouse, and Warehouse) to unify disparate data sources for downstream science workloads to build and optimize data workflows.
- Uses Model Engineering to develop, train, and tune machine learning models using Synapse Data Science (Notebooks) and MLflow while monitoring performance, data drift, and system reliability.
- Uses Feature Engineering to transform raw data into curated datasets using PySpark and SQL to optimize model performance.
- Implements CI/CD patterns for machine learning, ensuring models are versioned, monitored, and easily redeployed.
- Implements data quality checks, monitoring, and validation processes to ensure data integrity.
- Performs exploratory data analysis to uncover trends, patterns, and actionable insights.
- Translates business data into quantitative frameworks and measurable outcomes.
- Communicates findings and model performance for both technical and non-technical stakeholders.
- Optimizes query performance and data processing workflows for efficiency and cost-effectiveness.
- Creates and maintains technical documentation for data pipelines, models, and processes.
ESSENTIAL FUNCTIONS:
- Fabric Management: Creates, updates, and secures Fabric items, specifically Lakehouses, Warehouses, Notebooks, and Dataflows Gen2 within the OneLake platform.
- Advanced Orchestration: Manages data orchestration using advanced knowledge of Azure Data Factory pipelines, activities, triggers, and Self-hosted Integration Runtimes.
- Data Manipulation: Utilizes a strong command of Python (Pandas, Scikit-learn, PySpark) and SQL to create dataflows and notebooks for ingestion and analytics.
- Architecture Best Practices: Implements Star Schema and Medallion Architecture (Bronze/Silver/Gold) principles to ensure data scalability.
- Quality Assurance: Participates in code reviews and contributes to the evolution of best practices for the team.
QUALIFICATIONS:
- Bachelor’s degree or equivalent in Computer Science or a related field preferred
- Minimum of three (3+) of work experience as a data science engineer with a proven track record.
- Proven hands-on experience with the Azure Data Stack (ADLS Gen2, Azure SQL, Key Vault).
- Experience with Power BI for visualizing model outputs and creating user dashboards.
- Proficiency in SQL, Python, and PySpark.
- Experience with data pipeline development and AI/ML applications.
- Knowledge of data warehousing, data lake architecture, and Microsoft Fabric platform.
- Familiarity with machine learning, AI, and Generative AI technologies.
- Strong analytical, problem-solving, and communication skills.
- Ability to work collaboratively in a fast-paced, cross-functional environment.
- Familiarity with Azure DevOps or GitHub Actions for automated deployments.
- Microsoft Certifications (e.g., DP-600: Fabric Analytics Engineer or DP-700: Fabric Data Engineer).
#toponehire