Instructor-led 4-days
Course Description
This course focuses on automating end to end data science workflows using Python. You build pipelines that handle data ingestion, cleaning, modeling, evaluation, and deployment with minimal manual work. Emphasis stays on orchestration, reproducibility, monitoring, and scaling using tools such as Airflow, Prefect, scikit-learn pipelines, and MLflow.
Course Objectives
- Automate full data science workflows
- Build reusable and modular pipelines
- Schedule and orchestrate data tasks
- Track experiments and models
- Deploy and monitor automated systems
- Reduce manual intervention in data processes
Key Takeaways
- You design automated pipelines from raw data to deployment
- You reduce repetitive tasks using code and orchestration tools
- You manage model lifecycle with tracking and versioning
- You build systems that run on schedules and triggers
- You monitor performance and handle failures
Module 1: Python Automation Foundations
- Scripting and task automation
- File and process handling
Exercises
- Automate file ingestion
- Schedule script execution
- Log task outputs
Module 2: Data Pipeline Design
- Pipeline architecture
- Modular workflow design
Exercises
- Design pipeline structure
- Break workflow into reusable steps
- Build modular pipeline
Module 3: Automated Data Ingestion
- APIs, databases, streaming
Exercises
- Pull data from API
- Ingest database records
- Automate data refresh
Module 4: Automated Data Cleaning
- Cleaning pipelines
- Rule-based transformations
Exercises
- Build cleaning pipeline
- Handle missing data automatically
- Standardize formats
Module 5: Data Validation and Quality Checks
- Validation frameworks
- Data integrity rules
Exercises
- Create validation checks
- Detect anomalies
- Trigger alerts on failure
Module 6: Automated EDA and Reporting
- Auto-generated reports
- Profiling tools
Exercises
- Generate EDA report
- Summarize dataset automatically
- Export insights
Module 7: Feature Engineering Pipelines
- Automated transformations
- Feature stores basics
Exercises
- Build feature pipeline
- Automate feature scaling
- Store reusable features
Module 8: Model Training Automation
- Training pipelines
- Batch training
Exercises
- Train model pipeline
- Automate retraining
- Save trained models
Module 9: Model Evaluation Automation
- Metrics tracking
- Validation workflows
Exercises
- Evaluate model automatically
- Compare multiple models
- Generate evaluation reports
Module 10: Hyperparameter Tuning Automation
- Grid search, random search
- Optimization frameworks
Exercises
- Automate tuning process
- Compare tuning strategies
- Save best model
Module 11: Workflow Orchestration
- Airflow, Prefect
- Task scheduling
Exercises
- Build DAG pipeline
- Schedule workflow
- Handle task dependencies
Module 12: Experiment Tracking and Versioning
- MLflow, logging
- Model registry
Exercises
- Track experiments
- Version models
- Log metrics and artifacts
Module 13: Automated Deployment
- CI CD basics
- API deployment
Exercises
- Deploy model as API
- Automate deployment pipeline
- Update models without downtime
Module 14: Monitoring and Alerting
- Performance monitoring
- Drift detection
Exercises
- Monitor model performance
- Detect data drift
- Trigger alerts
Module 15: Scaling and Distributed Automation
- Parallel processing
- Cloud integration
Exercises
- Run parallel tasks
- Scale pipeline on cloud
- Optimize execution time
Module 16: End to End Automation Project
- Full pipeline integration
- Production readiness
Exercises
- Build end to end pipeline
- Automate full workflow
- Deploy system
- Monitor pipeline
- Handle failures
- Final project presentation