Scalable ETL pipeline and Machine Learning model to predict mortgage defaults using Freddie Mac’s Single-Family Loan-Level Dataset. Migrated from a Pandas-based legacy system to a distributed PySpark ...