Tredence Analytics Solutions

Unified Data Model (UDM) Mars Inc.

The Unified Data Model (UDM) Project for Mars Inc. was designed to create a scalable and standardized data architecture, enabling seamless data integration and analytics across the organization. As part of the project, I contributed by designing and revising Source-to-Target mappings for key business entities, ensuring accurate data flow and consistency. I developed and maintained data pipelines using Azure Data Factory to automate raw data ingestion and built optimized SQL tables in Databricks utilizing Simpel, Mars Inc.'s proprietary framework powered by Python and PySpark. Additionally, I conducted rigorous unit testing to validate data integrity, resolved bugs to enhance performance, and participated in peer code reviews to ensure adherence to coding standards. This project played a critical role in optimizing Mars Inc.'s data infrastructure, supporting better decision-making and analytics.

Detailed Description of the Project

1. Source-to-Target Mapping Creation and Revision

  • Collaborated with cross-functional teams to analyze and understand data requirements for critical business entities.
  • Designed and revised comprehensive Source-to-Target mappings, ensuring alignment with organizational standards and data model objectives.
  • Enhanced data lineage and traceability to support data governance and compliance requirements.

2. Data Pipeline Development using Azure Data Factory (ADF)

  • Developed and maintained data ingestion pipelines using Azure Data Factory to automate the movement of raw data from source systems to the target platform.
  • Optimized data pipelines to reduce latency and improve data processing efficiency, enabling near-real-time analytics.
  • Ensured data integrity and accuracy through rigorous testing and monitoring of pipeline operations.

3. SQL Table Development in Databricks with Simpel

  • Leveraged Databricks as the development environment to create and optimize SQL tables that formed the backbone of the Unified Data Model.
  • Utilized Simpel, Mars Inc.'s proprietary framework powered by Python and PySpark, to streamline table creation and enforce consistency across the data model.
  • Applied advanced data transformation techniques to structure data for analytics and reporting.

4. SQL Table Development in Databricks with Simpel

  • Leveraged Databricks as the development environment to create and optimize SQL tables that formed the backbone of the Unified Data Model.
  • Utilized Simpel, Mars Inc.'s proprietary framework powered by Python and PySpark, to streamline table creation and enforce consistency across the data model.
  • Applied advanced data transformation techniques to structure data for analytics and reporting.

5. Unit Testing for Data Validation

  • Conducted thorough unit testing for all SQL tables to validate data accuracy and model functionality.
  • Implemented rigorous quality assurance processes to ensure the reliability and scalability of the Unified Data Model.
  • Documented test cases and results to support ongoing maintenance and troubleshooting.

6. Technology Stack Used

  • Tools: Azure Data Factory, Databricks
  • Programming Languages: Python, SQL
  • Frameworks: Simpel (Mars Inc.'s proprietary framework), PySpark
  • Testing: Unit testing frameworks for SQL and Python

Internship with
CELEBAL TECHNOLOGIES

This was my first internship where I was introduced to the field of Data Science.I acquired various skills like Exploratory Data Analysis, deploying ML models using flask app and learnt hyperparameter tuning and Natural language processing.

Car Price
Predcition

Conducted EDA and Implemented several Machine Learning models: multiple Random Forest Models with varying hyperparameters and XGBoost Regressor. Also developed a flask app to deploy the models.

Air Quality Index
Prediction

Conducted Web Scraping to extract the data. Applied EDA and implemented several Machine Learning and Deep Learning Models. Deployed the best model using flask app.