nullData Scientist and BI Analyst

null

YASHWANTH MANGLARAPU
Data Scientist and BI Analyst Machine Learning Engineer Analytics Engineer
Houston, TX, USA | +1 (346) 297-1555 | [email protected]  | LinkedIn

PROFESSIONAL SUMMARY
Data Scientist and BI Analyst with 3 years of combined industry, internship, and academic experience in marketing analytics, customer intelligence, and predictive modeling. Proven track record at Merkle (Dentsu), Sasken Technologies, and Tech Mahindra delivering interactive Tableau/Power BI dashboards, customer segmentation models, and automated ETL pipelines that improved ROI and reduced reporting time by 40%. Skilled in translating business KPIs (CTR, ROAS, conversion rates, SLA) into analytical solutions using Python, SQL, and cloud platforms. M.S. in Data Science (GPA 3.8) and published researcher in ML sampling methods. Strong communicator with teaching assistant experience supporting 120+ students.


TECHNICAL SKILLS

  • Languages: Python (Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch), SQL, R, Java
  • BI & Visualization: Tableau, Power BI, Matplotlib, Seaborn, Plotly, Streamlit
  • Data Engineering: ETL/ELT Pipelines, Data Warehousing, BigQuery, Hadoop, Spark (PySpark), Databricks, Airflow (basic)
  • Databases: MySQL, PostgreSQL, MongoDB, Snowflake, BigQuery
  • ML & Analytics: Regression, Classification, Clustering (K-Means, RFM), SMOTE, PCA, Time Series (ARIMA, Prophet), A/B Testing, Hypothesis Testing, Feature Store concepts
  • Cloud & Tools: AWS (S3, Redshift, Lambda basics), Google Colab, Git, Jupyter, Docker (basic), Excel (advanced), Power Query
  • Soft & Ops: Stakeholder management, technical documentation, agile/scrum, JIRA, Confluence

 

EXPERIENCE

Data Scientist
Merkle (Dentsu) – Houston, TX
June 2025 – Present

  • Designed and deployed interactive Tableau dashboards for 5+ enterprise marketing accounts, reducing client reporting turnaround by 40% and enabling real-time KPI tracking (CTR, ROAS, conversion rates, CPA). Implemented drill-down layers and parameterized controls allowing non-technical stakeholders to filter by campaign, region, and audience segment without writing SQL.
  • Engineered customer segmentation models using Python (Scikit-learn, K-Means, RFM analysis) on 500K+ transaction records, identifying high-value audience clusters that increased ad spend ROI by 18% for a major retail client. Translated cluster outputs into segmented Tableau visualizations for each client’s media buying team.
  • Automated ETL pipelines using Python + SQL (BigQuery, MySQL, Airflow fundamentals), reducing manual data prep by 15+ hours/month and improving data freshness from weekly to daily. Built data validation checks and exception handling to ensure dashboard accuracy before client-facing refreshes.
  • Wrote complex window functions, CTEs, and optimized joins to extract campaign performance data across 10+ source tables, supporting BI reporting for leadership and client-facing teams. Reduced query execution time by 35% through index optimization and query restructuring.
  • Performed A/B testing and lift analysis on marketing campaign variants (t-test, chi-square, power analysis), presenting statistically significant recommendations to non-technical stakeholders via slide decks and executive dashboards. Built a reusable Python function for automated power analysis that reduced test design time by 50%.
  • Collaborated with media strategists to translate business questions into SQL queries and analytical models, directly influencing budget allocation decisions worth $2M+ annually. Created a Tableau storyboard summarizing test-versus-control results for monthly client business reviews.

 

 

 

Graduate Teaching Assistant – Data Science
University of Houston – Clear Lake
Aug 2024 – May 2025

  • Facilitated lab sessions and office hours for 120+ graduate/undergraduate students in Python, R, and SQL, covering data wrangling, EDA, regression, classification, and model validation.
  • Designed auto-graded Jupyter notebooks and supplementary materials on feature engineering, cross-validation, and bias-variance tradeoff — adopted as official course resources.
  • Provided structured rubric-based feedback on 200+ assignments and projects, maintaining consistent 48-hour turnaround and improving class average by 12% across semesters.
  • Resolved technical and conceptual student queries via email and discussion boards with <24-hour response time, earning recognition from course instructor for responsiveness.

 

Data Science Intern
Tech Mahindra – Hyderabad, India
Jan 2023 – June 2023

  • Developed a predictive model for IT service ticket resolution time using Python (Scikit-learn: Random Forest, XGBoost) and SQL, achieving 76% accuracy in pilot deployment across 3 internal teams. Engineered features from ticket description text (TF-IDF, character length, keyword flags) and historical resolution patterns.
  • Created Tableau dashboards visualizing service desk KPIs (response time, escalation rate, SLA adherence), enabling team leads to reduce average resolution time by 15% within 2 months. Added automated email alerts when any SLA category fell below 90% adherence for two consecutive days.
  • Performed data cleaning and feature engineering on 50,000+ unstructured ticket logs, improving model input quality by 25% and reducing data drift issues. Used regular expressions to extract priority codes, error messages, and team routing patterns from free-text fields.
  • Documented ETL logic and data dictionaries adopted by 3 downstream teams for operational reporting. Created a shared GitHub repository with version-controlled SQL scripts and Python transformation functions.
  • Presented weekly findings to operations managers, translating model outputs into actionable recommendations that reduced manual ticket triage effort by 8 hours/week. Built a simple Streamlit app to let managers input new ticket attributes and receive predicted resolution time ranges.

 

Data Science Intern
Sasken Technologies – Hyderabad, India
June 2022 – Dec 2022

  • Built an end-to-end customer churn prediction pipeline using Python (Scikit-learn: Random Forest, XGBoost, Logistic Regression) on 200K+ telecom subscriber records, achieving 84% recall and 79% precision; recommended retention campaigns that reduced churn by 12% in pilot segment.
  • Developed automated ETL scripts in PySpark and SQL (Hadoop Hive, MySQL) to ingest, clean, and aggregate customer usage data, cutting data processing time by 30% (from 8 hrs → 5.5 hrs per batch). Implemented incremental load logic to process only new records after the first full run.
  • Created interactive Power BI dashboards tracking cohort-level churn risk, ARPU, and engagement metrics; enabled product managers to identify high-risk segments 2 weeks earlier than previous manual methods. Added row-level security (RLS) so regional managers saw only their subscriber base.
  • Designed and implemented a time-series forecast (Prophet + ARIMA) for daily active users (DAU) with 91% MAPE accuracy, used by operations to right-size cloud resources and reduce infrastructure cost by ~8%. Compared five forecasting models before selecting Prophet for its interpretability and handling of holiday effects.
  • Maintained technical documentation (data dictionaries, ETL logic, model cards) for knowledge transfer to 3 cross-functional teams, ensuring reproducibility and audit compliance. Created a Confluence space with runbooks for each pipeline and dashboard refresh.
  • Presented weekly model performance and business insights to senior managers, translating feature importance (SHAP values) into actionable retention strategies. Built a PowerPoint template that standardized how churn drivers were communicated across product and marketing teams

 

 

BI & VISUALIZATION PROJECTS

  • Interactive Sales Performance Dashboard (Tableau, SQL, Excel) – Designed a regional sales dashboard for a retail client with 50+ stores; included YOY growth, product category filters, and dynamic forecast bands. Reduced monthly reporting effort from 3 days to 4 hours and was adopted by 12 sales managers for weekly reviews.
  • Marketing ROI & Funnel Dashboard (Power BI, DAX, BigQuery) – Built a multi-channel marketing ROI dashboard tracking impressions, clicks, conversions, and CPA across Google, Facebook, and email. Implemented custom DAX measures for incremental lift and ROAS. Enabled media buyers to reallocate $150K quarterly budget based on real-time performance.
  • HR Attrition & Workforce Analytics (Tableau, Python, PostgreSQL) – Created an executive HR dashboard showing attrition trends by department, tenure, and performance rating. Used Python to calculate survival probabilities and integrated results as Tableau extensions. Helped HR reduce voluntary attrition by 8% through targeted retention programs identified via dashboard filters.
  • Supply Chain Inventory Optimization (Power BI, SQL, Excel) – Developed a supply chain dashboard tracking inventory turnover, stockouts, and lead time variance across 3 warehouses. Added what-if parameters for reorder points. Reduced stockout events by 22% in pilot warehouses within 3 months of deployment.

EDUCATION

M.S. in Data Science – University of Houston – Clear Lake | Aug 2023 – May 2025

B.Tech in CS & IT – CVR College of Engineering, India |  July 2019 – Jun 2023

Relevant Coursework: Data Mining · Big Data Analytics · Generative AI · Regression Analysis · Applied Statistics · Business Analytics · Data Visualization · MLOps Fundamentals

PUBLICATIONS

  • Manglarapu Y., Jallepalli K. — “KNN Optimization Using Random Projection.” GIS Science Journal, Vol. 9, Issue 11
  • Manglarapu Y., Jallepalli K. — “Survey on Sampling Methods for Imbalanced Data.” GIS Science Journal, Vol. 9, Issue 11

CERTIFICATIONS

  • Oracle Academy – Database Programming with SQL (May 2022)
  • freeCodeCamp – Data Visualization Certification (Dec 2022)
  • NPTEL / IIT Madras – Python for Data Science ( Mar 2022)

 

Leave a Reply

Your email address will not be published. Required fields are marked *