About me

My name is Luan Leone de Jesus

I am a Data Analyst with a degree in Computer Engineering and over six years of experience managing, developing, and improving transactional communication journeys as a CCM Systems Analyst. Additionally, I develop data products using machine learning techniques to solve business issues.

My primary objective is to work as a Data Scientist, developing data products with machine learning techniques and statistical analysis to help companies make strategic decisions in the best possible way.

At SulAmerica, my functions included managing and customizing communication stages with the company's customers. I developed and analyzed routines for physical documents (such as sending to the post office) and digital communications, including email, SMS, WhatsApp, and queries via APIs. Additionally, I helped client areas make strategic decisions through data extraction and analysis.

At Sinqia and Senior Solution, I worked as a Systems Analyst providing support services to CRM systems and institutional portals for partner companies. My primary activities were ensuring that the systems were working correctly and generating health metrics for each one through controls of open incidents in monitoring systems like Jira.

As a professional, I am committed to improving some points that I believe are essential for the future of my career. One of them is enhancing my knowledge and skills in data science through training with Community DS (Brazilian School of High-Level Data Scientist Training) and IA Expert Academy.

Skills

Programming Languages and Databases

  • Python focused on data analysis.
  • Web scraping with Python.
  • SQL for data extraction.
  • SQL Server, PL/SQL, MongoDB, and SQLite databases.

Statistics and Machine Learning

  • Descriptive statistics (location, dispersion, skewness, kurtosis, density).
  • Regression, classification, clustering algorithms, and "learn to rank".
  • Data balancing techniques, feature selection, and dimensionality reduction.
  • Performance metrics of algorithms (RMSE, MAE, MAPE, Confusion Matrix, Precision, Recall, ROC Curve, Lift Curve, AUC, Silhouette Score, DB-Index).
  • Machine Learning packages: Sklearn and Scipy.

Data Visualization

  • Matplotlib, Seaborn and Plotly.
  • Power BI and SAS Analytics.

Software Engineering

  • Git, Github, and Gitlab.
  • Python APIs.
  • Google Cloud Platform (GCP).
  • Jupyter Notebook, Jupyter Lab, and Google Colab.

Professional Experience

6+ Years as Systems Analyst (Insurance Industry)

Responsible for creating transactional communication journeys across multiple channels (sending letters through the Post Office, digital dispatches like email, SMS, WhatsApp, generating on-demand documents via API, among others), generating value for the company and improving the customer experience. Additionally, I manage and monitor communications to solve possible problems in the journey and analyze the communicated data to generate insights for our stakeholders, facilitating strategic decision-making for the company.

3+ Completed Data Science Projects

Developed data solutions for business problems, close to real challenges faced by companies, using public data from Data Science competitions. I addressed the problem from the conception of the business challenge to the publication of the trained algorithm in production, using Cloud Computing tools.

1+ Year as Systems Support Intern

During the internship, in addition to technical skills such as database manipulation, web page development, and others, I improved my soft skills, such as communication, which facilitated the process of extracting information from clients. Improving the understanding of the problems so that I could plan and organize the strategies that would be applied to deliver and/or solve the points requested by the client.

Data Science Projects

Star Jeans - A Web Scraping and Exploratory Data Analysis Project

This project involves a fictional Brazilian company that intends to enter the American fashion market by establishing an e-commerce store for selling men's jeans. The objective is to keep operating costs low and scale up as they acquire more customers.

In addition to defining the target audience and product, the company aims to gain insights into the American market for this segment in order to set competitive prices for its products. The company considers its main competitor to be H&M, a large-scale American fashion company. As such, the Brazilian company hires a data science consulting firm to answer some business problem questions.

Tools:

  • Python
  • SQLite
  • Requests API
  • Git and Github
  • BeautifulSoup 4
  • Jupyter Notebook
  • Visual Studio Code
  • Pandas and Numpy
  • Matplotlib and Seaborn

Rossmann Drug Stores - Sales Prediction

This is a forecasting data science project inspired by the Rossmann Store Sales competition on Kaggle. The scenario involves predicting the sales for each of the company's stores for the next six weeks in order to facilitate decision-making, such as whether or not to build a new store in a few weeks.

Tools:

  • Git
  • Python
  • Sklearn
  • Telegram Bot
  • Heroku Cloud
  • Jupyter Notebook
  • Pandas and Numpy
  • Descriptive statistics
  • Matplotlib and Seaborn
  • Selecting attributes Boruta
  • Algorithm performance metrics
  • Linear Regression, XGBoost, Random Forest

HealthGuard Insurance: Cross-Sell Prediction (In progress)

This project uses data from a health insurance company that conducted a survey to assess customer interest in a potential car insurance policy. The objective is to analyze and study the data in order to classify and rank customers according to their interest, so that those with the highest likelihood of purchasing the policy can be prioritized and placed at the top of the ranking. This will optimize the contact process for the sales team and assist in making more accurate decisions. As a result, a performance was achieved that would result in almost three times the accuracy compared to using a random method to select customers, according to the business model described in the problem definition. The model's results can be accessed through a Google Sheets spreadsheet.

Tools:

  • Git
  • Python
  • Sklearn
  • Google Sheets spreadsheet
  • Render Cloud
  • Jupyter Notebook
  • Pandas and Numpy
  • Descriptive statistics
  • Matplotlib and Seaborn
  • Selecting attributes Random Forest feature importance
  • Fine Tuning with Hyperopt
  • Algorithm performance metrics
  • Linear Regression, XGBoost, Random Forest, LGBM, KNN
  • Cumulative Gain Curve and Lift Curve Metrics

Contact

Feel free to contact me.