Hello, welcome to my
Data Science project portfolio.

On this page, you will find some of my projects developed to demonstrate my knowledge and skills in solving business problems using Data Science concepts and tools.

You will also find a summary of my experiences as a developer and systems analyst for over six years, as well as technical and interpersonal skills acquired while participating in and developing Data Science-related projects.

Feel free to contact me. My LinkedIn, email, and GitHub information is available at the beginning and end of the page.

About me

My name is Luan Leone de Jesus

I am a Data Analyst with a degree in Computer Engineering and over six years of experience managing, developing, and improving transactional communication journeys as a CCM Systems Analyst. Additionally, I develop data products using machine learning techniques to solve business issues.

My primary objective is to work as a Data Scientist, developing data products with machine learning techniques and statistical analysis to help companies make strategic decisions in the best possible way.

At SulAmerica, my functions included managing and customizing communication stages with the company's customers. I developed and analyzed routines for physical documents (such as sending to the post office) and digital communications, including email, SMS, WhatsApp, and queries via APIs. Additionally, I helped client areas make strategic decisions through data extraction and analysis.

At Sinqia and Senior Solution, I worked as a Systems Analyst providing support services to CRM systems and institutional portals for partner companies. My primary activities were ensuring that the systems were working correctly and generating health metrics for each one through controls of open incidents in monitoring systems like Jira.

As a professional, I am committed to improving some points that I believe are essential for the future of my career. One of them is enhancing my knowledge and skills in data science through training with Community DS (Brazilian School of High-Level Data Scientist Training) and IA Expert Academy.

Skills

Programming Languages and Databases

Python focused on data analysis.
Web scraping with Python.
SQL for data extraction.
SQL Server, PL/SQL, MongoDB, and SQLite databases.

Statistics and Machine Learning

Descriptive statistics (location, dispersion, skewness, kurtosis, density).
Regression, classification, clustering algorithms, and "learn to rank".
Data balancing techniques, feature selection, and dimensionality reduction.
Performance metrics of algorithms (RMSE, MAE, MAPE, Confusion Matrix, Precision, Recall, ROC Curve, Lift Curve, AUC, Silhouette Score, DB-Index).
Machine Learning packages: Sklearn and Scipy.

Data Visualization

Matplotlib, Seaborn and Plotly.
Power BI and SAS Analytics.

Software Engineering

Git, Github, and Gitlab.
Python APIs.
Google Cloud Platform (GCP).
Jupyter Notebook, Jupyter Lab, and Google Colab.

Professional Experience

6+ Years as Systems Analyst (Insurance Industry)

Responsible for creating transactional communication journeys across multiple channels (sending letters through the Post Office, digital dispatches like email, SMS, WhatsApp, generating on-demand documents via API, among others), generating value for the company and improving the customer experience. Additionally, I manage and monitor communications to solve possible problems in the journey and analyze the communicated data to generate insights for our stakeholders, facilitating strategic decision-making for the company.

3+ Completed Data Science Projects

Developed data solutions for business problems, close to real challenges faced by companies, using public data from Data Science competitions. I addressed the problem from the conception of the business challenge to the publication of the trained algorithm in production, using Cloud Computing tools.

1+ Year as Systems Support Intern

During the internship, in addition to technical skills such as database manipulation, web page development, and others, I improved my soft skills, such as communication, which facilitated the process of extracting information from clients. Improving the understanding of the problems so that I could plan and organize the strategies that would be applied to deliver and/or solve the points requested by the client.

Data Science Projects

Star Jeans - A Web Scraping and Exploratory Data Analysis Project

This project involves a fictional Brazilian company that intends to enter the American fashion market by establishing an e-commerce store for selling men's jeans. The objective is to keep operating costs low and scale up as they acquire more customers.

In addition to defining the target audience and product, the company aims to gain insights into the American market for this segment in order to set competitive prices for its products. The company considers its main competitor to be H&M, a large-scale American fashion company. As such, the Brazilian company hires a data science consulting firm to answer some business problem questions.

Tools:

Python
SQLite
Requests API
Git and Github
BeautifulSoup 4

Jupyter Notebook
Visual Studio Code
Pandas and Numpy
Matplotlib and Seaborn

Learn More

Rossmann Drug Stores - Sales Prediction

This is a forecasting data science project inspired by the Rossmann Store Sales competition on Kaggle. The scenario involves predicting the sales for each of the company's stores for the next six weeks in order to facilitate decision-making, such as whether or not to build a new store in a few weeks.

Tools:

Git
Python
Sklearn
Telegram Bot
Heroku Cloud
Jupyter Notebook

Pandas and Numpy
Descriptive statistics
Matplotlib and Seaborn
Selecting attributes Boruta
Algorithm performance metrics
Linear Regression, XGBoost, Random Forest

Learn More

HealthGuard Insurance: Cross-Sell Prediction (In progress)

This project uses data from a health insurance company that conducted a survey to assess customer interest in a potential car insurance policy. The objective is to analyze and study the data in order to classify and rank customers according to their interest, so that those with the highest likelihood of purchasing the policy can be prioritized and placed at the top of the ranking. This will optimize the contact process for the sales team and assist in making more accurate decisions. As a result, a performance was achieved that would result in almost three times the accuracy compared to using a random method to select customers, according to the business model described in the problem definition. The model's results can be accessed through a Google Sheets spreadsheet.

Tools:

Git
Python
Sklearn
Google Sheets spreadsheet
Render Cloud
Jupyter Notebook
Pandas and Numpy
Descriptive statistics

Matplotlib and Seaborn
Selecting attributes Random Forest feature importance
Fine Tuning with Hyperopt
Algorithm performance metrics
Linear Regression, XGBoost, Random Forest, LGBM, KNN
Cumulative Gain Curve and Lift Curve Metrics

Learn More

Contact

Feel free to contact me.