Skip to main content

Machine learning

Foresee turns a CSV into an ML report in sixty seconds.

AutoML that ranks likely targets with Gemini and returns a trained model plus a readable PDF in about a minute.

By Daniel Jeun


Upload a CSV. The system runs a full exploratory analysis on Snowflake while Gemini 2.5 reads the schema and ranks the top five columns most worth predicting. You pick one. Three models train, one after the other: a Logistic Regression, a Decision Tree, and an XGBoost. You get back a PDF with metrics, confusion matrices, feature importances, and a short business rationale.

The pitch is a minute. The hard part was making the minute feel honest.

How it actually works

The frontend is a React app that streams progress from a Flask API. The first thing the API does on upload is push the file into Snowflake and start a parallel pair of jobs: an EDA pass that walks every column and computes the usual statistics, and a Gemini call that summarizes the schema and proposes ranked targets with importance scores from one to one hundred.

The two jobs land at roughly the same time, which is the whole point. Sequential EDA and ranking would be fine for a demo. It would not be fine for the user. The model training itself runs in sequence once a target is picked, since the three models share preprocessing and the cost of doing them serially is small compared to the analysis phase.

The model menu

Three is a deliberate choice. One model feels like a guess. Five would push the report past one screen. Three covers the three things people usually want: a baseline they understand, a tree they can interpret, and a boosted ensemble for the score.

Each one trains, validates, and produces SHAP values for the top features. The PDF is rendered on the server with a templated layout so every report looks like it came from the same publication.

The whatBroke field below is a placeholder until I get the real text from Daniel.

Python React Flask Snowflake Gemini 2.5 XGBoost

← Back to all projects