Predictive Modeling

Marketing is expensive when you target the wrong people.

Improve Targeting With Predictive Analytics

Predictive modeling identifies consumers that are most likely to be interested in your brand, product or service based on the attributes of your current customers. Think of it like cloning your current customers.


The predictive modeling process creates a mathematical algorithm that generates a custom audience score. Then we apply this algorithm to a much larger marketing audience so we can identify the subset of ideal candidates for your business. For example, we may score everyone in the U.S. if you’re a B2C marketer or just a file of businesses in a certain industry if you’re a B2B marketer. Finally we select just those prospects with the highest scores to build your custom audience.


This custom audience can be marketed via mail, e-mail, or through your social media outlets. Targeting prospects with the highest probability to become your customer reduces waste and improves marketing effectiveness. We provide campaign management services to execute in-market tests, or we can work with your existing providers. Whatever works for you.

Why Work With MineTrove?

1. We’re Specialists

Many machine-learning algorithms offer speed, but can sacrifice accuracy and not perform as expected due to how they’re trained. Alternatively, a regression-based approach alone can produce high-quality predictions but take weeks to complete on new, unfamiliar data. Poor predictions and time-consuming efforts are wasteful.


We compiled decades of thought leadership and in-market experience to produce our modeling platform: An unsupervised modeling environment combining machine learning and regression-based predictions to produce unparalleled results.


2. We’re Fast

We build all of our predictive models in-house, using our proprietary predictive modeling platform. What does this mean to you? Better customer targeting based on better science and shorter model development timelines.


3. Flexible Pricing

Many of our clients opt to purchase modeled audiences through us rather than have a predictive model commissioned by one provider and then purchase a marketing list through another. When clients purchase their marketing audiences through us, we waive up-front modeling fees in lieu of a scoring fee based on the size of the audience. This approach makes testing much more affordable.


We also provide modeling services to other marketing agencies and work with their data providers to produce scored lists. We’re always flexible and are open to new ways of working with clients.


4. Your Data Is Secure

Producing predictive models requires obtaining a sample of your current customers to train the model. All of our data is stored in the U.S. in a secure data center, accessible only to us. Our models are all developed in-house, not outsourced to a third party where your data would be exposed to yet another unknown party. Once the modeling process is complete, we delete your data for security purposes.

Smarter. Faster. Better.

What does this mean to you? Better customer targeting based on better science, with shorter model development timelines, flexible pricing and data security for your customer data.

Here’s How We Do It

Programmatic Hygiene

We have programmatic routines to scan for unusable features (attributes) and remove them from the modeling process. This includes features with high cardinality, single values, or an unusually high percent of nulls or zeros.

Null Imputation

It’s common for third party data appends to include features with a high degree of null or zero values. Yet these features are often still useful. Our algorithms impute values for nulls based on the dependent variable, and also determine if a zero value should be imputed or not.

IV Screening

Training predictive models on datasets with hundreds or thousands of features is the new norm. We use Information Value screening to identify features with sufficient variability, streamlining the modeling process.

Dimension Reduction

Dimension reduction is an important step in feature selection. The larger the number of explanatory variables introduced into regression modeling, the greater the likelihood of overfitting, leading models to fail to generalize to other datasets. We use machine learning algorithms to classify features into principal components through a clustering process. By then selecting the top candidate feature from each cluster, we reduce collinearity, helping to reduce model over fitting.

Non-linear Transformation

Most continuous numeric features like age, housing values, wealth, etc. have a non-linear effect on your dependent variable (like response rates). For example, response rates may grow with age up to a point, then flatten out. We conduct automated transformations of continuous numeric features to find the perfect fit. This ensures that the transformed feature has a linear effect on the dependent variable, maximizing model lift and stability, and reducing overfitting.

Feature Recoding

Many class-level features, which are coded attributes such as segmentation codes or state codes, struggle to perform in predictive models as most coded variables have high cardinality. Sorting alphabetically is typically irrelevant, and even if you sort by response rate data may be too thin in specific codes. We cluster values within class features into like-performing groups prior to introducing those attributes into the final modeling stage. The process results in more relevant class features surviving the pre-screen process, improving final model fit.

Iterative Regression Modeling

Once we reduce our feature set to the transformed and recoded versions of the principal components that shows the most promise for model development, we begin an iterative model training routine that seeks to further reduce the final feature set. Our goal is to produce the simplest model possible with the greatest predictive lift. Simpler models improve model stability, which improves the model’s ability to accurately predict (e.g. generalize) during production use.


No modeling process would be complete without cross-validation. A validation set is a subset of the initial dataset set aside from model training. Validation indicates whether a model can generalize in production use on a rollout basis. This is standard fare for our process.

Diagnostic Reporting

A suite of reporting is provided to illustrate general model diagnostics, model lift, rank features from most influential in the model to least, and finally to profile the consumer from highest scored (e.g. most likely to respond, purchase, etc.) to lowest scored. This is an especially useful way to help marketers understand their target audience.

Simplified Implementation

Our modeling platform automatically recodes the final modeling algorithm using SQL so you don’t need a proprietary software license to implement the model. This makes scoring a breeze for any database environment. Prefer for us to score a file and send it back? That’s fine too.

Find Us On

Learn more about model pricing