A successful outbound lead generation campaign needs to address the three main components:

  • Data (targeting strategy, data sources, and quality),
  • Content (sending relevant and engaging content to the right audience), and, finally,
  • Tech Stack (delivering emails to the target audience).

In this in-depth guide, we will focus on the first part of outbound lead generation, the data, and explain how to set up a scalable data acquisition process in order to scale lead gen ensuring both qualitative and quantitative targeting. With some of these techniques, we could develop an efficient lead generation strategy and scale lead generation campaigns on behalf of our clients and send over 1,000 new cold emails daily.

Please refer to our previous guide if you are interested in understanding how to set up a cold emailing infrastructure built to scale (the tech stack).

Oftentimes, we observe the following trend in lead generation campaigns through cold emailing. Both clients and agencies tend to focus more on the volume (quantity of people or businesses reached) instead of the quality (better targeting, lower volumes but higher conversion rates). In some cases, higher volumes could be justified, for instance, when further segmenting and requalifying costs more than contacting the whole database (or the audience gathered).

In this blog post, we will analyze the possible ways to streamline the data acquisition process while focusing on both the quality and quantity of prospects aiming to maximize the ROI of our lead generation efforts.

Summary of topics covered:

  1. Definition of the ICP
  2. Identification of data sources by use cases
  3. Data extraction
  4. Data uniformization and standardization
  5. Data enrichment and validation

Define your ideal customer profile (ICP)

The first step is always to understand your ideal customer profile or ICP, based on your market knowledge and current product market fit. The goal is to have a clear understanding of who are your potential customers, their buyer persona (works both for B2B and B2C) and how to best pre-qualify them.

Some questions you can ask yourself include the following:

  • Industry: What is the company’s main activity or vertical (i.e., banking, insurance, services, distribution, pharma, etc.)?
  • Company size or revenue: What is the ideal number of employees, the annual revenue, size of employees in any particular department?
  • Geography: Where is the company located (country, state, city level)?
  • Company department: Which department is responsible for making decisions for your product or service (this works well for medium to large companies)? What are the decision maker’s ideal position and seniority level (C-level, VP, Head of department, etc.)?
  • Buying signal: What external factors can you use to target your prospects more efficiently (raising funds, launching new products, changing positions, M&A, company growth, job offers, presence or absence of specific functions, etc.)?

By the end of this step, you should have a clear vision of your ICP. It is essential to have a precise mapping of problems solved vs. goals achieved for each segment and buyer persona (as granular as it can be).

Tools & Tips

  • Ideal Customer Profiles vs. Buyer Personas: ICPs and buyer personas are different. ICP focuses on the company, while buyer personas focus on the person’s position. Example: ICP = top 100 retail banks in the US, while the buyer persona is the top executive in cybersecurity (C-suite, VP, or Head level).
  • Buying signals are usually linked to your product/service and its fit with the company’s situation.Example: if a startup raised funds they might use to scale their hiring, tech stack, providers pool, etc. Another example could be that if a company does not have an in-house sales team, it might need help to generate leads.

Identification of data sources

This step is essential to understanding where to find the right companies and people to engage with. Depending on your target market, location, and other criteria, the choice of the possible data sources can vary from widely used social platforms like LinkedIn to hyper-specialized directories like regulated businesses.

To understand which source you should use, you need to understand your audience’s journey both from the company and personal point of view. Some questions worth asking yourself are related to the industry regulations (do they need a license or certification) and company formation aspects (who approves company creation and what data can be obtained).

Other questions concern the buyer personas’ journey as a business professional: what are their daily tasks and occupations, where do they spend their time, how do they want to be found for new job opportunities, are they recognized by other industry professionals and received awards, etc.).

It is impossible to list all possible sources; however, we have tried to summarize below the most common sources grouped by channels and use cases:

  1. LinkedIn (Sales Navigator, groups, events, company pages)
  2. Directories (YellowPages, Crunchbase, Indeed, memberships, events, etc.)
  3. Location-based (Google Maps, TripAdvisor)
  4. Interest-based (Instagram, Facebook, Twitter)
  5. Existing data vendors


LinkedIn is a widely used platform to identify potential B2B leads. There are almost 800 million active users from over 200 countries, making it the most extensive professional network in the world. Besides the number of users, the platform also offers an excellent search (querying) feature allowing users to look both for their ICP (accounts) and buyer personas (contacts or “leads” if we use LinkedIn vocabulary) through their Sales Navigator plan.

It is the ideal source for anyone looking to engage with B2B decision-makers, given that over 250 million users are in senior and C-level executive positions. Another benefit is that the users generate and constantly renew their data, as opposed to data vendors with static databases.

While many professionals across different industries widely adopt LinkedIn, it is essential to highlight its limits. Specific industries and small businesses are not well covered. For instance, if you need to engage with small business owners in-home services (plumbers, roofing, painting, etc.) or personal care-related services (hairdressers, tattoo artists, etc.), LinkedIn might not be the best place to look after them.

Some targeting criteria one could use on LinkedIn to find ideal prospects include:

  • Regular Sales Navigator query (it offers 29 filters both on account and contact level): you can identify all the decision makers meeting your criteria or run Account Based Marketing by focusing on a list of specific accounts.
  • Linkedin groups: similar to Facebook groups, professionals often join thematic groups based on their interests (both from personal and professional points of view)
  • Job offers: you can identify all the companies that are currently hiring specific positions.
  • Account-Based Marketing: you can first run an account search query to find companies meeting your requirements, then extract all their respective employees to launch hyper-focused campaigns or analyze the presence or absence of specific positions. Example: find companies that do not have an internal head of data science & analytics to offer your data science services.
  • LinkedIn events: you can target all participants of an event (for example, your competitors’ event or an event where the participants are your ideal audience)
  • Publication hashtags: you can identify all profiles having posted a publication with specific hashtags.


Besides LinkedIn, plenty of other public databases might list the companies you would like to prospect. Each industry has its own specialized directories, and it is essential to highlight the most common ones as an example.

  1. Crunchbase is one of the largest databases of private and public companies mixing funding and investment-related information with founding members and individuals in leadership positions. One could use it to target start-ups that have raised funds, VC firms investing in start-ups, founders of companies in specific locations, etc.
  2. Job sites: Indeed, Glassdoor, and other job listing platforms. The idea is similar to a job search on LinkedIn, but different data points could be used for additional targeting and/or segmentation. You could target companies with low employee scores if you offer corporate leadership improvement or HR-related consulting services. On the other hand, you can target top employers who care about their employees to provide your services.
  3. Startup accelerators: similar to Crunchbase, you can use start-up accelerators (such as Y Combinator, Techstars, and many others) to identify companies in the growth phase, filter by location, graduation year, etc.
  4. Specialized directories: Some listing websites might focus on specific companies with licenses or certifications. Some examples of such listings can include certified small businesses in the US (for each state), directories for sustainable companies like B-Corp certified organizations, professional associations (ex with the directory of licensed lawyers in Florida), software review websites (ex Capterra or G2), etc.
  5. Technology-based directories: some platforms (LinkedIn, BuiltWith, and many others) allow you to search for companies using a specific technology (like Shopify, Mailchimp, etc.). This approach is helpful if you target a particular category of businesses based on the tools they use. For instance, you can target e-commerce shops using Shopify or Hubspot to offer your niche consulting services.
  6. Event / Conference participants & exhibitors: plenty of industry-specific events cater to professionals who happen to be in your ICP. Identifying event participants and exhibitors allows you to understand which companies are proactively looking to create new relationships.
  7. OpenCorporates: is the world’s largest open database of companies. Their standardized data covers 210m+ companies and comes from primary public sources. You can get more data in bulk, cross-check the information, or find the company’s principal listed officers. More on other use cases with this data source in “Data uniformization and standardization.”

Local businesses

Geographically targeting local businesses can be a good idea to prospect small local companies with a physical presence (think of businesses having a physical location to service their clients). Usually, such companies are not very present on previously mentioned platforms, like LinkedIn.

Nevertheless, most of these businesses rely on local marketing and directories to be found by their customers. These include well-known platforms such as Google Maps with Google My Business, TripAdvisor for restaurants and other hospitality businesses, YellowPages, and even Facebook.

Most of these platforms provide a wealth of data, including most of the time emails, websites, phone numbers, and social links that can be used as engagement channels.


Obtaining data from social media platforms can be an excellent way to generate qualified leads. Social networks, especially platforms like Facebook, Instagram, and Twitter, have created a good ecosystem for companies. They are ideal for finding company pages and their contact information, especially for SMEs, freelancers, and other self-employed professionals.

You can analyze what are the potential interest areas of your target audience and identify the best pages and profiles whose followers fit your ICP.

  • Facebook: you can find groups for pretty much everything, including business-oriented ones.
  • Instagram: followers of your competition, local businesses, and specialized profiles can be used as targeting criteria. For example, followers of some real estate agent associations are a good source of leads if you target realtors. You can find more on how to use Instagram to generate leads in “Instagram Lead Generation” and here.
  • Twitter: Similar to Instagram, you can find key Twitter profiles to target their respective followers. Twitter has nearly 220 million daily active users, which tend to be affluent, educated, and techy. You can also build competitive insight by searching the relevant keywords and finding insights on which companies, products, and services are frequently mentioned.

Existing data vendors

The most traditional way is to rely on existing database providers, such as Zoominfo, Clearbit, Apollo, UpLead, Snov.io, Lusha, and hundreds more. Most of them rely on LinkedIn crawling to build and maintain their databases, so you might be better off using it directly.

All of these vendors have their benefits and inconveniences.

  • Benefits: decreased time-to-market, filtering, targeting, large volume of available data, cheaper cost for small data needs.
  • Inconvenience: data quality and segmentation could be more precise (industries and activities), email accuracy can become pricey (Zoominfo can cost over $15k a year), and other sales reps have access to and use the same data sets.

While this can be a good starting point for widely covered industries and types of buyer personas, there are also different alternative paths with more benefits for niche markets. Owning the data acquisition process allows you to control both the quality and quantity of data you need for your lead generation needs.

Tools & Tips

  • Data source identification is directly related to the go-to-market strategy and targeting criteria.
  • Compare “build vs. buy” to test the performance of your own data vs. purchased data (from different perspectives, including quality, cost-effectiveness, conversion performance, data granularity, etc.).
  • Try to find the origin of the data; in most cases, there are governing bodies in charge of regulating/approving company creation/renewal processes.
  • Extraction and automation tools are addressed in the next section.

Data extraction

Once you have defined the critical aspects of your Ideal Customer Profile (ICP) and identified the most relevant sources, you can move on to the next step – extracting (or scraping) the data from these sources. This step aims to have the data in a structured and usable form instead of a non-actionable online format.

Depending on the data source, you might get some or all of the following data points:

  • Company name and details fitting your ICP
  • Prospect name and position fitting your buyer persona
  • Prospect contact details (email, phone number, social links, etc.)
  • Any other relevant information for your campaign (buying signals, targeting criteria, etc.)

In case you only have a company name and no contact details, you will need to find complementary data sources to get the buyer persona’s information (LinkedIn, directories, company website, etc.).

Independently of the nature of the data, there are two main techniques when it comes to data extraction (manual copy-pasting being excluded from this scope as it goes against “scalability” principles).

  1. Templatized extraction using existing tools with pre-configured extractors or workflows. These tools cover most of the widely used platforms and websites (LinkedIn, Facebook, Instagram, Twitter, Google Maps, TripAdvisor, Crunchbase, YellowPages, etc.)
  2. Custom scraping agents for less common websites. Two different options exist here: full custom scraper using Python scripts or custom extractors using no-code web scrapers.

As is often the case when it comes to tools, there is no perfect approach or answer that fits all situations. Each context requires a minimal understanding of the complexity of each approach, and your familiarity with such tools as the learning curve can vary significantly.

Templatized data extraction (pre-configured scraping agents and workflows)

On the “simple” side of pre-configured scraping agents, there are plenty of applications that were historically created for SEO purposes and have some basic (nonetheless interesting) data extraction features (email, phone numbers, social links extraction). These tools are often distributed as desktop applications, the most common providers being Scrapebox, Screaming Frog, and URL Profiler.

On another end of the advanced spectrum of tools (read here “a longer learning curve”), you can find a plethora of automation platforms. We are sharing here the list of the most commonly used platforms for data extraction and automation: Phantombuster, CaptainData, TexAu, Apify, BrightData, and Automatio.

Some of these tools allow configurable workflows ranging from data scraping and enrichment to integration with various outreach or CRM platforms (Hubspot, Pipedrive, Salesforce, etc.). Most offer integration with automation tools like Zapier, Make (historically Integromat), or n8n. The vast majority is distributed in a SaaS mode; however, some provide desktop apps (namely TexAu) with their benefits and inconveniences.

A couple of interesting use cases that these tools can address:

  • Simple case “Linkedin Sales Navigator search”: Input = “Linkedin Sales Navigator search URL” => Output = “leads enriched”
      1. Extract LinkedIn Sales Navigator search results
      2. Enrich search results with email enrichment tools
  • Simple case “local business”: Input “Google Maps search results URL” => Output = list of emails
      1. Extract Google Maps search results
      2. Crawl found websites for emails/phone numbers / social pages links (Instagram / Facebook)
      3. Crawl social pages URLs for additional contact details (email/phone)
  • Advanced use case “Account Based Marketing”: Input = “Company name” => Output = “Buyer persona” with contact details
      1. From the company name, find the company website
      2. From the company name, find the LinkedIn company page
      3. From the LinkedIn company page, identify decision makers (buyer persona) with the Sales Navigator search
      4. Extract the Sales navigator search
      5. Enrich search results with email enrichment tools
      6. Engage with the buyer persona
  • Push into email marketing tools
  • Automate LinkedIn outreach (connection request + follow-up message)

Custom web-scrapers

Data scraping is an automated process of extracting data from different data sources. As we mentioned earlier, there are essentially two main ways to scrap the data, using custom scripts (requires coding) or leveraging no-code extractors (using a “point and click” type of workflow building or automated data format detection).

Overview of main tools

By the end of this step, you should have a good understanding of the different tools available on the market and identify the best approach for your needs. However, focus on the tool only once you’ve identified the best sources for your data needs. Often, people get confused about the order and tend to use only the tools they are familiar with. To quote the famous phrase – “if your only tool is a hammer, then every problem looks like a nail”- different problems require different solutions.

Tools & Tips

  • Advanced technique to scale data extraction from LinkedIn applying a series of multi-filtering searches to fit into the limit of 2,500 search limit and extract dozens of thousands of profiles daily (also known as “sliced bread” technique shared by our friends from TexAu)
  • For additional precision in enrichment phases, we recommend cleaning the database after the extraction step (ex, to get rid of acronyms added by users on LinkedIn, ex MBA, CPA, etc.)
  • To eliminate false positives and have only the people working in the companies of your choice, we recommend doing a reverse matching of company ID with your target accounts (it happens when people have more than one current job on LinkedIn).
  • It is possible to add an extra layer of lead scoring to only work with high-priority buyer personas. For instance, if you need to apply a top-down approach, you can decide to attribute X points for positions containing “Chief / C-word” or “VP/Vice President,” Y points to “Head of” or “Director,” and Z points to “Manager” or “Lead.” You can add thresholds to validate/discard leads automatically and have a manual validation process in between.

Data uniformization and standardization

This step is often overlooked in the data acquisition process. Most of the time, data comes in a messy and non-standardized format (ex, all caps, first and last names containing acronyms, company names with different spellings, non-standardized addresses, etc.).

To maximize the power of your data, you can use different tools dedicated to working with messy data and transforming it into a clean, actionable one. There are plenty of advanced tools for data cleaning and transformation purposes that are suitable for data scientists and professionals with an IT background. In our case, we work primarily with OpenRefine, an open-source tool that can be used by almost everyone who is proficient with Excel / Google Sheets.

Below you can find a couple of fundamental transformations we like to apply to our datasets:

  • Trim leading and trailing whitespace, and eliminate consecutive whitespace.
  • Cluster cells use fuzzy matching functions (ngram, metaphone, levenstein, etc.) and merge with a unique value (ex “Acme,” “Acme, Inc.” and “ACME INC” are all the same companies).
  • Standardize format for proper names, locations, zip codes, etc.
  • Split/merge cells depending on the format (ex, if a full name needs to be transformed into first name + last name)
  • Clean first/last names from acronyms or titles (ex Dr, Ph.D., PMP, “she/her,” etc.)
  • Data reconciliation with external sources (such as Wikidata, Opencorporates, etc.). This helps enrich or standardize data.


  • Example 1: you have the name of the city but not the country; you can reconcile data with Wikidata and find from matched data country)
  • Example 2: you can match the company name with Opencorporates to find officers’ names, company addresses, and other data.

By the end of this step, you should be able to understand the different uniformization use cases and check if your data set requires cleansing / segmenting / enriching. While this step is recommended, it is essential to know that it requires breaking the whole process into non-automated steps, which is essentially the question of volume & time-saving vs. precision & quality. There are cases where volume & time-saving will be preferred over quality & accuracy, and vice-versa.

Tools & Tips

  • Openrefine can also be used to match cells from another dataset (sort of a v-lookup).
  • More sophisticated tools can handle millions of rows for more advanced data matching and table merging needs. Example of use case: you have two datasets from different sources and formats – one contains company-related data and the other prospect contact data with the name of the company – and you want to have all data in one table. The company names are not an exact match and need to be fuzzy matched (cf ex with ACME and ACME INC) with a combination of different matching functions to maximize the output.

Data enrichment and validation

The ultimate step of our process consists in identifying the different engagement or contact points to maximize the efficiency of the lead gen campaigns. Enrichment data points can include email, LinkedIn profile, phone number, etc.

Data enrichment


The most commonly used channel is email; hence email enrichment is the easiest, with plenty of vendors specializing in this niche. The approach consists in “guessing” the email format from a simple input “contact name + company name” by using a combination of structures and pinging the email server for a valid format (ex for John Doe at “company.com” could be [email protected] or [email protected] or [email protected] etc.).

With the emergence of such tools, companies have been implementing “catch-all” type of email servers which results in being unable to provide a sure email format. Vendors have been using different internal databases of formats to provide the most probable combination based on the input provided.

Another way to find emails is to directly provide the targeted prospects’ LinkedIn profile URL. The platform does the heavy lifting by extracting in real time the necessary data points for enrichment (company name + contact name).

Some tools allow searching for specific positions for a list of company names (ex of input would be “position” + “company name”). The platform scans different libraries (internal databases or Google search for LinkedIn profile matches) and uses the matched names + company data for email enrichment.

Among the most common tools for email enrichment, you can find platforms like Hunter, Dropcontact, Apollo.io, Anymailfinder, Icypeas, Datagma, Snov.io, Findymail, etc.

This study analyzes the performance of a few email enrichment tools (enriched to found, found to valid, combination of two and three tools, etc.).

LinkedIn or Twitter profile

Let’s assume you only have contact and company names in your dataset and would like to find their LinkedIn or Twitter profile to add them to your network.

Tools like Phantombuster, TexAu, or CaptainData will do this simple task. You can use Zapier for more advanced automation.

Phone number

A few platforms are also specialized in providing prospects’ phone numbers, like Datagma and Lusha. These can be useful if you want to add a cold-calling or SMS step to your lead-generation process.

Data validation

Data validation is one of the most critical steps in the processes that include the use of data. Appropriate tools ensure that your cold email or outbound campaign will reach its objectives while avoiding penalties for spam traps or high bounce rates.

Some of the most used tools for email validation are Zerobounce, Neverbounce, Bouncer, etc.

Finally, when you have gathered phone numbers and would like to segment landlines from mobile phones, some tools can give you the nature of the number along with other data (like location).

Tools & Tips

  • Some automation scenarios: someone uses your contact form, the data is pushed into your CRM or tools like Airtable / Google Sheet, Zapier detects the trigger (new row or new contact added into CRM) and launches one of the tools, the data is used to launch an automated connection request on LinkedIn;
  • Zerobounce has an option for scoring “catch-all” types of emails. More on that here.

Final words

Data quality (i.e., targeting and segmentation resulting from the marketing strategy) counts for one-third of a campaign’s success. We can’t stress enough the importance of the quality of data in order to generate qualified leads and scale the process successfully.

The other components responsible for the success of an outbound lead generation campaign include:

  • Infrastructure: to ensure the communication is delivered through the most suitable/scalable channel.
  • Content: to raise interest from the prospects/target audience and capture leads while satisfying the Content-Market-Fit.

In this blog post, we have been solely focusing on the data acquisition process for lead generation. If you are interested in learning how to build your infrastructure for sending a large volume of cold emails, please refer to the following blog post.

If you have any questions, please feel free to book a session with me or ask me for a free session on GrowthMentor.

Want to speak with Sardar?

Talk to Sardar about how to scale your lead generation campaigns
View Profile