How to Do Keyword Research When You Don’t Speak the Language

There is a specific kind of pressure that comes with managing a website migration for a market where you don’t speak the native tongue. When you are dealing with thousands of SKUs and informational pages across multiple international domains, the margin for error is thin. You aren’t just moving pages; you are moving the organic visibility and brand authority that took years to build.

I recently navigated this during a large-scale project where we needed to launch several international sites simultaneously. We had a professional translation agency—which is the standard approach—but I quickly realized that a translator’s primary objective is linguistic accuracy, not SEO performance. To ensure the new sites didn't lose their footing, I had to establish a definitive list of terms we were already ranking for.

This wasn't a time for gap analysis or searching for new opportunities; that’s a luxury for a post-launch phase. This was a mission of digital preservation. To do this effectively, you have to look past the words and focus on the data patterns. Here is the framework for building a reliable keyword baseline when you can't read the language you’re researching.

Part 1: Establishing the Data Baseline

When you’re tasked with a migration, your first priority is understanding what Google already thinks your site is about in that specific region. While you might be tempted to run your English keyword list through Google Translate, Gemini, or Claude to get a starting point, that only gives you a theoretical list of what "should" be happening. It doesn't tell you what is actually happening on the ground.

The Power of Google Search Console (GSC)

Google Search Console is the only tool that provides the unvarnished truth of your current performance. While third-party tools like SEMRush or Ahrefs are excellent for estimating market volume, they rely on their own database snapshots. For a high-stakes migration, you need your own specific site data.

I went directly into the GSC accounts for each individual country domain and language subfolder. By filtering for that specific region, I could see exactly which queries were driving impressions and clicks. This is the only way to see the "long tail" of how people actually interact with your content.

This is the essential first step because GSC shows you the queries that are already "working," even if they aren't grammatically perfect. These are the "hidden" keywords that your translation agency might never suggest because they aren't "proper" language. If your site is already ranking for a specific string of characters, that string represents a cluster of intent that must be accounted for in your new site structure.

Exporting the Raw Reality

When you export this data, don't just look at your top 10 or 20 terms. In a migration, the "boring" functional terms—the specific ways people ask about shipping, returns, or technical specifications—are just as vital as the high-volume head terms. These are the queries that keep your informational and customer support pages alive.

Export the maximum amount of query data available. You want the full list of what has triggered an impression over the last six to twelve months. This raw list serves as your "source of truth." Even if you can't read it yet, this spreadsheet contains the DNA of your international organic performance.

Part 2: Analyzing Data Patterns (When You Can’t Read the Words)

Once you have your export, you’ll likely be staring at a wall of text that looks like a foreign language mixed with what appears to be broken code. This is where you have to stop thinking like a linguist and start thinking like a pattern recognizer.

Identifying Encoding and Character Errors

One of the most immediate hurdles in international research is character encoding. If you see "weird" characters in your lists—like square root symbols ($\sqrt{}$), copyright symbols ($\copyright$), or random periods in the middle of words—do not discard them as junk data. These are technical artifacts that provide a roadmap to your most important terms.

These symbols are often created by special characters—like umlauts in German, Swedish, or Danish, or tildes in Spanish—that didn't survive the export process from GSC to your spreadsheet. For example, the word for tissue paper might appear as papel cre(√)@ instead of the proper papel crepé.

When you see these patterns, you have found a high-intent keyword that has been mangled by software. By identifying these clusters, you can map them back to the correct terms. If you see ten different variations of a word containing a square root symbol, you’ve identified a core term that requires protection during the move. Your job is to find the "proper" version of that mangled word and ensure it is the primary target for your migrated page.

The Phonetic Search Reality

GSC exports exactly how people type, which includes misspellings and phonetic shortcuts. A common example is something like "colordorado" versus the proper "color dorado" (golden color).

Now, it is true that Google’s modern search algorithms are sophisticated enough to group these variations together. If you search for a common misspelling, Google will usually show the results for the correctly spelled "canonical" version of that term. Because of this, you don't need to—and shouldn't—put misspellings in your H1 headers or your primary copy.

However, seeing these phonetic clusters in your data is a critical diagnostic tool. It helps you verify that the "proper" term you are targeting is actually the one the audience is trying to reach. If you find a massive cluster of phonetic searches that don't seem to match your translated keyword list, it’s a red flag that your translation agency may have chosen a word that is technically correct but not used by real people in that market. You use the "messy" data to find the "correct" version that people actually use.

Brand Names and Borrowed Terms

Brand names tend to be "borrowed" terms. A product like a "Telecaster" or a "KitKat" usually remains consistent across languages. However, you must stay alert for brand equivalents. For various legal or trademark reasons, a product might have an entirely different name in another country.

In your data exports, look for these constants. If you see a term that looks like a brand name but isn't one you recognize from your English catalog, it might be the local trademarked equivalent. You need to flag these instances to ensure your SKU pages are correctly localized. If you miss a local brand equivalent during a migration, you aren't just missing a keyword; you're missing the entire identity of the product in that market.

Part 3: Validating Units and Local Context

Global content marketing often fails when it forgets that the world uses different systems of measurement. This is one of the easiest ways to spot a site that has been "translated" but not "localized."

The Measurement Friction Point

In your data exports, look specifically for units of measurement. In Spanish-speaking markets, you’ll see "pulgadas" instead of inches. In most of the world, you’ll see millimeters (mm), meters (m), or kilograms (kg).

You may also see queries like "que es un ft?" (what is a foot?) or "how much is a stone?" (a common British unit of weight). If these queries appear in your ranking data, it means your current pages are serving a need for users who are trying to bridge the gap between your provided specs and their local understanding.

When you migrate, you cannot lose these references. If your data shows people are searching for the metric equivalent of your imperial measurements, your new localized pages must handle that conversion. This isn't just an SEO win; it’s a user experience necessity.

Gendered Terms and Plurality

Most languages are more grammatically complex than English, particularly regarding gendered terms and plurality. In your data, you will often see what looks like the same word with slightly different suffixes.

In many languages, a singular version of a word might be used for a general search, while the plural version is used when someone is ready to browse a category or make a purchase. Similarly, masculine and feminine versions of a term can carry different intent depending on the product category.

Since you don't speak the language, you don't need to master the grammar. You simply need to ensure that the "canonical" version of the term you are using covers these variations. If GSC shows significant volume for both singular and plural versions, verify with your translation service that the localized copy naturally incorporates both.

Part 4: Building the Migration Map

The final step of this process is turning this mountain of "unreadable" data into a functional baseline for the migration. Since you are limited in how much gap analysis you can do alone, your goal is to use this data to keep the translation agency and the development team honest.

The Redirect Logic

Once you have your cleaned list of ranking terms, use them to inform your 301 redirect map. You aren't just moving /product-a/ to /es/producto-a/. You are ensuring that the page that used to rank for "colordorado" (the phonetic cluster) is moving to a page that is optimized for "color dorado" (the proper term).

Creating a Validation Workflow

Instead of just accepting a batch of translated files, run a spot check. Take the top 50 terms you found in GSC for a specific subfolder and see if they (or their "proper" equivalents) appear in the new H1 or H2 headers provided by the agency.

If there is a mismatch, ask the question. You can point to the data and say, "The search data shows this term is our primary driver of impressions, but the new copy uses a different word. Can we confirm why?" Often, you’ll find the agency used a formal word when a more casual, high-volume term was needed.

The Bottom Line

International SEO is less about being a linguist and more about being a steward of data. You are the bridge between the technical reality of how Google sees your site and the linguistic reality of how a translator sees the world.

By focusing on the raw data in Google Search Console, resolving encoding issues, and identifying the "real" terms behind phonetic search patterns, you can launch a migrated site with the confidence that you aren't leaving your hard-earned rankings behind. Stay close to the data, respect the local context of measurements, and never assume that a "perfect" translation is a "perfect" SEO strategy

Next
Next

Paid Search vs. Organic Strategy: Navigating the AI-Driven Synergy