Extract the domain name from a string

Have you ever needed to extract the domain name from a string using PostgreSQL? There are various methods to accomplish this task like using PL/Python and a URL parsing library such as urlparse, but that is a little more complex. Below is probably the simplest method and the regex is portable to other systems/languages.

SELECT replace(substring(column_name from '.*://([^/]*)'),'www.','') as domain_name 
FROM schema_name.table_name;

This SQL will simply return the domain name without the www. This assumes the string contains the full URL including the http:// or https://. This will leave any subdomains in place (other than www). Here is an example:

SELECT replace(substring('http://photos.google.com' from '.*://([^/]*)'),'www.','') as domain_name;
|    domain_name    |
| photos.google.com |
(1 row)

Another interesting method is using ts_debug. It’s PostgreSQL specific but works very well.

SELECT replace((x.ld).token,'www.','')
   SELECT ts_debug(column_name)
   FROM table_name
) x(ld)
WHERE (x.ld).alias = 'host';

Here we are basically parsing out the domain name from the string using a full-text search method then removing the www. if one is found. Below are both methods using my tables. This is just parsing the domain from a Google Places result (JSON).

SELECT place_id, replace((x.ld).token,'www.','')
   SELECT place_id, ts_debug(response->'result'->>'website')
   FROM results_api_google_places_detail
) x(place_id,ld)
WHERE (x.ld).alias = 'host';

, replace(substring(response->'result'->>'website' from '.*://([^/]*)'),'www.','') as domain_name 
FROM results_api_google_places_detail;

Matt McGuire

Proven technology leader with 25+ years of experience in the installation, administration, development, design, and maintenance of various enterprise data related systems.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.