Extract the domain name from a string

Have you ever needed to extract the domain name from a string using PostgreSQL? There are various methods to accomplish this task like using PL/Python and a URL parsing library such as urlparse, but that is a little more complex. Below is probably the simplest method and the regex is portable to other systems/languages.

SELECT replace(substring(column_name from '.*://([^/]*)'),'www.','') as domain_name 
FROM schema_name.table_name;

This SQL will simply return the domain name without the www. This assumes the string contains the full URL including the http:// or https://. This will leave any subdomains in place (other than www). Here is an example:

SELECT replace(substring('http://photos.google.com' from '.*://([^/]*)'),'www.','') as domain_name;
+-------------------+
|    domain_name    |
+-------------------+
| photos.google.com |
+-------------------+
(1 row)

Another interesting method is using ts_debug. It's PostgreSQL specific but works very well.

SELECT replace((x.ld).token,'www.','')
FROM (
   SELECT ts_debug(column_name)
   FROM table_name
) x(ld)
WHERE (x.ld).alias = 'host';

Here we are basically parsing out the domain name from the string using a full-text search method then removing the www. if one is found. Below are both methods using my tables. This is just parsing the domain from a Google Places result (JSON).

SELECT place_id, replace((x.ld).token,'www.','')
FROM (
   SELECT place_id, ts_debug(response->'result'->>'website')
   FROM results_api_google_places_detail
) x(place_id,ld)
WHERE (x.ld).alias = 'host';

SELECT 
  place_id
, replace(substring(response->'result'->>'website' from '.*://([^/]*)'),'www.','') as domain_name 
FROM results_api_google_places_detail;