Extract the domain name from a string
Have you ever needed to extract the domain name from a string using PostgreSQL? There are various methods to accomplish this task like using PL/Python and a URL parsing library such as urlparse, but that is a little more complex. Below is probably the simplest method and the regex is portable to other systems/languages.
SELECT replace(substring(column_name from '.*://([^/]*)'),'www.','') as domain_name
FROM schema_name.table_name;
This SQL will simply return the domain name without the www. This assumes the string contains the full URL including the http:// or https://. This will leave any subdomains in place (other than www). Here is an example:
SELECT replace(substring('http://photos.google.com' from '.*://([^/]*)'),'www.','') as domain_name;
+-------------------+
| domain_name |
+-------------------+
| photos.google.com |
+-------------------+
(1 row)
Another interesting method is using ts_debug. It’s PostgreSQL specific but works very well.
SELECT replace((x.ld).token,'www.','')
FROM (
SELECT ts_debug(column_name)
FROM table_name
) x(ld)
WHERE (x.ld).alias = 'host';
Here we are basically parsing out the domain name from the string using a full-text search method then removing the www. if one is found. Below are both methods using my tables. This is just parsing the domain from a Google Places result (JSON).
SELECT place_id, replace((x.ld).token,'www.','')
FROM (
SELECT place_id, ts_debug(response->'result'->>'website')
FROM results_api_google_places_detail
) x(place_id,ld)
WHERE (x.ld).alias = 'host';
SELECT
place_id
, replace(substring(response->'result'->>'website' from '.*://([^/]*)'),'www.','') as domain_name
FROM results_api_google_places_detail;