Have you ever needed to extract the domain name from a string using PostgreSQL? There are various methods to accomplish this task like using PL/Python and a URL parsing library such as urlparse, but that is a little more complex. Below is probably the simplest method and the regex is portable to other systems/languages.
SELECT replace(substring(column_name from '.*://([^/]*)'),'www.','') as domain_name FROM schema_name.table_name;
This SQL will simply return the domain name without the www. This assumes the string contains the full URL including the http:// or https://. This will leave any subdomains in place (other than www). Here is an example:
SELECT replace(substring('http://photos.google.com' from '.*://([^/]*)'),'www.','') as domain_name; +-------------------+ | domain_name | +-------------------+ | photos.google.com | +-------------------+ (1 row)
Another interesting method is using ts_debug. It’s PostgreSQL specific but works very well.
SELECT replace((x.ld).token,'www.','') FROM ( SELECT ts_debug(column_name) FROM table_name ) x(ld) WHERE (x.ld).alias = 'host';
Here we are basically parsing out the domain name from the string using a full-text search method then removing the www. if one is found. Below are both methods using my tables. This is just parsing the domain from a Google Places result (JSON).
SELECT place_id, replace((x.ld).token,'www.','') FROM ( SELECT place_id, ts_debug(response->'result'->>'website') FROM results_api_google_places_detail ) x(place_id,ld) WHERE (x.ld).alias = 'host'; SELECT place_id , replace(substring(response->'result'->>'website' from '.*://([^/]*)'),'www.','') as domain_name FROM results_api_google_places_detail;