Jul 31 2008, 01:47 PM
Hi, I need some advice in handling cartesian join.
1. Table Unit_CT has IP address converted into number (ex: 192.168.0.1 --> ip_integer value = 1291200..something like this)
We have to take this number and go to another table ip_cntry_map and get the corresponding country for the ip_integer.
below is the structure of the query (cartesian join)
SELECT A.VENDOR, A.POP, A.FIRST_DATE,B.COUNTRY_CD
FROM UNIT_CT A, IP_CNTRY_MAP B
WHERE A.IP_INTEGER BETWEEN B.IP_START_RANGE AND B.IP_END_RANGE;
Below is the data for IP_CNTRY_MAP table
IP_START_RANGE IP_END_RANGE COUNTRY_CD COUNTRY_NAME
1046512928 1046512943 DE Germany
1046512944 1046512967 GB United Kingdom
1046512968 1046512983 DE Germany
1046512984 1046512991 GB United Kingdom
1046512992 1046512999 DE Germany
1046513000 1046513007 GB United Kingdom
1046513008 1046513015 DE Germany
1046513016 1046513031 GB United Kingdom
1046513032 1046513039 DE Germany
1046513040 1046513047 GB United Kingdom
1046513048 1046513055 DE Germany
1046513056 1046513063 GB United Kingdom
The Unit_CT has about 120million rows and the ip_CNTRY_MAP has about 95000 rows.
Appreciate if you can help me in querying for a fast approach.
Jul 31 2008, 01:56 PM
If you know it is a Cartesian join, then why wouldn't you know how to avoid it in the first place? I'm lost here, help me.
Are you looking for advice?
"Do not write queries which result in Cartesian joins."
Jul 31 2008, 01:59 PM
I've no other choice. I know it is cartesian. I'm asking advice if i can avoid cartesian in this case? or if dealt with cartesian itself, what would the fast approach - like using hints or temp tables or any other better way ?
Jul 31 2008, 02:37 PM
Country map table cannot be one to one? List the IP address and its country instead of creating a range. The range version of it, if being used for a lookup, is very poor design.
Jul 31 2008, 02:40 PM
Thx Steve..will have to write a procedure to list down all the ip_integer values between the start range and the end range in a different table. thats what you meant? thats one of my work around i'm working on.
Any other suggestions you can think of?
Thanks and much appreciate.
Jul 31 2008, 03:18 PM
No join, use a subquery to get the country, based on IP >= start_range AND IP <=end_range, or create a function that returns the country.
Jul 31 2008, 05:11 PM
thx Steve. I tried doing the table method. but the difference in ip_start_range and Ip_end_range is huge for some countries. right now the ip table is just 95000. but when i explode it, it might grow huge...massive..may be close to a billion.
IP_START_RANGE IP_END_RANGE COUNTRY_CD COUNTRY_NAME
251658240 289011535 US United States
289011544 323243895 US United States
469762048 540679167 US United States
543844608 603979775 US United States
738197504 771751935 US United States
905969664 956301311 US United States
1610612736 1660944383 US United States
3590324224 3623890943 US United States
Jul 31 2008, 05:29 PM
How many range buckets are there? Put them into range-based partitions (as in partitioned table). And what is to keep a range or an IP associated with a particular country? What is the granularity of the range? If you are doing something like 1-7, then 8-14, then 15-21, it will be a nightmare. Subnet the addresses into larger chunks.
Jul 31 2008, 05:47 PM
Well I cant do that. The reason being, that the range is not consistent..
ex: range 1-7, 10-20, 21-28, 34-50, 70-90..
there are some numbers missing from the previous bucket.
Jul 31 2008, 05:51 PM
So when 8 comes along, what do you do?
Jul 31 2008, 05:54 PM
8 is bucketed to unknown category. This temp table in turn will be joined to another table. There country for 8 will be null..and eventually in the report, it will be shown as *Unknown Country*
Jul 31 2008, 06:06 PM
A lookup function will be your best bet then. You can use that directly in the select statement. Would take about five minutes to code up.
create or replace function get_country (ip in number)
return varchar2 as
v_country varchar2(20) := 'Unknown';
select country into v_country
where ip >= start_range
and ip <= end_range;
when no_data_found then
--when too_many_rows then
--can this happen?
Jul 31 2008, 06:17 PM
thx chris. I've a lookup which is currently being used in informatica. this takes hell lot of time as this is a row by row operation.
I'm working out something with hints directly in the first post i sent. (using some nested loop and so) ..currently taking about 1hr20mins. will research around for some better solutions.
Thx a lot for you are time and effort in addressing. much appreciate this.
Jul 31 2008, 06:29 PM
It is not a row by row lookup. It is a record by record lookup, whatever you feed the function. start_range and end_range are both indexed. There is only one country value in that range, right? And only one condition or row that will match.
create table ip_country
insert into ip_country values (1,6,'US');
insert into ip_country values (7,15,'GE');
SQL> create or replace function get_country (ip in number)
2 return varchar2 as
3 v_country varchar2(20) := 'Unknown';
5 select country into v_country
6 from ip_country
7 where ip >= start_range
8 and ip <= end_range;
9 return v_country;
11 when no_data_found then
12 return v_country;
13 --when too_many_rows then
14 --can this happen?
SQL> set serveroutput on
SQL> select get_country(2) from dual;
SQL> select get_country(16) from dual;
Jul 31 2008, 06:49 PM
Yes. Only one country for a given range.
I'm trying this out. I will put everything in place and will get the stats out.
Will benchmark with my existing process and will let you know.
Looks like this definitely should give some process improvement.
Thanks for your help!
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here