The Query Class
- class dwopt._qry._Qry(operator, from_=None, select=None, join=None, where=None, group_by=None, having=None, order_by=None, sql=None)
Generic query class. There are 2 main usages:
Make sql query.
Make and run summary query on top of the generated sql query. In particular, the sql query is placed into a sub query clause , and the summary query operates on the intermediate result that would be arrived by the query.
These usages work in the conext of a common program pattern where the summary query has preprocessing steps such as a case statement or a where clause. The intermediate tables are often not usefull on it’s own, thus avoiding explicitly materilising it gives performance gain. A example as below:
with x as ( select a.* ,case when amt < 1000 then amt*1.2 else amt end as amt from test a where score > 0.5 ) select time,cat ,count(1) n ,avg(score) avgscore, round(sum(amt)/1e3,2) total from x group by time,cat order by n desc
Query classes should not be instantiated directly by user , the appropriate query object should be returned by the appropriate database operator object’s qry method. Query classes:
dwopt._qry.PgQry: Postgre query class.
dwopt._qry.LtQry: Sqlite query class.
dwopt._qry.OcQry: Oracle query class.
- Parameters
operator (dwopt.db._Db) – Database operator object to operate on generated queries.
from (str) – Query table name str format.
select (str) – Columns in str format.
join (str) – Join clause in str format.
where (str) – Conditions in str format.
group_by (str) – Group by names in str format.
having (str) – Conditions in str format.
order_by (str) – Order by names in str format.
sql (str) – Sql code in str format.
Notes
Alternative to initializing the query object by all desired clauses , various convenience methods are given to augment the query. Use the methods.
Examples
Example of multiple join statements, and the underlying sql.
( lt.qry('test x') .select('x.id','y.id as yid','x.score','z.score as zscore') .join("test y","x.id = y.id+1","x.id <= y.id+1") .join("test z","x.id = z.id+2","x.id >= z.id+1") .where('x.id < 10','z.id < 10') .head() )
with x as ( select x.id,y.id as yid,x.score,z.score as zscore from test x left join test y on x.id = y.id+1 and x.id <= y.id+1 left join test z on x.id = z.id+2 and x.id >= z.id+1 where x.id < 10 and z.id < 10 ) select * from x limit 5
Example of group by and related clauses, and the underlying sql.
( lt.qry('test x') .select('x.cat,y.cat as bcat' ,'sum(x.score) bscore','sum(y.score) yscore','count(1) n') .join("test y","x.id = y.id+1") .where('x.id < 1000') .group_by('x.cat,y.cat') .having('count(1) > 50','sum(y.score) > 100') .order_by('x.cat desc','sum(y.score) desc') .run() )
select x.cat,y.cat as bcat,sum(x.score) bscore,sum(y.score) yscore,count(1) n from test x left join test y on x.id = y.id+1 where x.id < 1000 group by x.cat,y.cat having count(1) > 50 and sum(y.score) > 100 order by x.cat desc,sum(y.score) desc
- case(col, *args, cond=None, els='NULL')
Add a case when statement to select clause in query. Calling this method multiple times would add multiple statements.
- Parameters
col (str) – Column name of the resulting column.
*args (str) – Positional argument in form ‘condition then treatement’.
cond (dict) – Dictionary of condition str to treatment str mappings.
els (str) – Treatment str for else clause, default
NULL
.
- Returns
New query object with clause added.
- Return type
Examples
>>> from dwopt import lt >>> lt.qry("test").case('col',"x>5 then 'A'").print() select * ,case when x>5 then 'A' else NULL end as col from test >>> lt.qry("test").case('col',cond = {'x>5':'A','x<2':'B'}).print() select * ,case when x>5 then A when x<2 then B else NULL end as col from test >>> lt.qry("test").select('a','b') \ ... .case('col' ... ,'x<2 then B' ... ,cond = {'x>5':'A'} ... ,els = 'C').print() select a,b ,case when x<2 then B when x>5 then A else C end as col from test
- cols()
Fetch column names of the sub query table.
- Return type
Column names as list of str
Examples
>>> import pandas as pd >>> from dwopt import lt >>> >>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'int'}) >>> lt.write(tbl,'test') >>> lt.qry("test").cols() ['col1', 'col2']
- dist(*args)
Count number of distinct occurances of data within specified columns , or combination of columns, of the sub query table.
- Parameters
*args (str or [str]) – Either column names as str, or iterator of column name str.
- Return type
pandas.Series
Examples
>>> import pandas as pd >>> from dwopt import lt >>> >>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'int'}) >>> lt.write(tbl,'test') >>> lt.qry("test").where("col1 < 5").dist('col1','col2',['col1','col2']) count(distinct col1) 5 count(distinct col2) 5 count(distinct col1 || '_' || col2) 5 Name: 0, dtype: int64
- from_(from_)
Add the from clause to query. Alternative to simply specifying table name as the only argument of the qry method. Use the qry method.
- Parameters
from (str) – Table name str.
- Returns
New query object with clause added.
- Return type
Examples
>>> from dwopt import lt >>> lt.qry().from_("test").print() select * from test
- group_by(*args)
Add the group by clause to query.
- Parameters
*args (str or [str]) – Group by columns in str format, or iterator of them.
- Returns
New query object with clause added.
- Return type
Examples
>>> from dwopt import lt >>> lt.qry('test').select('a,count(1) n').group_by('a').print() select a,count(1) n from test group by a >>> lt.qry('test').select('a,b,count(1) n').group_by(['a','b']).print() select a,b,count(1) n from test group by a,b
- hash(*args)
Calculate a simple configuration of oracle hash to arrive at a indicative has value for a number of columns or all columns of a sub query table. Hash value is a number or symbol that is calculated from data , and is sensitive to any small changes in data. It serves as method to detect if any data element in data is changed.
- Parameters
*args (str) – Column names in str. If no value is given, a cols method will be performed to fetch the list of all columns, from which a hash will be calculated.
- Return type
int
Examples
>>> import pandas as pd >>> from dwopt import oc >>> >>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)}) >>> oc.drop('test') >>> oc.create('test',{'col1':'int','col2':'int'}) >>> oc.write(tbl,'test') >>> oc.qry("test").where("col1 < 5").hash()
- having(*args)
Add the having clause to query.
- Parameters
*args (str or [str]) – Conditions in str format, or iterator of them.
- Returns
New query object with clause added.
- Return type
Examples
>>> from dwopt import lt >>> lt.qry('test').select('a,count(1) n').group_by('a') \ ... .having("count(1)>5").print() select a,count(1) n from test group by a having count(1)>5
- head()
Fetch top 5 rows of the sub query table.
- Return type
pandas.DataFrame
Examples
>>> import pandas as pd >>> from dwopt import lt >>> >>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'int'}) >>> lt.write(tbl,'test') >>> lt.qry("test").where("col1 < 5").head() col1 col2 0 0 10 1 1 11 2 2 12 3 3 13 4 4 14
- join(tbl, *args, how='left')
Add a join clause to query. Calling this method multiple times would add multiple clauses.
- Parameters
tbl (str) – Table name to join to in str format.
*args (str) – Joining conditions in str format.
how (str) – The join keyword in str format, for example:
inner
,cross
. Defaultleft
.
- Returns
New query object with clause added.
- Return type
Examples
>>> from dwopt import lt >>> lt.qry('test x') \ ... .select('x.id','y.id as yid','x.score','z.score as zscore') \ ... .join("test y","x.id = y.id+1","x.id <= y.id+1") \ ... .join("test z","x.id = z.id+2","x.id >= z.id+1") \ ... .print() select x.id,y.id as yid,x.score,z.score as zscore from test x left join test y on x.id = y.id+1 and x.id <= y.id+1 left join test z on x.id = z.id+2 and x.id >= z.id+1
- len()
Length of the sub query table.
- Return type
int
Examples
>>> import pandas as pd >>> from dwopt import lt >>> >>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'int'}) >>> lt.write(tbl,'test') >>> lt.qry("test").len() 10
- mimx(col)
Summarise on max and min values of a column for a sub query table.
- Parameters
col (str) – Column name as str.
- Return type
pandas.Series
Examples
>>> import pandas as pd >>> from dwopt import lt >>> >>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'int'}) >>> lt.write(tbl,'test') >>> lt.qry("test").where("col1 < 5").mimx('col1') max(col1) 4 min(col1) 0 Name: 0, dtype: int64
- order_by(*args)
Add the order by clause to query.
- Parameters
*args (str or [str]) – Order by names in str format, or iterator of them.
- Returns
New query object with clause added.
- Return type
Examples
>>> from dwopt import lt >>> lt.qry('test').select('a,count(1) n').group_by('a') \ ... .having("count(1)>5").order_by('a','n desc').print() select a,count(1) n from test group by a having count(1)>5 order by a,n desc
- print()
Print the underlying query.
Examples
>>> from dwopt import lt >>> lt.qry().print() select * from test
- run(sql=None, *args, **kwargs)
Run the underlying query directly, without using it to make summary queries.
- Parameters
sql –
*args – Positional arguments to pass on to database operator’s run method.
**kwargs – Keyword arguments to pass on to database operator’s run method.
- Returns
Returned by the database operator’s run method.
- Return type
pandas.DataFrame or None
Examples
>>> import pandas as pd >>> from dw import lt >>> >>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'text'}) >>> lt.write(tbl,'test') >>> lt.qry("test").where("col2 = 'b'").run() col1 col2 0 2 b
- select(*args, sep=',')
Add the select clause to query.
- Parameters
*args (str or [str]) – Column name str as positional arguments , or an iterator of str column names.
sep (str) – Symbol used for seperating column names, default
,
.
- Returns
New query object with clause added.
- Return type
Examples
>>> from dwopt import lt >>> lt.qry('test').select("id,score,amt").print() select id,score,amt from test >>> lt.qry('test').select(["id","score","amt"]).print() select id,score,amt from test >>> lt.qry('test').select("id","score","amt").print() select id,score,amt from test
- sql(sql)
Replace entire query by specified sql. This allows arbituary advanced sql to be incorporated into framework.
- Parameters
sql (str) – Sql code in str format.
- Returns
New query object with clause added.
- Return type
Examples
>>> from dwopt import lt >>> lt.qry().sql("select * from test \nconnect by level <= 5").print() select * from test connect by level <= 5
- top()
Fetch top row of the sub query table.
- Return type
pandas.Series
Examples
>>> import pandas as pd >>> from dwopt import lt >>> >>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'int'}) >>> lt.write(tbl,'test') >>> lt.qry("test").where("col1 < 5").head() col1 0 col2 10 Name: 0, dtype: int64
- valc(group_by, agg=None, order_by=None, n=True)
Value count of a column or combination of columns. A value count is a group by query, with total number of row of each group calculated. Also allow custom summary calculation, and custom order by clauses to be added.
- Parameters
group_by (str) – Group by clause as str.
agg (str) – Custom aggeregation clause as str.
order_by (str) – Order by clause as str.
n (Bool) – Should the value count column be automatically created or not. Default to be True.
- Return type
pandas.DataFrame
Examples
>>> import pandas as pd >>> from dwopt import lt >>> >>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'int'}) >>> lt.write(tbl,'test') >>> lt.qry("test").case("cat","col1 > 3 then 'A'",els = "'B'") \ ... .where("col1 < 5").valc('cat',"sum(col2) col2") cat n col2 0 B 4 46 1 A 1 14
- where(*args)
Add the where clause to query.
- Parameters
*args (str or [str]) – Conditions in str format, or iterator of condition str.
- Returns
New query object with clause added.
- Return type
Examples
>>> from dwopt import lt >>> lt.qry('test').where('x>5','x<10').print() select * from test where x>5 and x<10 >>> lt.qry('test').where(['x>5','x<10','y <> 5']).print() select * from test where x>5 and x<10 and y <> 5