`The Query Class`

class dwopt._qry._Qry(operator, from_=None, select=None, join=None, where=None, group_by=None, having=None, order_by=None, sql=None)

Generic query class. There are 2 main usages:

Make sql query.
Make and run summary query on top of the generated sql query. In particular, the sql query is placed into a sub query clause , and the summary query operates on the intermediate result that would be arrived by the query.

These usages work in the conext of a common program pattern where the summary query has preprocessing steps such as a case statement or a where clause. The intermediate tables are often not usefull on it’s own, thus avoiding explicitly materilising it gives performance gain. A example as below:

with x as (
    select
        a.*
        ,case when amt < 1000 then amt*1.2 else amt end as amt
    from test a
    where score > 0.5
)
select
    time,cat
    ,count(1) n
    ,avg(score) avgscore, round(sum(amt)/1e3,2) total
from x
group by time,cat
order by n desc

Query classes should not be instantiated directly by user , the appropriate query object should be returned by the appropriate database operator object’s qry method. Query classes:

dwopt._qry.PgQry: Postgre query class.
dwopt._qry.LtQry: Sqlite query class.
dwopt._qry.OcQry: Oracle query class.

Parameters

operator (dwopt.db._Db) – Database operator object to operate on generated queries.
from (str) – Query table name str format.
select (str) – Columns in str format.
join (str) – Join clause in str format.
where (str) – Conditions in str format.
group_by (str) – Group by names in str format.
having (str) – Conditions in str format.
order_by (str) – Order by names in str format.
sql (str) – Sql code in str format.

Notes

Alternative to initializing the query object by all desired clauses , various convenience methods are given to augment the query. Use the methods.

Examples

Example of multiple join statements, and the underlying sql.

(
    lt.qry('test x')
    .select('x.id','y.id as yid','x.score','z.score as zscore')
    .join("test y","x.id = y.id+1","x.id <= y.id+1")
    .join("test z","x.id = z.id+2","x.id >= z.id+1")
    .where('x.id < 10','z.id < 10')
    .head()
)

with x as (
    select x.id,y.id as yid,x.score,z.score as zscore
    from test x
    left join test y
        on x.id = y.id+1
        and x.id <= y.id+1
    left join test z
        on x.id = z.id+2
        and x.id >= z.id+1
    where x.id < 10
        and z.id < 10
)
select * from x limit 5

Example of group by and related clauses, and the underlying sql.

(
    lt.qry('test x')
    .select('x.cat,y.cat as bcat'
        ,'sum(x.score) bscore','sum(y.score) yscore','count(1) n')
    .join("test y","x.id = y.id+1")
    .where('x.id < 1000')
    .group_by('x.cat,y.cat')
    .having('count(1) > 50','sum(y.score) > 100')
    .order_by('x.cat desc','sum(y.score) desc')
    .run()
)

select x.cat,y.cat as bcat,sum(x.score) bscore,sum(y.score) yscore,count(1) n
from test x
left join test y
    on x.id = y.id+1
where x.id < 1000
group by x.cat,y.cat
having count(1) > 50
    and sum(y.score) > 100
order by x.cat desc,sum(y.score) desc

case(col, *args, cond=None, els='NULL')

Add a case when statement to select clause in query. Calling this method multiple times would add multiple statements.

Parameters

col (str) – Column name of the resulting column.
*args (str) – Positional argument in form ‘condition then treatement’.
cond (dict) – Dictionary of condition str to treatment str mappings.
els (str) – Treatment str for else clause, default NULL.

Returns

New query object with clause added.

Return type

dwopt._qry._Qry

Examples

>>> from dwopt import lt
>>> lt.qry("test").case('col',"x>5 then 'A'").print()
    select *
        ,case when x>5 then 'A' else NULL end as col
    from test
>>> lt.qry("test").case('col',cond = {'x>5':'A','x<2':'B'}).print()
    select *
        ,case
            when x>5 then A
            when x<2 then B
            else NULL
        end as col
    from test
>>> lt.qry("test").select('a','b') \
... .case('col'
...     ,'x<2 then B'
...     ,cond = {'x>5':'A'}
...     ,els = 'C').print()
    select a,b
        ,case
            when x<2 then B
            when x>5 then A
            else C
        end as col
    from test

cols()

Fetch column names of the sub query table.

Return type: Column names as list of str

Examples

>>> import pandas as pd
>>> from dwopt import lt
>>>
>>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'int'})
>>> lt.write(tbl,'test')
>>> lt.qry("test").cols()
    ['col1', 'col2']

dist(*args)

Count number of distinct occurances of data within specified columns , or combination of columns, of the sub query table.

Parameters: *args (str or [str]) – Either column names as str, or iterator of column name str.
Return type: pandas.Series

Examples

>>> import pandas as pd
>>> from dwopt import lt
>>>
>>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'int'})
>>> lt.write(tbl,'test')
>>> lt.qry("test").where("col1 < 5").dist('col1','col2',['col1','col2'])
    count(distinct col1)                   5
    count(distinct col2)                   5
    count(distinct col1 || '_' || col2)    5
    Name: 0, dtype: int64

from_(from_)

Add the from clause to query. Alternative to simply specifying table name as the only argument of the qry method. Use the qry method.

Parameters: from (str) – Table name str.
Returns: New query object with clause added.
Return type: dwopt._qry._Qry

Examples

>>> from dwopt import lt
>>> lt.qry().from_("test").print()
    select * from test

group_by(*args)

Add the group by clause to query.

Parameters: *args (str or [str]) – Group by columns in str format, or iterator of them.
Returns: New query object with clause added.
Return type: dwopt._qry._Qry

Examples

>>> from dwopt import lt
>>> lt.qry('test').select('a,count(1) n').group_by('a').print()
    select a,count(1) n
    from test
    group by a
>>> lt.qry('test').select('a,b,count(1) n').group_by(['a','b']).print()
    select a,b,count(1) n
    from test
    group by a,b

hash(*args)

Calculate a simple configuration of oracle hash to arrive at a indicative has value for a number of columns or all columns of a sub query table. Hash value is a number or symbol that is calculated from data , and is sensitive to any small changes in data. It serves as method to detect if any data element in data is changed.

Parameters: *args (str) – Column names in str. If no value is given, a cols method will be performed to fetch the list of all columns, from which a hash will be calculated.
Return type: int

Examples

>>> import pandas as pd
>>> from dwopt import oc
>>>
>>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)})
>>> oc.drop('test')
>>> oc.create('test',{'col1':'int','col2':'int'})
>>> oc.write(tbl,'test')
>>> oc.qry("test").where("col1 < 5").hash()

having(*args)

Add the having clause to query.

Parameters: *args (str or [str]) – Conditions in str format, or iterator of them.
Returns: New query object with clause added.
Return type: dwopt._qry._Qry

Examples

>>> from dwopt import lt
>>> lt.qry('test').select('a,count(1) n').group_by('a') \
... .having("count(1)>5").print()
    select a,count(1) n
    from test
    group by a
    having count(1)>5

head()

Fetch top 5 rows of the sub query table.

Return type: pandas.DataFrame

Examples

>>> import pandas as pd
>>> from dwopt import lt
>>>
>>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'int'})
>>> lt.write(tbl,'test')
>>> lt.qry("test").where("col1 < 5").head()
        col1  col2
    0     0    10
    1     1    11
    2     2    12
    3     3    13
    4     4    14

join(tbl, *args, how='left')

Add a join clause to query. Calling this method multiple times would add multiple clauses.

Parameters

tbl (str) – Table name to join to in str format.
*args (str) – Joining conditions in str format.
how (str) – The join keyword in str format, for example: inner, cross. Default left.

Returns

New query object with clause added.

Return type

dwopt._qry._Qry

Examples

>>> from dwopt import lt
>>> lt.qry('test x') \
... .select('x.id','y.id as yid','x.score','z.score as zscore') \
... .join("test y","x.id = y.id+1","x.id <= y.id+1") \
... .join("test z","x.id = z.id+2","x.id >= z.id+1") \
... .print()
    select x.id,y.id as yid,x.score,z.score as zscore
    from test x
    left join test y
        on x.id = y.id+1
        and x.id <= y.id+1
    left join test z
        on x.id = z.id+2
        and x.id >= z.id+1

len()

Length of the sub query table.

Return type: int

Examples

>>> import pandas as pd
>>> from dwopt import lt
>>>
>>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'int'})
>>> lt.write(tbl,'test')
>>> lt.qry("test").len()
    10

mimx(col)

Summarise on max and min values of a column for a sub query table.

Parameters: col (str) – Column name as str.
Return type: pandas.Series

Examples

>>> import pandas as pd
>>> from dwopt import lt
>>>
>>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'int'})
>>> lt.write(tbl,'test')
>>> lt.qry("test").where("col1 < 5").mimx('col1')
    max(col1)    4
    min(col1)    0
    Name: 0, dtype: int64

order_by(*args)

Add the order by clause to query.

Parameters: *args (str or [str]) – Order by names in str format, or iterator of them.
Returns: New query object with clause added.
Return type: dwopt._qry._Qry

Examples

>>> from dwopt import lt
>>> lt.qry('test').select('a,count(1) n').group_by('a') \
... .having("count(1)>5").order_by('a','n desc').print()
    select a,count(1) n
    from test
    group by a
    having count(1)>5
    order by a,n desc

print()

Print the underlying query.

Examples

>>> from dwopt import lt
>>> lt.qry().print()
    select * from test

run(sql=None, *args, **kwargs)

Run the underlying query directly, without using it to make summary queries.

Parameters

sql –
*args – Positional arguments to pass on to database operator’s run method.
**kwargs – Keyword arguments to pass on to database operator’s run method.

Returns

Returned by the database operator’s run method.

Return type

pandas.DataFrame or None

Examples

>>> import pandas as pd
>>> from dw import lt
>>>
>>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'text'})
>>> lt.write(tbl,'test')
>>> lt.qry("test").where("col2 = 'b'").run()
    col1  col2
0     2     b

select(*args, sep=',')

Add the select clause to query.

Parameters

*args (str or [str]) – Column name str as positional arguments , or an iterator of str column names.
sep (str) – Symbol used for seperating column names, default ,.

Returns

New query object with clause added.

Return type

dwopt._qry._Qry

Examples

>>> from dwopt import lt
>>> lt.qry('test').select("id,score,amt").print()
    select id,score,amt
    from test
>>> lt.qry('test').select(["id","score","amt"]).print()
    select id,score,amt
    from test
>>> lt.qry('test').select("id","score","amt").print()
    select id,score,amt
    from test

sql(sql)

Replace entire query by specified sql. This allows arbituary advanced sql to be incorporated into framework.

Parameters: sql (str) – Sql code in str format.
Returns: New query object with clause added.
Return type: dwopt._qry._Qry

Examples

>>> from dwopt import lt
>>> lt.qry().sql("select * from test \nconnect by level <= 5").print()
    select * from test
    connect by level <= 5

top()

Fetch top row of the sub query table.

Return type: pandas.Series

Examples

>>> import pandas as pd
>>> from dwopt import lt
>>>
>>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'int'})
>>> lt.write(tbl,'test')
>>> lt.qry("test").where("col1 < 5").head()
    col1     0
    col2    10
    Name: 0, dtype: int64

valc(group_by, agg=None, order_by=None, n=True)

Value count of a column or combination of columns. A value count is a group by query, with total number of row of each group calculated. Also allow custom summary calculation, and custom order by clauses to be added.

Parameters

group_by (str) – Group by clause as str.
agg (str) – Custom aggeregation clause as str.
order_by (str) – Order by clause as str.
n (Bool) – Should the value count column be automatically created or not. Default to be True.

Return type

pandas.DataFrame

Examples

>>> import pandas as pd
>>> from dwopt import lt
>>>
>>> tbl = pd.DataFrame({'col1': range(10), 'col2': range(10,20)})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'int'})
>>> lt.write(tbl,'test')
>>> lt.qry("test").case("cat","col1 > 3 then 'A'",els = "'B'") \
... .where("col1 < 5").valc('cat',"sum(col2) col2")
       cat  n  col2
    0   B   4    46
    1   A   1    14

where(*args)

Add the where clause to query.

Parameters: *args (str or [str]) – Conditions in str format, or iterator of condition str.
Returns: New query object with clause added.
Return type: dwopt._qry._Qry

Examples

>>> from dwopt import lt
>>> lt.qry('test').where('x>5','x<10').print()
    select * from test
    where x>5
        and x<10
>>> lt.qry('test').where(['x>5','x<10','y <> 5']).print()
    select * from test
    where x>5
        and x<10
        and y <> 5

The Query Class

`The Query Class`