`The Database Operator Class`

class dwopt.db._Db(eng)

Generic database operator class. There are 3 main usages:

Run sql statment.
Run DDL/DML via the convenience methods.
Create query object, which allows running summary query.

This base class should not be instantiated directly by user , it’s child classes relevant to varies databases should be instantiated and used instead. Child classes:

dwopt.db.Pg: Postgre database operator class.
dwopt.db.Lt: Sqlite database operator class.
dwopt.db.Oc: Oracle database operator class.

The operator objects:

dwopt.pg: postgre.
dwopt.lt: sqlite.
dwopt.oc: Oracle.

The operator constructors:

dwopt.Pg(eng): Postgre.
dwopt.Lt(eng): Sqlite.
dwopt.Oc(eng): Oracle.

Parameters: eng (sqlalchemy engine) – Database connection engine to be used.

eng

Database connection engine that is used.

Type: sqlalchemy engine

add_pkey(tbl_nme, pkey)

Make and run an add primary key statement. Example sql code:

alter table tbl_nme add primary key (col1, col2, ...)

Parameters

tbl_nme (str) – Name of the database table to operate on.
tbl_nme – columns names in form col1, col2, ....

Examples

Make example table on sqlite, then add primary key constraint.

>>> from dw import lt
>>>
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'text'})
>>> lt.add_pkey('col1,col2')

create(tbl_nme, dtypes=None, **kwargs)

Make and run a create table statment. Example sql code:

create table tbl_nme(
    ,col1 dtype1
    ,col2 dtype2
    ...
)

Parameters

tbl_nme (str) – Name of the table to create.
dtypes (dict, optional) – Dictionary of column names to data types mappings.
**kwargs – Convenient way to add mappings. Keyword to argument mappings will be added to the dtypes dictionary. The keyword cannot be one of the positional parameter names.

Notes

Datatypes

Datatypes varies across databses, common example below:

Type	Sqlite	Postgre	Oracle
integer	integer	bigint	number
float	real	float8	float
string	text	varchar(20)	varchar2(20)
datetime	text	timestamp	timestamp
date	text	date	date

Details:

Other statements

The dtypes mappings also allow other sql statements which are part of a create statement to be added. For example a primary key constraint.

Details:

Examples

Make example table on sqlite.

>>> from dw import lt
...
>>> lt.drop('test')
>>> lt.create('test'
...     ,{
...         'id':'integer'
...         ,'score':'real'
...         ,'amt':'integer'
...         ,'cat':'text'
...         ,'time':'text'
...         ,'constraint df_pk':
...             'primary key (id)'
...     })

drop(tbl_nme)

Make and run a drop table statement. Does not throw error if table not exist. Example sql code:

drop tbl_nme

Oracle sql code:

drop tbl_nme purge

Parameters: tbl_nme (str) – Name of the database table to drop.
Returns: tbl_nme not exist if table not exist , tbl_nme dropped otherwise.
Return type: str

Notes

Error catching

Does not catch specifically the error related to table not exist in this version.

Examples

Make example table on sqlite, then drop it.

>>> from dw import lt
>>>
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'text'})
>>> lt.drop('test')

list_cons()

List all constraints.

Return type: pandas.DataFrame

Notes

Postgre sql used:

select * from information_schema.constraint_table_usage

Details:

https://www.postgresql.org/docs/current/infoschema-constraint-table-usage.html

list_tables(owner)

List all tables on database or specified schema.

Parameters: owner (str) – Only applicable for oracle. Name of the schema.
Return type: pandas.DataFrame

Notes

Postgre sql used:

select
    table_catalog,table_schema,table_name
    ,is_insertable_into,commit_action
from information_schema.tables
where table_schema
not in ('information_schema','pg_catalog')

Sqlite sql used:

select * from sqlite_schema
where type ='table'
and name NOT LIKE 'sqlite_%'

Oracle sql used:

select/*+PARALLEL (4)*/ owner,table_name
    ,max(column_name),min(column_name)
from all_tab_columns
where owner = ':owner'
group by owner,table_name

Details:

https://www.postgresql.org/docs/current/infoschema-tables.html https://www.sqlite.org/schematab.html https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/ALL_TAB_COLUMNS.html

qry(*args, **kwargs)

Make query object. Different database operator object method returns different query object.

Parameters

*args – Positional arguments of the query object.
**kwargs – keyword arguments of the query object.

Return type

dwopt._qry._Qry

Examples

Make query object from sqlite database operator object.

>>> from dwopt import lt
>>> lt.qry("test").where("x>5").print()
    select * from test
    where x>5

run(sql=None, args=None, pth=None, mods=None, **kwargs)

Run sql statement.

Support argument passing, text replacement and reading statements from sql script.

Parameters

sql (str, optional) – The sql statement to run. Only 1 statement is allowed.
args (dict, optional) – Dictionary of argument name str to argument str mappings. These arguments are passed to the database, to function as data for the argument names. See the notes and the examples section for details.
pth (str, optional) – Path to sql script, ignored if the sql parameter is not None. The script can hold one and only one sql statement, typically a significant piece of table creation statement.
mods (dict, optional) – Dictionary of modification name str to modification str mappings. Replaces modification name in the sql by the respective modification str. See the notes and the examples section for details.
**kwargs – Convenient way to add modification mappings. Keyword to argument mappings will be added to the mods dictionary. The keyword cannot be one of the positional parameter names.

Returns

Returns dataframe if the database returns any result. Returns dataframe with column names and zero rows if running query that returns zero rows. Returns None otherwise, typically when running DDL/DML statement.

Return type

pandas.DataFrame or None

Notes

The args and the mods parameter

An argument name is denoted in the sql by prepending a colon symbol : before a str.

Similiarly, a modification name is denoted by prepending a colon symbol : before a str in the sql. The end of str is to be followed by a symbol other than a lower or upper case letter, or a number. It is also ended before a line break.

The args parameter method of passing arguments is less prone to unintended sql injection, while the mods paramter method of text replacement gives much more flexibility when it comes to programatically generate sql statment. For example when database does not support argument passing on DDL/DML statements.

Examples

Make example table on sqlite.

>>> import pandas as pd
>>> from dw import lt
>>>
>>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'int'})
>>> lt.write(tbl,'test')

Run sql.

>>> lt.run("select * from test")
    col1  col2
0     1     3
1     2     4

Run sql with argument passing.

>>> lt.run("select * from test where col1 = :cond",args = {'cond':2})
    col1  col2
0     2     4

Run sql with text modification.

>>> tbl_nme = 'test2'
>>> lt.run("drop table :tbl",mods = {'tbl':tbl_nme})
>>> lt.run("create table :tbl as select * from test where :col = 1"
>>>        , mods = {'tbl':tbl_nme}, col = 'col1')
>>> lt.run("select *,col2 + 1 as :col from :tbl_nme"
>>>        , tbl_nme = tbl_nme, col = 'col3')
    col1  col2  col3
0     1     3     4

table_cols(sch_tbl_nme)

Show information of specified table’s columns.

Notes

Postgre sql used:

select column_name, data_type
from information_schema.columns
where table_schema = ':schema_nme'
and table_name = ':tbl_nme'

Oracle sql used:

select/*+PARALLEL (4)*/ *
from all_tab_columns
where owner = ':schema_nme'
and table_name = ':tbl_nme'

Details:

https://www.postgresql.org/docs/current/infoschema-columns.html https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/ALL_TAB_COLUMNS.html

Parameters: sch_tbl_nme (str) – Table name in format: schema.table.
Return type: pandas.DataFrame

table_sizes()

List sizes of all tables in current schema.

Return type: pandas.DataFrame

Notes

Oracle sql used:

select/*+PARALLEL (4)*/
    tablespace_name,segment_type,segment_name
    ,sum(bytes)/1024/1024 table_size_mb
from user_extents
group by tablespace_name,segment_type,segment_name

Details:

https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/USER_EXTENTS.html

write(tbl, tbl_nme)

Make and run a insert statement. Example sql code:

insert into tbl_nme (col1, col2, ...)
values (:col1, :col2, ...)

With arguments to sql being:

{
    ['col1' : data1, 'col2' : data2, ...]
    ,['col1' : data3, 'col2' : data4, ...]
    ...
}

Parameters

tbl (pandas.DataFrame) – Payload Dataframe with data to insert.
tbl_nme (str) – Name of the database table to insert into.

Notes

Datetime

Datetime objects are converted into str before inserting.

Null values

Behaviour with null values are not tested for current version.

Examples

Make example table on sqlite, then insert rows into it.

>>> import pandas as pd
>>> from dw import lt
>>>
>>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'text'})
>>> lt.write(tbl,'test')

write_nodup(tbl, tbl_nme, pkey, where=None)

Make and run a insert statement without creating duplicates on the database table. Implemented as below process:

Make and run a select statement with optionally provided where clause.
If step 1 returns any results and the payload table in non-empty , remove duplicates on the payload table, using the provided primary key columns as judge of duplication.
Make insert statement on the non-duplicating payload data.

Example sql code:

select * from tbl_nme where where_clause;

insert into tbl_nme (col1, col2, ...)
values (:col1, :col2, ...)

With arguments to sql being:

{
    ['col1' : data1, 'col2' : data2, ...]
    ,['col1' : data3, 'col2' : data4, ...]
    ...
}

Parameters

tbl (pandas.DataFrame) – Payload Dataframe with data to insert.
pkey ([str]) – Iterable of column name str.
tbl_nme (str) – Name of the database table to insert into.
where (str) – where clause in str form. The where keyword is not needed.

Examples

Make example table on sqlite, then insert duplicated rows into it.

>>> import pandas as pd
>>> from dw import lt
>>>
>>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
>>> tbl2 = pd.DataFrame({'col1': [1, 3], 'col2': ['a', 'c']})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'text'})
>>> lt.write(tbl,'test')
>>> lt.write_nodup(tbl2,'test',['col1'],"col1 < 4")
>>> lt.run("select * from test")
    col1  col2
0     1     a
1     2     b
2     3     c

The Database Operator Class

`The Database Operator Class`