The Database Operator Class

class dwopt.db._Db(eng)

Generic database operator class. There are 3 main usages:

  1. Run sql statment.

  2. Run DDL/DML via the convenience methods.

  3. Create query object, which allows running summary query.

This base class should not be instantiated directly by user , it’s child classes relevant to varies databases should be instantiated and used instead. Child classes:

  • dwopt.db.Pg: Postgre database operator class.

  • dwopt.db.Lt: Sqlite database operator class.

  • dwopt.db.Oc: Oracle database operator class.

The operator objects:

  • dwopt.pg: postgre.

  • dwopt.lt: sqlite.

  • dwopt.oc: Oracle.

The operator constructors:

  • dwopt.Pg(eng): Postgre.

  • dwopt.Lt(eng): Sqlite.

  • dwopt.Oc(eng): Oracle.

Parameters

eng (sqlalchemy engine) – Database connection engine to be used.

eng

Database connection engine that is used.

Type

sqlalchemy engine

add_pkey(tbl_nme, pkey)

Make and run an add primary key statement. Example sql code:

alter table tbl_nme add primary key (col1, col2, ...)
Parameters
  • tbl_nme (str) – Name of the database table to operate on.

  • tbl_nme – columns names in form col1, col2, ....

Examples

Make example table on sqlite, then add primary key constraint.

>>> from dw import lt
>>>
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'text'})
>>> lt.add_pkey('col1,col2')
create(tbl_nme, dtypes=None, **kwargs)

Make and run a create table statment. Example sql code:

create table tbl_nme(
    ,col1 dtype1
    ,col2 dtype2
    ...
)
Parameters
  • tbl_nme (str) – Name of the table to create.

  • dtypes (dict, optional) – Dictionary of column names to data types mappings.

  • **kwargs – Convenient way to add mappings. Keyword to argument mappings will be added to the dtypes dictionary. The keyword cannot be one of the positional parameter names.

Notes

Datatypes

Datatypes varies across databses, common example below:

Type

Sqlite

Postgre

Oracle

integer

integer

bigint

number

float

real

float8

float

string

text

varchar(20)

varchar2(20)

datetime

text

timestamp

timestamp

date

text

date

date

Details:

Other statements

The dtypes mappings also allow other sql statements which are part of a create statement to be added. For example a primary key constraint.

Details:

Examples

Make example table on sqlite.

>>> from dw import lt
...
>>> lt.drop('test')
>>> lt.create('test'
...     ,{
...         'id':'integer'
...         ,'score':'real'
...         ,'amt':'integer'
...         ,'cat':'text'
...         ,'time':'text'
...         ,'constraint df_pk':
...             'primary key (id)'
...     })
drop(tbl_nme)

Make and run a drop table statement. Does not throw error if table not exist. Example sql code:

drop tbl_nme

Oracle sql code:

drop tbl_nme purge
Parameters

tbl_nme (str) – Name of the database table to drop.

Returns

tbl_nme not exist if table not exist , tbl_nme dropped otherwise.

Return type

str

Notes

Error catching

Does not catch specifically the error related to table not exist in this version.

Examples

Make example table on sqlite, then drop it.

>>> from dw import lt
>>>
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'text'})
>>> lt.drop('test')
list_cons()

List all constraints.

Return type

pandas.DataFrame

Notes

Postgre sql used:

select * from information_schema.constraint_table_usage

Details:

https://www.postgresql.org/docs/current/infoschema-constraint-table-usage.html

list_tables(owner)

List all tables on database or specified schema.

Parameters

owner (str) – Only applicable for oracle. Name of the schema.

Return type

pandas.DataFrame

Notes

Postgre sql used:

select
    table_catalog,table_schema,table_name
    ,is_insertable_into,commit_action
from information_schema.tables
where table_schema
not in ('information_schema','pg_catalog')

Sqlite sql used:

select * from sqlite_schema
where type ='table'
and name NOT LIKE 'sqlite_%'

Oracle sql used:

select/*+PARALLEL (4)*/ owner,table_name
    ,max(column_name),min(column_name)
from all_tab_columns
where owner = ':owner'
group by owner,table_name

Details:

https://www.postgresql.org/docs/current/infoschema-tables.html https://www.sqlite.org/schematab.html https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/ALL_TAB_COLUMNS.html

qry(*args, **kwargs)

Make query object. Different database operator object method returns different query object.

Parameters
  • *args – Positional arguments of the query object.

  • **kwargs – keyword arguments of the query object.

Return type

dwopt._qry._Qry

Examples

Make query object from sqlite database operator object.

>>> from dwopt import lt
>>> lt.qry("test").where("x>5").print()
    select * from test
    where x>5
run(sql=None, args=None, pth=None, mods=None, **kwargs)

Run sql statement.

Support argument passing, text replacement and reading statements from sql script.

Parameters
  • sql (str, optional) – The sql statement to run. Only 1 statement is allowed.

  • args (dict, optional) – Dictionary of argument name str to argument str mappings. These arguments are passed to the database, to function as data for the argument names. See the notes and the examples section for details.

  • pth (str, optional) – Path to sql script, ignored if the sql parameter is not None. The script can hold one and only one sql statement, typically a significant piece of table creation statement.

  • mods (dict, optional) – Dictionary of modification name str to modification str mappings. Replaces modification name in the sql by the respective modification str. See the notes and the examples section for details.

  • **kwargs – Convenient way to add modification mappings. Keyword to argument mappings will be added to the mods dictionary. The keyword cannot be one of the positional parameter names.

Returns

Returns dataframe if the database returns any result. Returns dataframe with column names and zero rows if running query that returns zero rows. Returns None otherwise, typically when running DDL/DML statement.

Return type

pandas.DataFrame or None

Notes

The args and the mods parameter

An argument name is denoted in the sql by prepending a colon symbol : before a str.

Similiarly, a modification name is denoted by prepending a colon symbol : before a str in the sql. The end of str is to be followed by a symbol other than a lower or upper case letter, or a number. It is also ended before a line break.

The args parameter method of passing arguments is less prone to unintended sql injection, while the mods paramter method of text replacement gives much more flexibility when it comes to programatically generate sql statment. For example when database does not support argument passing on DDL/DML statements.

Examples

Make example table on sqlite.

>>> import pandas as pd
>>> from dw import lt
>>>
>>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'int'})
>>> lt.write(tbl,'test')

Run sql.

>>> lt.run("select * from test")
    col1  col2
0     1     3
1     2     4

Run sql with argument passing.

>>> lt.run("select * from test where col1 = :cond",args = {'cond':2})
    col1  col2
0     2     4

Run sql with text modification.

>>> tbl_nme = 'test2'
>>> lt.run("drop table :tbl",mods = {'tbl':tbl_nme})
>>> lt.run("create table :tbl as select * from test where :col = 1"
>>>        , mods = {'tbl':tbl_nme}, col = 'col1')
>>> lt.run("select *,col2 + 1 as :col from :tbl_nme"
>>>        , tbl_nme = tbl_nme, col = 'col3')
    col1  col2  col3
0     1     3     4
table_cols(sch_tbl_nme)

Show information of specified table’s columns.

Notes

Postgre sql used:

select column_name, data_type
from information_schema.columns
where table_schema = ':schema_nme'
and table_name = ':tbl_nme'

Oracle sql used:

select/*+PARALLEL (4)*/ *
from all_tab_columns
where owner = ':schema_nme'
and table_name = ':tbl_nme'

Details:

https://www.postgresql.org/docs/current/infoschema-columns.html https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/ALL_TAB_COLUMNS.html

Parameters

sch_tbl_nme (str) – Table name in format: schema.table.

Return type

pandas.DataFrame

table_sizes()

List sizes of all tables in current schema.

Return type

pandas.DataFrame

Notes

Oracle sql used:

select/*+PARALLEL (4)*/
    tablespace_name,segment_type,segment_name
    ,sum(bytes)/1024/1024 table_size_mb
from user_extents
group by tablespace_name,segment_type,segment_name

Details:

https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/USER_EXTENTS.html

write(tbl, tbl_nme)

Make and run a insert statement. Example sql code:

insert into tbl_nme (col1, col2, ...)
values (:col1, :col2, ...)

With arguments to sql being:

{
    ['col1' : data1, 'col2' : data2, ...]
    ,['col1' : data3, 'col2' : data4, ...]
    ...
}
Parameters
  • tbl (pandas.DataFrame) – Payload Dataframe with data to insert.

  • tbl_nme (str) – Name of the database table to insert into.

Notes

Datetime

Datetime objects are converted into str before inserting.

Null values

Behaviour with null values are not tested for current version.

Examples

Make example table on sqlite, then insert rows into it.

>>> import pandas as pd
>>> from dw import lt
>>>
>>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'text'})
>>> lt.write(tbl,'test')
write_nodup(tbl, tbl_nme, pkey, where=None)

Make and run a insert statement without creating duplicates on the database table. Implemented as below process:

  1. Make and run a select statement with optionally provided where clause.

  2. If step 1 returns any results and the payload table in non-empty , remove duplicates on the payload table, using the provided primary key columns as judge of duplication.

  3. Make insert statement on the non-duplicating payload data.

Example sql code:

select * from tbl_nme where where_clause;

insert into tbl_nme (col1, col2, ...)
values (:col1, :col2, ...)

With arguments to sql being:

{
    ['col1' : data1, 'col2' : data2, ...]
    ,['col1' : data3, 'col2' : data4, ...]
    ...
}
Parameters
  • tbl (pandas.DataFrame) – Payload Dataframe with data to insert.

  • pkey ([str]) – Iterable of column name str.

  • tbl_nme (str) – Name of the database table to insert into.

  • where (str) – where clause in str form. The where keyword is not needed.

Examples

Make example table on sqlite, then insert duplicated rows into it.

>>> import pandas as pd
>>> from dw import lt
>>>
>>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
>>> tbl2 = pd.DataFrame({'col1': [1, 3], 'col2': ['a', 'c']})
>>> lt.drop('test')
>>> lt.create('test',{'col1':'int','col2':'text'})
>>> lt.write(tbl,'test')
>>> lt.write_nodup(tbl2,'test',['col1'],"col1 < 4")
>>> lt.run("select * from test")
    col1  col2
0     1     a
1     2     b
2     3     c