Thursday, January 30, 2020

Python - Powerful eval() - concatenate string as function

eval(expression[, globals[, locals]])

The arguments are a string and optional globals and locals. If provided, globals must be a dictionary. If provided, locals can be any mapping object.

The expression argument is parsed and evaluated as a Python expression (technically speaking, a condition list) using the globals and locals dictionaries as global and local namespace. If the globals dictionary is present and does not contain a value for the key __builtins__, a reference to the dictionary of the built-in module builtins is inserted under that key before expression is parsed. This means that expression normally has full access to the standard builtins module and restricted environments are propagated. If the locals dictionary is omitted it defaults to the globals dictionary. If both dictionaries are omitted, the expression is executed with the globals and locals in the environment where eval() is called. Note, eval() does not have access to the nested scopes (non-locals) in the enclosing environment.

The return value is the result of the evaluated expression. Syntax errors are reported as exceptions. Example:

def schedule_job(file_name, scheduled_time):
    try:
        err_msg = pre_run_job(get_file(file_name))

# Return error. if no error, then run the schedule

        print(err_msg)

        logging.exception(err_msg)

# Write error to log. If has error, write error to log without affecting other schedules

        if err_msg == '':
            print('file is OK. Now start the schedule.')
            eval(scheduled_time)(call, ["python", get_file(file_name)])

# schedule the job

            run_schedule()
    except Exception as e:
        logging.exception(e)

Here, the argument 'scheduled_time' is actually a function like:

schedule.every(1).second.do()

But since I have lots of schedules to run, I'd rather like to write it into a function so that I don't need to write the whole thing lots of times.

By using eval(), it parses 'scheduled_time' as a function, then I can easily schedule jobs. e.g.

schedule_job('tx/permit/permit.py', 'schedule.every(1).seconds.do')

schedule_job('mt/permit/permit.py', 'schedule.every().monday.at("7:30").do')

Python - Capture error from one python script to another

When schedule a job,

you want to make sure the scheduled job runs successfully.
And you need to make sure that schedules won't affect other schedules if all of them run in parallel.
You need some try catch statement to keep it running, in the mean while, keep track of the error.

So you can use the following functions to do that. By doing so, the error of one job will be written in a log.txt file, in the same time, other jobs will keep running.

import functools
import os
import time
import logging
import schedule
from multiprocessing import *
from datetime import datetime
from subprocess import call, PIPE, Popen, TimeoutExpired
# initialize the log settings

logging.basicConfig(filename='schedule_permit.log', level=logging.INFO)

def pre_run_job(py_file):
    stderr = ''    cmd = ['python', py_file]
    p = Popen(cmd, shell=True, stdout=PIPE, stderr=PIPE)
    try:
        # let the job run 30 seconds to make sure it works fine
        stdout, stderr = p.communicate(timeout=30)  
    except TimeoutExpired:
        p.kill()  # stop the job after 30 seconds

    return stderr

def with_logging(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print('LOG: Running job "%s"' % func.__name__)
        print('Started at ' + str(datetime.now()))
        result = func(*args, **kwargs)
        print('LOG: Job "%s" Stopped' % func.__name__)
        print('Stopped at ' + str(datetime.now()))
        return result

    return wrapper

@with_logging
def run_test():
try:
    # Return error. if no error, then run the schedule
err_msg = pre_run_job(get_file('tx/permit/permit.py'))  
    print(err_msg) 
    logging.exception(err_msg)  
    if err_msg == '':
        print('file is OK. Now start the schedule.')
        schedule.every(1).seconds.do(call, ['python', get_file('tx/permit/permit.py')])
        run_schedule()
except Exception as e:
    logging.exception(e)
@with_loggingdef run_test2():
    try:
        schedule.every(10).seconds.do(print, 'hello')
        run_schedule()
    except Exception as e:
        logging.exception(e)


@with_loggingdef run_job():
    List_of_processes = [Process(target=run_test), Process(target=run_test2)]
    for p in List_of_processes:
        p.start()
    for p in List_of_processes:
        p.join()


if __name__ == "__main__":
    run_job()

Tuesday, January 28, 2020

Python - check if python is running or not

If you scheduled some jobs to run on schedule, and you don't want to start the process again while it is still running, then you can use the following code to check if it is still running. If not, it will start the job; if it is still running, it won't restart the process again.

psutil
psutil (process and system utilities) is a cross-platform library for retrieving information on running processes and system utilization (CPU, memory, disks, network, sensors) in Python. It is useful mainly for system monitoring, profiling and limiting process resources and management of running processes.

import psutil
import os
from subprocess import call


def is_running(script):
    for q in psutil.process_iter():
        if q.name().startswith('python'):
            if len(q.cmdline()) > 1 and script in q.cmdline()[1] and q.pid != os.getpid():
                print("'{}' Process is already running".format(script))
                return True
    return False

if __name__ == "__main__":
    file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'schedule_scripts.py')
    if not is_running(file_path):
        call(["python", file_path])

Python - function to write data into an existing python file

The code below is trying to add a file path to the beginning of an existing .py file.

def write_path(file_path):
    with open(file_path, 'r') as f:
        data = f.read()
    with open(file_path, 'w') as g:
        if data.splitlines()[0] != str_path:
            g.write(str_path + '\n' + data)
        else:
            g.write(data)

Thursday, January 23, 2020

Python - Schedule python scripts to run in parrallel

What is multiprocessing?

Multiprocessing refers to the ability of a system to support more than one processor at the same time. Applications in a multiprocessing system are broken to smaller routines that run independently. The operating system allocates these threads to the processors improving performance of the system.

Reference:

Multiprocessing in Python:

https://www.geeksforgeeks.org/multiprocessing-python-set-1/

Multiprocessing Vs. Threading In Python:

https://timber.io/blog/multiprocessing-vs-multithreading-in-python-what-you-need-to-know/

To start a process, we use start method of Process class.
```
p1.start()
p2.start()
```
Once the processes start, the current program also keeps on executing. In order to stop execution of current program until a process is complete, we use join method.
```
p1.join()
p2.join()
```
As a result, the current program will first wait for the completion of p1 and then p2. Once, they are completed, the next statements of current program are executed.

import schedule
import time
from subprocess import call
import os
from multiprocessing import *

str_path = ['from pathlib import Path\n', 'import sys\n', 'import os\n',

'sys.path.append(str(Path(os.path.dirname(os.getcwd())).parent))\n',

'sys.path.append(os.path.dirname(os.getcwd()))\n']


def run_schedule():
    while True:
        schedule.run_pending()
        time.sleep(5)


def write_path(file_path):
    with open(file_path, 'r') as f:
        data = f.read()
    with open(file_path, 'w') as g:
        if str_path[0] not in data:
            str_path.append(data)
            g.writelines(str_path)
        else:
            g.write(data)


## MT

def run_mt_permit():
    file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'mt/permit/permit.py')
    write_path(file_path)
    schedule.every(2).seconds.do(call, ['python', file_path])
    run_schedule()


## PA

def run_pa_permit():
    file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'pa/permit/permit.py')
    write_path(file_path)
    schedule.every(2).seconds.do(call, ['python', file_path])
    run_schedule()


def run_job():
    List_of_processes = [Process(target=run_pa_permit), Process(target=run_mt_permit)]
    for p in List_of_processes:
        p.start()
    for p in List_of_processes:
        p.join()


if __name__ == "__main__":
    run_job()

Monday, January 20, 2020

SQL - Dynamically unpivot data

----------dynamically unpivot production
DECLARE @colsUnpivot AS NVARCHAR(MAX), @query AS NVARCHAR(MAX)

USE [XXXXXX];
with tbl_metric_date_name as(
SELECT distinct '['+ c.name+ ']' AS 'ColumnName'
FROM sys.columns c JOIN sys.tables t ON c.object_id = t.object_id
WHERE t.name ='OH_Oil_Output$' and c.name <> 'API'
)

SELECT distinct @colsUnpivot = STUFF((
SELECT ',' + ColumnName
FROM tbl_metric_date_name
FOR XML PATH ('')
), 1, 1, '')
FROM tbl_metric_date_name;

set @query =
'IF OBJECT_ID(''dbo.temp_OH_Oil_Prod'',''U'') IS NOT NULL DROP TABLE dbo.temp_OH_Oil_Prod
SELECT API, cast(metric_date as date) as metric_date, [value]
INTO dbo.temp_OH_Oil_Prod
FROM [XXXXXX].[dbo].[OH_Oil_Output$] WITH(NOLOCK)
Unpivot
([value] FOR metric_date in ( ' + @colsUnpivot + '))u'

exec sp_executesql @query;

Friday, January 17, 2020

Python - Return the indexes of all occurrences of a string in a list

Here is the function to return the indexes of a certain string occurrences in a list:

def getIndexPositions(listOfString, certainString):
    ''' Returns the indexes of all occurrences of give element in

        the list- listOfString'''

    indexList = []
    indexPos = 0

    while True:
        try:
            # Search for item in list from indexPos to the end of list

            indexPos = listOfString.index(certainString, indexPos)
            # Add the index position in list

            indexList.append(indexPos)
            indexPos += 1

        except ValueError as e:
            break
    return indexList

Wednesday, January 15, 2020

SQL - search for column/table name in database

USE the database you want to search;
SELECT c.name AS 'ColumnName' ,t.name AS 'TableName'
FROM sys.columns c JOIN sys.tables t ON c.object_id = t.object_id
WHERE (t.name like 'LA%' or t.name like '%luw%' or t.name like 'Louisana%' )
and c.name in ('API','API_NUM','API num','Well_Serial_Num','Well_Serial_No','WELL_SERIA','Well Serial Number','WSN','Serial_no','Well_Serial','well Serial Num')
ORDER BY TableName ,ColumnName;

Monday, January 13, 2020

Python - some tricks with web scrapping ( decompose(), zip(), modify html tags, etc.)

If there are some junk html tab within the tab you want to scrape, e.g.

<table align="center" border="0" cellpadding="0" cellspacing="0" height="0%" summary="Scout Ticket well data content table" width="98%">
......data you want to scrape.......
<table border="0" cellpadding="0" cellspacing="0" height="0%" summary="Plan View Table" width="100%">....junk table....</table>
-----data you want to scrape
</table>

then you can use: soup.decompose()

for table_useless in soup.find_all("table", {"summary": "Plan View Table"}):
    table_useless.decompose()

If there are tags within another tag, you can extract data separately and zip them together, e.g.

then you can use: zip()

header_data = [html.get_contents(header.next) for header in data_points]

detail_data = [item.find('b').next if item.find('b') is not None else 'None' for item in data_points]

final_data = dict(zip(header_data, detail_data))

Nan's Blog