Tuesday, July 31, 2018

SQL - change exponential data to number

SELECT api, CONVERT(numeric(14,0), CAST(api AS FLOAT)) as new_api
FROM [dbo].[tbl_PA_Perforated]
where api like '%e%'



Python - multipart/form-data post requests

multipart/form-data contains boundary to separate name/value pairs. The boundary acts like a marker of each chunk of name/value pairs passed when a form gets submitted. The boundary is automatically added to a content-type of a request header.
The form with enctype="multipart/form-data" attribute will have a request header Content-Type : multipart/form-data; boundary --- WebKit193844043-h (browser generated vaue).
The payload passed looks something like this:
Content-Type: multipart/form-data; boundary=—-WebKitFormBoundary7MA4YWxkTrZu0gW

    --—-WebKitFormBoundary7MA4YWxkTrZu0gW
    Content-Disposition: form-data; name=”file”; filename=”captcha
    Content-Type:

    --—-WebKitFormBoundary7MA4YWxkTrZu0gW
    Content-Disposition: form-data; name=”action

    submit
    --—-WebKitFormBoundary7MA4YWxkTrZu0gW--
e.g.

from requests_toolbelt import MultipartEncoder
......

params = {
    "__EVENTVALIDATION": event_validation,
    "__VIEWSTATE": view_state,
    "MainContent_ScriptManager1_HiddenField": decode_content,
    "AutoCompleteCASNumbers": auto_CASnumber,
    "__VIEWSTATEGENERATOR": view_stategenerator,
    "ctl00$MainContent$cboHydSub": 'Submitted',
    "ctl00$MainContent$ddlRangeType": "Between",
    "ctl00$MainContent$tbRangeStartDate": fromDate,
    "ctl00$MainContent$tbRangeEndDate": toDate,
......}
url = 'https://fracfocusdata.org/DisclosureSearch/Search.aspx'
m = MultipartEncoder(params)
headers['Content-Type'] = m.content_type
resp = requests.post(url,headers=headers,data=m)
........



Reference:

https://www.jianshu.com/p/902452189ca9

Tuesday, July 24, 2018

SQL - Try_parse()

TRY_PARSE does two things - parse text using a specific culture and return NULL if the cast fails.


e.g.


with tbl as(
  select distinct api, ltrim(rtrim([GL/Ground Level/Elevation_above_MSL])) as GL, ltrim(rtrim(kb)) as KB
  FROM .[dbo].[tbl_NAWAT_COMPL_DirectionlSurvey] with (nolock)
  where kb is not null and kb <> 'NULL' and api is not null
 )

 select API, GL, TRY_PARSE(KB as float) as KB
 from(
select api, GL,
case when right(KB,4) = 'feet' then ltrim(rtrim(left(KB,len(KB)-4)))
when replace(KB,' ','') like '%@%usft%' then  left(right(KB, len(KB)-charindex('@',KB)),charindex('usft',right(KB, len(KB)-charindex('@',KB)-1)))
else KB
end as KB
from tbl
)d


Here I use try_parse because I only have two cases, but there are also characters beyond these two cases. since try_parse will return null for failed cases, I don't need to worry other cases.


Thursday, July 19, 2018

Python - Perform full outer join as in SQL using pandas

import pandas as pd


...............

lease_list = context['pending_lease']
report_list = context['production_report']

# step1: Create two data frames and use panda merge to outer join two files on production_report_id
df_lease = pd.DataFrame([])
df_lease = df_lease.append(lease_list)

df_report = pd.DataFrame([])
df_report = df_report.append(report_list)

result = pd.merge(df_lease, df_report, on=list(report_list[0].keys())[0], how='outer')
result = result.fillna('')

# dataframe to list
pending_lease_prod = result.to_dict(orient='records')
..............

Reference:

https://pandas.pydata.org/pandas-docs/stable/merging.html



Monday, July 16, 2018

Python - sorted(list) vs. list.sort()

sorted() returns a new sorted list, leaving the original list unaffected. list.sort() sorts the list in-place, mutating the list indices, and returns None (like all in-place operations).
sorted() works on any iterable, not just lists. Strings, tuples, dictionaries (you'll get the keys), generators, etc., returning a list containing all elements, sorted.
  • Use list.sort() when you want to mutate the list, sorted() when you want a new sorted object back. Use sorted() when you want to sort something that is an iterable, not a list yet.
  • For lists, list.sort() is faster than sorted() because it doesn't have to create a copy. For any other iterable, you have no choice.
  • You cannot retrieve the original positions. Once you called list.sort() the original order is gone.

Python - some useful small functions

# find substring within a string
def find_between(s, first, last):
    try:
        start = s.index(first) + len(first)
        end = s.index(last, start)
        return s[start:end]
    except ValueError:
        return ""

# find index of certain charactor
def find(str, ch):
    for i, ltr in enumerate(str):
        if ltr == ch:
            yield i


# reformat epoch_time to YYYY-mm-dd
def format_date(data):
    if data == None or data == 'null' or find_between(data, "(", ")") == '':
        return ''    
    else:
        epoch_time = int(find_between(data,"(",")"))
        if epoch_time < 0 and epoch_time >= -1262304000:
            data = datetime(1970, 1, 1) + timedelta(seconds=epoch_time)
        else:
            if epoch_time >= 0 and epoch_time <= 2145916800:
                data = datetime.utcfromtimestamp(epoch_time)
            else:
                if epoch_time < -1262304000:
                    data = datetime(1970, 1, 1) + timedelta(milliseconds=epoch_time)
                else:
                    if epoch_time > 2145916800:
                        data = datetime.utcfromtimestamp(epoch_time / 1000)

        data = f"{data:%Y-%m-%d}"        
        return data