Monday, March 26, 2018

SQL - SCHEMABING thing

To check which views are using a table, you can use:

SELECT VIEW_NAME
FROM INFORMATION_SCHEMA.VIEW_TABLE_USAGE
WHERE TABLE_NAME = 'tbl_well'

To check a certain column in a table being used by other process, you can use:

SELECT DISTINCT *
FROM sys.all_sql_modules
WHERE OBJECT_DEFINITION(OBJECT_ID) LIKE '%tbl_well%'
AND OBJECT_DEFINITION(OBJECT_ID) LIKE '%latitude%';

By including the SCHEMABINDING clause you protect the view/function from unexpected changes to the tables underneath them:

CREATE VIEW [dbo].[v_QC_well]

WITH SCHEMABINDING
...

Reference: 

https://sqlstudies.com/2014/08/06/schemabinding-what-why/

https://www.mssqltips.com/sqlservertip/1692/using-schema-binding-to-improve-sql-server-udf-performance/

Python - Some concepts used in Scrapy, BeautifulSoup, Mechanize

Reference:
CSS https://www.w3schools.com/css/default.asp
XPathhttps://www.w3schools.com/xml/xpath_intro.asp
yieldhttps://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do

Examplehttps://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3

Module "Mechanize"documentation - http://wwwsearch.sourceforge.net/mechanize/faq.html (Python 3 is not yet supported)

Wednesday, March 14, 2018

SQL - Want your query run faster?

Sometimes if you want to check if data in table A not exist in table B, but there is no data in the table B. If table A has large datasets, usually it takes very long time for the query to finish. Here is an alternative way to use not exists

set statistics time on 

select top 100000 a.*
from xxxxx.dbo.tbl_well_bridge a
     left join #Temp b  on a.api = b.api
where b.api is null;

select  top 100000 a.*
from xxxxxx.dbo.tbl_well_bridge a
where api not in (select api from #Temp)

select  top 100000  *
from xxxxx.dbo.tbl_well_bridge a
where not exists (select api from #Temp b where a.api = b.api )

set statistics time off

And the comparison is below. You can see clearly that which one is saving more time in total!


(100000 rows affected)

 SQL Server Execution Times:
   CPU time = 140 ms,  elapsed time = 22396 ms.

(100000 rows affected)

 SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 8155 ms.

(100000 rows affected)

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 6028 ms.

Monday, March 12, 2018

SQL - Want to make your Alias Column Name dynamic?

I was dealing with a not very cleaned CSV file, which is comma delimited, with "" around each column data, yet, data itself has comma, and worse, even double quote.

sample data:










It is hard for SSIS packages to load such a CSV file clean without column shifting. So what I did was to load the data into just one column, and try to separate them to each column in SQL.

From
To


















Here is the way I do it: I combined cursor and dynamic SQL to make the column alias name dynamic:

/****** Script for SelectTopNRows command from SSMS  ******/
IF OBJECT_ID('tempdb..#Names') IS NOT NULL DROP TABLE #Names;

create table #Names ( ColID INT, ColName VARCHAR(50));

INSERT INTO #Names (ColId, ColName)   
Values(1,'[APINo]'),  (2,'[FileNo]'),(3,'[CurrentOperator]'), (4,'[CurrentWellName]'), (5,'[LeaseName]'),(6,'[LeaseNumber]'),
      (7,'[OriginalOperator]'),(8,'[OriginalWellName]'), (9,'[SpudDate]'),(10,'[TD]'),  (11,'[CountyName]'), (12,'[Township]'),
      (13,'[Range]'),(14,'[Section]'), (15,'[QQ]'), (16,'[Footages]'),(17,'[FieldName]'), (18,'[ProducedPools]'),
      (19,'[OilWaterGasCums]'),(20,'[IPTDateOilWaterGas]'),(21,'[Wellbore]'),(22,'[Latitude]'), (23,'[Longitude]'),
      (24,'[WellType]'), (25,'[WellStatus]'), (26,'[CTB]'),  (27,'[WellStatusDate]');
  
SELECT ColId, ColName 
from #Names;
-------------Calculate distance based on county---------------
DECLARE @ColId INT;
DECLARE CURSOR_1 CURSOR 
LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR 

SELECT DISTINCT ColId
FROM  #Names

OPEN CURSOR_1;

FETCH NEXT FROM CURSOR_1 INTO @ColId;
WHILE @@FETCH_STATUS = 0
BEGIN 
    
DECLARE @ColName VARCHAR(50);
DECLARE CURSOR_2 CURSOR 
FOR 

SELECT DISTINCT ColName
FROM  #Names 
WHERE colId =   @ColId

OPEN CURSOR_2;
FETCH NEXT FROM CURSOR_2 INTO @ColName;
WHILE @@FETCH_STATUS = 0
BEGIN 

DECLARE @sql NVARCHAR(max)
DECLARE @column_alias VARCHAR(50)
SET @column_alias = @ColName
SET @sql = '

with tbl_raw as(
SELECT *
from [dbo].[ND_WellIndex_Load]
), tbl_extract as (
select a.id,right(a.raw_data,len(a.raw_data)-charindex(''",'',a.raw_data)-1) as raw_data,left(a.raw_data,charindex(''",'',a.raw_data)) as' + @column_alias +
'from tbl_raw a 
)
update a
set a.raw_Data = b.raw_data ,a.' + @column_alias + '= b.' + @column_alias +
'from tbl_raw a,
tbl_extract b
where a.id = b.id '

exec sp_executesql @sql

FETCH NEXT FROM CURSOR_2 INTO @ColName;
END;
CLOSE CURSOR_2;
DEALLOCATE CURSOR_2;

        FETCH NEXT FROM CURSOR_1 INTO @ColId;

END;
CLOSE CURSOR_1;
DEALLOCATE CURSOR_1;