- pip3 install databricks-cli
- Check if installed: which databricks
- Check version: databricks —version
- databricks configure —token
- databricks clusters list
- To edit default configure: vi .databrickscfg
- Create scope: databricks secrets create-scope —scope demo
- Put APP_key into the scope: databricks put —scope demo —key APP_key —string-value some-value
- To configure password file:
- vim password.txt (add at the end to remove newline character: set noendofline binary) use :wq to quit editing
- databricks put —scope demo —key password —binary-file password.txt
- To delete scopes: databricks secrets delete-scope —scope demo
- To push project to Databricks workspace and load the .whl file to dbfs:
- To install the .whl file from CLI:
- Get cluster-id: databricks clusters get —cluster-name demo
- To install lib: databricks libraries install —cluster-id your-cluser-id —whl dbfs:/tmp/whl-name.whl
- To export changes made in Databricks and sync with local and use :git diff weather-wheel.py to see the differences:
- To import local changes to sync with Databricks(completely overwrite): databricks workspace import -o -l PYTHON weather-notebook.py /cli-demo/weather-notebook
- Some other interactions with Databricks CLI:
- Start a cluster: databricks clusters start —cluster-id your-cluster-id
- List jobs: databricks jobs list
- Get job detail: databricks jobs get —job-id job-id-number
- Run a job: databricks jobs run-now —job-id job-id-number
- Get running job detail: databricks runs get-output —run-id id-from-last-step
- To terminate(not delete) a cluster: databricks clusters delete —cluster-id your-cluster-id
To create secrets using Databricks CLI:
- databricks secrets create-scope --scope your-scope-name
- databricks secrets put --scope your-scope-name --key username --string-value blabla
- databricks secrets list --scope your-scope-name
To check secrets in Databricks:
- dbutils.secrets.listScopes()
- dbutils.secrets.list('demo')
- dbutils.secrets.get(scope="demo", key="app_key")
- trick to see the key:
app_key = []
for x in dbutils.secrets.get(scope="demo", key="app_key"):
if x is not None:
app_key.append(x)
print("app_key:", ' '.join(app_key))
To create .whl file:
- python -m build
Per best practice, we have created a partitioned table. However, if you create a partitioned table from existing data, Spark SQL does not automatically discover the partitions and register them in the Metastore.
MSCK REPAIR TABLE recovers all the partitions in the directory of a table and updates the Hive metastore. When creating a table using PARTITIONED BY clause, partitions are generated and registered in the Hive metastore. However, if the partitioned table is created from existing data, partitions are not registered automatically in the Hive metastore. User needs to run MSCK REPAIR TABLE to register the partitions. MSCK REPAIR TABLE on a non-existent table or a table without partitions throws an exception. Another way to recover partitions is to use ALTER TABLE RECOVER PARTITIONS.