There are many cases where it’s desirable to create or truncate a table from within Databricks before pushing data towards it. Also executing a stored procedure might be of help within a variety of data projects (e.g. performing a merge on the SQL-database). Although this might seem basic functionality, it’s something that isn’t as straightforward when using PySpark…

Photo by on


The PySpark JDBC-connector doesn’t support executing DDL-statements and stored procedures. The PyODBC library does support this, but requires the installation of ODBC-drivers which has a big influence on cluster startup times.

A less known (and less documented) option is to use the…

Joshy Jonckheere

Passionate about technology, data & AI. Currently active as a business analyst in data science & engineering @ delaware.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store