There are several tools or software that I use to carry out my daily tasks and data operations. Data science tools can be categorised into programming tools (includes cloud technologies, development software, programming languages or libraries) or business tools (such as SAS, excel, datarobot etc).
For this article, I will be focusing mostly on the tools that have made me more productive and efficient. I will be highlighting 10 of them that I usually use on a day to day basis.
Oracle SQL Developer
I use Oracle SQL developer for my database development tasks. Everyday I would run Oracle SQL statements to pull data for reports, create tables and do some data modeling.
I like how I can use PyCharm for different things. I can create a python scripts, notebooks and even connect to a database and run queries. One thing that I like about PyCharm is that it makes installation of libraries or packages easy across multiple projects. It has all the tools I need and has inspection with advanced debugger.
Tableau is by far one of the most powerful data visualization tools that I’ve used. I tried Data Studio, PowerBI, Mixpanel all of which are also great for doing simple visuals but I find doing complex aggregations easier on Tableau and it has more visualization options — not to mention having the ability to connect to several database servers.
Probably one of the most underrated data wrangling tools but it’s actually very powerful in terms of speed and its ability to clean, structure raw data and transfer from one source to another. I like its minimalistic look but it has all the features I need on my day to day work.
Databricks is an analytics platform optimized for the Microsoft Azure cloud services platform. It has 3 different environments Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning. It supports Python, Scala, R, Java, SQL, and a few data science frameworks and libraries which makes it easier for data scientists and engineers to run different models.
I normally write python scripts to automate some of my daily processes such as ETL, data cleaning and exporting reports. I put all my commands in a batch file and use task scheduler to run all my processes according to schedule via Task Scheduler. It also has several other features other than running batch files. This productivity tool has made me more efficient and saved a lot of my time.
Alation is a modern data catalog for teams, making data more accessible for everyone and schemas easier to understand. To me, it’s more than just a catalog, I use alation heavily on scheduling my queries. I also use it if I have a big sql script I want to run since it doesn’t crash unlike running them on some SQL development software.
Although I’ve moved most of my scripts and notebooks to PyCharm, I still use Jupyter notebook from time to time. Sometimes I prefer using a web-based interactive computing platform just because of its simplicity and it gives me the document-centric experience. I’m a fan of its configurable nbextensions and how easy it is to navigate around directories.
I’ve been using GitLab mostly as a management platform for my code versions. Sometimes when I write scripts or data SQL models I would have different versions and I’d like to keep all of the versions separately. (Fun fact: for one of the data modeling project I did, I probably have made 15 versions of it thus far and all of the files are very important to keep track of) GitLab has many other powerful integrated features that supports development, issue tracking or team support.
I like its connectivity features. I use it for chat, creating channels, integrating my One Note, calendar, microsoft files and many more. It’s like my mini pc in a pc which is just amazing.