Thursday, May 19, 2022

Latest Tech News Stories

Data Scientist’s Tools to Boost Productivity

There are several tools or software that I use to carry out my daily tasks and data operations. Data science tools can be categorised into programming tools (includes cloud technologies, development software, programming languages or libraries) or business tools (such as SAS, excel, datarobot etc).

For this article, I will be focusing mostly on the tools that have made me more productive and efficient. I will be highlighting 10 of them that I usually use on a day to day basis.

Oracle SQL Developer

I use Oracle SQL developer for my database development tasks. Everyday I would run Oracle SQL statements to pull data for reports, create tables and do some data modeling.

PyCharm

I like how I can use PyCharm for different things. I can create a python scripts, notebooks and even connect to a database and run queries. One thing that I like about PyCharm is that it makes installation of libraries or packages easy across multiple projects. It has all the tools I need and has inspection with advanced debugger.

Tableau

Tableau is by far one of the most powerful data visualization tools that I’ve used. I tried Data Studio, PowerBI, Mixpanel all of which are also great for doing simple visuals but I find doing complex aggregations easier on Tableau and it has more visualization options — not to mention having the ability to connect to several database servers.

Trifacta

Probably one of the most underrated data wrangling tools but it’s actually very powerful in terms of speed and its ability to clean, structure raw data and transfer from one source to another. I like its minimalistic look but it has all the features I need on my day to day work.

Azure Databricks

Databricks is an analytics platform optimized for the Microsoft Azure cloud services platform. It has 3 different environments Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning. It supports Python, Scala, R, Java, SQL, and a few data science frameworks and libraries which makes it easier for data scientists and engineers to run different models.

Task Scheduler

I normally write python scripts to automate some of my daily processes such as ETL, data cleaning and exporting reports. I put all my commands in a batch file and use task scheduler to run all my processes according to schedule via Task Scheduler. It also has several other features other than running batch files. This productivity tool has made me more efficient and saved a lot of my time.

Alation

Alation is a modern data catalog for teams, making data more accessible for everyone and schemas easier to understand. To me, it’s more than just a catalog, I use alation heavily on scheduling my queries. I also use it if I have a big sql script I want to run since it doesn’t crash unlike running them on some SQL development software.

Jupyter Notebook

Although I’ve moved most of my scripts and notebooks to PyCharm, I still use Jupyter notebook from time to time. Sometimes I prefer using a web-based interactive computing platform just because of its simplicity and it gives me the document-centric experience. I’m a fan of its configurable nbextensions and how easy it is to navigate around directories.

GitLab

I’ve been using GitLab mostly as a management platform for my code versions. Sometimes when I write scripts or data SQL models I would have different versions and I’d like to keep all of the versions separately. (Fun fact: for one of the data modeling project I did, I probably have made 15 versions of it thus far and all of the files are very important to keep track of) GitLab has many other powerful integrated features that supports development, issue tracking or team support.

Microsoft Teams

I like its connectivity features. I use it for chat, creating channels, integrating my One Note, calendar, microsoft files and many more. It’s like my mini pc in a pc which is just amazing.

Kathleen is a Boston based Data Scientist with a background in Data Engineering and Statistics. Her current research focuses on on developing efficient algorithms to solve optimization and big data challenges in the healthcare and retail space

Kathleen Lara
Kathleen Lara
Kathleen is a Boston based Data Scientist with a background in Data Engineering and Statistics. Her current research focuses on on developing efficient algorithms to solve optimization and big data challenges in the healthcare and retail space

Latest Posts

Melbourne
clear sky
24 ° C
25.6 °
22.2 °
89 %
3.6kmh
0 %
Thu
30 °
Fri
30 °
Sat
27 °
Sun
28 °
Mon
26 °

Latest Tech News