Data Science: How to choose data analysis tools
Updated: Sep 5, 2020
Although we can have various data analysis tools to tackle our data, all of them focus on processing the same goals, which are data calculation, data analysis, and data display. Under those targets, Excel, SAS, SPSS, SQL, R, and python, might be the most common data analysis tools we mention. Depending on everyone's needs, habits, backgrounds, each tool has its supporter. The goal of this article is to share their characteristics and maybe give you some ideas about which tool is better for your need.
Analysis tools and analysis languages
First, we need to realise they are different:
1. Analysis Tools: as a tool/ system, we can usually call their menu to use it directly. It is more friendly to most people since we do not need to code or understand their math meaning. (Of course, if we understand it, which always be helpful) For an organisation, it is perhaps easier to promote it to other departments. Also, they can be divided into two main kinds of tools, which are:
Statistical tools: Excel and SQL belong to this. They are similar to descriptive analysis, which shows us trend, regular pattern, and current situation. Sometimes statistical tools are enough for many companies.
Modelling tools: SAS and SPSS belong to this. This type of tool focuses on digging deep data meanings, not just descriptive analysis. They can build models of regression and classification and have powerful predictive ability.
2. Analysis Languages: The biggest difference between analysis tools and analysis languages is whether we need to code or not. R and python belong to this. Basically, they can conduct every function that analysis tools can do and even provide more flexibilities, customise, and evolving predictive ability. Especially nowadays, python has become one of the most popular programming languages; its ease of learning and wide application range brings more possibilities and guidance to the business. More importantly, in some fields or service, data mining is not enough anymore, they are more looking forward to machine learning, and python can complete such tasks. It means that only one tool is needed from data analysis to modelling, which is python!
What factors should be considered?
After knowing their essential difference, I am going to share what is the key points we need to consider when we are choosing a tool for our project or learning, which will also show the characteristics of these tools.
1. Know the cost
Is it easy to get started? Is it totally free?
I know that python is recognised as a very easy-to-learn programming language, but for people who don't know any programming language at all, learning python requires work hard. Therefore, if you are not clear enough about your future career plan or interest, it isn't easy to support you in learning python.
Besides, SAS is quite expensive for individual and even for some companies, learning open source software might be a better choice for a long-term plan.
2. Know your analysis goal
Does your data need concise and fast analysis? Or need highly customised analysis?
Just need to be presented as a daily, weekly, and monthly trending report? Or need to be more flexible to find the correlation between variables?
The fact is that using harder tools to deal with simple problems does not make the results better. Knowing your data situation and goal is the only way to help you adopt the right tools. (So, you should ask many other questions about your data instead of only those examples I mentioned above!)
3. Know the ability of calculation you need according to the data size
I believe anyone with experience knows that when the amount of data is huge, the choice of tools is very significant! For instance, SAS can tackle a huge amount of data (after all, this is software that users need to pay for); The calculations of R are all performed in RAM, the memory size limits its calculation process, so it is not suitable for handling large amounts of data; now there are many cloud GPU services for python, which greatly improves the ability to process data. Therefore, in addition to understanding the ability of tools to data mining, it is also very crucial to know the ability of tools to handle the amount of data.
4. Know the industry conventions
For the choice of tools, each industry has its preferences. For example, SAS is more commonly used in the banking industry, and R is often used in stock analysis. Market research usually adopts SPSS, such as processing questionnaire analysis. The technology industry prefers to use python as it can perform not only data analysis but also complete many of today's popular AI projects, such as face recognition, computer vision, natural language processing and recommendation systems, etc. Also, it can even be used to develop websites and apps. This is why python is so popular nowadays. In short, choosing an analysis tool for your future career planning or current industry trends may be the most practical start.
5. Know the prospects
Nowadays, Excel, SQL and SPSS are more like essentials. The powerful and stable SAS system is still the choice for many large companies. Programming languages such as R and python have become the first choice of teams who want to reduce their cost in recent years. Among them, it seems that programming languages have a bigger breakthrough; they own strong mutual help communities, which provides constantly improved every detail. Also, they are completely free to download and use. Especially driven by the AI trend, the use of python has increased the most. Recently I have also observed that many teams, especially start-up teams, have begun to require their product managers to know how to use python.
Finally, the purpose of writing this article is not only to summarise my understanding of data analysis tools but also hope that by sharing this whole concept, which may help people who just enter the world of data analysis. After all, learning one specific tool is important; understanding the overall outline of what you are learning is also significant.
The most crucial thing for data analysis is always logical thinking and a deep understanding of business requirements. Tools and modelling are just means. Remember, all of them focus on processing the same goals, which are data calculation, data analysis, and data display. How to think about data with unique insights will be the only thing that makes you different from others. Of course, when we are familiar with the use of tools, we can save a lot of time. But occasionally when you forget what function to use, Google can tell you immediately, do not be too obsessed with tools!