Last year we saw machine learning solutions beginning to be implemented in conventional and much more regulated fields, like healthcare. These solutions are going to be considered much more seriously in 2020.
An interesting Kaggle challenge observed this year was the identification of Pneumothorax disease in chest x-rays, which, in essence, is a collapsed lung leading to severe breathing difficulties and which, can be very hard to detect. [https://towardsdatascience.com/data-science-trends-for-2020-78bda13032c7]. If not for the participation of the actual institutions that study this condition, this would have been just like any other Kaggle challenge.
This step not only helps those affected, but it also enables the decision-makers to implement the use of these kinds of technologies in areas that earlier were apprehensive of embracing these technologies.
The fact that the costs of in-memory are declining, will steer more analytics to real-time environments. The requirement for real-time or near real-time analytics will demand speedy CPUs and in-memory processing. Organizations look for the skills to immediately respond to online sales activities, alerts production infrastructures, or unexpected developments in financial markets and portfolios.
Voice-based apps and analytics are yet to develop at a large scale. This is because of the challenges of capturing diverse voice intonations and accents with precise natural language recognition. In 2020, there is going to be good news in this space as natural language recognition, interpretation, and mechanics are going to be greatly advanced –to the level where more analytics queries can be placed by voice command.
Spreadsheets have long been used by companies for their analytics, but many organisations are at an inflection point and the company’s data and the complexity of analytics queries are going beyond the abilities of the ordinary spreadsheet.
According to industry experts graph analytics will be augmented this year and it will help organisation effortlessly determine the links between a-number-of diverse data points–even the ones which appear to be disconnected. Graph technology makes the task of linking people, places, times, and things simpler, and has the potential to speed times to market for business insights.
Analytics life-cycle development
Organisations especially, IT sectors will start looking at analytics apps in the same vein that they see traditional transactional applications. IT departments are likely to build life-cycle management policies and procedures for analytics—starting from application development and testing, and broadening up to launch, support, backup, and disaster recovery.
IT and data science will also start to put together different elements of analytics into a structured entity. The companies have the baseline of elementary analytics, and also the option of augmenting such analytics with machine-generated data queries by means of artificial intelligence (AI) and machine learning (ML). Both AI and ML operate with data analytics repositories by scrutinizing recurring patterns of data, processing, and outcomes, and then posing derivative queries from what they gathered. Industry stalwarts say AI and ML will help in advancement of human creativity by framing matchless analytics queries.
Last year, organisations kept using analytics to get a perception of historical and current situations. This year, there is going to be a change towards more predictive analytics to weigh up climate trends, infrastructure maintenance, risk areas, future economic conditions, and investment needs.
Data scientists will be investing much of their time organization and arranging data. Organisations are now looking for data automation that can do away with human participation, particularly in the painstaking areas. Data scientists’ will spend more productive time and work towards marketing analytics, which can soon result in appropriately equipped and vetted data.
In 2019, companies providing IoT solutions were primarily focused on furnishing their own tools with analytics, but organisations are going to ask for more in 2020. This year, IoT analytics will advance towards a more inclusive approach and 2020 will be a “step off” point for amalgamating the threads of IoT analytics, and input businesses will be getting into an integrated IoT grid that will strongly reveal actual enterprise operations.
Data privacy by design
We are becoming more and more concerned about the handling and security of our data. In 2020 engineers and data scientists will be focused on fulfilling the new demands of users in data security. Federated Learning, introduced a couple of years ago, has become a rather sought after topic in data privacy and is essentially a machine learning setting where the goal is to train a high-quality centralized model with training data distributed over a large number of clients each with unreliable and relatively slow network connections.
This will be an opportunity to use less data, while still being able to create products that are useful and will allow software engineers and data scientists to architect systems following privacy by design.
Doing more with limited data seems counter instinctive for Data Science, but trust will play a major role in 2020. Users will be made to trust software companies with their data by using Federated Learning to train prediction models for mobile keyboards which will not require uploading sensitive typing data to servers (Hard, et al., 2018).
Mitigate model biases and discrimination
Users noticed that the Apple / Goldman Sachs credit card seemed to offer smaller lines of credit to women than to men. However, data scientists know that leaving out gender as an input of the model will not prevent accusations on gender discrimination. The risk that modern algorithms will create is “proxy discrimination.” Proxy discrimination is a particularly pernicious subset of disparate impact. Like all forms of disparate impact, it involves a facially-neutral practice that disproportionately harms members of a protected class.
Data scientists can help by performing excellent exploratory data analysis, guaranteeing the data represents the whole population, and exploring new architectures that can identify and eliminate to these biases.
Python as the de facto language for Data Science
As stated in Stack Overflow’s annual developer survey, released in April 2019:
“Python, the fastest-growing major programming language, has risen in the ranks of programming languages in our survey yet again, edging out Java this year and standing as the second most loved language (behind Rust).”
Python is going to be regarded as one of the best or most viable options in programming, having a huge supporting community on one’s side, or making a quick prototype, which is usually the most relevant. Data Science as a concept is now mostly used to combine statistics, data analysis, machine learning and associated methods. Python helps to execute all of that in a faster way.
Focus has to be something one adds to his or her goals. The idea of data scientist unicorn fortunately is being eliminated and the concept of specialization in data science is fast emerging. It is now possible to choose one (or both) of two ways:
- Heavy engineer path, with a focus on data pipelines, production, software engineering. This would be easier for those with a computer science background.
- Heavy analytical path, with a focus on statistics, data analysis, business knowledge. This would be easier for those with an applied math background.
The two paths can be pursued simultaneously but it would be advisable to pursue only one. Building on top of other open-source tools such as Tensorflow Extended or PyTorch can also be a means of achieving great things.