Big Data in automotive industry: from idea to monetization

IHS Automotive predicts that by 2020, some 152 million "connected" cars will generate up to 30 terabytes of data daily. And the business that will be able to competently use this wealth, obviously, will be "on horseback." Let's talk about what information can be used and what is needed for this.

Big Data in automotive industry

Digital technologies are changing the world. Objects cease to be just things – they turn into information, media centers that have access to the Internet, unite in networks and acquire new opportunities. In the automotive industry, this is connected cars.

The success of work in this direction depends not so much on the characteristics of the modules installed in the machines, but on the services themselves that use this data, and analytical models that process and analyze what they have received, making conclusions and forecasts useful for business.

Big Data opportunities in the automotive market

The car allows you to collect information about its location and instantaneous speed, as well as analyze the data of the self-diagnosis system via OBD2. Based only on this information from one car, it is already possible to draw a conclusion, for example, about the driving style of the driver or the mode of his movements (highway / city).

The analysis of such data "in bulk" is even more interesting. For example, by building a map of the movements of cars of a certain model, you can determine the target audience of this model and its “typical” habits. The horizon for the application of such information is wide enough. And business models for monetizing the collected unstructured data and the conclusions formed on the basis of their analysis can be very diverse.

Usage-based insurance and lending

Data on the preferred speed, periods and frequency of acceleration and braking allow you to determine the driving style of the car owner and the likelihood of an accident. This approach gives careful drivers the opportunity to get a discount, for example, on insurance. A similar system is already in use in a number of countries. Although, in general, the global volume of insurance premiums calculated using the analysis of telematics data is still small.

A similar story with loans – if a person drove carefully on a previous car, why not give him a loan for a new car at a reduced rate (the risks included by the bank in the interest rate will be slightly lower in this case).

Driver Information Services: Driving and Maintenance

The analysis of telematic data makes it possible to create a kind of "electronic navigator" who will advise which gas station is more convenient to call in, which route to prefer in order to save fuel, time and, ultimately, money. The service can also report maintenance in advance, and not only based on mileage data, but also on the basis of the analysis of service data for cars of the same configuration from owners with a similar operating mode.

Based on data on all cars of a certain brand that came off the same assembly line, it is possible to predict the remaining useful life of a car (RUL) and time to breakdown (TTF). And when comparing information about where and how the car was operated with visual inspection data, the reasons for some breakdowns are clear.

Big data in auto industry

Behavioral anomalies and emergency calls

Theoretically, if a driver demonstrates the same driving style year after year, and then suddenly changes his habits, the system can detect an anomaly and signal this.

The anomaly can be caused by an emergency situation – theft, illness – or a completely ordinary thing – teaching children to drive or updating applications. A correct analysis of such anomalies will be possible only after analyzing huge amounts of data from a large number of cars, since it is necessary to identify patterns of behavior that would unambiguously indicate an emergency situation.

Up-to-date statistics for the car manufacturer (dealer)

Information about where and how the car is operated, what difficulties arise in this case and how well certain components function is also of interest to “sellers”. After analyzing it, automakers will be able to identify "systemic problems" of the series or model and fix them in new versions. Dealers, on the basis of this data, will be able to plan the purchase of spare parts, bolt pattern or potential repairs.

In principle, such systems are already used by many dealers and are being tested by manufacturers. The latter are unlikely to spend a lot of time building processes – and, perhaps, we will soon see such solutions in commercial operation.

Dealer Customer Retention

Another area of application of "big automotive data" is working with "post-warranty" customers. Detailed information about visitors by car will reveal patterns in their behavior, which, in turn, will give scope for developing ways to keep them.


Additional information about the car owner and his movements allows you to target ads aimed at the driver and his passengers. For example, if the collection of data from a large number of cars shows that mainly families with children pass by an advertising banner along the road (and you can find out by fixing the regular parking of the same cars near schools and other children's institutions), this will give an trump card in the hands of the advertising rental agency. Roughly speaking, ad targeting techniques that have long been used online will become available offline.

At the same time, it becomes possible to apply cross-marketing. Based on the client's previous interests, analyzed through the prism of information about his movements and driving style, dealers, gas station owners and other service providers will be able to create a personal package of offers from partner companies (shops, leisure centers, etc.).

All of the above becomes possible thanks to the analysis of already collected data. Imagine what opportunities will open up for the market if the car begins to "communicate" with surrounding objects (other cars and elements of the road network), responding to their actions or collecting information about the driver's reaction.

There is an idea. How to implement?

Everything written above is very cool in theory, but so far on the scale of the entire traffic is not available in practice. And there is a simple explanation for this.

The correct use of big data requires three components: a developed infrastructure, the willingness of industry representatives to innovate, and resources, including human resources, to turn all ideas into reality. Let's see how things are now.


Technically, everything is ready for the transition to the ideology of connected cars data. Everywhere there is a mobile network with access to the Internet. Data exchange standards have already been developed that provide relatively easy integration of devices that support them into the infrastructure of a potential system. There are ready-made and generally accepted solutions for analyzing and storing big data, such as Hadoop, Spark, Storm, and others, as well as large cloud services (Amazon RedShift, Azure DataLake, Azure HDInsight).

Read also: Subaru Forester wheel and tire sizes – click here

Ready for innovation

It makes sense to talk about readiness for innovation in two planes: from the point of view of the market and from the side of ordinary motorists.

Big data in car manufacturing industry

The market is theoretically ready. Already more than half of the cars sold in the world are connected. Visiongain believes that Big Data is one of the fastest growing market segments in the automotive industry. This indicates a great demand for big data analysis. At the same time, automakers, which have not yet taken the initiative, are being pushed by investors and shareholders.

However, the active movement towards Big Data is still hindered by a purely technical barrier: closed data exchange protocols inside the car do not allow you to easily and quickly collect all information from cars of all brands on the market. Perhaps the situation will be corrected by the appearance of a certain common standard, but for now this question is open.

It is difficult to judge the degree of readiness of the mass user now. Like any innovation, services based on Big Data analysis have their supporters and opponents. For example, fans of an aggressive driving style are unlikely to like the revision of the insurance calculation scheme. On forums and blogs, the very idea of collecting data from cars causes the same controversy as the analysis of user behavior by devices and the Google search engine: some like the new features, while others protest against “total surveillance” and tell horrors about the insecurity of the accumulated data arrays. But the flywheel is running.


Implementing Big Data analysis from scratch implies a large intellectual and financial investment. Alone, of course, not everyone will pull them, but, as in other markets, they may well be divided between interested parties. For example, we used this approach when creating Remoto: we took over research and development, and transferred the installation of equipment to car manufacturers. So the device becomes an additional option of the car, due to which users receive a number of necessary convenient functions.

With personnel capable of working effectively with Big Data, everything is somewhat more complicated, because globally this is a new market, to which the “right” approach has yet to be found. For several years now we have been forming our team, focusing on active specialists with a creative approach to work, and we are open for new contacts with people interested in this direction.

SQL Server Big Data Clusters provides a deployment of scalable SQL Server clusters, Spark, and HDFS containers running on Kubernetes. These components run in parallel, allowing you to read, write and process big data in Transact-SQL or Spark, so you can easily merge and analyse important relational data with voluminous big data.

This is an article for experienced developers. Learn more about what is web hosting and why it is important previously.

What are Big Data clusters

Big data cluster architecture


The Controller provides cluster management and security. It includes the control service, configuration repository, and other cluster-level services such as Kibana, Grafana, and Elastic Search.

Compute Pool

The compute pool provides compute resources to the cluster. It contains nodes with a SQL Server pod on Linux. Pods in the compute pool are subdivided into SQL compute instances for specific processing tasks.

Data Pool

The data pool is used to store data. The data pool consists of one or more SQL Server pods on Linux. It is used to receive data from SQL queries or Spark jobs.

Media pool

A media pool is formed from a pod pool of media consisting of SQL Server on Linux, Spark and HDFS. All storage nodes in the SQL Server big data cluster are part of the HDFS cluster.

Application Pool

Application Deployment allows you to deploy applications in SQL Server Big Data Clusters, providing interfaces for creating, running, and managing applications.

Scenarios and features

SQL Server Big Data Clusters provide high flexibility when working with big data. You can query external data sources, store big data in HDFS under SQL Server control, and query data from multiple external data sources through a cluster. The resulting data can be processed using artificial intelligence, machine learning, and other analytical techniques.

Use SQL Server big data clusters for the following tasks:

  • Deploy scalable SQL Server clusters, Spark and HDFS containers running on Kubernetes.
  • Read, write and process big data from Transact-SQL or Spark.
  • Easily merge and analyse valuable relational data and big data.
  • Query external data sources.
  • Store big data in HDFS under SQL Server.
  • Query multiple external data sources through a cluster.
  • Using data for artificial intelligence, machine learning, and other analytics tasks.
  • Deploying and running applications in Big Data Clusters.
  • Data virtualisation with Polybase. You can now query data from external SQL Server, Oracle, Teradata, MongoDB and universal ODBC data sources with external tables.

Ensure high availability for the main SQL Server instance and all databases using Always On availability group technology.

The following subsections contain more information about these scenarios.

Data virtualisation

With PolyBase, SQL Server Big Data Clusters can query external data sources without having to move or copy data. SQL Server 2019 (15.x) includes new connectors for data sources. For more information, see New PolyBase 2019 features.

Data lake

SQL Server big data cluster includes a scalable HDFS media pool. It can be used to store big data that can come from multiple external sources. Once you store big data in HDFS in the big data cluster, you can analyse and query it and merge it with relational data.

Built-in artificial intelligence and machine learning capabilities

SQL Server big data clusters allow you to perform artificial intelligence and machine learning tasks on data stored in HDFS data pools and media pools. You can use Spark as well as artificial intelligence-based tools built into SQL Server that use R, Python, Scala or Java.

Management and monitoring

Management and monitoring capabilities are implemented through a combination of command line tools, APIs, portals, and dynamic administrative views.

You can use Azure Data Studio to perform a variety of tasks in a big data cluster.

  • Embedded code snippets for common management tasks.
  • View HDFS, upload and preview files, and create directories.
  • Create, open, and execute Jupyter-compatible notebooks.

A data virtualization wizard that simplifies the process of creating external data sources (enabled with the Data Virtualization extension).

Basic Kubernetes concepts

SQL Server Big Data Cluster is a cluster of Linux containers running Kubernetes.

Kubernetes is an open-source container orchestrator that provides scalable container deployments according to needs.


A Kubernetes cluster is a set of computers, also called nodes. One node is used to manage the cluster and is the main node. The other nodes are considered to be worker nodes. The Kubernetes master node is responsible for distributing the workload among the worker nodes and also for monitoring the health of the cluster.


The node runs the container applications. This can be either a physical computer or a virtual machine. A Kubernetes cluster can include nodes of both physical computers and virtual machines.


A Pod is an atomic unit of a Kubernetes deployment. A Pod is a logical group that consists of one or more containers and associated resources needed to run an application. Each pod runs on a node. In doing so, a node can execute in one or more pods. The Kubernetes master node automatically assigns existing pods to nodes in the cluster.

In SQL Server Big Data Clusters, the Kubernetes service is responsible for the state of the cluster. To perform this task, Kubernetes creates and configures cluster nodes, assigns existing pod modules to them, and monitors the health of the cluster.

How did you imagine the digital economy of the future 10 years ago? It was probably very bold and flashy: trendy blockchain, decentralised finance, artificial intelligence and other futurism. In fact, the new technologies took a bit of a wrong turn: in 2021, people were selling memes and earning their bread with gamer graphics cards. 

One of the big concepts of the past year, along with the already familiar 'covid' and 'lockdown' was NFT – everyone is talking about it. Blockchain digitised "ownership rights" to pictures or items from online games are being sold and bought for tens of millions of dollars. 

NFT Profit App

Now is a good time to talk about the future of this mysterious market. Where will it go in the new year? I tried to find out whether we are talking about a new digital economy or another financial bubble. To do so, I talked to both developers and our partners, crypto-business players.

What is NFT and how it works

So, NFT, or Non-Fungible Token, is a blockchain-based cryptographic token. 

We won't go into the intricacies of how crypto works, but just a couple of important points: by default, all tokens in a blockchain network are equal, and their price is tied to the cryptocurrency exchange rate. Each token can be exchanged for exactly the same token or split into parts if necessary. 

NFTs are also part of a blockchain network, but are structured differently. These tokens have their own codes and metadata that ensure they are unique. Each NFT is as unique as a snowflake – it is virtually impossible to separate or tamper with. While equivalent tokens play the role of "coins" in cryptocurrency, an NFT is more like a document of ownership – with a unique notary's signature and a hundred wet seals. This concept has long been experimented with in the crypto world, but NFTs took their current form in 2017, based on Etherium smart contracts. 

Does NFT give any real rights to dispose of virtual assets? The question is more of a legal one. According to critics, no, and that's about it. In some cases, token creators and the NFT exchange are trying to regulate ownership. But now it's also coming down to actual lawsuits over the tokens. Quentin Tarantino is now suing Miramax over the release of NFT for Pulp Fiction, and the outcome of this trial could be a landmark for the entire industry.

Where does the hype come from?

In the already mentioned 2017, a couple of IT friends from Canada conceived the idea of selling tokens in the form of unique avatars. The project was called CryptoPunks: it was a collection of 10,000 images protected by Etherium tokens. The little "prank" was a success, NFTs from the collection quickly rose in value and today sell for millions of dollars. Cryptopunks quickly developed a following: both new sets of avatars (the same monkeys from the Bored Ape Yacht Club set) and entire blockchain-based card games (CryptoKitties, Axie Infinity).

With the new wave of interest in 'crypto' in 2020, NFTs have had their share of popularity. Obviously, man's passion for collecting rarities played a part here. It turned out that blockchain can be used to digitally "originality" almost any content, from Renaissance paintings, to memes and funny cat videos. 

One of the key triggers of the hype was the $69 million sale of the artist's Beeple work last March. The market was spotted by big investors – and here we go! At one point it seemed that every content maker, every brand wanted to release their own NFT – to be on trend, or to make money. Even the "father" of the modern Internet, Tim Berners-Lee, sold a token containing the network's source code for $5.4 million.

Meanwhile, in the world of gaming, trading NFT items has become a new form of monetisation. Giants such as Ubisoft and Square Enix are getting involved in experiments with blockchain. Cult game designer Peter Molyneux announced Legacy, "the first blockchain-based business simulator" with real estate and land. And he made $50 million right off the bat, although the game's release is still a long way off. 

Just look at the scale: NFT had a market size of $100 million in 2020, but by the end of 2021 it will have crossed the $40 billion mark. 

What will happen to NFT 

As it often happens, the haip around NFT generated a "reverse haip" – a barrage of criticism of varying quality from all sides. There are allegations from experts and laymen alike that the value of tokens is inflated artificially and this "digital bubble" will burst, leaving 99% of investors with nothing. Opponents of NFT compare this market to selling plots on the moon and names for stars: in that analogy, the buyer also gets no real ownership, only a conditional record of it in someone else's registry. 

There are more substantive comments relating to proven episodes of money laundering and NFT fraud. Last November, an attempted CryptoPunks collection token scam was uncovered: an investor wanted to inflate the price of his token and tried to buy it from himself for 124,457 ETH (over half a billion dollars at the transaction date!) using borrowed funds. How many such episodes remain in the shadows is a rhetorical question. 

As expected, criminals have seized upon the NFT market, since it has proved to be an ideal medium for laundering ill-gotten wealth. Let's say, you create a token with a picture of a seal on it, and then sell it to yourself through a front man for fabulous money. Even in developed countries, there is no one to monitor such processes today.

Does all this mean that we are dealing with a "scam of the century" and a "digital branch of the MMM"? More likely no than yes (although individual projects should be monitored very closely). In spite of questionable episodes, NFT itself has already proved to be a useful technology, and now big money and IT giants have their hands on it. From an investment point of view, NFT is more liquid than the traditional art market, because participants don't have to pay 5-10% commission to brokers. 

So far, public opinion is also in favour of NFT. The collapse of cryptocurrencies, which has been predicted for years by respected experts, has not happened in the past 10 years. The blockchain theme includes brands, stars and opinion leaders. Coca-Cola launched its own token collection, Nike bought NFT studio RTFKT, and Eminem put a monkey from the Bored Ape collection on his Twitter avatar. And there are hundreds of such examples. 

All indications are that NFT is destined to become more than a hype toy. When the "foam" around the technology settles, it should take an important place in the new digital economy by becoming a full-fledged intellectual property protection tool. Cryptographic tokens will gradually enter new areas – these are the same smart contracts that will help in logistics, real estate and the land market. 

Trends in NFT in 2022

What's in store for NFT in the near term? To answer this question, I spoke to market players and blockchain experts. Everyone agrees that this market has a future, but wants to see more predictability in it. One way or another, everyone is talking about these trends: 

The arrival of regulators. The free-for-all of selling pixel-picture tokens for millions can't go on forever. In the US, for example, several mechanisms for regulating this market are being considered today. 

NFT marketplaces may obtain the legal status of art dealers there – the US fiscal authorities scrutinize art deals very closely, because it is a classic method of money laundering. NFT-tokens can also be equated to common cryptocurrency or even securities, with all the ensuing consequences. You can use this NFT profit app to test it.


By the end of 2021, the global NFT market had crossed the $40 billion mark. And that is a lot, because we are talking about digital objects, most of which have minimal artistic value and are often created "by hand". By comparison, the global art market will reach $50 billion in 2021. 

The peak of the NFT fever is over. In fact, the cooling of the market began last autumn – there were fewer and fewer deals, although the average price of lots remained high. It can be assumed that there will be no explosive demand for digital art in the new year; the focus is shifting from selling NFT art to the gaming sector (GameFi).

The search for new solutions. Historically, Ethereum blockchain has been the basis for NFT, but today its colossal network is increasingly inadequate: transactions are slow, require a decent amount of energy and cost. The situation should be corrected by the transition to Ethereum 2.0, but the timing of this massive upgrade is constantly postponed, and no one today is willing to predict all the consequences of Ethereum's transition to the Proof-of-Stake algorithm. 

Large NFT-projects are already looking for alternatives to ether, which can conduct transactions faster and more efficiently. For example, the successful NFT game Axie Infinity uses Ronin blockchain. Solana is popular. There are also projects that are developing their own blockchain for their own needs.

Gamification revolution

NFT could break into the world of digital leisure this year, essentially turning it into a digital job. The ability to earn real money in-game provides an entirely different level of player engagement. Blockchain has spawned a new form of gaming monetization – Play-to-Earn (P2E). This is an extremely profitable model for developers, which "ties" players to the project at minimal cost, and thus generates a stable profit one way or another.

Probably the most successful example of such a game today is Axie Infinity – its user base passed the one million mark in the summer. In 2022, developers of mobile F2P games will try to enter the Play-to-Earn niche en masse. But the traditional gaming industry will not be left behind. Ubisoft announced the launch of its own NFT-marketplace on the blockchain Tezos, Japanese gaming giant Square Enix also announced plans to develop a blockchain and meta universe on January 1. 

The move towards the Metaverse. Another hot topic in the past year has been the "metaverse," Mark Zuckerberg's super mission to build the Internet of the future based on virtual reality and a gamified digital economy. A full-fledged virtual economy needs real digital value, and NFT fits perfectly into that role.

Today, it's hard to imagine what a "metaverse" with the crypto-economy would look like, and whether it could be built in principle. But high-profile announcements by IT giants about building their own virtual worlds will steadily fuel interest in NFT, and metaverse platforms like Decentraland and Next Earth are successfully selling digital real estate and "land" on blockchain. Who knows, maybe in 10 years' time tokens "from the late WEB 2.0 era" will become real jewels in the metaverse.

NFT Profit App

What to do with NFT

Is investing in NFT a good investment in the realities of 2022? By no means am I encouraging anyone to do anything, after all investing has always been and will always be a risky venture for the brave. Especially when it comes to blockchain. 

Apart from issuing tokens for PR purposes and buying and selling NFT art, businesses today have several ways to get involved in this booming market. The most obvious: creating a trading platform for NFT.

Although non-interchangeable tokens are part of the blockchain, they cannot be freely traded on conventional cryptocurrency exchanges today. To do so, one has to create separate platforms that can store, issue and trade NFTs. On the user side, NFT marketplaces are like regular online shops and buy-and-sell platforms: users register, create personal wallets, browse the catalogue, and put up or create their own tokens. Technologically, however, such a platform is very different from the usual e-commerce and requires a different architecture and blockchain developers. The most important thing is to get the goals of the platform right, the blockchain and the token standard right. 

The largest NFT marketplaces – OpenSea, SuperRare, Rarible, NiftyGateway and others – are currently trending in this market. However, the market is rapidly expanding into new niches, so savvy marketplaces have every chance to "take off". NFT commerce may well go beyond the sale of memes and art – wait for the takeoff of large smart contract platforms for real estate, investment, retail, logistics, and cybersecurity.

Another promising area is the development of blockchain games. Admittedly, enthusiastic gamers today look at the Play-to-Earn model without confidence – at best, they see it as an opportunity to make money, but certainly not as a way to have a good time. The most impressive results can be achieved by those who smoothly weave crypto-economy into an interesting gameplay: creating a massive meta-village (like in a cult space MMO EVE Online), or a super-flexible sandbox, where players can build ecosystems and entertain themselves (like in Minecraft, Roblox, or Garry`s Mod). 

Instead of a conclusion

IT today is experiencing the greatest technological upheaval in the last 20 years: AI, blockchain and meta universes are definitely a game changer, although few understand in what direction. The situation has been called both the digital wild west and the digital nineties. Blockchain and NFT enthusiasts talk about the advent of the "WEB 3.0 era", whatever that means.

Blockchain technologies are now at a crossroads and could evolve in either direction. Their further evolution will depend both on investors who invest in specific projects and on the fate of startups – any bold idea can "shoot out" and change the rules of the game. Businesses in this situation have only to keep a close eye on trends and look for reliable partners in IT – to keep their hand on the pulse and "on the trigger" in order to avoid mistakes and not miss out on valuable opportunities in new markets.

A master's degree is awarded to students who have completed undergraduate studies in a specific field or area of professional practice demonstrating a high level of competence.

Master's in data science

Master's in data science

The Master's degree is aimed at graduates interested in the analysis and management of data and information who want to play a key role in the creation of value within any type of business: the role of the Data Scientist, defined by The Economist as "the most interesting profession of the twenty-first century, combining the skills and expertise of IT technologists, statisticians and disseminators to extract the nuggets of gold hidden beneath mountains of data". The Master's course aims to train professionals by providing them with interdisciplinary skills and enabling them to successfully interface with corporate management. With this in mind, the Master's course aims to provide a diverse range of expertise, encompassing the design and implementation of databases and information sources, the design and development of computer algorithms, the understanding, analysis and management of information, as well as economic-legal and managerial aspects related to Data Science.

The Master's course is preferably addressed to graduates (at least first level) in Computer Science, Engineering, Physics, Mathematics, Statistics. Holders of a degree in other disciplines, who have proven work experience of several years in the field of computer science, may also apply.

Master's in data science: FAQ

How long does it take to become a Data Scientist?

It lasts one year, alternating between frontal classroom training and field training in companies. It relies on four sponsoring companies that provide their data and real cases for students to confront real industry challenges.

What does one need to study to become a Data Scientist?

To become a Data Scientist, one needs at least a university degree, almost always in subjects with a scientific focus: Mathematics, Engineering, Physics, Computer Science, Statistics, Economics.

Where do Data Scientists work?

Data Scientists mostly work in the fields of finance, retail, e-commerce and marketing.

How to become a Data Scientist without a degree?

Although it is possible to become a Data Scientist without a degree by gaining experience in the field, the ideal starting point for acquiring the ideal basis for building a successful career in the field of data analysis is a degree course in computer science or mathematics.

How much does a Data Scientist earn in America?

The average salary as a Data Scientist is USD 125,000 per year in the selected location (New York, USA area). The average additional cash compensation for the role of Data Scientist, New York, United States area is USD 12,500, ranging from USD 5,000 to USD 30,000.

What to do after Data Science?

We would recommend the Master's degree, because it should be more theoretical. data scientist should have very good knowledge of statistics, be able to program and understand data bases and to finish should try to understand business.

What can a data scientist do?

A data scientist is responsible for developing strategies for data analysis, preparing data for analysis, exploring, analysing and visualising data, creating models with data using programming languages such as Python and R, and implementing models in applications.

Why become a Sata Scientist?

The Data Scientist is a key figure for the company because: He/she ensures that crucial information for business growth is not lost that would otherwise be ignored. He manages to transform information into full-bodied and well-organised databases. He transforms the abstract reading of data into concrete actions.

How does one become a Big Data Analyst?

To become a Data Analyst, the most suitable education is a degree in a scientific subject such as Statistics, Mathematics, Computer Science, Computer Engineering, Data Science, or in Economics, Finance or Business Administration.

Let's see how you can use some fairly simple python code to pull various publicly available data from instagram.

Pulling data from Instagram


While phasing from Employed to Self-Employed, I got immersed in my own projects that I've been wanting to do for a long time. After a couple of telegram bots with e-acquiring, I decided to try my luck with Instagram. As someone who had only worked with finished and cleaned up data before, I was interested in learning more about the data mining process.

Which API to choose?

Let's start with what libraries to use. Since I write in Python, I chose libraries for it.

Facebook has an official API for interacting with Instagram. They are the Graph API and the Instagram Basic Display API. I found the process of setting it up and using it overly complicated, so I decided to look for an easier solution.

Of the unofficial APIs, there is the relatively popular InstaPy (12k GitHub), which is based on Selenium. Such a framework seemed to me cumbersome.

After hours of searching my choice fell on a fairly convenient library instabot, the library itself, the documentation.

Before we start sorting out the code, a couple of remarks should be made. I should say right away that I'm quite skeptical about using such frameworks for automation of activity (likes, comments, subscriptions) in order to increase audience.

Instragram does not take well to the use of such libraries for promotion purposes, and in general, it treats "non-human" activity negatively. So I do not recommend using them on your main account. I don't know what the probability of being blocked is, but it's obviously different from zero.

My main interest was to play around with the data.

What can we do?

For the purposes of this article, I'm going to talk about how you can get the following information:

  • Subscriptions and subscribers of a particular account
  • Users who liked / commented
  • Postings of a specific user
  • Information about the user
  • Uploading images from Instagram

It's much more interesting to look at this kind of information gathering process not as an isolated task, but as an applied task. So for each item, I've found some real-world challenges and shown how they can be solved.

Buy Instagram Likes

Subscriber list

Let's imagine a situation where you're a young blogger and you've decided to run a prank to expand your audience. New Year's Eve, all the more so soon, so the example is relevant. Let's assume that the main criterion of the prank is to be subscribed to you.

So we can formulate the task of how to randomly select one or more subscribers to give them gifts.

Let's see how this can be done. First, you need to log in. By the way, so as not to jeopardize my main account, I created a new one and made all requests through it.

from instabot import Bot
bot = Bot()
bot.login(username = INST_USERNAME,  password = INST_PASSWORD)

Once we are logged in, we can get a list of subscribers and a list of subscriptions for any user with an open account. This is done as follows.

user_followers = bot.get_user_followers(username)
user_following = bot.get_user_following(username)

It is worth noting that in this case we will see something like

['1537613519', '7174630295', '5480786626', ... , '6230009450', '4294562266', '27518898596']

This is the user_id of the users. In order to get the usernames of the users you need to do the following:

user_id = user_followers[i] username = bot.get_username_from_user_id(user_id)

However, keep in mind that the get_username_from_user_id query does not work instantaneously, so it is better to work with user_id inside the program and resolve it to username only if necessary.

To randomly select multiple username subscribers, you can, for example, do the following

user_followers = bot.get_user_followers(username) amount = len(user_followers) winners = np.random.choice(amount, N, replace=False) winners_usernames = [bot.get_username_from_user_id(users_followers[i]) for i in winners]

Considering that bloggers like to conduct collective raffles – you can get lists of subscribers for several accounts and already choose winners among the many users who are subscribed to all the necessary profiles.

List of people who liked it

Keeping with the blogger theme, imagine that you're conducting a drawing not only among the users who are subscribed to you, but also among those who like your post. How would you get the list of users in that case?

First you need to get the media_pk from the link to your post:

media_link = '' media_pk = bot.get_media_id_from_link(media_link)

Then for the list of people who liked it:

users_liked = bot.get_media_likers(media_pk)

List of people who left a comment:

users_commented = bot.get_media_commenters(media_pk)

You can also get a list of comments under the post

comments = bot.get_media_comments(media_pk) # 20 last comments all_comments = bot.get_media_comments_all(media_pk) #all comments

From there you can work with these lists in the same way as in the previous point. For example, you can select winners among those users who are subscribed to you and who have liked and commented on the last N posts.


In this article we talk about the main principles of game animation, which all aspiring artists should learn. In particular, we pay attention to the aspects that make animation realistic, and also explained the differences between CGI animation and game animation.

Key principles of game animation and CGI

Difference between CGI animation and game animation

The main difference between CGI animation and game animation is that it is created in advance, so all the compositional work is initially planned.

Both game animation and CGI animation use the same principles, but their application is slightly different. With CGI animation you can control exactly what the audience sees. The CGI animator's main job is to make everything believable and entertaining to watch (obviously, this applies both to in-house designers and animation outsource developers).

In game animation, it is the user who controls the camera, so it is important to take care that the game looks good from every angle. In addition, the animations themselves should run as soon as the user presses a button – this means that movements should be made immediately, and the controls should feel fast and responsive.

Game animation also differs from CGI in that it creates a large number of animations for different actions and then stitches them together using code in the game engine.

For example, a very simple jump typically has four animations that are coded to play one after the other after the jump button is pressed: from a resting state, the character moves to a pushing position off the floor, then a landing that loops until he touches the surface, and then the resting animation comes on again at the end.

The same concept is used for everything in video games. Many animations have a separate start stage, like the example mentioned above. But some have to be very fast, so they skip this stage. For example, you have to swing the big sword first and only then strike because it's heavy. With a knife, however, you can start the attack right away because it is light.


The first step in creating an animation is planning. For example, in the game Oasis, the author wanted to make the Enn character's gait look nice and convincing. To do this, the animator had to do several things:

  • a looped idle animation;
  • an animation of the transition to running;
  • a 90-degree turn to the right;
  • a 90-degree turn to the left;
  • turn to the right by 180 degrees;
  • turn left to 180 degrees;
  • looped animation of movement with a step;
  • stop animation.

All of this was necessary to create a movement system that would allow the character to move depending on where the player was pointing it.

The author also wanted the jumps and landings to always look different and depend on two variables: how long the character is in the air (AirTime) and whether he is moving. The solution was the following animations:

  • looped running and stepping;
  • a transitional start animation from step to jump;
  • transitional start animation from running to jumping;
  • looped fall;
  • transitional landing and tripping animation;
  • transitional landing and roll animation;
  • transitional superhero landing animation.

Based on this, the author created BlendSpace, which is a graph that tells the character which animations he should perform depending on the values of certain variables. This type of graph is a specific feature of the Unreal Engine, but the concept is applicable to various game engines.

The rest is the adjustment of the animation according to certain events that take place during the game.

Key principles

When creating animations, it is important to consider certain principles. First, you need to take into account the setting of the character. You can start with the starting pose and assess all animation – does it reflect the effect you want? Is it easily identifiable? Does it seem interesting? If not, you need to keep trying until the desired result is achieved. You can also place poses on the timeline to get a rough idea of the timing and to check the animation in the engine.

At first, the animation is very crude, but it gives you an idea of what the movement will look like. Next, you need to transfer it to the Unreal Engine and test it to see what it looks like from the player's point of view. After that, you can move on to the next step: inbetweens. The point of this stage is to adjust the movements between the basic poses that were made earlier-they begin to form arcs and help refine the timing of the movement.

Related article: browser games – hidden object art creation

At this stage you can gradually polish the animation – add weight to it, work through each movement, and so on. The author notes that for the superhero landing, it was important that the character fall quickly and land in a certain pose. Timings are incredibly important to achieve a satisfactory result.

For games, timings are usually very fast. If you slow down the animation too much, the result is quite clumsy. Once a button is pressed, something has to happen. To make a jump, characters usually crouch down very quickly and then push off the ground almost instantly. 

Pose building is important in both CGI animation and game animation. In stories with short animation timings, it's important to find ways to convey movements as quickly and intelligently as possible. The character's pose tells you what's going to happen, what's happening now, and what's happened in the past.

Silhouette is very important for good readability of movements. For example, if someone makes a punch, the viewer should definitely see a fist. Even if a character pulls something out of his pocket, it must be absolutely clear – if it's not, people might miss where the new object came from.

The solution to this problem is to always make it clear in advance what's coming next. In a situation with an object in the pocket, the player should see the character's hand move up before it goes down into the pocket. And it's important to do this even if people don't necessarily move that way in real life.

Big data is increasingly used in the technology-driven market of the 21st century. Using the method of collecting a lot of statistical information about customers allows entrepreneurs to significantly improve their services. The online casino sector is an area of ​​the Internet where competition is simply merciless. The big players in the industry need to become leaders in data analytics for personalized services to thrive and outperform the competition.

Big data in the online gaming industry

Combination of quantity and quality

The data that can be collected about online casino users is vast. They represent an interesting combination of quantitative and qualitative data. By examining these two sets at the same time, an online casino can get information that will help them decide which bonuses to offer, which games to add and how to develop them.

Quantitative data provides site operators with information about the amount of money and time that users spend on a particular game. Site owners can also sort out which sets of games the same player prefers. Is the roulette fan stuck with just this game, or is he still interested in slot machines? The online roulette section of 32Red offers players 6 different options, including European and French roulette. Do users only play one of these games, or do they prefer to switch from one to the other? After analyzing this data, 32Red can decide whether it would be worthwhile to add more online roulette options or to improve one of the proposed ones.

Qualitative data can be collected through questionnaires and surveys. With their help, you can find out what customers like about certain games and why. Such feedback will help find ways to improve games and help operators decide on the choice of offers. Thus, you can easily find out what types of games and topics users prefer.

Demographic Research

Of course, online casinos can also rely on a wide range of research conducted at universities. One of them showed that slot machines are mainly preferred by female homeowners aged 55-60, whose annual family income is more than $50,000. With the help of such facts, casino sites can track other preferences of the target audience. Having learned what interests typical female homeowners, operators will be able to choose the right theme for slots.

According to additional research by Oregon State University's Sandy Chen, some slot machine enthusiasts are young adults who are intensely thrill-seeking. They are attracted to themed games and games in which luck plays a significant role and which offer high rewards. To attract such users, online casinos can offer bonuses in the form of free spins and the chance of doubling winnings in high-stakes games.

Big data in the online gaming industry

Understanding the customer base

This is an example of how data can also show how certain bonuses and promotions make customers want to come back. Having sorted out the client base and checked the online casino reviews, they can offer bonuses that attract more visitors. 32Red is offering an intriguing welcome bonus of $32, which is not just about beating the competition, but also promoting the brand. Golden Riviera Casino has taken a different approach in its pursuit of loyal customers. It offers a 3-step deposit bonus which is 100%, 25% and 50%. Some casinos also provide players with a welcome bonus to choose from.

With the increasing number of ways online casinos are using big data to gain more customer insights, in the coming years online resources will be able to develop a personalized experience for each individual player.

In this material, we will talk about what knowledge and skills specialists should have, what kind of education is valued by employers, how interviews go, and how much data engineers and data scientists earn.

Big data engineer salary

What a Data Scientist and Data Engineer Should Know

The profile education for both specialists is Computer Science. Any data scientist – data scientist or analyst – should be able to prove the correctness of their conclusions. To do this, one cannot do without knowledge of statistics and the basic mathematics associated with statistics.

Machine learning and data analysis tools are indispensable in today's world. If familiar tools are not available, you need to have the skills to quickly learn new tools, create simple scripts to automate tasks.

It is important to note that the data scientist must effectively communicate the results of the analysis. This will help him visualize the data or the results of studies and testing hypotheses. Professionals should be able to create charts and graphs, use visualization tools, understand and explain data from dashboards.

For a data engineer, three areas come to the fore.

Algorithms and data structures. It is important to get your hand in writing code and using the basic structures and algorithms:

  • algorithm complexity analysis,
  • ability to write understandable maintainable code,
  • batch processing,
  • real time processing.

Databases and data warehouses, Business Intelligence:

  • storage and processing of data,
  • design of complete systems,
  • Data Ingestion,
  • distributed file systems.

Hadoop and Big Data. There is more and more data, and on the horizon of 3-5 years, these technologies will become necessary for every engineer. A plus:

  • data lakes,
  • work with cloud providers.

Machine learning will be used everywhere, and it is important to understand what business problems it will help solve. It is not necessary to be able to make models (data scientists can handle this), but you need to understand their application and the corresponding requirements.

What is big data engineer salary?

In international practice, the starting salary is usually $100,000 per year and increases significantly with experience, according to Glassdoor. In addition, companies often provide stock options and 5-15% annual bonuses.

What are the interviews like

In the West, graduates of vocational training programs have their first interview an average of 5 weeks after graduation. About 85% find a job after 3 months.

The process of passing interviews for the vacancy of a data engineer and a data scientist is practically the same. Usually consists of five stages.


Candidates with non-core previous experience (for example, from marketing) need to prepare a detailed cover letter for each company or have recommendations from a representative of this company.

Technical screening

It usually takes place over the phone. Consists of one or two difficult and as many simple questions regarding the current employer stack.

HR interview

Can be done by phone. At this stage, the candidate is tested for general adequacy and ability to communicate.
Technical interview

Most often passes internally. In different companies, the level of positions in the staffing table is different, and positions can be called differently. Therefore, it is technical knowledge that is tested at this stage.

Interview with CTO/Chief Architect

Data Scientist and Data Engineer are both strategic and new positions for many companies. It is important that a potential colleague like the leader and coincide with his views.

What will help data scientists and data engineers in career growth
There are a lot of new tools for working with data. And few people are equally well versed in all.

Many companies are not ready to hire employees without work experience. However, candidates with a minimal background and knowledge of the basics of popular tools can gain the necessary experience if they learn and develop on their own.

Useful Skills for a Data Engineer and Data Scientist

Willingness and ability to learn. You don't have to jump right into experience or change jobs for a new tool, but you do need to be ready to switch to a new field.

The desire to automate routine processes. This is important not only for productivity, but also for maintaining the high quality of data and the speed of its delivery to the consumer.

Attentiveness and understanding of “what's under the hood” of processes. The specialist who has a good eye and a thorough knowledge of the processes will solve the problem faster.

In addition to excellent knowledge of algorithms, data structures and pipelines, you need to learn how to think in products – to see the architecture and business solution as a single picture.

For example, it is useful to take any well-known service and come up with a database for it. Then think about how to develop ETL and DW that will populate it with data, what consumers will be and what it is important for them to know about data, as well as how buyers interact with applications: job search and dating, car rental, podcast application, educational platform.

Analyst, Data Scientist and Data Engineer positions are very close, so you can move from one direction to another faster than from other areas.

In any case, it will be easier for owners of any IT background than for those who do not have it. On average, motivated adults retrain and change jobs every 1.5–2 years. It is easier for those who study in a group and with a mentor, compared to those who rely only on open sources.

Work Styles at Zoom