Background
We have created a custom tailored system for processing and analyzing maritime data, including the Automatic Identification System data (an international vessel tracking system). The purpose of the project has been to build a pipeline for processing and analyzing this data and data from other sources. Various valuable insights can be extracted from these analyses, which is an important aspect of the system’s functionality.
Our client
Our client is a British company analyzing maritime data. We have been responsible for designing and implementing the pipeline and integrating it with the client’s web application. We have also been helping our client with other tasks – sharing our knowledge and making our best efforts to find the most suitable solutions.
System overview
Challenges and our solutions
Data coming to the system
The AIS data provided by a third party is in the form of NMEA messages. There are different types of these messages belonging to one of two groups: position messages and static data messages. Position messages include the data about geographic location (coordinates), speed and values connected with them. Static data messages include information about the vessels (e.g. MMSI, destination, IMO number). The messages are sent separately and their intervals are different. Theoretically, static data messages are sent every few minutes, and position messages depend on the speed over ground. In practice, the messages can be sent irregularly. We have developed an algorithm for processing these different types of messages.
Algorithm for differentiating between vessels
There are two numbers used to identify ships in the AIS system: Maritime Mobile Service Identity number (MMSI) and International Maritime Organization (IMO) number. The MMSI number can be changed only in some limited cases and the IMO number cannot be changed. However, there are duplicates of these numbers in the incoming data or sometimes they are incorrect. We have implemented an algorithm differentiating between vessels to solve this issue.
The differentiating algorithm had to solve other problems as well. For example, because of the high volume of messages and at the same time the need for high throughput, we needed to design and implement custom TTL for the caches. Other examples are deleting old static data which is attached to a new position message or matching vessels by geographic positions (checking whether a new incoming position message relates to the vessel that already exists in the system or whether it is a position of a new vessel).
Processing historical and real-time data
The system processes both historical and real-time data. It has to be stable during data ingestion because real-time data is available only once – if it is lost, it cannot be retrieved. We have applied redundancy to ensure that the data is not lost.
We have also added a new module gathering data from different sources, including a new product of our client – compact boxes independent of AIS, sending ship location to our client’s system. The challenge has been to make sure it works well with the unreliable ship internet connection.
Port entrance analysis
The client requested us to build a component detecting when ships enter and leave ports. The problem associated with this was not only the algorithm for recognizing port entry and exit but also determining which data to base this detection on. This functionality has been added, and the demanding part was identifying these moments accurately.
Developing the project
Another challenge was developing the system in order to achieve greater flexibility in data processing and data access optimization. The system had to be built in such a way that it could be possible to incorporate the evolving needs of our client (it is possible to add modules that can be expanded). We have proposed scalable data processing solutions with the ability to analyze data in sophisticated ways.
Another important task was to achieve low latency of the queries (Apache Druid has been applied to optimize the database).
Despite its complexity, the system has been running smoothly for over three years without any major problems, thanks to designing it as a microservices architecture as well as monitoring and alert components.
Outcome
We have created a complex system analyzing the AIS data. We have successfully integrated various technologies and solutions to address multiple challenges in this project. Thanks to our fruitful cooperation, our client has been requesting us to develop more functionalities.
Industry
Marine industry
Keywords
Back-end development
Technologies
Apache Druid, Apache Kafka, Apache Spark, PostgreSQL, Apache Airflow, Apache Hive, Apache ZooKeeper, Amazon S3, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Compute Cloud (EC2), Prometheus, Grafana, Trino, Apache Zeppelin, Apache APISIX