Portfolio

US company specializing in patent prosecution

We develop a system for our client specializing in patent prosecution in the United States. The system allows the users to analyze large datasets of decisions relating to patents with functional web application. The project requires multiple custom modules used for specific purposes – for example, Machine Learning and Natural Language Processing are used to extract necessary information from the texts of the decisions, and a special managing module is used to download and process new documents on a daily basis. Web application written in Django has numerous search filters, email notifications and other functionalities allowing the users to see and monitor the results of processing. The system is hosted on Google Cloud Platform. Two databases are used – MongoDB and Cloud SQL. If you want to learn more about this project, please read its case study.

US company providing medical information - system analyzing data about doctors

We created a system searching for news about a particular type of doctors on the Internet. Because of the complexity of the system and the amount of data to be processed, there were several challenges. One of these challenges was that the system is searching for information in documents coming from various datasources. Separate components are used to process the data from these datasources (microservices architecture is applied in the system). Another challenge required building a Machine Learning model (with SVM classifier) to determine whether a document is relevant or whether it is connected with a completely different topic. Furthermore, the doctors needed to be identified in the articles (the articles include names of people who are not doctors), and Named-Entity Recognition was applied for this purpose. The whole system was built using Amazon technologies (ECS, S3, RDS, ECR) and is hosted on AWS platform. You can read more information about this project here.

US company providing medical information - system analyzing scientific articles

We developed a system processing scientific articles from numerous datasources for our client from the medical industry. We designed the architecture of the system and implemented the entire pipeline – components for fetching data, processing it, storing it and presenting it in a web application. We used microservices to make it possible to process the data from various datasources simultaneously. Because the client wanted to have as many articles as possible in different language versions, we needed to implement normalization process – data coming from various APIs was in different formats, and these formats needed to be unified in our system. Another challenge, related to the number of various datasources and APIs studied, was that there are duplicated documents among different websites. To overcome this, we created a complex deduplication process. Another issue is related with the KPIs displayed in the web application – they needed to be generated in the processing part. There were also KPIs based on Amazon services using Natural Language Processing – Comprehend and Comprehend Medical. The system includes a web application which can be used to search the data and narrow down the results of processing by multiple filters and the above mentioned KPIs. The whole system runs using Amazon services (ECS, S3, RDS, DynamoDB, ElasticSearch). You can find out about the details of the system here.

US company specializing in Big Data analytics

We cooperate with a company based in the USA specializing in Big Data analytics. We are responsible for the entire ETL pipeline. We gather data from popular social media (e.g. Twitter, Youtube), streaming platforms and billing platforms. Therefore, the process of downloading the data from each datasource needs to be different – the data is downloaded from APIs or loaded from files, and we use event data processing. The ETL pipeline includes custom processes, for example aggregating data and calculating metrics needed for KPIs. We designed a custom query engine used for gathering and aggregating the data from these various datasources for search and analytics queries. The project deals with Big Data – gigabytes of data are processed every day.