Project descriptions

US company specializing in patent prosecution

We develop a system for our client specializing in patent prosecution in the United States. The system allows the users to analyze large datasets of decisions relating to patents and to retrieve useful information from them. The project requires multiple custom modules used for specific purposes – for example, Machine Learning and Natural Language Processing are used to extract necessary information from the texts of the decisions, and a special managing module is used to download and process new documents on a regular basis. The web application has numerous search filters, email notifications and other functionalities allowing the users to see and monitor the results of the document processing.

US company providing medical information - system analyzing data about doctors

We created a system searching the Internet for documents about a particular type of doctors. Because of the complexity of the system and the amount of data to be processed, there were several challenges. One of these challenges was that the system is searching for information in documents coming from various datasources. Another challenge required using Machine Learning models (SVM model and TFIDF model) to determine whether a document is relevant or whether it is connected with a completely different topic. Furthermore, the doctors needed to be identified in the articles (the articles include names of people who are not doctors), and Named-Entity Recognition was applied for this purpose. The combination of technologies used for creating the system resulted in a product overcoming the issues described above.

US company providing medical information - system analyzing scientific article

We developed a system processing scientific articles from numerous datasources for our client from the medical industry. We designed the architecture of the system and implemented the entire pipeline – components for fetching data, processing it, storing it and presenting it in a web application. Because the client wanted to have as many articles as possible in different language versions, we needed to implement normalization process – data coming from various APIs was in different formats, and these formats needed to be unified in our system. Another challenge, related to the number of various datasources and APIs studied, was that there are duplicated documents among different websites. To overcome this we created a complex deduplication process. Another issue is related with the KPIs displayed in the web application – when processing the documents, we used Machine Learning models to generate these KPIs. The system includes a web application which can be used to search the data and narrow down the results of processing by multiple filters and the above mentioned KPIs.

US company specializing in Big Data analytics

We cooperate with a company based in the USA specializing in Big Data analytics. We are responsible for the entire ETL pipeline. We use various datasources (social media, Funnel, Salesforce), therefore the process of downloading the data from each datasource needs to be different – the data is downloaded from APIs or loaded from files, and we use event data processing. The ETL pipeline includes custom processes, for example aggregating data and computing metrics. We designed the entire backend (ETL pipeline is a part of the backend) without using already existing engines because of the complexity of the requirements of the clients. The project deals with Big Data – gigabytes of data are processed every day.