United States Patent and Trademark Office (USPTO)
Web Crawling and Content Summarization (Pending)
The current innovative web crawling solution addresses the challenge of automatically ranking web crawling and creating a news summary for news topics. The solution crawls news articles from various news websites with different structures and content types and includes news items, comments, etc. from social media, blogs, and other sources that do not have automatic content distribution systems.
The current solution utilizes automatically generated web page wrappers to crawl, analyze, and extract information from web pages based on their structure and content. This is achieved through the use of XPath expressions and techniques such as neural networks and natural language processing. The efficiency of the web page wrappers is continuously evaluated and they are updated automatically whenever the structure of the associated web pages changes.
Web crawling policies are regularly updated to take into account the reputation and significance of a web page, and to prioritize the most important and up-to-date news sites on a particular topic. The innovative approach looks at the number of times a website’s articles are referenced by other sources, such as news sites and social media, and the number of articles at other websites that cover the same event, topic, or entity after it was first discussed by the original website.
The web crawling process is carried out using multiple parallel queues for collecting news items, and multiple threads that retrieve the items from the queues. These parallel queues are eventually combined into a single prioritized queue, that is used to crawl websites based on the established policies.
Once the content is obtained, it is analyzed using various techniques, such as natural language understanding, and grouped according to similarity. Summaries of news themes are then created by either selecting relevant sentences from the grouped articles or using natural language synthesis techniques.
European Patent Office (EPO)
Web Content Sentiment Analysis (Pending)
The current example system deals with the task of sentiment extraction from various online sources. The collected data is pre-processed to extract useful features that help machine learning algorithms in the sentiment analysis task. Specifically, the words in each text are converted into a neural embedding space and fed into a hybrid, bidirectional long short-term memory network, which includes convolutional layers and an attention mechanism. These features are then used to produce the final textual features.
Furthermore, the system evaluates the frequency and type of emoji ideograms, whether they are extracted automatically or assigned manually through hashtags, etc. The proposed approach is novel in its semantic annotation of the pre-processed data items, the enhancement of their semantic context by identifying patterns, and the simplification of the analysis problem by reducing the data size through selective down-sampling and other techniques. Specific implementation details are provided that achieve the best-known performance. However, alternative examples may use different configurations of layers in the neural network, different window sizes, thresholds, etc. All these variations fall within the scope of the innovative solution.
This web crawler prioritizes the crawling of web pages and creates summaries for news topics of different formats and content types. It also includes comments, posts, and other types of interactions from social media and blogs. It utilizes automatically generated web wrappers to detect, analyze, and extract web content based on its structure and content using methods such as XPath expressions, neural networks, and natural language processing and understanding. The effectiveness of these web page wrappers is evaluated, and they are automatically updated when the structure of the corresponding web pages changes. Additionally, web crawling policies are regularly updated to take into account factors such as the reputation, influence, and impact of a website.
Hellenic Industrial Property Organisation (OBI)
Web Crawling and Content Summarization (No 1010585)
The web crawler analyzes, aggregates, and summarizes the detected content. It prioritizes web crawling and creates news summaries for topics with various structures and content types, including comments, posts, and other interactions from social media and blogs. The crawler uses automatically generated web page wrappers to detect, analyze, and extract information from web pages based on their structure and content, utilizing XPath expressions, neural networks, and natural language processing and understanding. The effectiveness of the web page wrappers is evaluated and they are updated automatically when the structure of the associated web pages changes. The web crawling policies are also continuously updated to account for factors such as reputation, influence, impact, and references to a website or site. The tracing process uses multiple parallel queues, each implementing different micro-processes, that converge into a single priority queue used for tracing based on the tracing policies.
Sentiment Analysis of Web Page Content (Νο 1010537)
A system and approach for extracting sentiment in web data elements from various web sources are provided. The detection data is preprocessed to extract useful features that help machine learning algorithms in the sentiment analysis task. The words in each text are transformed into a neural embedding space and fed into a hybrid, bidirectional short-term memory network, along with convolutional layers and an attention mechanism, which extracts the final textual features. Additionally, various document metadata such as emoji symbols are extracted, which further assist in detecting sentiment in the data elements, increasing pattern recognition, etc. The analysis problem is also simplified by reducing the data size through selective sampling reduction and other methods.
- #1:Use the industry's top artificial intelligence to handle the heavy work for you and gain insights in minutes.
- #2:Receive an alert if something major occurs near your customer.
- #3:Identify the influencers, material, and messaging required to generate success in real-time.
- #4:Manage cross-channel campaigns with multidisciplinary groups and infinite channels.
- #5:Monitor engagement and sentiment to get valuable insights.
- #6:Monitor trending topics of discussion among users.
LET’S GROW YOUR BUSINESS TOGETHER.
CONTACT US NOW.