Build Your Own Threat Intel Platform (No Budget Required) | DIY TIP Guide

Summary

This presentation guides the audience through building a DIY Threat Intelligence Platform (TIP). It emphasizes that a TIP is more of a thought process than just a tool, focusing on contextualizing information into actionable intelligence. The speaker shares their personal journey of creating a TIP due to organizational needs and lack of existing solutions. Key takeaways include the distinction between information and intelligence, the importance of context and critical thinking, and the need to deliver intelligence effectively to stakeholders. The presentation also delves into sourcing, finding RSS feeds, data storage, automation, and the limitations of Generative AI in threat intelligence.

Key Insights

Information is not intelligence; context is crucial for actionable insights.

The speaker strongly emphasizes the difference between raw data (information) and processed, contextualized data (intelligence). Many vendors provide Indicators of Compromise (IoCs) or data feeds, but they often lack the organizational context, relevance, and critical thinking needed to make that information truly valuable for decision-making. Intelligence requires understanding the specific organization's priorities, stakeholders, and business goals to determine what is relevant and how it impacts them. This contextualization, paired with critical thinking to project, forecast, or provide guidance, transforms information into intelligence. Simply collecting vast amounts of data without this process does not equate to building intelligence.

Focus on a specific problem and audience to build an effective, minimum viable TIP.

The presentation advises against trying to build a comprehensive TIP that attempts to be everything to everyone. Instead, it advocates for a 'minimum viable DIY' approach. This involves identifying a very specific problem, a particular type of information to collect, a defined stakeholder or tool that will receive the intelligence, and the desired organizational action or security improvement. This narrow focus allows for a more manageable and effective development process, ensuring the resulting intelligence is relevant and impactful for its intended audience. Trying to do too much at once leads to a 'nice-to-have' output that isn't operationally useful or impactful.

Generative AI has significant limitations in threat intelligence and should not replace human critical thinking.

While Generative AI (GenAI) can assist with tasks like summarization and pattern identification, it has critical limitations for threat intelligence. The speaker highlights its inabilities in attribution, handling malware analysis, susceptibility to hallucinations, and inconsistent output. GenAI struggles with multi-step instructions and can exhibit 'first-third' bias in data retrieval. Crucially, the speaker warns against feeding proprietary or sensitive internal data into public commercial GenAI models due to security risks and vendor incentives, emphasizing that these models are not designed for secure operations. For true intelligence, human oversight, critical thinking, and validation against reliable sources are essential, rather than blindly trusting AI outputs.

Sections

Introduction to DIY Threat Intelligence Platform (TIP)

The session focuses on building a functional threat intelligence platform from scratch.

The speaker clarifies that this is not a typical GenAI session and that the exercises are take-home. The presentation aims to provide a refreshingly practical approach to building a threat intelligence platform, which the speaker initially developed out of personal necessity.

The speaker's personal journey led to developing a TIP.

The speaker explains how, before transitioning to cybersecurity, they struggled with managing and finding downloaded NIST reports. This personal challenge, exacerbated by ADHD, led to an attempt at organizing digital files. The realization that this process mirrored a threat intelligence platform occurred after joining the cybersecurity field.

A TIP is a thought process, not just a platform or tool.

The speaker defines a TIP as less of a rigid platform and more of a thought process. The framework involves identifying reliable sources, triaging information, storing key data, contextualizing it, and distributing actionable intelligence to relevant stakeholders. The process aims to enable informed decision-making, even if the outcome is risk acceptance.

The core framework of developing a TIP is transferable.

The speaker emphasizes that the process and framework for building a TIP are universal, even if the specific tools, skills, or reasons for building it may differ for each individual. The goal is to offer this adaptable framework to the audience.

Understanding the TIP Process

The TIP process involves several key stages from source to action.

The process begins with identifying a reliable source that produces new, useful information. This information enters a channel (e.g., RSS feed), is triaged from noise, and important bits are extracted. The extracted data is stored, contextualized, and finally delivered to those who need it to take action. Even risk acceptance by a stakeholder is considered a successful outcome.

Distributing intelligence effectively is crucial for stakeholder engagement.

The intelligence produced must be transmitted in a way that enables organizational action. This involves ensuring stakeholders process the information, not just glance at it. This distribution and contextualization aspect is often what commercial TIPs offer at a high cost.

Various methods exist for intelligence distribution, from simple to automated.

Intelligence can be distributed via email, social media, or more advanced automated methods. The speaker notes that sometimes custom formats were developed to effectively translate information, essentially recreating the wheel to meet specific communication needs.

Key Principles for Effective Threat Intelligence

Information requires context to become intelligence.

Raw data like IoCs are not intelligence on their own. Intelligence is derived from contextualizing this information for a specific organization, its priorities, and its stakeholders. It involves critical thinking to project a forecast or provide guidance, not just automating data collection or scraping feeds.

Collecting data does not equal knowing its contents or value.

Simply downloading reports or collecting feeds doesn't mean the information is understood or actionable. To claim expertise or ensure compliance, the information must be processed, understood, and, most importantly, transmitted effectively to drive organizational progress. Collection without transmission is essentially worthless.

Focus on the end goal, not the tools or technical impressiveness.

The primary goal is to protect the organization by delivering actionable intelligence. The tools used (e.g., official CTI tools vs. simple scripting, or wizzywig editors vs. complex coding) are secondary to achieving the objective. Impressive technical skills are less valuable than successfully convincing stakeholders to invest in necessary security measures.

Exercise 1: Defining Your Minimum Viable TIP

Be highly specific about the purpose and scope of your TIP.

The first exercise involves defining why you want to build a TIP. It's crucial to be extremely specific, as you cannot be everything to everyone. Narrowing the focus on a specific audience or problem makes your message more attention-grabbing and credible.

Start with one component: a stakeholder, a tool, or a process.

When building your TIP, don't try to do everything at once. Pick one piece of information, one stakeholder, or one tool to get right initially. This minimum viable product (MVP) approach is essential for a successful DIY project.

Align your TIP's focus with the needs and interests of the intended audience.

The speaker's personal failure stemmed from creating a TIP that was 'cool' to them but not useful or important to subscribers. To ensure adoption, the TIP must address what the target tool or person considers important, relevant, or impactful for their operations.

Define a specific persona, information type, recipient, and desired outcome.

The exercise requires identifying: 1. A very specific person or archetype (e.g., a developer, your boss). 2. A specific type of information (e.g., exploited vulnerabilities in a certain tech stack, not all cybersecurity headlines). 3. Who the resulting intelligence will go to (e.g., a SIEM, a firewall, yourself). 4. The desired organizational security improvement or action enabled by the intelligence.

Examples of narrowed scope include focusing on exploited vulnerabilities and specific threat behaviors.

The speaker shares their own feed's scope: specifically exploited individuals or risky behaviors, rather than major breaches or patches. This narrow focus allows other tools to handle broader information, while their TIP delivers highly relevant, actionable intelligence.

Sourcing Information for Your TIP

Prioritize primary sources over secondary media outlets.

While secondary sources like news media summarize technical topics into impact statements, they are often delayed and spun. The speaker recommends drilling down to the original primary sources for more timely and accurate information. Reputable outlets usually hyperlink their sources.

Primary sources can include research, blogs, or even individuals with valuable insights.

Primary sources don't have to be formal research papers. They can be individuals who effectively explain difficult concepts, curate important information, or provide unique perspectives. The key is that they provide information that fits your process and is usable.

Evaluate source credibility by considering academic standards and potential bias.

Two rules of thumb for source evaluation: 1. Would your high school teacher accept this source for an essay? If not, reconsider its validity or seek corroborating information. 2. Understand the source's bias and motive. Why are they sharing this information? What do they gain? This context is vital for interpreting the information.

Cybersecurity vendors' motives should be considered when evaluating their content.

The speaker cautions that vendors offering free talks or content often have a motive, such as marketing their products. Understanding this bias helps in critically evaluating the information presented and recognizing that information alone is not intelligence.

A list of starting sources is provided, but vetting remains essential.

The presentation offers a list of initial sources to help attendees get started but stresses that personal common sense and vetting are required. The speaker cannot guarantee the veracity or security of any recommended source.

Finding and Creating RSS Feeds

Many primary sources obscure RSS feeds to drive traffic to their sites.

Vendors often hide RSS feeds to encourage users to visit their websites, where they can be marketed to. However, blogs, bulletins, and notifications commonly use backend systems that support RSS, even if not prominently displayed.

Google's advanced search can create custom RSS feeds.

A valuable technique is using Google's advanced search capabilities with specific boolean queries to create custom RSS feeds. The Wi-Fi symbol on search results pages often indicates an RSS feed link. While this can generate a lot of noise, it effectively captures information.

Regularly check RSS feeds to ensure they are still active.

RSS feeds are not a 'set and forget' process. The speaker recommends checking them periodically (at least weekly) to ensure they haven't stopped working and need to be reset.

Use browser developer tools to find hidden RSS feeds in website source code.

By viewing a webpage's source code and searching for 'RSS', 'atom', or 'xml', users can often find the hidden RSS feed links. This involves right-clicking on the page and selecting 'View Page Source'.

Brute-forcing feed URLs and checking sitemaps are alternative methods.

If direct methods fail, one can try appending common feed terms like '/feed' or '/rss' to a blog's URL. Navigating through a website's sitemap or robots.txt file can also sometimes lead to RSS feed locations, though these methods can be more complex.

Practice identifying and subscribing to RSS feeds is recommended.

An interactive exercise is included where attendees are encouraged to practice finding RSS feeds on websites. This hands-on approach ensures they can apply the techniques discussed. The speaker notes that finding feeds on mobile devices can be challenging.

RSS feed creation tools can help if direct finding is difficult.

Several applications and services exist that can help create or find RSS feeds if direct methods are unsuccessful. Attendees are cautioned to use these tools at their own risk as the speaker has not tested all of them extensively.

Organizing and Reviewing Feed Data

A dashboard approach helps manage feed overwhelm.

With numerous feeds, a dashboard is essential for reviewing information without getting overwhelmed. The speaker suggests dedicating about 15 minutes a day to this task and avoiding letting feeds pile up. Having a backup plan is also advised.

Categorize feeds to enable efficient API connections and automation.

The speaker organizes feeds into categories like 'Niche Media' (individual researchers), 'Mid Media' (tech outlets like Dark Reading, Hacker News), and 'Big Media' (mainstream news like WSJ). This categorization facilitates API hooks for automation and filtering.

Newsletters can be integrated into RSS feeds to avoid inbox clutter.

For newsletters, particularly those with subscription models like Substack, integrating them into an RSS feed (often a paid feature) prevents them from getting lost in a crowded inbox.

Save primary resources like PDFs or specific articles as artifacts.

When a valuable piece of information is found, especially if it cites a larger PDF or resource, save that artifact. This is important because niche researchers' content can disappear if companies fold or individuals move on. URLs can become dead links over time.

Use boards or tags within your system for further filtering and analysis.

The speaker uses 'boards' in their system, which function as tags that can trigger automation. These tags are pulled into a database as variables, allowing for more granular filtering later on.

Feed collection tools can be integrated into communication platforms like Slack or Teams.

Many teams already use Slack or Teams for information sharing. RSS feeds can be piped into these platforms. Bots can be configured to react to specific emojis, triggering further actions or organization within these tools, promoting visibility of work.

Automation in Threat Intelligence

Automation is essential for managing large volumes of data and preventing burnout.

Automation predates GenAI and is crucial for managing the influx of information. It's not just about efficiency; it's about sustainability in the role, preventing the need to manually process everything.

Metadata is key for synthesizing and categorizing information.

Basic variables (metadata) are necessary to understand and process collected information. This includes data sensitivity labels, which are critical for controlling access and appropriate distribution, ensuring sensitive information is handled correctly.

Data sensitivity labels are crucial for controlling access and distribution.

Sensitivity labels help determine who can see what information. This can range from formal TLP (Traffic Light Protocol) classifications to internal labels determining if certain departments should see specific data to avoid unnecessary alarm or action.

Vendor TIPs can be restrictive with data sharing and TLP classifications.

Commercial TIPs and their data feeds may have strict rules regarding how data, especially classified information like TLP Amber, can be shared or integrated into automated systems. Misusing this data can lead to sanctions.

Leverage HTML and schema.org for extracting metadata from web sources.

The structure of web pages (HTML) and standards like schema.org provide standardized ways to format information for search engines, which can be exploited to extract metadata programmatically. This is where search bots and automation begin.

Automation can extract metadata, preserve relevant data, and populate a database.

A typical automation process involves pulling data from feeds, extracting metadata (like descriptions, tags, source URLs), preserving essential elements, and organizing it into a chosen database. This process can be triggered by various conditions.

Consider archiving older references as threat intelligence has a decay rate.

Threat intelligence research and data decay over time. Mechanisms should be in place to archive older or outdated information, acknowledging that IP addresses rotate and research needs updating. Thread reports, for instance, require periodic review.

Automation possibilities range from simple to complex, custom-coded solutions.

Whether using cloud services, open-source tools, or custom scripts, the focus should be on achieving the goal of getting intelligence processed and delivered, not on creating overly complex or impressive systems. The goal is utility, not just technical prowess.

Storing Threat Intelligence Data

Choose a database platform that suits your needs and technical skills.

Various database options exist, from personal databases like Airtable (a sophisticated spreadsheet) to enterprise solutions like Azure. Even a simple table of contents linking to file folders can work, but the speaker found that approach cumbersome for extensive data.

Develop a structured database with detailed descriptions for rich data retrieval.

The speaker's personal database uses Airtable. For cyber threat intelligence, detailed descriptions are crucial, especially for AI processing, as they make information modular, sequential, and tagged. This plain text summary is more valuable than a PDF for contextual understanding.

Plain text summaries with section headers aid AI processing and searchability.

When extracting information, converting it into plain text with clear section headers (corresponding to database variables) significantly improves its usability, particularly for AI tools. This allows for easier searching and retrieval of specific intelligence, such as all resources on 'Scattered Spider'.

AI can summarize articles, but human review is needed for accuracy and completeness.

The speaker provides an example of an AI-summarized article on a 'Striker hack', noting that while AI provides a summary, human review is necessary to ensure accuracy, completeness, and to catch missed crucial details like TTPs (Tactics, Techniques, and Procedures).

Consider available database tools, but focus on selecting one that works for you.

A selection of database platforms is presented, emphasizing that the best choice is the one the user can effectively implement and use. The goal is to organize collected information in a way that makes sense for analysis and retrieval.

Distributing Threat Intelligence

Distribution is key; avoid hoarding information without sharing.

Intelligence is only valuable if it reaches the right people in the right format. Simply sharing a link is insufficient. The act of sharing must enable decision-making and action within the organization.

Tailor intelligence delivery to your audience's needs and decision-making processes.

Instead of considering yourself, focus on the audience. The intelligence must be packaged in a way that the recipient (person or tool) can use it to make a decision. This might involve producing briefings or reports in a specific, desired format.

Proactively build trust and educate stakeholders before crises occur.

Threat analysts should not wait for incidents to happen to communicate. Being proactive in educating and informing stakeholders builds trust and credibility, making them more receptive to critical information when emergencies arise. It helps build buy-in for security measures.

Synthesize and double-check intelligence before distribution.

When preparing intelligence for distribution, it's vital to synthesize the information and rigorously double-check its accuracy and completeness. AI summaries might miss critical details like living-off-the-land tactics or specific TTPs, requiring human correction.

Automate formatting or adapt to stakeholder preferences for faster, more effective delivery.

The speaker shares an example where they adapted their information format to suit the needs of the emerging threat team, even if it wasn't their preferred method. This pragmatic approach, focused on getting people safer faster, is more important than personal preference.

Utilize various distribution channels, including internal communication platforms.

Distribution channels can include Slack, Teams, or formal briefings. Using emoji codes in Slack can signal new briefings or covered tactics. Adaptability in presentation format, even if it requires extra work, ensures the intelligence is used effectively.

Generative AI in Threat Intelligence: Capabilities and Limitations

GenAI can perform tasks like summarization and pattern identification but requires significant skill and oversight.

GenAI tools can summarize content and identify patterns, which can be useful for initial processing. However, the speaker stresses that this requires skill to implement effectively and should not be mistaken for fully automated intelligence production. The output often needs correction.

GenAI struggles with attribution, malware, and consistently applied multi-step instructions.

Key limitations of current GenAI include poor attribution accuracy, difficulty with complex malware analysis, inconsistencies in output (even with the same prompt), and a tendency to hallucinate or invent information. It also struggles to reliably follow multi-step instructions.

Do not outsource your critical thinking or trust GenAI with sensitive internal data.

The speaker strongly advises against relying on GenAI for critical thinking or analysis. Furthermore, feeding proprietary or internal data into public commercial GenAI models is a significant security risk, as these models prioritize data extraction and are not designed for secure handling of sensitive information.

Commercial GenAI models are not incentivized for security or data privacy.

Publicly available GenAI models are built by companies incentivized to develop fast, gather data, and potentially break things. They are not typically run by security-focused individuals, and their operational motives may not align with secure data practices. Trusting them implicitly is risky.

GenAI can be a useful tool when used cautiously, with verification and specific training (like RAG).

When used as a tool, GenAI can assist with certain tasks like summarization or providing definitions. However, outputs must be diligently checked against known good sources. While Retrieval-Augmented Generation (RAG) can enhance accuracy, setting it up requires significant expertise. Users must always demand sources for AI-generated information.

Homework assignment: Compare GenAI outputs over time and with varying temperature settings.

Attendees are given a homework assignment: use a GenAI tool for summarization multiple times on the same text and compare the outputs. They are also encouraged to experiment with 'temperature' settings (controlling randomness) to see how it affects accuracy and output variation. This highlights GenAI's inconsistency.

Ask a Question

*Uses 1 Wisdom coin from your coin balance

Watch Video

Open in YouTube