Machine Learning and Security

Why Machine Learning & Security?

Spam

As soon as academics and scientists had hooked enough computers together via the internet to create a communications network that provided value, other people realized that this medium of free transmission and broad distribution was a perfect way to advertise sketchy products, steal account credentials, and spread computer viruses ClicK

Machine Learning

Machine Learning is not invented by the spam fighters, but it was quickly adopted by statistically inclined technologies who saw its potential in dealing with a constantly evolving source of abuse. Email providers and Internet Service Providers (ISPs) have access to wealth to email content, metadata, and user behavior. Using email data, content based models can be built to create a generalizable approach to recognize spam. Metadata and entity reputation can be extracted from emails to predict the likelihood that an email is spam without even looking at its content. By instantiating a user behavior feedback loop, the system can build a collective intelligence and improve over time with the help of its users.

Email filters have thus gradually evolved to deal with the growing diversity of circumvention methods that spammers have thrown at them. Even though 85% of all mails sent today are spam(according to one research group ClicK) , the best modern spam filters block more than 99.9% of all spam ClicK. These results demonstrate an enormous advance over the simplistic spam filtering techniques developed in the early days of the internet, which made use of simple word filtering and email metadata reputation ClicK to achieve modest results.


Cyber Thread Landscape

Malware
        Short for "malicious software," any software designed to cause harm or gain unauthorized access to computer systems.

Worm
        Standalone malware that replicates itself in order to spread to other computer systems.

Trojan
        Malware disguised as legitimate software to avoid detection.

Spyware
        Malware installed on a computer system without permission and/or knowledge by the operator, for the purposes of espionage and information collection. Key-loggers fall into this category.

Adware
        Malware that injects unsolicited advertising material(e.g., pop ups, banners, videos) in to user interface, often when a user is browsing the web.

Ransomware
        Malware designed to restrict availability of computer systems until a sum of money(ransom) is paid.

Rootkit
       A collection of (often) low-level software designed to enable access to or gain control of a computer system.("Root" denotes the most powerful level of access to a system.)

Backdoor
       An intentional hole placed in the system perimeter to allow for future accesses that can bypass perimeter protections.

Bot
       A variant of malware that allows attackers to remotely take over and control computer systems, making them zombies.

Botnet
       A large network of bots

Exploit
       A piece of code or software that exploits specific vulnerabilities in other software applications or frameworks.

Scanning 
Attacks that send a variety of requests to computer systems, often in a brute-force manner, with the goal of finding weak points and vulnerabilities as well as information gathering.

Sniffing
Silently observing and recording network and in-server traffic and processes without the knowledge of network operators.

Keylogger
A piece of hardware or software that(often covertly) records the keys presses on a keyboard or similar computer input device.

Spam
Unsolicited bulk messaging, usually for the purposes of advertising. Typically email, but could be SMS or through a messaging provider(e.g., WhatsApp).

Login attack
Multiple, usually automated, attempts at guessing credentials for authentication systems, either in a brute-fore manner or with stolen/purchased credentials.

Account takeover(ATO)
Gaining access to an account that is not your own, usually for the purpose of downstream selling, identity theft, monetary theft and so on. Typically the goal of a login attack, but also can be small scale and highly targeted (e.g., spyware, social engineering).

Phishing(aka masquerading)
Communications with a human who pretends to be a reputable entity or person in order to induce the revelation of personal information or to obtain private assets.

Spear Phishing
Phishing that is targeted at a particular user, making us of information about that user gleaned from outside sources.

Social Engineering
Information Exfiltration (extraction) from human being using nontechnical methods such as lying, trickery, bribery, blackmail, an so on.

Incendiary Speech
Discriminatory, discrediting, or otherwise harmful speech targeted at an individual or group.

Denial of Service(DoS) and distributed denial of service (DDoS)
Attacks on the availability of systems through high-volume bombardment and/or malformed requests, often also breaking down system integrity and reliability.

Advanced persistent threats (APTs)
Highly targeted networks or host attack in which a stealthy intruder remains intentionally undetected for long periods of time in order to steal and exfiltrate data.

Zero-day vulnerability
A weakness or bug in computer software or systems that is unknown to the vendor, allowing for potential exploitation (called zero-day attack) before the vendor has chance to patch/fix the problem.


What is machine Learning?

Teaching computers to reason and make "intelligent" decisions in the way that humans do, by drawing generalizations and distilling concepts from complex information sets without explicit instructions.

Machine Learning refers to one aspect of this goal -- specifically, to algorithms and processes that "learn" in the sense of being able to generalize past data and experiences in order to predict future outcomes. At its core, machine learning is a set of mathematical techniques, implemented on computer systems, that enables a process of information mining, pattern discovery, and drawing inferences from data.

Types of Machine Learning
Supervised machine learning methods adopt Bayesian approach to knowledge discovery, using probabilities of previously observed events to infer the probabilities of new events.

Unsupervised machine learning methods draws abstractions from unlabeled datasets and apply these new data. 

Both families of methods can be applied to problems of classification(assigning observations to categories) or regression (predicting numerical properties of an observation)




Comments

Popular posts from this blog

The Deep Learning Revolution

CRYPTO(CURRENCIES/TOKEN/ASSET/GRAPHY) - PART 1

Designing Data Intensive Application