Sunday, February 14, 2016

Research about overview of Machine Learning and Data Science

Research Findings:
1.  Engineers who are great in both fields are basically unicorns and are at least 10x as valuable as someone who is great in just one of the fields. These are the engineers who don’t just work on algorithms or systems all day but instead launch personalization products in the market. These are the types of engineers who are behind the personalization teams at companies such as Amazon, Netflix, LinkedIn and many successful personalization startups
2.  machine learning is both about breadth as depth. You are expected to know the basics of the most important algorithms (see my answer to What are the top 10 data mining or machine learning algorithms?). On the other hand, you are also expected to understand low-level complicated details of algorithms and their implementation details. I think the approach I am describing addresses both these dimensions and I have seen it work.
3.  there are lots of folks in the market that are great engineers and there are also lots of folks who are great at machine learning, but there is a severe shortage of great Machine Learning Engineers.
4.  Knowing R or Python really well might amount to building a model faster or allow you to integrate it into software better, but it says nothing about your ability choose the right model, or build one that truly speaks to the challenge at hand.
5.  The art of being able to do machine learning well comes from seeing the core concepts inside the algorithms and how they overlap with the pain points trying to be addressed.  Great practitioners start to see interesting overlaps before ever touching a keyboard.
6.  Data science and machine learning are not synonyms, but much of the core algorithmic variety used by data scientists comes from machine learning.
7.  R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.
8.  Just when you became an R ninja, Python came around the corner and became the de facto. Just when you finally mastered how to lay out a kick-ass data pipeline using Hadoop, Spark became the new thing.
9. many machine learning jobs look for people that have a PhD in some quantitative discipline. The reason for this is that the PhD is the only degree that trains its students in the discipline of advanced research. Importantly, obtaining a PhD means you have solved something truly original and defended that in front of leaders in the field.
10. Although many vendors would have you believe machine learning can be a "service" or something that is packaged, real machine learning takes real research.  If it could be a packaged solution, everyone would do it and there would be no competitive advantage in either academia or enterprise.
11.  Thinking like a researcher means working towards something truly original, regardless of how small that piece of originality might be.  Don't look for obvious solutions; look for that thing that models the system in an original way and that offers you, your client, or your employer something truly competitive and interesting.
12.  Revisit the core concepts regularly, practice solving problems relentlessly, and work towards trying to solve something original.
14.  Jump in and have fun.  Machine learning is an exciting field and will help solve many of today's most difficult challenges.
15.  Add more algorithms (decision trees, neural nets etc.).  Learn more domains and problems.  Study techniques to solve unstructured data.  There are wonderful courses in the thread
16.  if you're looking for a slow but mature growth into machine learning, familiarity with basic concepts of computer science (algorithms, data structures, and complexity), mathematical maturity in discrete math, matrix math, and  probability and statistics, are all important for understanding the more complex parts of machine learning you can get into
17.  Importantly, there are a lot of algorithms/paradigms in machine learning. While you should have some understanding of these, it is equally important to have the basic intuition of machine learning - concepts of bias-variance tradeoff, overfitting, regularization, duality, etc. These concepts are often used in most or all of machine learning, in some form or the other.
18.  machine learning is responsible for everything from Siri, Google Now, the recommendation engine on YouTube and Netflix, to even the driverless cars. Surely, machine learning is something that every computer scientist will encounter sooner or later, and it is important to learn this well.
19.  Machine learning was the byproduct of the early endeavours to develop artificial intelligence. The aim was to make a machine learn via data. But the use of this approach often resulted in reinventing already existing statistical models. This, coupled with the increase in knowledge and logical based approach to AI put machine learning out of favour among the AI community.
20.  Machine learning soon became a sub-sect of statistics and data mining.
21.  Machine Learning has become a separate field of its own. Instead of striving to achieve artificial intelligence, the main aim of machine learning has become more towards tackling solvable problems. It borrows techniques from statistics and probability to focus on predictions derived from data.
22.  AI includes concepts like symbolic logic, evolutionary algorithms and Bayesian statistics and many other concepts that don’t fall under the purview of machine learning.
23.  Apart from machine learning, AI tries to achieve a broad range of goals like Reasoning, Knowledge representation, Automated planning and scheduling, Natural language processing, Computer vision, Robotics and General intelligence.
24.  Machine learning on the other hand focusses on solving tangible, domain specific problems through data and self learning algorithms.
25.  In your free time, read papers like Google Map-Reduce [34], Google File System [35], Google Big Table [36], The Unreasonable Effectiveness of Data [37],etc There are great free machine learning books online and you should read those also. [38][39][40].
26.  “People You May Know” ads achieved a click-through rate 30% higher than the rate obtained by other prompts to visit more pages on the site. They generated millions of new page views. Thanks to this one feature, LinkedIn’s growth trajectory shifted significantly upward.
27.  Goldman is a good example of a new key player in organizations: the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data.
28.  If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.
29.  If capitalizing on big data depends on hiring scarce data scientists, then the challenge for managers is to learn how to identify that talent, attract it to an enterprise, and make it productive.
30.  what data scientists do is make discoveries while swimming in data.
31.  But we would say the dominant trait among data scientists is an intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested.
32.  Some of the best and brightest data scientists are PhDs in esoteric fields like ecology and systems biology.
33.  The sexy job in the next 10 years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s
34.   There simply aren’t a lot of people with their combination of scientific background and computational and analytical skills.
35.  Think of big data as an epic wave gathering now, starting to crest. If you want to catch it, you need people who can surf.
36.  a data scientist requires comprehensive mastery of a number of fields, such as software development, data munging, databases, statistics, machine learning and data visualization.
37.  A data scientist with a software engineering background might excel at a company like this, where it’s more important that a data scientist make meaningful data-like contributions to the production code and provide basic insights and analyses.
38.  Data Scientists in this setting likely focus more on producing great data-driven products than they do answering operational questions for the company. Companies that fall into this group could be consumer-facing companies with massive amounts of data or companies that are offering a data-based service.
39.  Google is currently using Machine Learning a lot - in my estimate, over a hundred places in their systems have been replaced by Deep Learning and other ML techniques in the past few years
40.  Machine learning uses statistics (mostly inferential statistics) to develop self learning algorithms
41.  Artificial Intelligence is a science to develop a system or software to mimic human to respond and behave in a circumstance.
42.  Machine Learning is a technology within the sphere of 'Aritifical Intelligence'.
43.  The pioneering technology within Machine Learning is the neural network (NN), which mimics (to a very rudimentary level) the pattern recognition abilities of the human brain by processing thousands or even millions of data points. Pattern recognition is pivotal in terms of intelligence.
44.

Companies:
A Data Scientist is a Data Analyst Who Lives in San Francisco
Please Wrangle Our Data
We Are Data. Data Is Us
Reasonably Sized Non-Data Companies Who Are Data-Driven

Academic:
Maths
Statistics
Physics

Topics:
Logistic Regression and Linear Kernel SVMs,
PCA vs. Matrix Factorization,
regularization, or gradient descent.
LibSVM, Weka, ScikitLearn
Big Data

Steps:
1. Andrew Ng's Course
2. Recommender Systems, Mining Massive Datasets
3. Read books
4. Try to apply it in a startup with Python

Algorithms:
L2-regularized Logistic Regression
k-means
LDA (Latent Dirichlet Allocation)
SVMs

Software:
Python Package
R
Matlab

Maths:
Statistics
linear algebra [ feature vectors, eigenvectors, singular value decompositions, and PCA]
probability
optimization
vectors, matrix,
Mutlivariable Calculus

Tools:
Coursera
Open MIT
Standford Courses
Videos in Youtube

Top Academics:
Standford
MIT
CMU
Harward

Jobs:
Data Scientist
PhD

Responsibilities:
Data Analyst
Data Analytics
Data Engineer

Terms:
Artificial Intelligence
Machine Learning
Data Mining
Data Science

Resources:
Books
Articles
Researches

AI and Machine Learning:
Python, R, Java
Probability and Statistics
Applied Maths + Algorithms
Distributed Computing, Data Science, Hadoop, MapReduce
Expert in Unix Tools
Hadoop Subprojects - HBase, Zookeeper, Hive, Mahout
Learn about advanced signal processing techniques

Getting Started:
http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
http://scikit-learn.org/stable/tutorial/basic/tutorial.html
https://www.datacamp.com/courses/kaggle-tutorial-on-machine-learing-the-sinking-of-the-titanic

Course:
https://www.coursera.org/learn/machine-learning/home/week/1

Examples:
LinkedIn



References:
https://www.quora.com/How-do-I-learn-machine-learning-1
https://www.quora.com/What-are-the-top-10-data-mining-or-machine-learning-algorithms/answer/Xavier-Amatriain
http://www.datascienceontology.com/
https://en.wikipedia.org/wiki/R_(programming_language)?q=get%20wiki%20data
https://www.linkedin.com/pulse/20141113191054-103457178-the-only-skill-you-should-be-concerned-with
http://blog.hackerearth.com/2016/01/getting-started-with-machine-learning.html?utm_source=Quora&utm_medium=Content&utm_content=MachineLearning&utm_campaign=BlogQuora
https://www.quora.com/What-is-the-best-language-to-use-while-learning-machine-learning-for-the-first-time/answer/Travis-Addair
https://www.quora.com/What-skills-are-needed-for-machine-learning-jobs/answer/Joseph-Misiti?srid=pBDw&share=4fa777bd
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
http://blog.udacity.com/2014/11/data-science-job-skills.html
http://www.stat.cmu.edu/~larry/=sml/

Friday, February 12, 2016

Research about Complexity Theory in Time

Research Findings:
1. In computational complexity theory, a complexity class is a set of problems of related resource-based complexity.
2. A typical complexity class has a definition of the form: the set of problems that can be solved by an abstract machine M using O(f(n)) of resource R, where n is the size of the input.
3. Complexity classes are concerned with the rate of growth of the requirement in resources as the input n increases.
4. Cobham's thesis states that polynomial time is a synonym for "tractable", "feasible", "efficient", or "fast".
5.  Any problem that cannot be contained in P is not feasible, but if a real-world problem can be solved by an algorithm existing in P, generally such an algorithm will eventually be discovered.
6.  In similar spirit, NC complexity class can be thought to capture problems "effectively solvable" on a parallel computer.
7.  Typical input lengths that users and programmers are interested in are approximately between 100 and 1,000,000. Consider an input length of n=100 and a polynomial algorithm whose running time is n2. This is a typical running time for a polynomial algorithm.
8.  A typical CPU will be able to do approximately 10^9 operations per second (this is extremely simplified).
9.  an algorithm that runs in exponential time might have a running time of 2^n.
10.  Mathematically speaking, for big enough inputs, any polynomial time algorithm will beat any exponential time algorithm, and by arbitrarily large amounts.
11.  There are many lines of objection to Cobham's thesis. The thesis essentially states that "P" means "easy, fast, and practical," while "not in P" means "hard, slow, and impractical."
12.  Mathematical symbols can designate numbers (constants), variables, operations, functions, punctuation, grouping, and other aspects of logical syntax.
14.  Problems are said to be tractable if they can be solved in terms of a closed-form expression
15.  the class of expressions considered to be analytic expressions tends to be wider than that for closed-form expressions. In particular, special functions such as the Bessel functions and the gamma function are usually allowed, and often so are infinite series and continued fractions. On the other hand, limits in general, and integrals in particular, are typically excluded.
16.  If a is an algebraic number different of 0 and 1, and b an irrational algebraic number, then all the values of ab are transcendental numbers (that is, not algebraic).
17.  In mathematics, the logarithm is the inverse operation to exponentiation.
18.  Trigonometric functions are important in the study of triangles and modeling periodic phenomena, among many other applications.
19.  Semantics is the study of meaning. Formal semantics is about attaching meaning to expressions.
20.  Algebra studies two main families of equations: polynomial equations and, among them, linear equations. Polynomial equations have the form P(x) = 0, where P is a polynomial. Linear equations have the form a(x) + b = 0, where a is a linear function and b is a vector. To solve them, one uses algorithmic or geometric techniques, coming from linear algebra or mathematical analysis.
21.  The P versus NP problem is a major unsolved problem in computer science. Informally speaking, it asks whether every problem whose solution can be quickly verified by a computer can also be quickly solved by a computer
22.  When employed in industrial contexts, machine learning methods may be referred to as predictive analytics or predictive modelling.
23.  In essence, a Turing machine is imagined to be a simple computer that reads and writes symbols one at a time on an endless tape by strictly following a set of rules.
24.  A tuple is a finite ordered list of elements. In mathematics, an n-tuple is a sequence (or ordered list) of n elements, where n is a non-negative integer.
25.  Turing machines, first described by Alan Turing in (Turing 1937), are simple abstract computational devices intended to help investigate the extent and limitations of what can be computed.
26.  Turing was interested in the question of what it means for a task to be computable, which is one of the foundational questions in the philosophy of computer science. Intuitively a task is computable if it is possible to specify a sequence of instructions which will result in the completion of the task when they are carried out by some machine. Such a set of instructions is called an effective procedure, or algorithm, for the task. The problem with this intuition is that what counts as an effective procedure may depend on the capabilities of the machine used to carry out the instructions. In principle, devices with different capabilities may be able to complete different instruction sets, and therefore may result in different classes of computable tasks
27.  Turing proposed a class of devices that came to be known as Turing machines. These devices lead to a formal notion of computation that we will call Turing-computability. A task is Turing computable if it can be carried out by some Turing machine.
28.  Turing machines are not physical objects but mathematical ones
29.  A Turing machine is a kind of state machine. At any time the machine is in any one of a finite number of states. Instructions for a Turing machine consist in specified conditions under which the machine will transition between one state and another.
30.  Even when a problem is decidable and thus computationally solvable in principle, it may not be solvable in practice if the solution requires an inordinate amount of time or memory.
31.  In worst-case analysis, the form we consider here, we consider the longest running time of all inputs of a particular length. In average-case analysis, we consider the average of all the running times of inputs of a particular length.
32.  In one convenient form of estimation, called asymptotic analysis, we seek to understand the running time of the algorithm when it is run on large inputs. We do so by considering only the highest order term of the expression for the running time of the algorithm, disregarding both the coefficient of that term and any lower order terms, because the highest order term dominates the other terms on large inputs.
33.  the expression 0(1) represents a value that is never more than a fixed constant.
34.  any language that can be decided in o(n log n) time on a single-tape Turing machine is regular,
35.  we exhibited a two-tape TM M3 that decides A in 0(n) time. Hence the time complexity of A on a single-tape TM is 0(n log n) and on a two-tape TM it is 0(n). Note that the complexity of A depends on the model of computation selected.
36.  This discussion highlights an important difference between complexity theory and computability theory. In computability theory, the Church-Turing thesis implies that all reasonable models of computation are equivalent-that is, they all decide the same class of languages. In complexity theory, the choice of model affects the time complexity of languages. Languages that are decidable in, say, linear time on one model aren't necessarily decidable in linear time on another.
37.  In complexity theory, we classify computational problems according to their time complexity.
38.  For our purposes, polynomial differences in running time are considered to be small, whereas exponential differences are considered to be large.
39.  Polynomial time algorithms are fast enough for many purposes, but exponential time algorithms rarely are useful.
40.  Exponential time algorithms typically arise when we solve problems by exhaustively searching through a space of solutions, called brute-force search. Sometimes, brute-force search may be avoided through a deeper understanding of a problem, which may reveal a polynomial time algorithm of greater utility.
41.  All reasonable deterministic computational models are polynomially equivalent.
42.  our aim is to present the fundamental properties of computation, rather than properties of Turing machines or any other special model.
43.  P is invariant for all models of computation that are polynomially equivalent to the deterministic single-tape Turing machine, and
44.  P roughly corresponds to the class of problems that are realistically solvable on a computer.
45.  When we present a polynomial time algorithm, we give a high-level description of it without reference to features of a particular computational model. Doing so avoids tedious details of tapes and head motions. We need to follow certain conventions when describing an algorithm so that we can analyze it for polynomiality
46.  We describe algorithms with numbered stages. The notion of a stage of an algorithm is analogous to a step of a Turing machine, though of course, implementing one stage of an algorithm on a Turing machine, in general, will require many Turing machine steps.
47.  When we analyze an algorithm to show that it runs in polynomial time, we need to do two things. First, we have to give a polynomial upper bound (usually in big-O notation) on the number of stages that the algorithm uses when it runs on an input of length n. Then, we have to examine the individual stages in the description of the algorithm to be sure that each can be implemented in polynomial time on a reasonable deterministic model.
48.  When both tasks have been completed, we can conclude that the algorithm runs in polynomial time because we have demonstrated that it runs for a polynomial number of stages, each of which can be done in polynomial time, and the composition of polynomials is a polynomial.
49.  We continue to use the angle-bracket notation <.>to indicate a reasonable encoding of one or more objects into a string, without specifying any particular encoding method.
50.  we can avoid brute-force search in many problems and obtain polynomial time solutions.
51.  However, attempts to avoid brute force in certain other problems, including many interesting and useful ones, haven't been successful, and polynomial time algorithms that solve them aren't known to exist.
52.  One remarkable discovery concerning this question shows that the complexities of many problems are linked. A polynomial time algorithm for one such problem can be used to solve an entire class of problems.
53.  verifying the existence of a Hamiltonian path may be much easier than determining its existence.
54.  a polynomial time algorithm for testing whether a number is prime or composite was discovered, but it is considerably more complicated than the preceding method for verifying compositeness.
55.  NP is the class of languages that have polynomial time verifiers.
56.  The term NP comes from nondeterministic polynomial time and is derived from an alternative characterization by using nondeterministic polynomial time Turing machines.
57.  Problems in NP are sometimes called NP-problems.
58.  We show how to convert a polynomial time verifier to an equivalent polynomial time NTM and vice versa. The NTM simulates the verifier by guessing the certificate. The verifier simulates the NTM by using the accepting branch as the certificate.
59.  The class NP is insensitive to the choice of reasonable nondeterministic computational model because all such models are polynomially equivalent.
60.  When describing and analyzing nondeterministic polynomial time algorithms, we follow the preceding conventions for deterministic polynomial time algorithms. Each stage of a nondeterministic polynomial time algorithm must have an obvious implementation in nondeterministic polynomial time on a reasonable nondeterministic computational model. We analyze the algorithm to show that every branch uses at most polynomially many stages.
61.  Verifying that something is not present seems to be more difficult than verifying that it is present.
62.  We make a separate complexity class, called coNP, which contains the languages that are complements of languages in NP. We don't know whether coNP is different from NP.
63.  As we have been saying, NP is the class of languages that are solvable in polynomial time on a nondeterministic Turing machine, or, equivalently, it is the class of languages whereby membership in the language can be verified in polynomial time. P is the class of languages where membership can be tested in polynomial time.
64.  where we loosely refer to polynomial time solvable as solvable "quickly."
65.  P = the class of languages for which membership can be decided quickly.
66.  NP = the class of languages for which membership can be verified quickly.
67.  The power of polynomial verifiability seems to be much greater than that of polynomial decidability. But, hard as it may be to imagine, P and NP could be equal. We are unable to prove the existence of a single language in NP that is not in P
68.  The question of whether P = NP is one of the greatest unsolved problems in theoretical computer science and contemporary mathematics.
69.  The question of whether P = NP is one of the greatest unsolved problems in theoretical computer science and contemporary mathematics.
70.  The best method known for solving languages in NP deterministically uses exponential time.
71.  They discovered certain problems in NP whose individual complexity is related to that of the entire class  If a polynomial time algorithm exists for any of these problems, all problems in Nl' would be polynomial time solvable. These problems are called NP-complete.
72.  On the theoretical side, a researcher trying to show that P is unequal to NP may focus on an NP complete problem. If any problem in NP requires more than polynomial time, an NP-complete one does. Furthermore, a researcher attempting to prove that P equals NP only needs to find a polynomial time algorithm for an NP-complete problem to achieve this goal.
73.  Even though we may not have the necessary mathematics to prove that the problem is unsolvable in polynomial time, we believe that P is unequal to NP, so proving that a problem is NP-complete is strong evidence of its nonpolynomiality.
74.  A Boolean formula is satisfiable if some assignment of Os and is to the variables makes the formula evaluate to 1.
75.  If one language is polynomial time reducible to a language already known to have a polynomial time solution, we obtain a polynomial time solution to the original language,
76.  Showing that SAT is in NP is easy, and we do so shortly. The hard part of the proof is showing that any language in NP is polynomial time reducible to SAT.
77.  NP-completeness can be proved with a polynomial time reduction from a language that is already known to be NP-complete.
78.  If you seek a polynomial time algorithm for a new NP-problem, spending part of your effort attempting to prove it NP-complete is sensible because doing so may prevent you from working to find a polynomial time algorithm that doesn't exist.
79.  Our general strategy is to exhibit a polynomial time reduction from 3SAT to the language in question, though we sometimes reduce from other NP-complete languages when that is more convenient.
80.  To show that VERTEX-COVER is NP-complete we must show that it is in NP and that all NP-problems are polynomial time reducible to it.
81.  Time and space are two of the most important considerations when we seek practical solutions to many computational problems.
82.  NP-complete languages as representing the most difficult languages in NP.
83.  Complete problems are important because they are examples of the most difficult problems in a complexity class
84.  A complete problem is most difficult because any other problem in the class is easily reduced into it, so if we find an easy way to solve the complete problem, we can easily solve all other problems in the class.
85.  The reduction must be easy, relative to the complexity of typical problems in the class, for this reasoning to apply. If the reduction itself were difficult to compute, an easy solution to the complete problem wouldn't necessarily yield an easy solution to the problems reducing to it.
86.  Whenever we define complete problems for a complexity class, the reduction model must be more limited than the model used for defining the class itself.
87.  To show that TQBF is in PSPACE we give a straightforward algorithm that assigns values to the variables and recursively evaluates the truth of the formula for those values. From that information the algorithm can determine the truth of the original quantified formula.
88.  To show that every language A in PSPACE reduces to TQBF in polynomial time, we begin with a polynomial space-bounded Turing machine for A. Then we give a polynomial time reduction that maps a string to a quantified Boolean formula X that encodes a simulation of the machine on that input. The formula is true iff the machine accepts.
89.  For the purposes of this section, a game is loosely defined to be a competition in which opposing parties attempt to achieve some goal according to prespecified rules.
90.  Games appear in many forms, from board games such as chess to economic and war games that model corporate or societal conflict.
91.  To illustrate the correspondence between games and quantifiers, we turn to an artificial game called theformula game
92.  We say that Player E has a winning strategy for this game. A player has a winning strategy for a game if that player wins when both sides play optimally.
93.  To prove that GG is PSPACE-hard, we give a polynomial time reduction from FORMULA-GAME to GG.
94.  The only space required by this algorithm is for storing the recursion stack. Each level of the recursion adds a single node to the stack, and at most m levels occur, where m is the number of nodes in G. Hence the algorithm runs in linear space.
95.  Sublinear space algorithms allow the computer to manipulate the data without storing all of it in main memory
96.  Logarithmic space is just large enough to solve a number of interesting computational problems, and it has attractive mathematical properties such as robustness even when machine model and input encoding method change. Pointers into the input may be represented in logarithmic space, so one way to think about the power of log space algorithms is to consider the power of a fixed number of input pointers.
97.  Most computer-aided proofs to date have been implementations of large proofs-by-exhaustion of a mathematical theorem. The idea is to use a computer program to perform lengthy computations, and to provide a proof that the result of these computations implies the given theorem.
98.  Boolean formula is satisfiable if some assignment of Os and is to the variables makes the formula evaluate to 1.
99.  The satisfiability problem is to test whether a Boolean formula is satisfiable
100.  3cnf-formnula if all the clauses have three literals,
101.  In a satisfiable cnf-formula, each clause must contain at least one literal that is assigned 1.
102.

Theoretical CS:
Theory
Definition, Theorems, Lemma, Corollary
Formula
Methodology
Complexity
Topics
Research

Time Complexity: P, NP, NP-Complete, NP-Hardness:
1.  P is a complexity class that represents the set of all decision problems that can be solved in polynomial time. That is, given an instance of the problem, the answer yes or no can be decided in polynomial time.
2.  Decision problem: A problem with a yes or no answer.
3.  NP is a complexity class that represents the set of all decision problems for which the instances where the answer is "yes" have proofs that can be verified in polynomial time.
4.  NP-Complete is a complexity class which represents the set of all problems X in NP for which it is possible to reduce any other NP problem Y to X in polynomial time.
5.  Intuitively, NP-Hard are the problems that are at least as hard as the NP-complete problems. Note that NP-hard problems do not have to be in NP, and they do not have to be decision problems.

Turing Machine:
1. A Turing machine has an infinite one-dimensional tape divided into cells. Traditionally we think of the tape as being horizontal with the cells arranged in a left-right orientation. The tape has one end, at the left say, and stretches infinitely far to the right. Each cell is able to contain one symbol, either ‘0’ or ‘1’.
2. The machine has a read-write head which is scanning a single cell on the tape. This read-write head can move left and right along the tape to scan successive cells.
3. The action of a Turing machine is determined completely by (1) the current state of the machine (2) the symbol in the cell currently being scanned by the head and (3) a table of transition rules, which serve as the “program” for the machine.
4. if the machine is in state Statecurrent and the cell being scanned contains Symbol then move into state Statenext taking Action
5. ⟨ StatecurrentSymbolStatenextAction ⟩
6. As actions, a Turing machine may either to write a symbol on the tape in the current cell (which we will denote with the symbol in question), or to move the head one cell to the left or right, which we will denote by the symbols « and » respectively.
7. In modern terms, the tape serves as the memory of the machine, while the read-write head is the memory bus through which data is accessed (and updated) by the machine. 
8. The first concerns the definition of the machine itself, namely that the machine's tape is infinite in length. This corresponds to an assumption that the memory of the machine is infinite. The second concerns the definition of Turing-computable, namely that a function will be Turing-computable if there exists a set of instructions that will result in a Turing machine computing the function regardless of the amount of time it takes. One can think of this as assuming the availability of infinite time to complete the computation.
9. If the machine reaches a situation in which there is no unique transition rule to be carried out, i.e., there is none or more than one, then the machine halts.
10. In order to speak about a Turing machine that does something useful, we will have to provide an interpretation of the symbols recorded on the tape.
11.  a Turing machine is a model of computation that completely captures the notion of computability, while remaining simple to reason about, without all the specific details of your PC's architecture.
12.  The (generally accepted) "Church-Turing thesis" asserts that every device or model of computation is no powerful than a Turing machine.
14.  So many theoretical problems (e.g. classes like P and NP, the notion of "polynomial-time algorithm", and so on) are formally stated in terms of a Turing machine, although of course they can be adapted to other models as well.
15.  people talk about Turing machines because it is a precise and full specified way to say what a "computer" is, without having to describe every detail of the CPU's architecture, its constraints, and so on.
16.  To be very technical, there is one key difference between a PC and a Turing Machine: Turing Machines have infinite memory. This means a PC is not (and no tangible device ever could be) technically as powerful as a Turing Machine, although in practice the difference is trivial.
17.  A Turing-machine is a theoretical machine that can be used to reason about the limits of computers. Simply put, it is an imaginary computer with infinite memory.
18.  We care about Turing-machines because they help us discover what is impossible to accomplish with real computers (like your IBM PC). If it is impossible for a Turing machine to perform a particular computation (like deciding the Halting Problem), then it stands to reason that it is impossible for your IBM PC to perform that same computation
19.  Turing machines are basic abstract symbol-manipulating devices which, despite their simplicity, can be adapted to simulate the logic of any computer algorithm. They were described in 1936 by Alan Turing. Turing machines are not intended as a practical computing technology, but a thought experiment about the limits of mechanical computation. Thus they were not actually constructed. Studying their abstract properties yields many insights into computer science and complexity theory.
20.  A Turing machine that is able to simulate any other Turing machine is called a Universal Turing machine (UTM, or simply a universal machine). A more mathematically-oriented definition with a similar "universal" nature was introduced by Alonzo Church, whose work on lambda calculus intertwined with Turing's in a formal theory of computation known as the Church-Turing thesis. The thesis states that Turing machines indeed capture the informal notion of effective method in logic and mathematics, and provide a precise definition of an algorithm or 'mechanical procedure'.
21.  Using this scheme, we can demonstrate a method for giving evidence that certain problems are computationally hard, even if we are unable to prove that they are.
22.  You have several options when you confront a problem that appears to be computationally hard. First, by understanding which aspect of the problem is at the root of the difficulty, you may be able to alter it so that the problem is more easily solvable. Second, you may be able to settle for less than a perfect solution to the problem. In certain cases finding solutions that only approximate the perfect one is relatively easy. Third, some problems are hard only in the worst case situation, but easy most of the time. Depending on the application, you may be satisfied with a procedure that occasionally is slow but usually runs quickly. Finally, you may consider alternative types of computation, such as randomized computation, that can speed up certain tasks.
23.  The theories of computability and complexity are closely related. In complexity theory, the objective is to classify problems as easy ones and hard ones, whereas in computability theory the classification of problems is by those that are solvable and those that are not. Computability theory introduces several of the concepts used in complexity theory.
24.  Directed graphs are a handy way of depicting binary relations.
25.  Strings of characters are fundamental building blocks in computer science.
26.  Theorems and proofs are the heart and soul of mathematics and definitions are its spirit.
27.  Researchers sometimes work for weeks or even years to find a single proof.

Learning:
1.  Scientific Research uses R&D to find hidden theories, which helps human to under some subject further. So these theories are very important, and are the foundation of further research.

Theory:
Savitch's Theorem
Cook-Levin theorem

Term:
Turing Machine
Non-deterministic Turning Machine [NTM]
Deterministic Turning Machine [DTM]
Asymptotic Analysis
Polynomial Bounds
Exponential Bounds
linear time
Polynomial difference
polynomality
nonpolynomaility
Cook-Levin theorem
PSPACE-Hard
Sublinear Space Complexity

Expressions:
Arithmetic Expression
Polynomial Expression
Algebra Expression
Closed-form Expression
Analytic Expression
Mathematical Expression

Books:
Introduction to the Theory of Computation

Questions:
What are the fundamental capabilities and limitations of computers?
What makes some problems computationally hard and others easy?




People:
Turing is widely considered to be the father of theoretical computer science and artificial intelligence.

References:
https://en.wikipedia.org/wiki/Cobham%27s_thesis
https://en.wikipedia.org/wiki/Polynomial#Generalizations_of_polynomials
https://en.wikipedia.org/wiki/Trigonometric_functions
https://en.wikipedia.org/wiki/Millennium_Prize_Problems#P_versus_NP
https://en.wikipedia.org/wiki/Non-deterministic_Turing_machine
http://plato.stanford.edu/entries/turing-machine/
https://en.wikipedia.org/wiki/State_diagram
http://stackoverflow.com/questions/236000/whats-a-turing-machine
https://en.wikipedia.org/wiki/NP-hardness
http://stackoverflow.com/questions/1857244/what-are-the-differences-between-np-np-complete-and-np-hard
https://en.wikipedia.org/wiki/Computer-assisted_proof


Sunday, February 7, 2016

Research about Registered Agent

Research Findings:
1.  Commercial law, also known as business law, is the body of law that applies to the rights, relations, and conduct of persons and businesses engaged in commerce, merchandising, trade, and sales
2.  In United States business law, a registered agent, also known as a resident agent[1] or statutory agent,[2] is a business or individual designated to receive service of process (SOP) when a business entity is a party in a legal action such as a lawsuit or summons.
3.  The registered agent's address may also be where the state sends the paperwork for the periodic renewal of the business entity's charter (if required).
4.  The registered agent for a business entity may be an officer or employee of the company, or a third party, such as the organization's lawyer or a service company. Failure to properly maintain a registered agent can affect a company negatively.
5.  Most businesses are not individuals but instead business entities such as corporations or limited liability companies (LLCs). This is because there are substantive (and substantial) liability protections as well as tax advantages to being "incorporated" as opposed to being "self-employed".
6.  The purpose of a Registered Agent is to provide a legal address (not a P.O. Box) within that jurisdiction where there are persons available during normal business hours to facilitate legal service of process being served in the event of a legal action or lawsuit.
7.  Generally, the registered agent is also the person to whom the state government sends all official documents required each year for tax and legal purposes, such as franchise tax notices and annual report forms.
8.  Registered Agents generally will also notify business entities if their state government filing status is in "Good Standing" or not.
9.  The reason that these notifications are a desired function of a registered agent is that it is difficult for a business entity to keep track of legislative changes and report due dates for multiple jurisdictions given the disparate laws of different states.
10.  Penalties for not maintaining a registered agent generally will cause a jurisdiction to revoke a business’s corporate or LLC legal status as well as in some cases, assess additional penalty fees on the entity.
11.  This is one of the most common reasons that business entities generally will utilize a third party as their Registered Agent be it a commercial service company, an attorney, or in some cases, a CPA.
12.  The person at the business entity that maintains contact with the registered agent is the corporate secretary or governance officer.
14.  No matter where you’re starting your business, if you’re forming an LLC or corporation, you’re required to have a registered agent and a registered office.
15.  A registered agent is simply a person or entity appointed to accept service of process and official mail on your business’ behalf. You can appoint yourself, or in many states, you can appoint your business to be its own registered agent.
16.  A lawsuit against your business cannot move forward in court without your business being properly notified first.
17.  The registered agent must have a physical address within that state and be available during business hours so someone suing you can easily find you. This requirement gives the court an easy way to notify you. This also eliminates the possibility of big corporations hiding behind 1000’s of employees. The registered agent is your business’ point of contact with the state and for service of process. The theory behind the requirements of the listed registered agent is to ensure your business maintains a reliable way to be contacted.
18.  The majority of small businesses (10 employees or less) do not hire registered agents. That said, there are some specific reasons why some business owners do opt hire registered agent services; you’ll find those reasons listed below:
19.  If you hire a registered agent service your registered agent should have a system in place to track and notify you when annual reports are due to keep your business in compliance with the state, so you don’t have to worry about it. Also, all your important documents will be kept in one place and you don’t have to bother keeping track of notices.
20.  In each state your business operates, you’ll need to register with that state, and in every state you register, you need a registered agent with a physical location in that state.
21.  A registered agent receives important legal and tax documents on behalf of a business, including important mail sent by the state (annual reports or statements), tax documents sent by the state’s department of taxation, and Service of Process—sometimes called Notice of Litigation, which initiates a lawsuit.
22.  sometimes a registered agent is called a statutory agent. Or an agent for service of process.
23.  Service of process is the procedure by which a party to a lawsuit gives an appropriate notice of initial legal action to another party (such as a defendant), court, or administrative body in an effort to exercise jurisdiction over that person so as to enable that person to respond to the proceeding before the court, body, or other tribunal.
24.  In many cases, you or another member of your business, such as a partner, a member of your LLC, or an officer of your corporation, will serve as the registered agent, and the address for the registered agent will be your business location.
25.  the top three registered agents by volume in the U.S. are CSC, CT, and Incorp.


Functionalities:
track the official notices and annual report due dates with the state.
Document Organization
Compliance Management
Reminds you of upcoming compliance requirements and deadlines, such as the due date for the annual report required by your state of incorporation.
Provides software to manage important corporate records and documents.
Offers secure, online access to important documents.
Monitors your company's status in the state(s) where it is registered.
online account access to manage notifications

Questions?
1.  What is the best practice ?

References:
https://en.wikipedia.org/wiki/Commercial_law
https://en.wikipedia.org/wiki/Registered_agent
http://www.northwestregisteredagent.com/registered-agent-marketshare.html
http://www.tccorporate.com.au/why-tc-corporate
http://www.bizfilings.com/learn/what-is-registered-agent.aspx
https://en.wikipedia.org/wiki/Service_of_process


Friday, February 5, 2016

Research about Computability Theory

Research Findings:
1.  Finite automata are good models for devices that have a small amount of memory. Pushdown automata are good models for devices that have an unlimited memory that is usable only in the last in, first out manner of a stack.
2.  Similar to a finite automaton but with an unlimited and unrestricted memory, a Turing machine is a much more accurate model of a general purpose computer.
3.  Note that Z does not contain the blank symbol, so the first blank appearing on the tape marks the end of the input.
4.  As a Turing machine computes, changes occur in the current state, the current tape contents, and the current head location. A setting of these three items is called a configuration of the Turing machine.
5.  Call a language Turing-decidable or simply decidable if some Turing machine decides it.
6.  Every decidable language is Turing-recognizable.
7.  A Turing machine is a general example of a CPU that controls all data manipulation done by a computer, with the canonical machine using sequential memory to store data. More specifically, it is a machine (automaton) capable of enumerating some arbitrary subset of valid strings of an alphabet; these strings are part of a recursively enumerable set.
8.  In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running or continue to run forever.
9.  Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist. A key part of the proof was a mathematical definition of a computer and program, which became known as a Turing machine; the halting problem is undecidable over Turing machines. It is one of the first examples of a decision problem.
10.  A Turing machine that is able to simulate any other Turing machine is called a universal Turing machine (UTM, or simply a universal machine).
11.  It is often said that Turing machines, unlike simpler automata, are as powerful as real machines, and are able to execute any operation that a real program can.
12.  Turing machines are not intended to model computers, but rather they are intended to model computation itself.
14.  Anything a real computer can compute, a Turing machine can also compute.
15.  In this section we describe some of these variants and the proofs of equivalence in power. We call this invariance to certain changes in the definition robustness. Both finite automata and pushdown automata are somewhat robust models, but Turing machines have an astonishing degree of robustness.
16.  we can convert any TM with the "stay put" feature to one that does not have it. We do so by replacing each stay put transition with two transitions, one that moves to the right and the second back to the left.
17.  Theorems and proofs are the heart and soul of mathematics and definitions are its spirit.
18.  The computation of a nondeterministic Turing machine is a tree whose branches correspond to different possibilities for the machine.
19.  The idea behind the simulation is to have D try all possible branches of N's nondeterministic computation. If D ever finds the accept state on one of these branches, D accepts. Otherwise, D's simulation will not terminate
20.  All share the essential feature of Turing machines-namely, unrestricted access to unlimited memory distinguishing them from weaker models such as finite automata and pushdown automata.
21.  Any two computational models that satisfy certain reasonable requirements can simulate one another and hence are equivalent in power.
22.  Informally speaking, an algorithm is a collection of simple instructions for carrying out some task. Commonplace in everyday life, algorithms sometimes are called procedures or recipes.
23.  A polynomial is a sum of terms, where each term is a product of certain variables and a constant called a coefficient.
24.  He did not use the term algorithm but rather "a process according to which it can be determined by a finite number of operations.
25.  The definition came in the 1936 papers of Alonzo Church and Alan Turing. Church used a notational system called the A-calculus to define algorithms. Turing did it with his "machines." These two definitions were shown to be equivalent. This connection between the informal notion of algorithm and the precise definition has come to be called the Church-Turing thesis.
26.  Intuitive Notions of Algorithms = Turning Machine Algorithms
27.  We have come to a turning point in the study of the theory of computation. We continue to speak of Turing machines, but our real focus from now on is on algorithms. That is, the Turing machine merely serves as a precise model for the definition of algorithm. We skip over the extensive theory of Turing machines themselves and do not spend much time on the low-level programming of Turing machines. We need only to be comfortable enough with Turing machines to believe that they capture all algorithms.
28.  The input to a Turing machine is always a string. If we want to provide an object other than a string as input, we must first represent that object as a string. Strings can easily represent polynomials, graphs, grammars, automata, and any combination of those objects. A Turing machine may be programmed to decode the representation so that it can be interpreted in the way we intend. Our notation for the encoding of an object 0 into its representation as a string is (0).
29.  In our format, we describe Turing machine algorithms with an indented segment of text within quotes. We break the algorithm into stages, each usually involving many individual steps of the Turing machine's computation. We indicate the block structure of the algorithm with further indentation. The first line of the algorithm describes the input to the machine. If the input description is simply w, the input is taken to be a string. If the input description is the encoding of an object as in (A), the Turing machine first implicitly tests whether the input properly encodes an object of the desired form and rejects it if it doesn't.
30.  the Turing machine as a model of a general purpose computer and defined the notion of algorithm in terms of Turing machines by means of the Church-Turing thesis.
31.  You are probably familiar with solvability by algorithms because much of computer science is devoted to solving problems. The unsolvability of certain problems may come as a surprise
32.  First, knowing when a problem is algorithmically unsolvable is useful because then you realize that the problem must be simplified or altered before you can find an algorithmic solution. Like any tool, computers have capabilities and limitations that must be appreciated if they are to be used well.
33.  The second reason is cultural. Even if you deal with problems that clearly are solvable, a glimpse of the unsolvable can stimulate your imagination and help you gain an important perspective on computation.
34.  Showing that the language is decidable is the same as showing that the computational problem is decidable.
35.  language is decidable is the same as showing that the computational problem is decidable.
36.  for decidability purposes, presenting the Turing machine with a DFA, NFA, or regular expression are all equivalent because the machine is able to convert one form of encoding to another.
37.  The problem of determining whether a CFG generates a particular string is related to the problem of compiling programming languages.
38.  The general problem of software verification is not solvable by computer.
39.  The proof of the undecidability of the halting problem uses a technique called diagonalization, discovered by mathematician Georg Cantor in 1873.
40.  we can't use the counting method to determine the relative sizes of infinite sets.
41.  Cantor proposed a rather nice solution to this problem. He observed that two finite sets have the same size if the elements of one set can be paired with the elements of the other set. This method compares the sizes without resorting to counting. We can extend this idea to infinite sets.
42.  A set A is countable if either it is finite or it has the same size as M.
43.  The preceding theorem has an important application to the theory of computation. It shows that some languages are not decidable or even Turing recognizable, for the reason that there are uncountably many languages yet only countably many Turing machines. Because each Turing machine can recognize a single language and there are more languages than Turing machines, some languages are not recognized by any Turing machine. Such languages are not Turing-recognizable, as we state in the following corollary.
44.  a compiler is a program that translates other programs. A compiler for the language Pascal may itself be written in Pascal, so running that program on itself would make sense.
45.  the complement of a language is the language consisting of all strings that are not in the language.
46.  Turing machine as our model of a general purpose computer.
47.  the primary method for proving that problems are computationally unsolvable. It is called reducibility
48.  A reduction is a way of converting one problem to another problem in such a way that a solution to the second problem can be used to solve the first problem.
49.  In terms of computability theory, if A is reducible to B and B is decidable, A also is decidable. Equivalently, if A is undecidable and reducible to B, B is undecidable. This last version is key to proving that various problems are undecidable.
50.  our method for proving that a problem is undecidable will be to show that some other problem already known to be undecidable reduces to it.
51.  But you may not be able to determine whether M is looping, and in that case your simulation will not terminate. That's bad, because you are a decider and thus never permitted to loop.
52.  The computation history method is an important technique for proving that ATM is reducible to certain languages.
53.  Computation histories are finite sequences. If A doesn't halt on w, no accepting or rejecting computation history exists for M on w.
54.  The idea for detecting when M is looping is that, as M computes on w, it goes from configuration to configuration. If M ever repeats a configuration it would go on to repeat this configuration over and over again and thus be in a loop.
55.  If M on w has not halted within qngn steps, it must be repeating a configuration according to Lemma 5.8 and therefore looping. That is why our algorithm rejects in this instance.
56.  Roughly speaking, being able to reduce problem A to problem B by using a mapping reducibility means that a computable function exists that converts instances of problem A to instances of problem B. If we have such a conversion function, called a reduction, we can solve A with a solver for B. The reason is that any instance of A can be solved by first using the reduction to convert it to an instance of B and then applying the solver for B. A precise definition of mapping reducibility follows shortly.
57. A Turing machine computes a function by starting with the input to the function on the tape and halting with the output of the function on the tape.
58.  All usual arithmetic operations on integers are computable functions
59.  represent computational problems by languages.
60.  A mapping reduction of A to B provides a way to convert questions about membership testing in A to membership testing in B.
61.  If one problem is mapping reducible to a second, previously solved problem, we can thereby obtain a solution to the original problem.
62.  The recursion theorem is a mathematical result that plays an important role in advanced work in the theory of computability. It has connections to mathematical logic, the theory of self-reproducing systems, and even computer viruses.
63.  The same reasoning applies to any machine A that constructs a machine B: A must be more complex than B. But a machine cannot be more complex than itself. Consequently, no machine can construct itself, and thus self-reproduction is impossible.
64.  Making machines that reproduce themselves is possible. The recursion theorem demonstrates how
65.  The recursion theorem provides the ability to implement the self-referential this into any programming language
66.  Mathematical logic is the branch of mathematics that investigates mathematics itself.
67.  we sketch the proof of Kurt Godel's celebrated incompleteness theorem. Informally, this theorem says that, in any reasonable system of formalizing the notion of provability in number theory, some true statements are unprovable.
68.  The concepts algorithm and information are fundamental in computer science.
69.  We define the quantity of information contained in an object to be the size of that object's smallest representation or description. By a description of an object we mean a precise and unambiguous characterization of the object so that we may recreate it from the description alone.
70.  we restrict our attention to objects that are binary strings. Other objects can be represented as binary strings, so this restriction doesn't limit the scope of the theory.
71.  A finite state machine is a mathematical abstraction used to design algorithms.
72.

Reducibility:
1.  For example, suppose that you want to find your way around a new city. You know that doing so would be easy if you had a map. Thus you can reduce the problem of finding your way around the city to the problem of obtaining a map of the city.
2.  reducibility says nothing about solving A or B alone, but only about the solvability of A in the presence of a solution to B
3.  The problem of traveling from Boston to Paris reduces to the problem of buying a plane ticket between the two cities. That problem in turn reduces to the problem of earning the money for the ticket. And that problem reduces to the problem of finding a job.

Undecidability:
the undecidability of ATM, the problem of determining whether a Turing machine accepts a given input.
ETM is undecidable
RegularTM is undecidable
HALTTM is undecidable

Turing Machine:
1. The Turing machine model uses an infinite tape as its unlimited memory. It has a tape head that can read and write symbols and move around on the tape.

Computing Models:
Finite Automata
Pushdown Automata
Turing Machine
Universal Turing Machine

Theory:
Lambda Calculus

Types of Turing Machines:
Turing Machine
Multitape Turing Machine
Nondeterministic Turing Machine [Use Breadth First Search, not Depth First Search]
Enumerator

Terms:
Definitions
Mathematical Statements
Proofs
Theorems
Lemmas
Corollary
Example

Polynomial:
Term
Coefficient
Root
integral root

Turing Recognizable Language
Recursively Enumerable Language

Turing Machine:
Recognizer
Decider

Turing Machine Algorithms
Algorithmic Solvability
Unsolvability



Terms:
Halting Problem
Correspondance
Uncountable
diagonalization
Co-Turing recognizable
Decidability
Reducibility
Linear bounded Automata/LBA
Undecidability
Post Correspondence Problem /PCP
Modified Post Correspondence Problem/MPCP
Mapping Reducibility
Computable Functions
halt
many-one reducibility
paradox
tenet
fixed point
Precise Language
formula
well-known formula
atomic formula
arity
scope
prenex normal form
free variable
sentence/statement
Univers
model
language of model
oracle turing machine
description language

Theorems:
1. The halting problem is undecidable.
2. a language is decidable exactly when both it and its complement are Turing-recognizable.
3. Rice's theorem, states that testing any property of the languages recognized by Turing machines is undecidable.
4. EQTM is undecidable.
5. Elba is undecidable.
6. ALLcfg is undecidable.
7. PCP is undecidable.
8. Afixed point of a function is a value that isn't changed by application of the function. In this case we consider functions that are computable transformations of Turing machine descriptions. We show that for any such transformation some Turing machine exists whose behavior is unchanged by the transformation. This theorem is sometimes called the fixed-point version of the recursion theorem.
8. Th(N,+) is decidable.
9. Th(N,+,*) is undecidable.
10.  Theorem 6.24 shows that a string's minimal description is never much longer than the string itself.

Levels:
1. [Formal Description]The first is the formal description that spells out in full the Turing machine's states, transition function, and so on. It is the lowest, most detailed, level of description.
2. [Implementation Description] The second is a higher level of description, called the implementation description, in which we use English prose to describe the way that the Turing machine moves its head and the way that it stores data on its tape. At this level we do not give details of states or transition function.
3. [High Level Description] Third is the high-level description, wherein we use English prose to describe an algorithm, ignoring the implementation details. At this level we do not need to mention how the machine manages its tape or head.

References:
https://en.wikipedia.org/wiki/Halting_problem
https://en.wikipedia.org/wiki/Partial_function