Research Findings:
1. Engineers who are great in both fields are basically unicorns and are at least 10x as valuable as someone who is great in just one of the fields. These are the engineers who don’t just work on algorithms or systems all day but instead launch personalization products in the market. These are the types of engineers who are behind the personalization teams at companies such as Amazon, Netflix, LinkedIn and many successful personalization startups
2. machine learning is both about breadth as depth. You are expected to know the basics of the most important algorithms (see my answer to What are the top 10 data mining or machine learning algorithms?). On the other hand, you are also expected to understand low-level complicated details of algorithms and their implementation details. I think the approach I am describing addresses both these dimensions and I have seen it work.
3. there are lots of folks in the market that are great engineers and there are also lots of folks who are great at machine learning, but there is a severe shortage of great Machine Learning Engineers.
4. Knowing R or Python really well might amount to building a model faster or allow you to integrate it into software better, but it says nothing about your ability choose the right model, or build one that truly speaks to the challenge at hand.
5. The art of being able to do machine learning well comes from seeing the core concepts inside the algorithms and how they overlap with the pain points trying to be addressed. Great practitioners start to see interesting overlaps before ever touching a keyboard.
6. Data science and machine learning are not synonyms, but much of the core algorithmic variety used by data scientists comes from machine learning.
7. R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.
8. Just when you became an R ninja, Python came around the corner and became the de facto. Just when you finally mastered how to lay out a kick-ass data pipeline using Hadoop, Spark became the new thing.
9. many machine learning jobs look for people that have a PhD in some quantitative discipline. The reason for this is that the PhD is the only degree that trains its students in the discipline of advanced research. Importantly, obtaining a PhD means you have solved something truly original and defended that in front of leaders in the field.
10. Although many vendors would have you believe machine learning can be a "service" or something that is packaged, real machine learning takes real research. If it could be a packaged solution, everyone would do it and there would be no competitive advantage in either academia or enterprise.
11. Thinking like a researcher means working towards something truly original, regardless of how small that piece of originality might be. Don't look for obvious solutions; look for that thing that models the system in an original way and that offers you, your client, or your employer something truly competitive and interesting.
12. Revisit the core concepts regularly, practice solving problems relentlessly, and work towards trying to solve something original.
14. Jump in and have fun. Machine learning is an exciting field and will help solve many of today's most difficult challenges.
15. Add more algorithms (decision trees, neural nets etc.). Learn more domains and problems. Study techniques to solve unstructured data. There are wonderful courses in the thread
16. if you're looking for a slow but mature growth into machine learning, familiarity with basic concepts of computer science (algorithms, data structures, and complexity), mathematical maturity in discrete math, matrix math, and probability and statistics, are all important for understanding the more complex parts of machine learning you can get into
17. Importantly, there are a lot of algorithms/paradigms in machine learning. While you should have some understanding of these, it is equally important to have the basic intuition of machine learning - concepts of bias-variance tradeoff, overfitting, regularization, duality, etc. These concepts are often used in most or all of machine learning, in some form or the other.
18. machine learning is responsible for everything from Siri, Google Now, the recommendation engine on YouTube and Netflix, to even the driverless cars. Surely, machine learning is something that every computer scientist will encounter sooner or later, and it is important to learn this well.
19. Machine learning was the byproduct of the early endeavours to develop artificial intelligence. The aim was to make a machine learn via data. But the use of this approach often resulted in reinventing already existing statistical models. This, coupled with the increase in knowledge and logical based approach to AI put machine learning out of favour among the AI community.
20. Machine learning soon became a sub-sect of statistics and data mining.
21. Machine Learning has become a separate field of its own. Instead of striving to achieve artificial intelligence, the main aim of machine learning has become more towards tackling solvable problems. It borrows techniques from statistics and probability to focus on predictions derived from data.
22. AI includes concepts like symbolic logic, evolutionary algorithms and Bayesian statistics and many other concepts that don’t fall under the purview of machine learning.
23. Apart from machine learning, AI tries to achieve a broad range of goals like Reasoning, Knowledge representation, Automated planning and scheduling, Natural language processing, Computer vision, Robotics and General intelligence.
24. Machine learning on the other hand focusses on solving tangible, domain specific problems through data and self learning algorithms.
25. In your free time, read papers like Google Map-Reduce [34], Google File System [35], Google Big Table [36], The Unreasonable Effectiveness of Data [37],etc There are great free machine learning books online and you should read those also. [38][39][40].
26. “People You May Know” ads achieved a click-through rate 30% higher than the rate obtained by other prompts to visit more pages on the site. They generated millions of new page views. Thanks to this one feature, LinkedIn’s growth trajectory shifted significantly upward.
27. Goldman is a good example of a new key player in organizations: the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data.
28. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.
29. If capitalizing on big data depends on hiring scarce data scientists, then the challenge for managers is to learn how to identify that talent, attract it to an enterprise, and make it productive.
30. what data scientists do is make discoveries while swimming in data.
31. But we would say the dominant trait among data scientists is an intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested.
32. Some of the best and brightest data scientists are PhDs in esoteric fields like ecology and systems biology.
33. The sexy job in the next 10 years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s
34. There simply aren’t a lot of people with their combination of scientific background and computational and analytical skills.
35. Think of big data as an epic wave gathering now, starting to crest. If you want to catch it, you need people who can surf.
36. a data scientist requires comprehensive mastery of a number of fields, such as software development, data munging, databases, statistics, machine learning and data visualization.
37. A data scientist with a software engineering background might excel at a company like this, where it’s more important that a data scientist make meaningful data-like contributions to the production code and provide basic insights and analyses.
38. Data Scientists in this setting likely focus more on producing great data-driven products than they do answering operational questions for the company. Companies that fall into this group could be consumer-facing companies with massive amounts of data or companies that are offering a data-based service.
39. Google is currently using Machine Learning a lot - in my estimate, over a hundred places in their systems have been replaced by Deep Learning and other ML techniques in the past few years
40. Machine learning uses statistics (mostly inferential statistics) to develop self learning algorithms
41. Artificial Intelligence is a science to develop a system or software to mimic human to respond and behave in a circumstance.
42. Machine Learning is a technology within the sphere of 'Aritifical Intelligence'.
43. The pioneering technology within Machine Learning is the neural network (NN), which mimics (to a very rudimentary level) the pattern recognition abilities of the human brain by processing thousands or even millions of data points. Pattern recognition is pivotal in terms of intelligence.
44.
Companies:
A Data Scientist is a Data Analyst Who Lives in San Francisco
Please Wrangle Our Data
We Are Data. Data Is Us
Reasonably Sized Non-Data Companies Who Are Data-Driven
Academic:
Maths
Statistics
Physics
Topics:
Logistic Regression and Linear Kernel SVMs,
PCA vs. Matrix Factorization,
regularization, or gradient descent.
LibSVM, Weka, ScikitLearn
Big Data
Steps:
1. Andrew Ng's Course
2. Recommender Systems, Mining Massive Datasets
3. Read books
4. Try to apply it in a startup with Python
Algorithms:
L2-regularized Logistic Regression
k-means
LDA (Latent Dirichlet Allocation)
SVMs
Software:
Python Package
R
Matlab
Maths:
Statistics
linear algebra [ feature vectors, eigenvectors, singular value decompositions, and PCA]
probability
optimization
vectors, matrix,
Mutlivariable Calculus
Tools:
Coursera
Open MIT
Standford Courses
Videos in Youtube
Top Academics:
Standford
MIT
CMU
Harward
Jobs:
Data Scientist
PhD
Responsibilities:
Data Analyst
Data Analytics
Data Engineer
Terms:
Artificial Intelligence
Machine Learning
Data Mining
Data Science
Resources:
Books
Articles
Researches
AI and Machine Learning:
Python, R, Java
Probability and Statistics
Applied Maths + Algorithms
Distributed Computing, Data Science, Hadoop, MapReduce
Expert in Unix Tools
Hadoop Subprojects - HBase, Zookeeper, Hive, Mahout
Learn about advanced signal processing techniques
Getting Started:
http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
http://scikit-learn.org/stable/tutorial/basic/tutorial.html
https://www.datacamp.com/courses/kaggle-tutorial-on-machine-learing-the-sinking-of-the-titanic
Course:
https://www.coursera.org/learn/machine-learning/home/week/1
Examples:
LinkedIn
References:
https://www.quora.com/How-do-I-learn-machine-learning-1
https://www.quora.com/What-are-the-top-10-data-mining-or-machine-learning-algorithms/answer/Xavier-Amatriain
http://www.datascienceontology.com/
https://en.wikipedia.org/wiki/R_(programming_language)?q=get%20wiki%20data
https://www.linkedin.com/pulse/20141113191054-103457178-the-only-skill-you-should-be-concerned-with
http://blog.hackerearth.com/2016/01/getting-started-with-machine-learning.html?utm_source=Quora&utm_medium=Content&utm_content=MachineLearning&utm_campaign=BlogQuora
https://www.quora.com/What-is-the-best-language-to-use-while-learning-machine-learning-for-the-first-time/answer/Travis-Addair
https://www.quora.com/What-skills-are-needed-for-machine-learning-jobs/answer/Joseph-Misiti?srid=pBDw&share=4fa777bd
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
http://blog.udacity.com/2014/11/data-science-job-skills.html
http://www.stat.cmu.edu/~larry/=sml/
No comments:
Post a Comment