The reason I put data science here is due to two reasons:
- I am too lazy to make a new page tab.
- The key insights that result from proper massaging of data can help consultants/ executives make data driven decisions!
Just some definitions before we get started:
Big Data: Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.
MetaData: A subset of data that gives insight and information on another, or larger dataset.
Data mining is a process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective marketing strategies as well as increase sales and decrease costs. Data mining depends on effective data collection and warehousing as well as computer processing.
For a decision to be properly validated, and appropriate it must be data driven. Relying purely on qualitative metrics will lead to ill informed decisions.
Before we get started here is a few programs, languages, and software for facilitating data manipulation:
- Excel (yes excel can do data analysis- it is very powerful! Just read this article here about Excel VBA’s).
- Stata13
- Splunk
- Hadoop (Apache)
- R (yes it is just an “R” it isn’t a spelling mistake)
- MapReduce (Apache- Hadoop)
- Hive
- Pig (Apache)
- Spark SQL and DataFrames
- Panda (Python)
If you can master all these languages. A. You are a genius. B. You will have a complete set of skills to master and manipulate any dataset! If you need a little boost check out this free resource here– Udemy!
An exciting advancement in data languages is the integration into cloud computing. What do I mean by that….
Okay okay not that. It is the perfect harmonizing to provide a cost effective and scalable infrastructure to support big data and business analytics.
Another cool tool used to manipulate and display data is a practice called Data Visualization. This is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns.
Here are some cool tools to help you get started! Here are some of the more popular tools!
- D3 (GitHub)
- Fusion Charts
- Chart.js
- Charts by Google Developers
Here are some less technical alternatives for the not-so tech-savvy person!
I had a cool conversation with a private asset manager from RBC Wealth. And he was telling me how much he used Excel VBA. He did have quant. team however from a, at risk of sounding conceded, an elementary level he used Excel to draw insights from large aggregates of data.
What is a Macro?
A Microsoft Office Macro (as this functionality applies to several of the MS Office Applications) is simply Visual Basic for Applications (VBA) code saved inside a document. For a comparable analogy, think of a document as HTML and a macro as Javascript. In much of the same way that Javascript can manipulate HTML on a webpage, a macro can manipulate a document.
Macros are incredibly powerful and can do pretty much anything your imagination can conjure. As a (very) short list of functions you can do with a macro:
- Apply style and formatting.
- Manipulate data and text.
- Communicate with data sources (database, text files, etc.).
- Create entirely new documents.
- Any combination, in any order, of any of the above.
You can also do similar functionality (for example web scraping) on multiple platforms using Python.
Want a detailed tutorial on how to create Macro? Check it here.
What Is Python?
Python refers to the Python programming language (with syntax rules for writing what is considered valid Python code) and the Python interpreter software that reads source code (written in the Python language) and performs its instructions. The Python interpreter is free to download from http://python.org/, and there are versions for Linux, OS X, and Windows.
The name Python comes from the surreal British comedy group Monty Python, not from the snake. Python programmers are affectionately called Pythonistas, and both Monty Python and serpentine references usually pepper Python tutorials and documentation.
This online book goes into so much detail, and it is free sooo why not right! Check it here.
Basically the following talks about automating redundancy especially in large data sets. Check out this Facebook group for a detailed discussion about automation in business.
Best,
#YBK
_________________________________________________________________
**This blog is not all original content- I do not own all the content. The purpose of this blog is to collect valuable insights across various channels, publications, and articles, and present them in a digestible and current way. Some material has been copied, and referenced in some articles, and should not be treated as original work.