This hands-on advisor demonstrates how the flexibleness of the command line might be useful develop into a extra effective and efficient facts scientist. You’ll easy methods to mix small, but robust, command-line instruments to speedy receive, scrub, discover, and version your data.
To get you started—whether you’re on home windows, OS X, or Linux—author Jeroen Janssens introduces the information technology Toolbox, an easy-to-install digital surroundings full of over eighty command-line tools.
Discover why the command line is an agile, scalable, and extensible know-how. no matter if you’re already cozy processing information with, say, Python or R, you’ll enormously enhance your facts technology workflow through additionally leveraging the facility of the command line.
- Obtain information from web pages, APIs, databases, and spreadsheets
- Perform scrub operations on undeniable textual content, CSV, HTML/XML, and JSON
- Explore facts, compute descriptive records, and create visualizations
- Manage your info technology workflow utilizing Drake
- Create reusable instruments from one-liners and current Python or R code
- Parallelize and distribute data-intensive pipelines utilizing GNU Parallel
- Model information with dimensionality aid, clustering, regression, and class algorithms
Read or Download Data Science at the Command Line: Facing the Future with Time-Tested Tools PDF
Best Computer Science books
Programming hugely Parallel Processors discusses easy strategies approximately parallel programming and GPU structure. ""Massively parallel"" refers back to the use of a big variety of processors to accomplish a collection of computations in a coordinated parallel method. The publication info a number of concepts for developing parallel courses.
Dispensed Computing via Combinatorial Topology describes strategies for examining disbursed algorithms in line with award successful combinatorial topology study. The authors current a pretty good theoretical beginning correct to many actual platforms reliant on parallelism with unpredictable delays, equivalent to multicore microprocessors, instant networks, dispensed structures, and web protocols.
"TCP/IP sockets in C# is a wonderful booklet for someone attracted to writing community purposes utilizing Microsoft . web frameworks. it's a targeted mix of good written concise textual content and wealthy conscientiously chosen set of operating examples. For the newbie of community programming, it is a solid beginning booklet; however pros can also reap the benefits of very good convenient pattern code snippets and fabric on subject matters like message parsing and asynchronous programming.
Additional info for Data Science at the Command Line: Facing the Future with Time-Tested Tools
Schutt, R. (2013). Doing information technology. O’Reilly Media. • Shron, M. (2014). considering with info. O’Reilly Media. 12 | bankruptcy 1: advent CHAPTER 2 Getting begun during this bankruptcy, we will just remember to have the entire must haves for doing information technological know-how on the command line. the must haves fall into elements: (1) having a formal atmosphere with the entire command-line instruments that we hire during this ebook, and (2) figuring out the basic ideas that come into play whilst utilizing the command line. First, we describe the right way to set up the knowledge technological know-how Toolbox, that is a digital environ‐ ment in accordance with GNU/Linux that includes all of the valuable command-line instruments. Sub‐ sequently, we clarify the fundamental command-line thoughts via examples. via the top of this bankruptcy, you’ll have every thing you would like with a purpose to proceed with step one of doing facts technology, specifically acquiring facts. evaluation during this bankruptcy, you’ll study: • the right way to organize the knowledge technology Toolbox • crucial suggestions and instruments essential to do info technological know-how on the command line constructing Your facts technology Toolbox during this booklet we use many alternative command-line instruments. The distribution of GNU/ Linux that we're utilizing, Ubuntu, comes with a whole lot of command-line instruments pre-installed. additionally, Ubuntu bargains many programs that comprise different, suitable command-line instruments. fitting those programs your self isn't really too tough. notwithstanding, we additionally use command-line instruments that aren't on hand as applications and require a extra guide, and extra concerned, deploy. so that it will gather the mandatory command- thirteen line instruments with no need to head during the concerned set up technique of each one, we motivate you to put in the knowledge technology Toolbox. when you wish to run the command-line instruments natively instead of within a digital computer, then you definately can set up the command-line instruments separately. although, remember that it is a very timeconsuming strategy. Appendix A lists the entire command-line instruments utilized in the e-book. The deploy directions are for Ubuntu in basic terms, so money the book’s site for up to date details on how you can set up the command-line instruments natively on different working structures. The scripts and knowledge units utilized in the booklet should be got through clon‐ ing this book’s GitHub repository. the knowledge technology Toolbox is a digital atmosphere as a way to start doing information technological know-how in mins. The default model comes with conventional gentle‐ ware for info technological know-how, together with the Python clinical stack and R including its preferred programs. extra software program and information bundles are simply put in. those bundles might be particular to a definite ebook, path, or association. you could learn extra in regards to the information technology Toolbox at its web site. There are how one can organize the information technology Toolbox: (1) fitting it in the neighborhood utilizing VirtualBox and Vagrant or (2) launching it within the cloud utilizing Amazon net providers. either methods lead to the exact same surroundings. during this bankruptcy, we clarify easy methods to arrange the knowledge technological know-how Toolbox for info technological know-how on the Command Line in the community.