This Silk is a structured database listing tools and resources that (data) journalists might want to include in their toolkit. We tried to cover the main steps of the ddj process: from data collection and scraping to data cleaning and enhancement; from analysis to data visualization and publishing. We're trying to showcase especially tools that are free/freemium and open source, but you will find a bit of everything.

This Silk is updated regularly: we have collected a list of hundreds of tools, which we manually tag (are they open source tools? Free? for interactive datavizs?). Make sure you follow this Silk, so you won't miss an update!

Tools Currently Listed: 250

Caveat: because the data is manually  curated with the information (about price, source code availability, tags, usage, developers...) there could be oversights or things you don't agree with. Always check on the tools' website for more details.

 This is a work in progress and we'd love to get feedback from the community.  Drop us a mail. Or submit more tools through this form.

Latest Added Tools - Filterable Table

HuginnData CollectionScrapingSocial Media MiningFreeYesAndrew CantinoViewHuginn is a system for building agents that perform automated tasks for you online. They can read the web, watch for events, and take actions on your behalf. Huginn's Agents create and consume events, propagating them along a directed graph. Think of it as a hackable Yahoo! Pipes plus IFTTT on your own server. You always know who has your data. You do. Here are some of the things that you can do with Huginn: Track the weather and get an email when it's going to rain (or snow) tomorrow ("Don't forget your umbrella!"); List terms that you care about and receive emails when their occurrence on Twitter changes. (For example, want to know when something interesting has happened in the world of Machine Learning? Huginn will watch the term "machine learning" on Twitter and tell you when there is a spike in discussion.); Watch for air travel or shopping deals; Follow your project names on Twitter and get updates when people mention them; Scrape websites and receive emails when they change; Connect to Adioso, HipChat, Basecamp, Growl, FTP, IMAP, Jabber, JIRA, MQTT, nextbus, Pushbullet, Pushover, RSS, Bash, Slack, StubHub, translation APIs, Twilio, Twitter, Wunderground, and Weibo, to name a few.; Send digest emails with things that you care about at specific times during the day; Track counts of high frequency events and send an SMS within moments when they spike, such as the term "san francisco emergency"; Send and receive WebHooks; Run custom JavaScript or CoffeeScript functions; Track your location over time Create Amazon Mechanical Turk workflows as the inputs, or outputs, of agents (the Amazon Turk Agent is called the "HumanTaskAgent"). For example: "Once a day, ask 5 people for a funny cat photo; send the results to 5 more people to be rated; send the top-rated photo to 5 people for a funny caption; send to 5 final people to rate for funniest caption; finally, post the best captioned photo on my blog."
DatPublishingDataset PublishingData Source / Data DiscoveryFreeYesMax OgdenMathias Buus. Karissa McKelveyViewDat is an open source, decentralized data tool for sharing datasets, small and large. Version, fork, and sync data over a peer-to-peer network.
Model.jsData VisualizationInteractive ChartsFreeYesCurran KelleherViewModel.js manages the execution flow of the data flow graphs you define. Kind of like Backbone and React, but simpler and designed specifically for making D3 easier to use. Also check out Chiasm, a visualization runtime engine built on Model.js.
FoamTreeData VisualizationInteractive ChartsHierarchy TreeFreemiumNoCarrot SearchFoamTree is a JavaScript tree map visualization with innovative layout algorithms and animations. It aids understanding of hierarchical data, such as groups of documents, network domains or site maps. (Note: The package available from this page contains FoamTree with locked branding options. If you hold a FoamTree license, you can download the newest unlimited version of FoamTree from)
ContextMinerData CollectionScrapingSocial Media MiningFreeChirag ShahContextMiner is a framework to collect, analyze, and present the contextual information along with the data. It is based on an idea that while describing or archiving an object, contextual information helps to make sense of that object or to preserve it better. This website provides tools to collect data, metadata, and contextual information off the Web by automated crawls. At present, ContextMiner supports automated crawls from blogs, YouTube, Flickr, Twitter, and open Web. It also collects inlinks information for YouTube videos from the Web. Additional sources will continue to be added. ContextMiner helps you (1) run automated crawls on various social media sources on the Web and collect data as well as contextual information, (2) analyze and add value to collected data and context, and (3) monitor digital objects of interest. Following is a typical flow of using ContextMiner: 1. Start a new campaign based on some story, concept, or an object. 2. Choose the sources (Web, Blogs, YouTube, Twitter, Flickr) that you want ContextMiner to do your searches and crawls on. 3.Once you provide all the required parameters, ContextMiner can immediately start running your campaign. You can access all your campaigns and collected data as well as contextual information through this website. 4.You can manipulate individual items as well as related items that are collected by the above processes to add your interpretation and meaning to the campaign.
Geojson.ioData VisualizationData Cleaning & EnhancementMapsFreeYesMapboxViewWe are trying to make it easier to draw, change, and publish maps. Some of the most important geospatial data is the information we know, observe, and can draw on a napkin. This is the kind of data that we also like to collaborate on, like collecting bars that have free wifi or favorite running routes. aims to fix that. It’s an an open source project built with MapBox.js, GitHub’s powerful new Gist and GeoJSON features, and an array of microlibraries that power import, export, editing, and lots more.
MooWheelData VisualizationNetwork GraphsFreeYesJoshua GrossViewThe purpose of this script is to provide a unique and elegant way to visualize data using Javascript and the <canvas> object. This type of visualization can be used to display connections between many different objects, be them people, places, things, or otherwise. The script is licensed under an MIT-style license.
Textures.jsData VisualizationDiagnostic ToolFreeYesRiccardo ScalcoViewTextures.js is a JavaScript library for creating SVG patterns
Density.ioData TeamDensity is a people counter. Our sensor gets attached to a place’s entrance, measures anonymous movement as people come and go, and generates real-time and historical data that can be integrated anywhere.
SwarmizeData CollectionData AnalysisData VisualizationCrowdsourcingFreeYesThe GuardianViewSwarmize is a stack of tools to make crowd-powered number-gathering a lot easier. Swarmize is a data journalism platform. It helps editors collect, analyze and output information. News teams can create simple surveys in minutes or build engaging interactives and custom data collectors with front-end developers using the Swarmize toolchain. The platform includes a survey wizard, embeddable web forms, dashboards and charts, CSV outputs and developer-friendly APIs optimized for high volume, fast-paced news organizations such as Swarmize does the heavy lifting behind the scenes and makes it easy to deploy and manage data journalism projects quickly and cheaply. Project Status: Swarmize is currently in alpha. The platform is accessible for use by Guardian staff only, but the code is available for download and reuse with an open license on GitHub. The project was funded by a grant from the Knight News Challenge and also supported by The Guardian. The developers are Tom Armitage and Graham Tackley. If you would like to discuss working on the project together please contact Matt McAlister.
7ella.comData CollectionData Source / Data DiscoveryFreeNoDevelopment SeedPursueWe recently worked with Pursue to improve tracking of service breakdowns in Palestinian refugee camps across Lebanon. Pursue has been working with community organizations in all twelve refugee camps for the past five years. With more attention on the refugee situation in Lebanon, there are opportunities to push for improved service delivery with better, timely data. Working closely with Pursue, we created a mobile reporting system, verification tools, and a public website showing the extent to which refugees are denied basic services. We built the system entirely from open source tools, and we designed it with security and privacy in mind.
Moebio FrameworkData AnalysisData VisualizationStatisticsInteractive ChartsNetwork GraphsFreeYesMoebio LabsViewMoebio Framework is a JavaScript toolkit for performing data analysis and creating visualizations. Moebio Framework Demo Reel on Vimeo. At its core is a set of data types as well as operators to manipulate them and derive meaning from your data. These include include Lists, Tables, Intervals, Networks and many more. Additonally Moebio framework provides a canvas based drawing framework and a collection of graphics & geometry related functions to empower the creation of data visualizations.
DatacopiaData VisualizationInteractive ChartsFreemiumNoDatacopia TeamCreating charts is easy. Creating good charts is tough. Yes, Excel makes it look easy: select some data; press the chart wizard; and voila - you have your chart. But have you ended up with the chart that is going to best convey your ideas to other people? Our goal at Datacopia is to eliminate this arbitrary decision-making process. We believe the data should speak for itself. Anyone should be able to quickly turn an opaque table of data into the most representative and informative graphics possible — without needing a degree in data science!
Open Street Map Metadata UtilityData CollectionData Source / Data DiscoveryFreeYesDevelopment SeedAmerican Red Cross.ViewOSM-meta-util is a tool to tap into OpenStreetMap changeset metadata. We built the tool in partnership with the American Red Cross as part of the infrastructure for tracking efforts such as #MissingMaps. OpenStreetMap changesets are an incredibly rich source of information. In 2014 alone, users committed over 6 million changesets to OSM. Changeset metadata includes information such as the username making the edits, number of edits, which editor was used, the commit message, etc. Metadata is helpful in understanding the changing nature of OSM. With metadata, we can track hashtags, analyze commit text or aggregate user metrics over time. This data also provides insight into the motivations of people contributing to one of the largest crowdsourcing exercises on the planet. OSM-meta-util as to make it easier and faster to explore this rich set of data.
CrossfilterData VisualizationInteractive ChartsFreeYesSquareViewCrossfilter is a JavaScript library for exploring large multivariate datasets in the browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records; we built it to power analytics for Square Register, allowing merchants to slice and dice their payment history fluidly.
Event RegistryData CollectionData Source / Data DiscoveryFreemiumNoGregor LebanEvent Registry is a system for real-time collection and analysis of news published by news outlets globally. Events mentioned in the news are identified and relevant information about them is automatically extracted and stored in a searchable form.
TAGSData CollectionScrapingSocial Media MiningFreeYesMartin HawkseyViewTAGS is a free Google Sheet template which lets you setup and run automated collection of search results from Twitter.
ClosrData VisualizationImages / GraphicsFreemiumNoClosr TeamCreate Zoomable Stories From Your Big Images
JpGraphData VisualizationImages / GraphicsFreemiumYesAsial CorporationViewJpGraph is an Object-Oriented Graph creating library for PHP from 5.1 to 5.6 The library is completely written in PHP and ready to be used in any PHP scripts (both CGI/APXS/CLI versions of PHP are supported).
Wolfram AlphaData CollectionData Source / Data DiscoveryFreemiumNoStephen WolframWolfram Alpha TeamWolfram Alpha (also styled WolframAlpha and Wolfram|Alpha) is a computational knowledge engine or answer engine developed by Wolfram Research. It is an online service that answers factual queries directly by computing the answer from externally sourced "curated data", rather than providing a list of documents or web pages that might contain the answer as a search engine might. Wolfram Alpha, which was released on May 18, 2009, is based on Wolfram's earlier flagship product Mathematica, a computational platform or toolkit that encompasses computer algebra, symbolic and numerical computation, visualization, and statistics capabilities. Additional data is gathered from both academic and commercial websites such as the CIA's The World Factbook, the United States Geological Survey, a Cornell University Library publication called All About Birds, Chambers Biographical Dictionary, Dow Jones, the Catalogue of Life, CrunchBase, Best Buy, the FAA and optionally a user's Facebook account.

Statistics about the tools listed

Most tools are free and open source

Price Plan?

Open Source?

Most tools listed are currently focused on crafting data visualizations

Number of Tools Grouped by Usage

Number of Tools with Data Visualization as Usage, grouped by Tags

The developers behind most listed ddj tools

Number of Tools grouped by Main Developer

