I have processed more of the Daniel Morgan data, and thus have an updated network of the data. Below is a visualisation of the data produced by extracting the network structure from Neo4J using R and iGraph, then saving the network as a gexf file and importing into Gephi. The network is more complete but also has edge labels.
Category: Neo4J
What do you do with the Panama Data?
The released Panama data comes in the form of a Neo4J database, or the files that you can make one with, seems to me a little tricky to do much with. There is no detail beyond attributes of the different entities, so that limits us to looking at the relationships alone and it is hard to judge the significance of the relationships without the context… that said its a fun data set to play with.
I decided to draw out some graphs of how things are connected via other things. Below is one from Officers connected to other Officers via *something* else, generated via R using iGraph from the Neo4J data set. This produces a few clusters containing a relatively small number of nodes connected to others. The query that produces the graph is, “MATCH (n:Officers)-[:`officer of`]->(o)<-[:`officer of`]-(m:Officers) WHERE NOT id(n)=id(m) AND id(n)<id(m) RETURN n.name AS Officer1, m.name AS Officer2, count(o) AS Weight”
Daniel Morgan Murder
After listening to the Daniel Morgan podcast, Untold, I became really interested in the murder investigation. To help me follow it I started building a network of all the key people, organisations, and events in the case. The networks this produces can be seen here,and you can keep up-to-date with the progress on the network here.
There is an updated network image here.
The Case
The story is a compelling one, I suggest you either listen to the podcast or read the book. Very briefly it looks into the murder of Daniel Morgan, and the subsequent investigations into the murder and the police handling of the murder. The book builds a compelling story of decades of struggle by the Morgan family to get justice, and the difficultly they have had in discovering the truth.
The network is not complete, at the time of writing I have only put in the ‘easy’ bits. The network stores objects as the nodes, so people, companies, organisations. The lines, or edges, store the relationship between the objects, e.g. Alistair Morgan is ‘brother_of’ Daniel Morgan. The visualisation is produced using Alchemy, and the data is stored in Neo4J. I intend to continue to develop the network further, and the visualisation which needs things like edge labels. Once the network is more complete it would be interesting to see if there is any useful analysis that can be done on the network. It would also be interesting to expand the data to include other related and interesting cases. Such as the Stephen Lawrence murder, and the Leveson Inquiry will likely form a part of Algorithmic Indexing in the future.
Here is a picture of the network in Neo4J:
Panama Revisted
The people over at The International Consortium of Investigative Journalists have updated the released panama data. Its not clear to me if that is more data than they had already released, or that this time it is a ready made Neo4J database. They provide two versions of the database, Windows and Mac. Its easy to get it to work in Linux, just copy the graph.db file from out of the archive into the databases directory of your Neo4J install.
I made a quick query to look for officers with the same address. Seems there some, it would need something more sophisticated to did any deeper.
MATCH (n:Officer)–(a:Address)–(m:Officer) RETURN n,a,m LIMIT 25
Java Panama Papers Neo4J Network Generator
Further to the first attempt at importing the Panama Papers network data into Neo4J I did a very quick Java program that greats an embedded Neo4J database. It needs a bit of checking as it finds nodes that have the same node_id. Which I assume is some sort of mistake in the program or the data, it also looks like there is some duplicate relationships.
This program generates relationships of the different types. Such as ‘officer_of’, rather than the hack used to get Cypher to import the data (see earlier post).
The code can be found in my new github.
Below is Blairmore, Ian Cameron, the intermediary, and loads of other companies that use the same intermediary.
Not many directly links to Blairmore.
Panama Papers: Import Data to Neo4J using Cypher
I downloaded the panama paper network data, I was hoping it would be all the data, sadly not. Its it still interesting however. The import process is not to tricky. The following Cypher commands will get the data into a running Neo4J database. Note there is a \” in the Addresses file that will break the import. Search for it an replace with \ “. Data can be downloaded from here.
To get the relationships in we have to do a bit of hack as you cannot generate a relationship type on the fly from a CSV file with Cypher. I will do this properly with a bit of Java soon.
Change the paths! This is for the Addresses:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:/path/Addresses.csv' AS line CREATE (:Addresses { address: line.address, icij_id: line.icij_id, valid_until: line.valid_until, country_codes: line.country_codes, countries: line.countries, node_id: toInt(line.node_id), sourceID: line.sourceID})
For the Intermediaries:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:/path/Intermediaries.csv' AS line CREATE (:Intermediaries { name: line.name, internal_id: line.internal_id, address: line.address, valid_until: line.valid_until, country_codes: line.country_codes, countries: line.countries, status: line.status, node_id: toInt(line.node_id), sourceID: line.sourceID})
Officers:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:/path/Officers.csv' AS line CREATE (:Officers { name: line.name, icij_id: line.icij_id, valid_until: line.valid_until, country_codes: line.country_codes, countries: line.countries, node_id: toInt(line.node_id), sourceID: line.sourceID})
Entities:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:/path/Entities.csv' AS line CREATE (:Entities { name: line.name, original_name: line.original_name, former_name: line.former_name, jurisdiction: line.jurisdiction, jurisdiction_description: line.jurisdiction_description, company_type: line.company_type, address: line.address, internal_id: line.internal_id, incorporation_date: line.incorporation_date, inactivation_date: line.inactivation_date, struck_off_date: line.struck_off_date, dorm_date: line.dorm_date, status: line.status, service_provider: line.service_provider, ibcRUC: toInt(line.ibcRUC) , country_codes: line.country_codes, countries: line.countries, note: line.note, valid_until: line.valid_until, node_id: toInt(line.node_id), sourceID: line.sourceID})
Finally the relationships, or edges. Note the hack, all relationships are of type ACCOC. This isn’t a big problem but offends me a little bit. I will post you Java code that generates the graph dir from the files.
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:/path/all_edges.csv' AS csvLine
MATCH (n1 { id: toInt(csvLine.node_1)}),(n2 { id: toInt(csvLine.node_2)})
CREATE (n1)-[:ACCOC {role: csvLine.rel_type}]->(n2)