tag:blogger.com,1999:blog-7181711759016870742.post5643855067645146561..comments2022-05-10T12:26:08.070+02:00Comments on Thomas Jungblut's Blog: Controlling Hadoop MapReduce Job recursionThomas Jungbluthttp://www.blogger.com/profile/07157841886768146088noreply@blogger.comBlogger27125tag:blogger.com,1999:blog-7181711759016870742.post-74843935372948712552015-04-01T07:57:37.076+02:002015-04-01T07:57:37.076+02:00Yes it is definetly possible in MapReduce. Write y...Yes it is definetly possible in MapReduce. Write your own key class (WritableComparable) which treats 1,2 and 2,1 is the same key. So when you get the input to reducer you will be having (1,2) or (2,1) as key and their count as values. You can sum up for total tweets, and to identify Mutual or not you just need to check it values count (not sum) is more than one or not.<br />Anonymoushttps://www.blogger.com/profile/08257139794793401504noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-42886279912102811392015-03-30T11:42:03.746+02:002015-03-30T11:42:03.746+02:00Hi, I found this blog really very helpful.
I am a...Hi, I found this blog really very helpful.<br /><br />I am actually dealing with a twitter dataset, where I have information on how many time a particular user retweeted on another user's tweet.<br />So I have a row saying that,<br />1 -> 2 (20 times)<br />2 -> 1 (5 times)<br />I am writing a map reduce job to process this data and trying to aggregate this information in a way as:<br />1 -- 2 (25 times and relationship is mutual)<br />Can it be done using map reduce. The dataset is quite large and I am having a difficult way figuring this out.<br />Thanks.Anonymoushttps://www.blogger.com/profile/02716527170060331693noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-82800464425061713692013-09-09T13:10:46.421+02:002013-09-09T13:10:46.421+02:00This comment has been removed by the author.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-69988381293010251312013-04-19T13:27:24.426+02:002013-04-19T13:27:24.426+02:00Hi M,
to 1: The enum can be declared anywhere, bu...Hi M,<br /><br />to 1: The enum can be declared anywhere, but it must be accessible from the Controller class that submits the job, as well as the Mapper/Reducer class.<br /><br />to 2: Exactly, the counters are incremented in the mapper and/or in the reducer<br /><br />to 3: yes.Thomas Jungbluthttps://www.blogger.com/profile/07157841886768146088noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-8563968737544013322013-04-19T13:24:43.380+02:002013-04-19T13:24:43.380+02:00Hi
Thanks for the info you have shared. I have co...Hi<br /><br />Thanks for the info you have shared. I have couple of quick questions:<br /><br />1. Is the enum declared in Mapper/Reducer Class - since its their count which we need to monitor?<br />2. context. increment again called in mapper or reducer?<br />3. I am implementing this in a reducer, so even if one reducer instance runs, it will increment and exit the loop?<br /><br />Thanks<br />MNeOAxEshttps://www.blogger.com/profile/13483104523097238542noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-40640443803847817412013-02-25T09:26:51.105+01:002013-02-25T09:26:51.105+01:00That will work yes, then it writes/reads from the ...That will work yes, then it writes/reads from the HDFS you configured.Thomas Jungbluthttps://www.blogger.com/profile/07157841886768146088noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-6157923555616155272013-02-25T09:25:06.839+01:002013-02-25T09:25:06.839+01:00Thanks Thomas .. that's what I did and it is r...Thanks Thomas .. that's what I did and it is running locally with the configuration set to localhost and the ports used with hadoop installed on my machine but I was afraid with the actual setup this won't work by just replacing the url with the url of the namenode?CECUEGhttps://www.blogger.com/profile/12708790500969227929noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-78368589064673722522013-02-25T09:16:26.402+01:002013-02-25T09:16:26.402+01:00The Mahout jars must reside on your classpath and ...The Mahout jars must reside on your classpath and the jars that are in Hadoop's lib folder should be there too.Thomas Jungbluthttps://www.blogger.com/profile/07157841886768146088noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-1025063935390717122013-02-25T07:52:46.456+01:002013-02-25T07:52:46.456+01:00I couldn't get what you mean by "The jars...I couldn't get what you mean by "The jars should reside on both sides." .. for example if I am running an example that will call mahout code on my machine and I want it to start a job on an external hadoop cluster (I don't have have hadoop installed on my machine ) .. so do you mean I have to copy the jar file for my program to the hadoop namenode or what do you mean by both sides? <br />Thanks a lot & best regardsCECUEGhttps://www.blogger.com/profile/12708790500969227929noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-72657087465964789762013-02-24T08:36:25.611+01:002013-02-24T08:36:25.611+01:00Hi, exactly like you told. Or if you more the XML ...Hi, exactly like you told. Or if you more the XML kind of guy you can copy the hdfs-site.xml/core-site.xml of your cluster that contains this information to the other server and use conf.addResource(...). The jars should reside on both sides.Thomas Jungbluthttps://www.blogger.com/profile/07157841886768146088noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-34763192122386643322013-02-24T06:26:50.103+01:002013-02-24T06:26:50.103+01:00Hi, I want to ask a question not related directly ...Hi, I want to ask a question not related directly to this article .. I am new to Hadoop and wanted to ask how I can call Hadoop-based jobs externally from a machine not in the hadoop cluster (not a namenode or a datanode).. is just including hadoop libraries/jars and setting the configuration object with hdfs and jobtracker urls will do the job?? CECUEGhttps://www.blogger.com/profile/12708790500969227929noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-73894326828936213902012-08-05T10:07:16.994+02:002012-08-05T10:07:16.994+02:00Hi the counter never decreases, but in every MapRe...Hi the counter never decreases, but in every MapReduce job this counter starts from 0. So if the reducer not increments the counter, it will return 0.Thomas Jungbluthttps://www.blogger.com/profile/07157841886768146088noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-9869894097406477152012-08-05T09:36:05.126+02:002012-08-05T09:36:05.126+02:00hello,I want to ask you a question,
>>>i ...hello,I want to ask you a question,<br />>>>i know in the reduce() fuction,if vertex is updated(this vertex is set activated too) the counter is increment(1);if vertex is not updated(this vertex will be set to no-activated) the counter will noe increment.<br />>>>in the main() function,you use while(count>0) as loop condition.<br />-----my question is : in the begining itrations,in the reduce() function the counter will incremnet,and in the main() function,while will be executed。and if in one iteration, in the reduce() function,counter is not incremented,this means that no vertex is updated,so the while() will stop,but the while(counter>0) is also true(i have not see the decrement of counter),loop will continue。<br />i want to ask,how the counter decrease ?so the loop condition will stop.Anonymoushttps://www.blogger.com/profile/08130670420039176670noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-70794681186114720482012-07-27T05:12:40.693+02:002012-07-27T05:12:40.693+02:00NiceNiceAnonymoushttps://www.blogger.com/profile/15338498300054277130noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-58861832502131658262012-05-18T09:33:57.288+02:002012-05-18T09:33:57.288+02:00No there is no built in structure within Hadoop.
G...No there is no built in structure within Hadoop.<br />Graphs are somewhat abstract, so you actually can express it with an adjacency list by using a key and a list of keys as value.<br /><br />In Hadoop generics spoken you would have something like this:<br /><br />< Text, ArrayWritable ><br /><br />, whereas ArrayWritable is consisting of keys of Text. This is then your adjacency list. Now you can run fancy graph algorithms on it ;)Thomas Jungbluthttps://www.blogger.com/profile/07157841886768146088noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-3581368229791313592012-05-17T23:14:06.614+02:002012-05-17T23:14:06.614+02:00hi.. Is there any output type in Hadoop which can...hi.. Is there any output type in Hadoop which can output a graph structure? or how are graphs in general implemented in hadoop?RShttps://www.blogger.com/profile/03391400269569688284noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-8977493490605572252012-05-10T17:47:47.413+02:002012-05-10T17:47:47.413+02:00Oh! I got the idea of your example!
All I have to ...Oh! I got the idea of your example!<br />All I have to do is to make sure that the non-converged reducer tasks will increment the counter, and check if counter is still > 0.<br />Thanks very much again!Engineerhttps://www.blogger.com/profile/07992665544454474703noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-62777527676274166002012-05-10T17:35:52.190+02:002012-05-10T17:35:52.190+02:00You mean if some reducer increment but some do not...You mean if some reducer increment but some do not, this mechanism for recursive is not suitable?Engineerhttps://www.blogger.com/profile/07992665544454474703noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-6062987640080885972012-05-10T17:31:50.075+02:002012-05-10T17:31:50.075+02:00Hey,
it checks the counters after the job run, so...Hey,<br /><br />it checks the counters after the job run, so it takes the sum of all reducers of a single job.<br />If you have jobs where all the reducers do not increment the counter, well then this won't work and you have to find another metric.Thomas Jungbluthttps://www.blogger.com/profile/07157841886768146088noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-55157122867950415112012-05-10T17:30:13.312+02:002012-05-10T17:30:13.312+02:00Hi, thanks for this tutorial, it's really help...Hi, thanks for this tutorial, it's really helpful.<br />But what if some reducer tasks increment the counter but some do not?Engineerhttps://www.blogger.com/profile/07992665544454474703noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-38614535815904309372012-04-16T07:28:24.870+02:002012-04-16T07:28:24.870+02:00The counter is incremented in the mapreduce job th...The counter is incremented in the mapreduce job that is run between the while loop. Which has the breaking condition: (counter > 0)Thomas Jungbluthttps://www.blogger.com/profile/07157841886768146088noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-82907650408915808362012-04-16T05:43:51.142+02:002012-04-16T05:43:51.142+02:00Where is the counter incremented in the code and w...Where is the counter incremented in the code and when do we get out of the loop?Praveen Sripatihttps://www.blogger.com/profile/11782284194201977787noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-40470561143320324792012-01-21T11:53:19.813+01:002012-01-21T11:53:19.813+01:00This comment has been removed by a blog administrator.ciellehttps://www.blogger.com/profile/00822520778900707617noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-24228197051678865632012-01-16T09:59:23.006+01:002012-01-16T09:59:23.006+01:00Hi,
if you are in distributed mode, the distribut...Hi,<br /><br />if you are in distributed mode, the distribution of the splits are "completely" random. So Hadoop itself won't benefit from caching.<br /><br />If you are searching for a full cached solution, you should take a look into Spark.<br /><br />http://www.spark-project.org<br /><br />Or you can take a look at Apache Hama, there you can use caching very well and the iteration is much faster than Hadoop MapReduce.Thomas Jungbluthttps://www.blogger.com/profile/07157841886768146088noreply@blogger.comtag:blogger.com,1999:blog-7181711759016870742.post-9841573740586504802012-01-15T14:22:16.910+01:002012-01-15T14:22:16.910+01:00Hi,
Your job is so helpful.
Is there a way to ca...Hi, <br /><br />Your job is so helpful.<br />Is there a way to cache the output of the job for the next job, Skip the HHD writing or just Skip HDD input the reading process. <br /><br />Regards,Abdulrahman Kaitouahttps://www.blogger.com/profile/03422514047954824057noreply@blogger.com