Woozle Wuzzle
find, xargs and spaces

I commonly use the following (or something close to it) to rip through third-party source to find something I'm interested in:

find . -name "*.java" -type f | xargs grep someMethod

The problem that I usually run into is that directories will have spaces in the names (xargs treats spaces as delimiters). The trick to getting around this is the following:

find . -name "*.java" -type f -print0 | xargs -0 grep someMethod

The -print0 on find will use null to separate filenames and the -0 on xargs will read them. This article has more information.

And for those of you that are wondering why I have spaces in directories / filenames, A) it's 2005 people, B) I'm developing on Windows where it's more common and C) I'm looking at third-party source (i.e. go b*tch to someone else).

Comments
Comment by bob at October 28, 2005 11:38 PM

xargs(1) is very unsafe. stick with the -exec in find(1).

 

Comment by rgrzywinski at October 29, 2005 08:51 AM

Define "unsafe"! Am I going to lose an arm? Is my wife going to be abducted? Is it going to wipe my hard drive?!?

The "-exec" link in "find" is orders of magnitude slower than "xargs" for what I use it for. Unless "unsafe" means that lasers are going to shoot out of my mouse and burn my eye balls out, I think that I'm OK.

Let us know!

 

Comment by bob at December 3, 2005 04:17 AM

maybe youll be working at a client site and execute something malicious (think of the irssi problems in 2003 just from stdout in terminals) and then have to sell off your arm or sell your wife to pay for the lawsuit....or you could just think of it as safe computing practices...though i see youre the type of developer who isnt really aware of these things

 

Comment by rgrzywinski at December 3, 2005 09:20 AM

I hate to be negative in comments but you're touching a nerve here. What you're telling me is that because some operation has the potential for causing harm in some limited and identifiable domain that I should use it in no other -- even in those where it can have no impact.

So by your logic, I should not use malloc since it's known to cause memory leaks which leads to computer crashes or I shouldn't increment pointers since there's a chance that I might run past then end of a block and cause a seg fault. These are unsafe computing practices, right? Hell, I shouldn't even turn on my computer since it's possible for someone to hack in and use it for a platform for causing harm to other computers -- computers are *very* unsafe. I shouldn't turn on lights since they might spark and cause fires -- home wiring is very unsafe. I shouldn't exhale since I expel carbon dioxide which is contributing to global warming and that's unsafe. Maybe I should just off myself and save everyone the problem but then, oh crap, my decomposing body will release methane and other noxious gasess into the air and that's unsafe. I guess I'm just not aware of what I'm doing. Not only am I *not* the "type of developer who isnt really aware of these things", I'm not the type of human being that is really aware of these things.

Thank you for pointing out my ignorance and incompetence even though I *clearly* identifed the domain in which I use this highly unsafe and should-be-stricken-from-the-earth technology and that domain has no chance of ever causing an impact.

Yes, my friend, even the most unsafe practices *must* be accompanied with a domain in which they are either safe to use (i.e. cause no undesired impact) or are unsafe. You can't simply make a statement that says "XYZ is very unsafe".

You might simply have said to the effect of "xargs(1) is not recommended in environments where there is the potential for executing malicious code that might cause a negative impact (such as at a client's site) since there are known cases XYZ where it has been shown to be possible". This is what is commonly referred to as a "well laid out argument" it has a statement of concern and its outcome, a domain of applicability, and supporting evidence.

Thank you for your time and concern but please peddle your goods elsewhere.

 

Comment by n/a at December 12, 2006 12:44 AM

I think the commenter was trying to point out that calling exec(3) from find(1) is a bit safer because you're still stuck in the find(1) memory space, forked and not pipelined. It gets tricky though: what operating platform was the commenter discussing? Eh, they prolly won't check here again, but pipelining on some platforms might be better off if you have stringent constraints on what any program making system calls to the kernel can do (see OpenBSD's systrace(4)).

Some machines hosted by monkey.org were claimed to have been hacked through some code execution in the actual TTY (from stdout) years ago. For that reason, systrace(4) was developed by dugsong. How this relates to your post, I don't know...this is just rambling now.

p.s. s/incompitence/incompetence/g :)

 

Comment by rgrzywinski at December 12, 2006 06:24 AM

Spelling updated. Thanks! No matter how hard I try to remember that one, having the two 'e's next to each other looks odd and I change the first to an 'i'.

 

Comment by n/a at December 15, 2006 08:22 AM

The one good thing about firefox 2.0 is spellcheck...

...when the browser isn't using up 100TB of memory or trying to restore non-existent sessions or not crashing or not rendering xml too strictly or....

 

Comment by rgrzywinski at December 16, 2006 08:11 AM

I can agree with that, but unfortunately I've rolled back to 1.5 (see the relevant posting).

 

Post a comment













Remember personal info?






Creative Commons License Unless otherwise expressly stated, all original material of whatever nature created by Rob Grzywinski and included in this weblog and any related pages, including the weblog's archives, is licensed under a Creative Commons License.