RyanJ Posted April 25, 2010 Share Posted April 25, 2010 Hey there guys! I have a question for anyone. Maybe some code already exists for this or a good algorithm is already written down somewhere. Basically what I'm trying to do is make an unknown file type identifier. I'm trying to find if there are any .NET implementations of a file-type matching algorithm out there but I can't seem to find one. Or a good description of an implementation in another language. If there are none then can anyone point me in the general direction of writing one? I'm looking for help with the project so if anyone else is interested in this let me know. Cheers! Link to comment Share on other sites More sharing options...
khaled Posted April 27, 2010 Share Posted April 27, 2010 i don't know but, files types have extension .xxx and its content start with a unique code [ CODE ... ] Link to comment Share on other sites More sharing options...
Icefire Posted April 27, 2010 Share Posted April 27, 2010 probably one that takes the characters after the last period in the name (there can be more than one, but only the last one is important - except for exceptions such as .tar.gz), then goes to a site in inputs it as a search, then reformats the page by removing excess info (not the best idea if the website constantly redesigns itself...) Link to comment Share on other sites More sharing options...
D H Posted April 27, 2010 Share Posted April 27, 2010 I think Ryan is talking about identifying the file type by looking at the file's contents rather than at the file's name. The Unix file tool does a semi-decent job of doing just that. Link to comment Share on other sites More sharing options...
RyanJ Posted April 27, 2010 Author Share Posted April 27, 2010 I think Ryan is talking about identifying the file type by looking at the file's contents rather than at the file's name. The Unix file tool does a semi-decent job of doing just that. That's the idea. Though that uses a trick called magic numbers which misses most file types these days. Link to comment Share on other sites More sharing options...
StringJunky Posted April 27, 2010 Share Posted April 27, 2010 This program does the kind of job you are looking to do I think: http://www.freedownloadsplace.com/Products/38290/TrID-File-Identifier Link to comment Share on other sites More sharing options...
RyanJ Posted April 27, 2010 Author Share Posted April 27, 2010 I know. It uses a pattern matching approach. However it also fails with files that have no structure, such as ISO files. Link to comment Share on other sites More sharing options...
jryan Posted April 29, 2010 Share Posted April 29, 2010 You mean something like this?: http://filext.com/ I think there may be some trouble writing a program that can determine the exact application the file is connected to since file extension nomenclature is not strictly policed. As such you will have file types with the same names that are actually completely different formats. To really determine the root application you will need to get in and read the files themselves and match it against a structure database in the same what a virus scanner sniffs out infected files. Link to comment Share on other sites More sharing options...
RyanJ Posted April 29, 2010 Author Share Posted April 29, 2010 I know that. Actually that's what I'm working on at the moment I'm thinking of using pattern match and possible techniques such as entropy and compressibility in cases where patterns are not clear cut. The program is being written in C# and WPF so if anyone wants to take a look and work on this with me - feel free to let me know Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now