"Word Lists" for Software Security Test Cases Word lists, Dictionary Files, Attack Strings, Miscellaneous Datasets and Proof-of-Concept Test Cases With a Collection of Tools for Penetration Testers Brief Introduction to werdlists Inspiration Taken from Similar Projects Unique Features Only Available With werdlists Repository Directory Hierarchy and Structure Naming Scheme, Syntax and Meaning Folder Names and Description of Contents Brief Introduction to werdlists ?? This project is a collection of word lists--they are mostly whitespace-delimited or line-based. Although the passes-dicts folder contains inputs for password cracking , overall the files amassed here are intended to be useful in facilitating the creation of insecure program state (with the help of a black-box fuzzer or scanning tool.) The vast majority of files are simply ASCII with the UNIX style newline . Beware that this project does not attempt in any way to be minimalist or lack verbosity! Inspiration Taken From Similar Projects ?? werdlists is very similar to fuzzdb and SecLists . SecLists is maintained by my former colleague at IOActive , Daniel Miessler . Admittedly, werdlists is quite similar in mission as it's a centralized attack strings and input data resource. Regardless, werdlists expands on a number of concepts: it has its own unique style, organization, original hand-crafted contents, dataset creation/management/validation scripts, scanner springboards, etc. Unique Features Only Available With werdlists ?? werdlists cross-references between the code repositories of third-party scanners and its own datasets that each tool will benefit from. Moreover, there are specialized parsing scripts exclusive to werdlists that extract results produced through pairing test tools with its own data. Output strings are gathered from those results and fed back into the test tools. In other words, there are a number of interactive and/or tunable feedback loops implemented. Quite a few of the werdlists data files were created this way. Repository Directory Hierarchy and Structure ?? The scripts folder consists of shell scripts used for repository maintenance. There is a sub-directory of scripts called init where scripts that initialize data files are stored. If a script filename stored in init contains two dashes, then it's output should reflect the contents of the associated data file. For example, compare manpages-environ and clib-package-names . All scripts were written using bash syntax. The contrib folder is for storing scripts contributed via pull request and the utils folder contains utilities that aren't necessarily specific to the werdlists project, such as scripts for managing any wordlist file. Other data files were manually composed by hand and a small handful were created by recycling output strings back into input parameter lists, i.e. dirbdirs-feedback The tools folder lists security tools that the datasets contained in this repository can be provided as input for. Individual folders are detailed in the Folder Names and Description of Contents section below. All files in each dataset directory are detailed in the local README.md file for that folder (as opposed to the global README.md in the root directory being read now.) Naming Scheme, Syntax and Meaning ?? Most files have the *.txt extension signifying the text/plain MIME type Often used formats besides plain text include: Comma-Separated Values ( text/csv ), Extended Markup Language ( application/xml ), Hyper Text Markup Language ( application/html ), etc. Any file that is larger than 1MB uncompressed will be compressed with xz according to the commands in the scripts/xzlarge-files bash script. Other file extensions in use are: *.ans , *.asc , *.bin , *.c , *.conf , *.cpp , *.csv , *.html , *.inf , *.ini , *.json , *.md , *.rpz , *.rst , *.sh , *.txt , *.xml , *.yaml , *.yml , *.zip , and *.zone . Folder Names and Description of Contents ?? Folder??Name Description of Contents apple-paths ?? Pathnames found on MacOS file systems apple-data ?? Data identifiers and such from Apple's MacOS operating system arpa-headers ?? Header fields transmitted over RFC2822 style protocols like SMTP ascii-art ?? "Low bit" a.k.a. 7-bit ASCII art items without control characters biology-info ?? Reference information useful in the study of biological issues browser-data ?? Data related to GUI browser software like Chrome , FireFox , etc. cert-data ?? Information commonly utilized by cryptographic certificate materials char-encodes ?? Various character encodings provided by different locales / charsets char-sequence ?? various character sequences modeled after ctype.h chat-data ?? Additional data on IRC , XMPP and other such messaging protocols cipher-data ?? Data denoting or used by cryptographic algorithm implementations cmd-usage ?? Help text shown in a terminal when attempting to execute CLI programs code-keywords ? Computer language identifiers, reserved words and similar syntax cpu-arch ?? Low-level computer architecture and hardware subjects crypt-output ? Cipher text string outputs created by cryptographic hash functions database-strs ?? Strings often encountered when working with database software dns-domains ?? A list of domains that may have been found in the live DNS tree at one point dns-hostnames ?? The host name part of an FQDN dns-records ?? Data specific to RR's in the DNS system dns-servers ?? Data provided to, produced by or related to DNS name servers dns-toplevel ?? TLD's or Top Level Domains in the uppermost part of the DNS hierarchy environ-vars ? Environment variable names, settings, etc. exploit-info ?? Technical information on exploitation of security vulnerabilities file-extens ? Stuff on Filename extensions , i.e. the part after the dot file-specs ?? File format specifications as distributed by vendor(s)/author(s) ftp-data ?? Various FTP datum from RFC's and elsewhere glibc-data ?? Data taken from the source code of the GNU C Library html-words ?? Words not uncommon to come across when parsing HTML dialects http-agents ?? Software version banners for HTTP User Agents also known as browsers http-headers ?? Header fields sent in requests/responses by browser/server software http-methods ▶? Names Request methods browsers send in the first line of HTTP http-params ?? Parameters browsers sometimes send when requesting server URI paths http-security ?? HTTP security info such as Content Security Policy http-servers ?? Information related to the usage of web server software http-status ?? Numeric HTTP status codes in server reply as RFC7231 specifies inet-addrs ?? Numeric Internet addresses a.k.a. IP addresses--mostly version 4 inet-routes ?? Data useful in the maintenance and use of an Internet routing table inet-services ? Lists of Internet protocols/daemons--similar to /etc/services infosec-people Noteworthy individuals known from information security communities iso-codes ?? Codes, numbers and such as standardized by ISO java-data ?? Data found in or related to source code of programs written with Java linux-data ?? Data identifiers and such from the Linux operating system linux-paths ??? Pathnames found on file systems created by Linux installations malware-iocs ?? IOC for identification of malware infections mobile-devs ?? Mobile device development for "handheld" form factors net-attacks ♨? Info about attacks on telecommunications and Internetworks net-ifaces ??? Detailed information which can be extracted from network interfaces ntfs-paths ?? File paths expected to be seen in NTFS folders owasp-data ?? Data from or for OWASP passes-dicts ?? Dictionary files for brute-force attacks against account passwords passes-sites ?? Hashed or unencrypted passwords that were publicized after the breach of a well-known site perl-data ?? Data often seen in PERL (Practical Extraction and Report Language) php-data ?? Files containing information about the PHP programming language postal-data ?? United States Postal Service information python-data ?? Data used by the Python scripting language interpreter at runtime radio-data ?? Things commonly used in radio frequency transmissions regex-data ?? Regular expression patterns used to launch/detect attacks ruby-data ?? Data typically seen within the syntax of the Ruby scripting language search-dorks ?? General purpose search-engine queries likely to find insecure sites smtp-messages ?? Messages (i.e. signatures, auto-replies, etc.) sent by SMTP servers soap-messages ?? SOAP (Simple Object Access Protocol) messages social-data ?? Sociological or social media related data sets including logins and user names software-strs ?? Strings describing software engineering , programming languages , etc. string-enums ?? Enumerations of values that aren't too terribly unusual system-admin ?? System administration and BOFH related materials system-notices ?? Disclaimer/warning messages shown by networked computer systems telco-data ?? Voice telecommunications technologies: POTS , PCS , VoIP , SMS etc. text-files ?? zine articles and such like those archived at Jason Scott's textfiles.com text-words ?? Lists of words likely to be found in an actual hard copy dictionary top-secret ?? Files and/or data related to documents that were/are classified unicode-data ?? Unicode character usage and representation unix-data ?? Data associated with various flavors of the UNIX OS and its clones unix-paths ??? File path names found in various UNIX file systems uri-attacks ?? Malicious URI materials specially crafted for attack targets uri-schemes ?? Lists containing references for URI schemes (part before colon) uri-data ?? Universal Resource Identifier related data vuln-data ?? Information about security vulnerabilities found in server software webapp-attacks ?? Proof-of-concept samples demonstrating attacks against web applications webapp-data ?? Data associated with applications hosted on web servers webapp-dirs ?? Directories related to applications running on a web server webapp-files ?? Files related to applications running on a web server webapp-paths ?? Path names related to applications running on a web server webapp-words ?? Words related to applications running on a web server web-sites ?? Addresses to and/or information on significant WWW sites wifi-networks ?? IEEE 802.11 Wi-Fi network information windows-data ?? Data only found within the Microsoft Windows series of OSes ans asc bin c conf cpp csv html inf ini json md rpz rst sh txt xml yaml yml zip zone