Monday, March 20, 2017

User Agent Strings : Rules of Thumb


Gathering Server Usage Statistics via User Agent Strings

All incoming web traffic to a web server contains a "user agent", which is a string value that clients include with their web requests. The user agent string contains information about the device hardware, the device's OS, the browser type, and the browser's version. Thus, if you have access to a web server's traffic logs, you have a great way to gather statistics about your clients and the devices they use.

Unfortunately, user agent strings are not standardized. There is some consistency within a given device manufacturer/browser type, but otherwise sometimes one or more of these pieces of information are missing or are ambiguous.
When interpreting user-agent strings you need to do a little bit of investigating to reverse-lookup these pieces of information.

Recently I went through this exercise, and I developed a few rules of thumb.

Analysis Caveats - Precisely Measuring the Imprecise


  • If 80% accuracy is all you're looking for, then these rules of thumb will serve you well. 
    • These rules encompass major PCs, smart phones, and tablets. These rules do not cover miscellaneous devices such as iPods, smart TVs, and smart watches.
    • Understand the business decision you're trying to make. Is 80% acceptable?
    • These rules represent the device market in 2017. New devices are coming out constantly with potentially new combinations of user agent strings. These rules may become dated in a couple years. Be prepared to adjust the rules of thumb accordingly.
    • Some of these rules are self explanatory, some are quirky.
  • If you're looking for 100% accuracy, then beware:
    • Not easy to automate. There will some online tools that attempt to do this; I found them all to be inadequate, and I kept falling back to manual analysis.
    • Don't waste your time analyzing every unique string. Set yourself a cutoff value; aim for the big fish. For example: during my research I deemed any string with less than 100 hits per week not worth my time.
    • Any client can spoof their user agent string (e.g. bots). There is a non-trivial margin of error baked into this kind of statistics gathering.
    • Duplicate strings. Manufacturers may be too lazy (or purposefully) use the same non-unique strings for multiple devices. The resolution of your results are have some coarseness baked into them. Don't try to precisely measure the imprecise.

User Agent String : Rules of Thumb

Ignore Junk

  • The front of the user-agent string is always "Mozilla". Completely ignore it. All user-agent strings from all devices and all browsers will contain "Mozilla". This is because of historical reasons.
  • The middle part of the user-agent string is always "AppleWebKit ... like Gecko". All user-agent strings from all devices and all browsers will contain it. Completely ignore it. Again, historical reasons. 


Interpret Browser
Generally, browser info is located near the end of the string.

  • Chrome
    • Chrome on android devices will contain both strings "Chrome" and "Safari". Completely ignore the "Safari" part of the string.
    • Chrome on an iOS device will contain the string "CriOS".
    • Chrome's version is directly after the "Chrome" string
  • Safari
    • Safari on an iOS device will only make reference to "Safari" (i.e. it will make no reference to "Chrome")
    • Safari's version is directly after the "Version" string
  • Firefox
    • Contains the string "Firefox"
    • Firefox's version is directly after the "Firefox" string
  • IE
    • Will contain the string "MSIE" or "Trident"


Interpreting Device OS
Generally, device info is near the front of the string
  • Android
    • Android will contain the string "Linux; Android", or sometimes just "Android"
    • Android version is directly part of the string. Easy.
  • iOS
    • iOS will contain the string "iPhone" and "iPad"
    • iOS version is directly part of the string. Easy.

  • Windows
    • A windows OS device with "Windows NT 10.0" corresponds to a Windows 10 operating system
    • A windows OS device with "Windows NT 6.2" corresponds to a Windows 8 operating system
    • A windows OS device with "Windows NT 6.1" corresponds to a Windows 7 operating system
  • OSX
    • Will contain the string "Macintosh"


Interpreting Device Hardware
Generally, device info is near the front of the string

  • Android
    • Most android devices include a unique 5-10 digit hardware string. Google the string to see which device name it corresponds to.
    • Android smart phone will include the string "mobile safari" (even if chrome was used)
    • Android tablets will include the string "safari" (i.e. the word "mobile" will be omitted).
  • iOS
    • iPads will contain the string "iPad"
    • iPhones will contain the string "iPhone"
    • iOS devices are otherwise ambiguous and do not contain specific hardware strings. Usually you can only narrow it down to a range of devices (e.g. between iPhone 5 and iPhone 7)
      • The only hints we get are from the iOS version and the 5-6 digit string for the iOS build version of iOS. Do reverse lookups on iOS device/version charts. Sometimes the build version is only used by 1 or 2 devices.
  • OSX
    • Assumed to be a laptop or desktop
  • Windows
    • Assumed to be a laptop or desktop, unless:
    • A windows phone will contain the string "windows phone"
    • A Microsoft surface tablet will contain the string "touch"

External References used during research
http://www.useragentstring.com/index.php
https://developer.chrome.com/multidevice/user-agent
https://msdn.microsoft.com/en-us/library/ms537503(v=vs.85).aspx
http://www.enterpriseios.com/wiki/Complete_List_of_iOS_User_Agent_Strings (out dated!)
https://en.wikipedia.org/wiki/IOS_version_history
https://ipsw.me/9.3.5

https://ipsw.me/10.0.2

https://ipsw.me/10.1
https://ipsw.me/10.1.1
https://ipsw.me/10.2
https://ipsw.me/10.2.1

No comments:

Post a Comment